Skip to content

srimoyee1212/Document-Summary-and-QA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“š Document Summarizer and Q&A System

This project implements a Document Summarizer and Question-Answering (QA) System using Retrieval Augmented Generation (RAG) principles. It combines document retrieval, summarization, and question-answering capabilities using FAISS for efficient vector search and OpenAI GPT models for summarization and Q&A tasks.

Watch it summarize the paper "Attention Is All You Need" ⬇️

sum-ezgif com-video-to-gif-converter

πŸš€ Features

  • Upload and process documents: Supports PDF and EPUB formats.
  • Text splitting and embedding: Uses OpenAI embeddings and FAISS for efficient text retrieval.
  • Clustering for summary: Clusters embeddings to summarize the most representative parts of the document.
  • Summarization: Generates cohesive summaries using GPT-3.5 and GPT-4.
  • Interactive Q&A: Ask questions based on the uploaded document and get answers powered by GPT models.

πŸ› οΈ Built With

  • LangChain: For managing document loaders, embeddings, chains, and summarization.
  • OpenAI API: For using GPT-3.5 and GPT-4 models in summarization and Q&A.
  • FAISS: For fast vector searches.
  • KMeans Clustering: For grouping similar document embeddings.
  • Streamlit: For building the interactive web application.

πŸ’» How It Works

  1. Document Upload: The app allows you to upload PDF or EPUB files.
  2. Text Processing:
    • The document is loaded using PyPDFLoader or UnstructuredEPubLoader.
    • Text is split into manageable chunks using RecursiveCharacterTextSplitter.
  3. Embeddings and FAISS Indexing:
    • Chunks are embedded using OpenAI’s embeddings API.
    • FAISS is used to create a vector store for efficient retrieval.
  4. Clustering:
    • Embeddings are clustered using KMeans to identify key sections.
  5. Summarization:
    • Selected document chunks are summarized using GPT-3.5-turbo.
    • A final, cohesive summary is generated by GPT-4.
  6. Q&A:
    • Users can ask questions about the document, and relevant chunks are retrieved using FAISS.
    • GPT-3.5 answers the questions based on the retrieved content.

πŸ—οΈ Installation

  1. Clone the repository:

    git clone https://github.com/srimoyee1212/Document-Summary-and-QA.git
    cd Document-Summary-and-QA
  2. Create a virtual environment and activate it:

    python -m venv venv
    source venv/bin/activate  # On Windows use: venv\Scripts\activate
  3. Install the required dependencies:

    pip install -r requirements.txt
  4. Set your OpenAI API key by adding it directly to the code or using environment variable

πŸ”‘ Usage

  1. Run the Streamlit app:

    streamlit run app.py
  2. Open the app in your browser at http://localhost:8501.

  3. Upload a document (PDF or EPUB).

  4. Generate a summary and/or ask questions about the document.

About

Use AI to summarize long books and research papers!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages