# 🔍 **LLM-Powered PDF Question Answering System**

This project demonstrates how to build a Question Answering (QA) system over PDF documents using Large Language Models (LLMs).  
Users can upload a PDF, ask questions about its content, and receive accurate, context-based answers.

**Key Technologies**: LangChain, FAISS, HuggingFace, Gradio, PyTorch



## **Making Preparations**

***Make a utils directory***

In [1]:
!mkdir -p utils

***Download and Upload config.json and custom_logger.py files from this [Github Repo](https://github.com/iam-vsr/llm-pdf-qa)***

In [2]:
from google.colab import files
uploaded = files.upload()

Saving config.json to config.json
Saving custom_logger.py to custom_logger.py


***Code to move custom_logger.py to utils folder created earlier***

In [3]:
import shutil
shutil.move('custom_logger.py', 'utils/custom_logger.py')

'utils/custom_logger.py'

***Install Required Packages***

In [4]:
!pip install gradio langchain accelerate sentence_transformers pypdf tiktoken bitsandbytes

Collecting pypdf
  Downloading pypdf-5.7.0-py3-none-any.whl.metadata (7.2 kB)
Collecting bitsandbytes
  Downloading bitsandbytes-0.46.1-py3-none-manylinux_2_24_x86_64.whl.metadata (10 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch>=2.0.0->accelerate)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none

In [5]:
!pip install faiss-cpu

Collecting faiss-cpu
  Downloading faiss_cpu-1.11.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.8 kB)
Downloading faiss_cpu-1.11.0-cp311-cp311-manylinux_2_28_x86_64.whl (31.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.3/31.3 MB[0m [31m63.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: faiss-cpu
Successfully installed faiss-cpu-1.11.0


In [6]:
import faiss
print(f"FAISS version: {faiss.__version__}")

FAISS version: 1.11.0


In [7]:
!pip install -U langchain-community

Collecting langchain-community
  Downloading langchain_community-0.3.27-py3-none-any.whl.metadata (2.9 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.10.1-py3-none-any.whl.metadata (3.4 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.1-py3-none-any.whl.metadata (9.4 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain-community)
  Downloading python_dotenv-1.1.1-py3-none-any.whl.metadata (24 k

***Import Libraries & Load Config***

In [8]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import TokenTextSplitter
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
import pickle
import os
import gradio as gr
import json
import re

from utils.custom_logger import CustomLogger
logger = CustomLogger()

from langchain import HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA
import torch
from utils.custom_logger import logger


##**Document Processing & Embedding**

This section includes:
- Loading PDF pages
- Splitting documents into chunks
- Creating and storing document embeddings
- Loading embeddings if already available


In [9]:
class DataLoadPDF:

    """
    A class for loading data from a PDF file.
    """

    def __init__(self, file_path):

        """
        Initialize the DataLoadPDF instance.
        Args:
            file_path (str): Path to the PDF file to load.
        """

        self.file_path = file_path

    def load_data(self):

        """
        Load data from the PDF file.=
        Returns:
            list: List of pages from the PDF.
        """

        logger.info(f"Reading file {os.path.basename(self.file_path)} ... ")
        loader = PyPDFLoader(self.file_path)
        pages = loader.load()

        return pages

In [10]:
class DataSplitter:

    """
    A class for splitting data into chunks.
    """

    def __init__(self, chunk_size, chunk_overlap):

        """
        Initialize the DataSplitter instance.
        Args:
            chunk_size (int): Size of each chunk.
            chunk_overlap (int): Overlap between consecutive chunks.
        """

        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap

    def split_data(self, pages):

        """
        Split data into chunks.

        Args:
            pages (list): List of data pages.
        Returns:
            list: List of split documents.
        """

        logger.info(f"Document splitting with chunk_size {self.chunk_size} and chunk_overlap {self.chunk_overlap} ... ")

        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=self.chunk_size,
            chunk_overlap=self.chunk_overlap,
            separators=["\n\n", "\n", ".", " ", ""]
            )

        docs = text_splitter.split_documents(pages)
        return docs

In [11]:
class EmbeddingManager:

    """
    A class for managing document embeddings.
    """

    def __init__(self, model_name):

        """
        Initialize the EmbeddingManager instance.

        Args:
            model_name (str): Name of the embedding model.
        """

        self.model_name = model_name
        logger.info(f"Loading embeddings Model {self.model_name} ... ")
        self.embeddings = HuggingFaceEmbeddings(model_name=self.model_name)

    def create_embeddings(self, docs):

        """
        Create embeddings for documents.

        Args:
            docs (list): List of documents.
        Returns:
            FAISS: Document embeddings.
        """

        logger.info(f"Creating document embeddings for {len(docs)} split ... ")
        self.doc_embedding = FAISS.from_documents(docs, self.embeddings)
        return self.doc_embedding

    def save_embedding(self, file_name):

        """
        Save document embeddings to a file.

        Args:
            file_name (str): Name of the file to save the embeddings.
        """

        emedding_dir = "embeddings_data"

        if not os.path.exists(emedding_dir):
            os.mkdir(emedding_dir)

        file_name = os.path.basename(file_name)
        logger.info(f"Saving document embeddings: {'embeddings_data/'+file_name} ... ")

        with open("embeddings_data/"+file_name+".pkl", "wb") as f:
            pickle.dump(self.doc_embedding, f)

    def load_embedding(self, file_name):

        """
        Load document embeddings from a file.

        Args:
            file_name (str): Name of the file to load the embeddings.
        Returns:
            FAISS: Loaded document embeddings.
        """

        file_name = os.path.basename(file_name)
        logger.info(f"Loading document embeddings locally: {'embeddings_data/'+file_name} ... ")

        with open("embeddings_data/"+file_name+".pkl", "rb") as f:
            self.doc_embedding = pickle.load(f)

        return self.doc_embedding

    def check_embedding_available(self, file_name):

        """
        Check if document embeddings are available in a file.
        Args:
            file_name (str): Name of the file to check.
        Returns:
            bool: True if document embeddings are available, False otherwise.
        """

        file_name = os.path.basename(file_name)
        doc_check = os.path.isfile("embeddings_data/"+file_name+".pkl")
        logger.info(f"Is document embedding found: {doc_check}")

        return doc_check

In [12]:
class DocumentProcessor:

    """
    A class for processing documents and managing embeddings.
    """
    def __init__(self, model_name, chunk_size, chunk_overlap):

        """
        Initialize the DocumentProcessor instance.

        Args:
            model_name (str): Name of the embedding model.
            chunk_size (int): Size of each chunk.
            chunk_overlap (int): Overlap between consecutive chunks.
        """

        logger.info(f"Initializing document processor parameters - embedding model_name: {model_name}, chunk_size: {chunk_size}, chunk_overlap: {chunk_overlap} ... ")

        self.model_name = model_name
        self.chunk_size = chunk_size
        self.chunk_overlap = chunk_overlap
        self.embedding_manager = EmbeddingManager(model_name)

    def process_document(self, file_path):

        """
        Process a document and manage embeddings.
        Args:
            file_path (str): Path to the document file.
        Returns:
            FAISS: Document embeddings.
        """

        if self.embedding_manager.check_embedding_available(file_path):
            return self.embedding_manager.load_embedding(file_path)

        else:
            data_loader = DataLoadPDF(file_path)
            pages = data_loader.load_data()
            data_splitter = DataSplitter(self.chunk_size, self.chunk_overlap)
            docs = data_splitter.split_data(pages)
            doc_embedding = self.embedding_manager.create_embeddings(docs)

            self.embedding_manager.save_embedding(file_path)

            return doc_embedding

##**Loading the LLM Model**

Used `meta-llama/Llama-2-7b-chat-hf` from Hugging Face, loaded using Transformers pipeline.  
Supports int8 loading for memory efficiency.


In [13]:
class ModelLoader:
    """
    A class responsible for loading the language model.
    """
    def __init__(self, model_id, max_length, temperature,load_int8):
        """
        Initialize the ModelLoader instance.
        Args:
            model_id (str): Identifier of the pretrained model.
            max_length (int): Maximum length of generated text.
            temperature (float): Temperature parameter for text generation.
        """
        self.model_id = model_id
        self.max_length = max_length
        self.temperature = temperature
        self.load_int8 = load_int8

    def load_model(self):
        """
        Load the language model using the specified model_id, max_length, and temperature.

        Returns:
            HuggingFacePipeline: Loaded language model.
        """
        logger.info(f"Loading LLM model {self.model_id} with max_length {self.max_length} and temperature {self.temperature}...\n")
        tokenizer = AutoTokenizer.from_pretrained(self.model_id)
        if self.load_int8:
            model = AutoModelForCausalLM.from_pretrained(self.model_id, load_in_8bit=True, device_map="auto")
        else:
            model = AutoModelForCausalLM.from_pretrained(self.model_id, torch_dtype=torch.bfloat16, device_map="auto")

        logger.info("Model is loaded successfully\n")
        pipe = pipeline(
            "text-generation", model=model, tokenizer=tokenizer, max_length=self.max_length, temperature=self.temperature
        )
        llm = HuggingFacePipeline(pipeline=pipe)
        return llm

##**Setting up the Retrieval-Based QA System**

- Uses LangChain’s `RetrievalQA` chain
- Combines context retrieval and answer generation
- Prompt template is customizable


In [14]:
class QASystem:

    """
    A class representing a Question Answering (QA) system.
    """
    def __init__(self, llm):

        """
        Initialize the QASystem instance.
        Args:
            llm (HuggingFacePipeline): Loaded language model for text generation.
        """

        self.llm = llm

        self.prompt_template = """You are a helpful and concise assistant.Answer the question as best as you can.
        If you cannot find an answer, say "I don't know".
        Context:{context}
        Question: {question}
        Answer (based on the context above):"""

        PROMPT = PromptTemplate(
            template=self.prompt_template, input_variables=["context", "question"]
        )

        self.chain_type_kwargs = {
            "prompt": PROMPT,
        }

    def setup_retrieval_qa(self, doc_embedding):

        """
        Set up the retrieval-based QA system.
        Args:
            doc_embedding: Document embedding for retrieval.
        Returns:
            RetrievalQA: Configured retrieval-based QA system.
        """

        logger.info("Setting up retrieval QA system...\n")

        qa = RetrievalQA.from_chain_type(

            llm=self.llm,
            chain_type="stuff",  # You might need to replace this with the appropriate chain type.
            retriever = doc_embedding.as_retriever(
                search_type="similarity_score_threshold",
                search_kwargs={"score_threshold": 0.5, "k": 6}
                ),
            chain_type_kwargs=self.chain_type_kwargs,
            )

        return qa

## **Adding Hugging Face token**

In [24]:
from huggingface_hub import login
from getpass import getpass

hf_token = getpass("Enter your Hugging Face token:")
login(token=hf_token)

Enter your Hugging Face token:··········


## **Loading model and processing parameters from config.json for flexible and centralized configuration management**

In [16]:
with open('config.json', 'r') as config_file:
    config = json.load(config_file)

logger.info(f"Loaded config file: {config}")

2025-07-09 06:08:45,102 - INFO - ipython-input-16-1378223404.py:4 - Loaded config file: {'embedding_model_name': 'thenlper/gte-base', 'model_id': 'meta-llama/Llama-2-7b-chat-hf', 'chunk_size': 500, 'chunk_overlap': 50, 'max_length': 2000, 'temperature': 0.05, 'load_int8': True}
2025-07-09 06:08:45,102 - INFO - ipython-input-16-1378223404.py:4 - Loaded config file: {'embedding_model_name': 'thenlper/gte-base', 'model_id': 'meta-llama/Llama-2-7b-chat-hf', 'chunk_size': 500, 'chunk_overlap': 50, 'max_length': 2000, 'temperature': 0.05, 'load_int8': True}
INFO:custom_logger:Loaded config file: {'embedding_model_name': 'thenlper/gte-base', 'model_id': 'meta-llama/Llama-2-7b-chat-hf', 'chunk_size': 500, 'chunk_overlap': 50, 'max_length': 2000, 'temperature': 0.05, 'load_int8': True}


## **Initialize the embedding-based document processor and load the LLM based on parameters from the config.**

In [17]:
# Loading embedding model
document_processor = DocumentProcessor(model_name=config["embedding_model_name"], chunk_size=config["chunk_size"], chunk_overlap=config["chunk_overlap"])

# Load model globally
model_loder = ModelLoader(config["model_id"], config["max_length"], config["temperature"],config['load_int8'])

llm = model_loder.load_model()

qa_system = QASystem(llm)

2025-07-09 06:08:45,111 - INFO - ipython-input-12-1836899682.py:17 - Initializing document processor parameters - embedding model_name: thenlper/gte-base, chunk_size: 500, chunk_overlap: 50 ... 
2025-07-09 06:08:45,111 - INFO - ipython-input-12-1836899682.py:17 - Initializing document processor parameters - embedding model_name: thenlper/gte-base, chunk_size: 500, chunk_overlap: 50 ... 
INFO:custom_logger:Initializing document processor parameters - embedding model_name: thenlper/gte-base, chunk_size: 500, chunk_overlap: 50 ... 
2025-07-09 06:08:45,114 - INFO - ipython-input-11-3627787106.py:17 - Loading embeddings Model thenlper/gte-base ... 
2025-07-09 06:08:45,114 - INFO - ipython-input-11-3627787106.py:17 - Loading embeddings Model thenlper/gte-base ... 
INFO:custom_logger:Loading embeddings Model thenlper/gte-base ... 
  self.embeddings = HuggingFaceEmbeddings(model_name=self.model_name)
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging F

modules.json:   0%|          | 0.00/385 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/618 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/219M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/314 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

2025-07-09 06:09:05,099 - INFO - ipython-input-13-908555565.py:25 - Loading LLM model meta-llama/Llama-2-7b-chat-hf with max_length 2000 and temperature 0.05...

2025-07-09 06:09:05,099 - INFO - ipython-input-13-908555565.py:25 - Loading LLM model meta-llama/Llama-2-7b-chat-hf with max_length 2000 and temperature 0.05...

INFO:custom_logger:Loading LLM model meta-llama/Llama-2-7b-chat-hf with max_length 2000 and temperature 0.05...



tokenizer_config.json:   0%|          | 0.00/1.62k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

2025-07-09 06:21:27,309 - INFO - ipython-input-13-908555565.py:32 - Model is loaded successfully

2025-07-09 06:21:27,309 - INFO - ipython-input-13-908555565.py:32 - Model is loaded successfully

INFO:custom_logger:Model is loaded successfully

Device set to use cuda:0
  llm = HuggingFacePipeline(pipeline=pipe)


In [18]:
# Initialize global variable for doc_embedding
doc_embedding = None
pdf_file_name = None
qa = None

##**Defining Chatbot Logic**

This function:
- Handles file change detection
- Loads or creates document embeddings
- Performs retrieval + LLM-based answering


In [19]:
def chatbot(pdf_file,query):

    global doc_embedding
    global pdf_file_name
    global qa

    if pdf_file_name is None or pdf_file_name!= pdf_file.name or doc_embedding is None:
        logger.info("New PDF Found Resetting doc_embedding")
        doc_embedding = None
        pdf_file_name = pdf_file.name

    if doc_embedding is None:
        logger.info("Starting for new doc_embedding")
        doc_embedding = document_processor.process_document(pdf_file.name)
        qa = qa_system.setup_retrieval_qa(doc_embedding)

    result = qa.invoke({"query": query})

    def remove_duplicate_lines(text):
      seen = set()
      result = []
      for line in text.split("\n"):
        line = line.strip()
        if line and line not in seen:
          seen.add(line)
          result.append(line)
      return "\n".join(result)

    return remove_duplicate_lines(result['result'])

##**Building the User Interface with Gradio**

The Gradio app allows:
- PDF Upload
- User Query Input
- Real-Time Answers from LLM

Launches with `share=True` for public demo.


***I'll be using a Competitive Programming Handbook PDF as input, you can found it on my [GitHub Repo](https://github.com/iam-vsr/llm-pdf-qa)***

In [25]:
with gr.Blocks(theme=gr.themes.Default(primary_hue="red", secondary_hue="pink")) as demo:
    gr.Markdown("# Ask your Question to PDF Document")

    with gr.Row():
        with gr.Column(scale=4):
            pdf_file = gr.File(label="Upload your PDF")

    output = gr.Textbox(label="output",lines=3)
    query = gr.Textbox(label="query")
    btn = gr.Button("Submit")
    btn.click(fn=chatbot, inputs=[pdf_file,query], outputs=[output])

gr.close_all()
demo.launch(share=True, debug=True)

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://ae8b2f36668125cef9.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


2025-07-09 06:55:45,839 - INFO - ipython-input-19-1722763772.py:8 - New PDF Found Resetting doc_embedding
2025-07-09 06:55:45,839 - INFO - ipython-input-19-1722763772.py:8 - New PDF Found Resetting doc_embedding
INFO:custom_logger:New PDF Found Resetting doc_embedding
2025-07-09 06:55:45,842 - INFO - ipython-input-19-1722763772.py:13 - Starting for new doc_embedding
2025-07-09 06:55:45,842 - INFO - ipython-input-19-1722763772.py:13 - Starting for new doc_embedding
INFO:custom_logger:Starting for new doc_embedding
2025-07-09 06:55:45,845 - INFO - ipython-input-11-3627787106.py:86 - Is document embedding found: False
2025-07-09 06:55:45,845 - INFO - ipython-input-11-3627787106.py:86 - Is document embedding found: False
INFO:custom_logger:Is document embedding found: False
2025-07-09 06:55:45,848 - INFO - ipython-input-9-1779726145.py:25 - Reading file cp_handbook.pdf ... 
2025-07-09 06:55:45,848 - INFO - ipython-input-9-1779726145.py:25 - Reading file cp_handbook.pdf ... 
INFO:custom_log

Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://eddfa5c600d2d031b2.gradio.live
Killing tunnel 127.0.0.1:7861 <> https://7916c5d6be712f3816.gradio.live
Killing tunnel 127.0.0.1:7862 <> https://0234df2ee80957d4f2.gradio.live
Killing tunnel 127.0.0.1:7863 <> https://c8accc7a09c9fd27fc.gradio.live
Killing tunnel 127.0.0.1:7864 <> https://ae8b2f36668125cef9.gradio.live




In [26]:
!jupyter nbconvert --to notebook --ClearOutputPreprocessor.enabled=True --inplace your_notebook.ipynb

This application is used to convert notebook files (*.ipynb)
        to various other formats.


Options
The options below are convenience aliases to configurable class-options,
as listed in the "Equivalent to" description-line of the aliases.
To see all configurable class-options for some <cmd>, use:
    <cmd> --help-all

--debug
    set log level to logging.DEBUG (maximize logging output)
    Equivalent to: [--Application.log_level=10]
--show-config
    Show the application's configuration (human-readable format)
    Equivalent to: [--Application.show_config=True]
--show-config-json
    Show the application's configuration (json format)
    Equivalent to: [--Application.show_config_json=True]
--generate-config
    generate default config file
    Equivalent to: [--JupyterApp.generate_config=True]
-y
    Answer yes to any questions instead of prompting.
    Equivalent to: [--JupyterApp.answer_yes=True]
--execute
    Execute the notebook prior to export.
    Equivalent to: [--ExecutePr