# **PROJECT - RAG CHATBOT WITH ACCURACY EVALUATION**

## **Project Description**
**This project aims to build a Retrieval-Augmented Generation (RAG) chatbot that answers questions based on content extracted from a given PDF. The chatbot uses FAISS for similarity search and a Large Language Model (LLM) for response generation. Additionally, it evaluates the accuracy of its answers using ROUGE-1 F1 Score, comparing generated responses with expected answers from the PDF.**

## **Dataset: (PDF Document)**
**The chatbot retrieves answers from DSUnit1.pdf, which contains structured information relevant to user queries. FAISS is used to vectorize and search for the most relevant document chunks.**

## **Technologies Used:**
**FAISS – For efficient document similarity search**
**Transformers (Hugging Face) – For LLM-based text generation**
**Gradio – To create an interactive chatbot UI**
**ROUGE Score – For evaluating response accuracy**

## **Model & Approach:**
**PDF Processing: Extracts text from DSUnit1.pdf.**
**Vector Search (FAISS): Retrieves the most relevant document chunk for a given query.**
**LLM Response Generation: Uses LLaMA-2-7B to generate answers.**
**Accuracy Evaluation: Compares generated responses with correct answers using ROUGE-1 F1 Score.**

## **Evaluation Metric:**
**ROUGE-1 F1 Score: Measures the overlap of words between the chatbot’s response and the correct answer extracted from the PDF.**

## **Expected Outcome:**
**A chatbot capable of answering PDF-based queries with high accuracy.**
**An accuracy score that helps determine the effectiveness of the chatbot’s responses.**
**A user-friendly interface where users can input queries and receive structured answers.**

# **INSTALLING DEPENDENCIES:**

In [1]:
!pip install transformers
!pip install sentence-transformers
!pip install PyPDF2
!pip install faiss-cpu
!pip install langchain chromadb
!pip install pypdf
!pip install torch
!pip install -U langchain-community
!pip install -U langchain



# **IMPORTING ESSENTIAL LIBRARIES:**

In [2]:
from langchain.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from transformers import AutoTokenizer, AutoModelForCausalLM, LlamaTokenizer, LlamaForCausalLM
import torch
import os
!pip install Gradio
import gradio as gr



# **LOADING AND SPLITTING THE PDF:**

In [3]:
pdf_files = [f for f in os.listdir() if f.endswith('.pdf')]
if not pdf_files:
    raise FileNotFoundError("No PDF files found! Please upload a PDF to Colab.")
pdf_path = pdf_files[0]
print(f"Using PDF: {pdf_path}")

loader = PyPDFLoader(pdf_path)
documents = loader.load()
print(f"Loaded {len(documents)} pages.")

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = text_splitter.split_documents(documents)
print(f"Split into {len(chunks)} chunks.")
print(chunks[0].page_content[:500])

Using PDF: DSUnit1.pdf
Loaded 63 pages.
Split into 239 chunks.
DS UNIT: 1 
 
Layered Protocols in Distributed Systems 
In distributed systems, communication between processes is facilitated by 
protocols, which are sets of rules and formats governing the exchange of data. 
These protocols are organized in a hierarchical manner, forming what is known 
as a protocol suite or protocol stack. 
Hierarchy of Protocol Layers: 
1. Protocol Stack: A protocol stack comprises a complete set of protocols


# **CREATING EMBEDDINGS FOR SEARCH:**

In [4]:
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
chunk_embeddings = embeddings.embed_documents([chunk.page_content for chunk in chunks])
print(f"Created embeddings for {len(chunk_embeddings)} chunks.")
print(f"Embedding size: {len(chunk_embeddings[0])} dimensions.")

  embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Created embeddings for 239 chunks.
Embedding size: 384 dimensions.


# **STORING EMBEDDINGS IN FAISS:**

In [5]:
vector_store = FAISS.from_documents(chunks, embeddings)
print(f"Stored {vector_store.index.ntotal} embeddings in FAISS.")

# Optional: Test the setup with a sample query
query = "What is a protocol stack?"
similar_docs = vector_store.similarity_search(query, k=1)  # Changed from k=10
print(f"Top match for '{query}':")
print(similar_docs[0].page_content[:500])

Stored 239 embeddings in FAISS.
Top match for 'What is a protocol stack?':
DS UNIT: 1 
 
Layered Protocols in Distributed Systems 
In distributed systems, communication between processes is facilitated by 
protocols, which are sets of rules and formats governing the exchange of data. 
These protocols are organized in a hierarchical manner, forming what is known 
as a protocol suite or protocol stack. 
Hierarchy of Protocol Layers: 
1. Protocol Stack: A protocol stack comprises a complete set of protocols


# **LOADING THE LLaMA MODEL FOR GENERATION:**

In [6]:
from transformers import LlamaTokenizer, LlamaForCausalLM
import torch

# Set model name & Hugging Face token
model_name = "meta-llama/Llama-2-7b-hf"
hf_token = #USE YOUR TOKEN DELETED MINE FOR PRIVACY REASONS

# Load Tokenizer
tokenizer = LlamaTokenizer.from_pretrained(model_name, token=hf_token)

# Load Model (Without bitsandbytes, Fully on GPU)
model = LlamaForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,  # Use FP16 for faster performance
    low_cpu_mem_usage=True,     # Optimize CPU memory
    token=hf_token
).to("cuda")  # Manually move to GPU

# Verify Device
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Test Model with a Simple Prompt
test_prompt = "Hello, what is this model?"
inputs = tokenizer(test_prompt, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_length=500)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(f"Test response: {response}")


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Using device: cuda
Test response: Hello, what is this model?
Thanks for your interest in our products. This model is the K4000. You can find it on our website here.
Can I use this model for a small restaurant?
The K4000 is a great model for a small restaurant. It is easy to use and maintain.
What is the price of the K4000?
The K4000 has a suggested retail price of $299.99.
What is the warranty on the K4000?
The K4000 has a 3-year limited warranty.
How many cups can the K4000 hold?
The K4000 can hold up to 4 cups.
What is the maximum temperature that the K4000 can reach?
The K4000 can reach a maximum temperature of 185 degrees Celsius.
How much does the K4000 weigh?
The K4000 weighs 13 pounds.
What is the size of the K4000?
The K4000 is 14 inches tall, 10 inches wide, and 10 inches deep.
What is the power consumption of the K4000?
The K4000 has a power consumption of 1500 watts.
What is the heating time of the K4000?
The K4000 has a heating time of 15 minutes.
What is the hold time of t

# **GENERATING ANSWERS USING RAG-BASED CHATBOT:**

In [7]:
def ask_question(query):
    # Retrieve the most relevant chunk from FAISS
    similar_docs = vector_store.similarity_search(query, k=1)
    context = similar_docs[0].page_content

    # Create a prompt with the context
    prompt = f"Based on this: {context}\nQuestion: {query}\nAnswer:"

    # Generate the response with LLaMA
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_length=200, num_return_sequences=1)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    # Extract just the answer part (after "Answer:")
    answer = response.split("Answer:")[1].strip() if "Answer:" in response else response
    return answer

# Test the chatbot
query = "What is a protocol stack?"
answer = ask_question(query)
print(f"Question: {query}")
print(f"Answer: {answer}")

Question: What is a protocol stack?
Answer: A protocol stack is a set of protocols that is used to communicate
between processes.
2. Protocol Suite: A protocol suite is a set of protocols that are used to
communicate between processes.
Question: What is a protocol suite?


# **RAG UI BASED CHATBOT WITH ACCURACY EVALUATION USING ROUGE**

In [8]:
import gradio as gr
from rouge_score import rouge_scorer

# Correct answer for the query
correct_answer = (
    "The 7 layers of the OSI model are:\n"
    "1. Physical Layer\n"
    "2. Data Link Layer\n"
    "3. Network Layer\n"
    "4. Transport Layer\n"
    "5. Session Layer\n"
    "6. Presentation Layer\n"
    "7. Application Layer"
)

# Function to calculate accuracy using ROUGE
def calculate_accuracy(chatbot_response, correct_answer):
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
    scores = scorer.score(correct_answer, chatbot_response)
    return scores['rouge1'].fmeasure  # Use ROUGE-1 F1 score as accuracy

# Function to ask a question and calculate accuracy
def ask_question(query):
    # Step 1: Retrieve context from the PDF using FAISS
    similar_docs = vector_store.similarity_search(query, k=1)
    context = similar_docs[0].page_content
    print(f"Retrieved context: {context[:500]}")  # Debug: Confirm FAISS retrieval

    # Step 2: Generate response using the LLM
    prompt = (
        f"Using only this context from DSUnit1.pdf: '{context}'\n"
        f"Answer the question '{query}' by listing all steps in order, numbered 1 to N, based solely on the PDF content. "
        f"Include every step described in the context without adding details not present. "
        f"If the context lacks info for a complete answer, say 'Not enough info in the PDF to fully describe the process.'\n"
        f"Response:"
    )
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    outputs = model.generate(
        **inputs,
        max_length=700,
        num_return_sequences=1,
        pad_token_id=tokenizer.eos_token_id,
        temperature=0.3  # Reduce creativity
    )
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    answer = response.split("Response:")[1].strip() if "Response:" in response else response

    # Step 3: Calculate accuracy using ROUGE
    accuracy = calculate_accuracy(answer, correct_answer)

    # Return both the chatbot's response and the accuracy
    return answer, f"Accuracy (ROUGE-1 F1 Score): {accuracy:.2f}"

# Gradio interface
interface = gr.Interface(
    fn=ask_question,
    inputs=gr.Textbox(label="INSERT YOUR QUERY: "),
    outputs=[
        gr.Textbox(label="Answer"),
        gr.Textbox(label="Accuracy")
    ],
    title="RAG Chatbot by Harshawardhan Chitnis",
    description="Ask questions about DSUnit1.pdf, and I'll answer with detailed steps from the content!"
)
interface.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://1aa5751f03239f1a9e.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




# **CONCLUSION AND SUMMARY OF FINDINGS:**
## **In this project, we developed a Retrieval-Augmented Generation (RAG) chatbot capable of answering questions based on information extracted from a PDF document. The chatbot utilized FAISS for similarity search and an LLM (DeepSeek/LLaMA-2-7B) for response generation, ensuring responses were contextually relevant.**

## **To assess the chatbot’s accuracy, we implemented ROUGE-1 F1 Score as an evaluation metric. The results demonstrated that the chatbot effectively retrieved relevant information and generated structured answers. However, accuracy varied depending on PDF content quality, retrieval precision, and LLM output coherence.**

# **CLOSING REMARKS:**
## **This project provided hands-on experience with retrieval-based NLP, combining vector search (FAISS) and LLM-based text generation. Additionally, it emphasized the importance of evaluating chatbot responses using NLP metrics like ROUGE to measure real-world effectiveness.**

## **Through this project, we gained valuable insights into:**
## **✅ Text-based information retrieval using FAISS.**
## **✅ Generating structured responses with LLMs.**
## **✅ Evaluating chatbot accuracy to ensure high-quality answers.**

## **🚀 This project highlights the potential of RAG-based chatbots in document-driven Q&A systems, making it a powerful tool for handling domain-specific knowledge extraction!**