# RAG (Retrieval Augmented Generation) Chatbot Implementation

This notebook demonstrates how to build a RAG chatbot that can answer questions based on PDF documents. The system combines document retrieval with LLM generation to provide accurate, context-aware responses.

## Overview of the RAG Pipeline:
1. **Document Loading**: Load PDF documents from a local directory
2. **Text Chunking**: Split documents using two different strategies (recursive and semantic)
3. **Vector Storage**: Store document embeddings in ChromaDB for similarity search
4. **Hybrid Retrieval**: Combine semantic search with keyword-based BM25 retrieval
5. **Reranking**: Use ColBERT to rerank retrieved documents for better relevance
6. **Response Generation**: Generate answers using Google's Gemini model

## 1. Import Required Libraries


In [None]:
from langchain_experimental.text_splitter import SemanticChunker
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_chroma import Chroma
import chromadb
from langchain.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
import os
from dotenv import load_dotenv
load_dotenv()

## 2. Document Loading

This section loads all PDF files from a specified directory and converts them into a format suitable for processing.

In [18]:
# Load data
# from langchain.document_loaders import PyPDFLoader

data_root = "./data"
docs = []

for filename in os.listdir(data_root):
    filepath = os.path.join(data_root,filename)
    if filepath.endswith(".pdf"):
        loader = PyPDFLoader(filepath)
        data = loader.load()
        docs.extend(data)

if docs:
    print("Documents have been loaded")
else:
    print("No PDF files found in the folder.")

Documents have been loaded


## 3. Text Chunking Strategies

We implement two different chunking strategies to compare their effectiveness:

### 3.1 Recursive Character Text Splitting
This method splits text based on character count with overlap to maintain context between chunks.


In [19]:
rec_text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=100, separators = ["\n\n", "\n", " ", ""]
)
rec_text_splits = rec_text_splitter.split_documents(docs)
print(f"number of chunks: {len(rec_text_splits)}")

number of chunks: 167


In [20]:
# Example: View a specific chunk to understand the splitting result

rec_text_splits[10]

Document(metadata={'producer': 'Adobe PDF Library 15.0', 'creator': 'Adobe InDesign 14.0 (Macintosh)', 'creationdate': '2021-04-14T15:36:17-05:00', 'moddate': '2021-04-14T15:37:00-05:00', 'source': './data/AMZN-2020-Shareholder-Letter.pdf', 'total_pages': 10, 'page': 2, 'page_label': '3'}, page_content='know that a variety of things can impact performance in any given week, day, or hour. If employees are on\ntrack to miss a performance target over a period of time, their manager talks with them and provides\ncoaching.\nCoaching is also extended to employees who are excelling and in line for increased responsibilities. In fact,\n82% of coaching is positive, provided to employees who are meeting or exceeding expectations. We terminate\nthe employment of less than 2.6% of employees due to their inability to perform their jobs (and that\nnumber was even lower in 2020 because of operational impacts of COVID-19).\nEarth’s Best Employer and Earth’s Safest Place to Work\nThe fact is, the large

### 3.2 Semantic Chunking
This advanced method splits text based on semantic similarity, creating more coherent chunks.

In [None]:
# Initialize the embedding model for semantic analysis
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
semantic_text_splitter = SemanticChunker(embedding_model,breakpoint_threshold_type="percentile",  breakpoint_threshold_amount=85)
semantic_text_splits = semantic_text_splitter.split_documents(docs)
print(f"number of chunks: {len(semantic_text_splits)}")

## 4. Vector Database Setup

We use ChromaDB (cloud version) to store document embeddings for similarity search.

In [None]:
# Connect to ChromaDB Cloud instance

client = chromadb.CloudClient(
    api_key=os.getenv('CHROMADB_API_KEY'),
    tenant=os.getenv('CHROMADB_TENANT'),
    database=os.getenv('CHROMADB_DATABASE')
)

# Create separate vector stores for recursive and semantic chunks

vecstore_rec = Chroma(
    client=client,
    collection_name="my_kb_rec",
    embedding_function=embedding_model
)

vecstore_sem = Chroma(
    client=client,
    collection_name="my_kb_sem",
    embedding_function=embedding_model
)

# Add documents to vector stores

# vecstore_rec.add_documents(rec_text_splits)
# vecstore_sem.add_documents(semantic_text_splits)

## 5. Basic Similarity Search
Test basic vector similarity search to see how well each chunking strategy performs.


In [46]:
query = "What's the first thing to do?"

def basic_search (vecstore, query, k=2):
    results= vecstore.similarity_search(query, k=k)
    for i, result in enumerate(results, 1):
        print(f"\n--- Result {i}")
        print(result.page_content)


print("Semantic chunking search results:")
basic_search(vecstore_sem, query)

print("\nRecursive chunking search results:")
basic_search(vecstore_rec, query)

Semantic chunking search results:

--- Result 1
Sometimes, you proactively invite it in, and sometimes it just comes
a-knocking.

--- Result 2
customers check their risk
level for COVID-19 at home. Customers can ask, “Alexa, what do I do if I think I have COVID-19?” or “Alexa,
what do I do if I think I have coronavirus?” Alexa then asks a series of questions about the person’s symptoms
and possible exposure. Based on those responses, Alexa then provides CDC-sourced guidance.

Recursive chunking search results:

--- Result 1
stairway handrails, lockers, elevator buttons, and touch screens, and disinfectant wipes and hand sanitizer are
standard across our network.
We’ve also introduced extensive social distancing measures to help protect our associates. We have eliminated
stand-up meetings during shifts, moved information sharing to bulletin boards, staggered break times, and spread
out chairs in breakrooms. While training new hires is challenging with new distancing requirements, we con

## 6. Hybrid Retrieval Implementation
Hybrid retrieval combines semantic search (dense) with keyword search (sparse) for better results.


In [None]:
# TODO: compare the different results from 2 different chunking strategies

In [50]:
from langchain.retrievers import EnsembleRetriever, BM25Retriever

def hybrid_retrieve(vectorstore,k=5):
    """
    Create a hybrid retriever that combines BM25 (keyword-based) and semantic search.
    
    Args:
        vectorstore: The vector database containing document embeddings
        k: Number of documents to retrieve
    
    Returns:
        EnsembleRetriever: Combined retriever with weighted results
    """
    
    bm25_retriever = BM25Retriever.from_documents(docs)
    bm25_retriever.k = k
    semantic_retriever = vectorstore.as_retriever(search_kwargs={"k": k})
    hybrid_retriever = EnsembleRetriever(
        retrievers=[bm25_retriever, semantic_retriever],
        weights=[0.4, 0.6]  # 40% BM25, 60% semantic - adjust as needed: BM25 scores are multiplied by 0.4, Semantic retriever scores are multiplied by 0.6,Then the ensemble merges all the documents and ranks them according to the weighted scores. 
    )
    return hybrid_retriever

In [51]:
# Test hybrid retrieval

query = "What is important?"

# Create hybrid retrievers for both chunking strategies

hybrid_retriver_rec = hybrid_retrieve(vecstore_rec)
hybrid_retriver_sem = hybrid_retrieve(vecstore_sem)

results_rec = hybrid_retriver_rec.invoke(query)
results_sem = hybrid_retriver_sem.invoke(query)

In [53]:
# Examine the retrieved results
print("=== Semantic Chunking Results ===")
for chunk in results_sem[:5]:
    print("-"*50)
    print(chunk.page_content)

print("\n=== Recusrsive Splitting Results ===")
for chunk in results_rec[:5]:
    print("-"*50)
    print(chunk.page_content)

=== Semantic Chunking Results ===
--------------------------------------------------
It’s significant, and it improves their lives.
--------------------------------------------------
But I
also know something else: it’s not the largest part of the value we’ve created. Create More Than You Consume
If you want to be successful in business (in life, actually), you have to create more than you consume. Y our
goal should be to create value for everyone you interact with. Any business that doesn’t create value for those
it touches, even if it appears successful on the surface, isn’t long for this world.
--------------------------------------------------
We create value for them.
--------------------------------------------------
If
we stopped doing all of the continuous hard work that is needed to maintain our distinctiveness in that
regard, we would quickly come into equilibrium with tyranny. We all know that distinctiveness – originality – is valuable. We are all taught to “be yourself.” W

## 7. Advanced Reranking with ColBERT
ColBERT reranking improves the relevance of retrieved documents by using a more sophisticated scoring mechanism.


In [56]:
from rerankers import Reranker
# Initialize ColBERT reranker

colbert_ranker = Reranker ("colbert-ir/colbertv2.0", model_type="colbert")
retreived_docs = [ chunk.page_content for chunk in results_sem]
colbert_rerank_results = colbert_ranker.rank(query = query, docs = retreived_docs)

Loading ColBERTRanker model colbert-ir/colbertv2.0 (this message can be suppressed by setting verbose=0)
No device set
Using device mps
No dtype set
Using dtype torch.float32
Loading model colbert-ir/colbertv2.0, this might take a while...
Linear Dim set to: 128 for downcasting


In [57]:
print("=== Reranked Results ===")
for result in colbert_rerank_results.results:
    print(f"\nRank: {result.rank}")
    print(f"Score: {result.score:.4f}")
    print(f"Text: {result.document.text[:200]}...")

=== Reranked Results ===

Rank: 1
Score: 1.3301
Text: It’s significant, and it improves their lives....

Rank: 2
Score: 0.7695
Text: We create value for them....

Rank: 3
Score: 0.6699
Text: to take collective action from big companies, small companies, nation states, global organizations, and
individuals, and I’m excited to be part of this journey and optimistic that humanity can come to...

Rank: 4
Score: 0.6318
Text: If
we stopped doing all of the continuous hard work that is needed to maintain our distinctiveness in that
regard, we would quickly come into equilibrium with tyranny. We all know that distinctiveness...

Rank: 5
Score: 0.4986
Text: But I
also know something else: it’s not the largest part of the value we’ve created. Create More Than You Consume
If you want to be successful in business (in life, actually), you have to create more...

Rank: 6
Score: 0.4056
Text: It’s about all the other detailed aspects of the relationship too. Does your Chair take comfort in the outcome

## 8. Response Generation with Gemini
Generate final answers using Google's Gemini model based on the retrieved and reranked context.


In [None]:
from google import genai
from google.genai import types

# Initialize Gemini client

client = genai.Client(api_key=os.getenv('GOOGLE_API_KEY'))

context = "\n\n".join([result.document.text for result in colbert_rerank_results.results if result.score > 0.4])

prompt_rag = f"""Question: {query}
Context: {context}
Based on the context above, provide a detailed answer:"""

generate_content_config = types.GenerateContentConfig(
    temperature=0.7, # Controls randomness (0.0 = deterministic, 1.0 = very random)
    top_p = 0.85, # Controls diversity of token selection
    thinking_config = types.ThinkingConfig(
        thinking_budget=0,# Disables thinking
    )
)
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=prompt_rag,
    config=generate_content_config
)

In [None]:
# TODO: Implement conversation memory
# - Store chat history to maintain context across multiple questions
# - Use techniques like conversation summarization for long chats

# TODO: Add evaluation metrics
# - Implement RAGAS (Retrieval Augmented Generation Assessment) for quality measurement

#TODO: Prompt engineering should be implemented, once the focus of this RAG chabot has been decided

In [62]:
response.text

'Based on the provided text, several things are highlighted as important:\n\n1.  **Creating Value:** This is a recurring and central theme. The text explicitly states, "Your goal should be to create value for everyone you interact with. Any business that doesn’t create value for those it touches, even if it appears successful on the surface, isn’t long for this world." It\'s also mentioned that creating value for customers is significant and improves their lives.\n\n2.  **Distinctiveness/Originality:** This is presented as "of utmost importance" by the CEO. The text uses a biological metaphor to explain that just as living things must work to maintain their distinctiveness from their environment to survive, companies and individuals must work to maintain their originality. It states, "We all know that distinctiveness – originality – is valuable. We are all taught to \'be yourself.\'" It emphasizes that maintaining distinctiveness requires continuous effort and a "price," but "it\'s wor