
# Building and Evaluating a RAG Pipeline

This notebook guides you through the process of building and evaluating a Retrieval-Augmented Generation (RAG) pipeline. The pipeline utilizes:

- **Chroma**: A vector database used to store and efficiently search for relevant text chunks from your document.
- **OpenAI Embeddings**: Used to convert text into numerical representations (vectors) for storage and retrieval in Chroma.
- **OpenAI Chat Model**: A language model used to generate answers based on the retrieved context.

The steps cover loading data, preparing it for the pipeline, setting up and testing the core RAG components, and exploring methods to improve and evaluate the pipeline's performance.


## Load data

Load data from a source (e.g., a file on Google Drive).


In [None]:
from google.colab import drive

# Mount Google Drive to access files stored in your Drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# Install necessary libraries for the RAG pipeline
!pip install python-docx langchain langchain-text-splitters langchain-openai langchain-community chromadb PyMuPDF



In [None]:
import fitz  # Import PyMuPDF

# Specify the path to your data file in Google Drive
file_path = '/content/DMV-DriverManual.pdf' # Replace with your actual file path

# Read the content of the PDF file
try:
    doc = fitz.open(file_path)
    # Extract text from each page and join them with newline characters
    data = ""
    for page_num in range(doc.page_count):
        page = doc.load_page(page_num)
        data += page.get_text() + "\n"
except FileNotFoundError:
    # Handle the case where the file is not found
    print(f"Error: File not found at {file_path}")
    data = None

# Display the first 500 characters of the loaded data to verify
if data:
    print(data[:500])

  
  
Ned Lamont 
Governor
Sibongile Magubane 
Commissioner
ct.gov/dmv
facebook.com/CTDMVteens
@CTDMV
Driver’s  
Manual
State of Connecticut 
Department of Motor Vehicles

Commissioner 
Sibongile Magubane
Governor Ned Lamont
An Important Message from Governor Ned Lamont and 
Commissioner Sibongile Magubane
Connecticut takes pride in its highway safety initiatives and efforts to make the roads 
safer for all who use them. This work involves driver licensing and ensuring that new and 
renewing dri


## Split text

Divide the loaded text into smaller, manageable chunks for processing.


In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Initialize a text splitter with a chunk size of 500 and an overlap of 50 characters
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
# Split the loaded data into text chunks
texts = text_splitter.split_text(data)

# Print the number of generated chunks
print(f"Number of chunks: {len(texts)}")
# Print the first chunk to verify the splitting
print("First chunk:")
print(texts[0])

Number of chunks: 499
First chunk:
Ned Lamont 
Governor
Sibongile Magubane 
Commissioner
ct.gov/dmv
facebook.com/CTDMVteens
@CTDMV
Driver’s  
Manual
State of Connecticut 
Department of Motor Vehicles


## Create embeddings

Generate embeddings for the text chunks using OpenAI's embedding model.


In [None]:
# Import the OpenAIEmbeddings class for generating embeddings
from langchain_openai import OpenAIEmbeddings
from google.colab import userdata

# Initialize the OpenAIEmbeddings model, explicitly passing the API key
embeddings = OpenAIEmbeddings(openai_api_key=userdata.get("OPENAI_API_KEY"))

In [None]:
# Generate embeddings for the text chunks
vector = embeddings.embed_documents(texts)
# Print the number of generated embeddings
print(f"Number of embeddings: {len(vector)}")

Number of embeddings: 499


In [None]:
# Import the Document class to represent text documents
from langchain_core.documents import Document
# Import the Chroma vector store class
from langchain_community.vectorstores import Chroma

# Convert the text chunks into Document objects
documents = [Document(page_content=text) for text in texts]
# Initialize the Chroma vector store from the documents and embeddings
vectorstore = Chroma.from_documents(documents, embeddings)

## Set up the retriever

Create a retriever from the Chroma database that can search for relevant text chunks based on a query.


In [None]:
# Create a retriever from the Chroma vector store
retriever = vectorstore.as_retriever()

## Set up the language model

Choose and initialize a language model (e.g., an OpenAI chat model) to answer questions based on the retrieved context.


In [None]:
# Import the ChatOpenAI class for the language model
from langchain_openai import ChatOpenAI

# Initialize the ChatOpenAI language model
llm = ChatOpenAI(openai_api_key=userdata.get("OPENAI_API_KEY"))

## Create the rag chain

Combine the retriever and the language model into a RAG chain that takes a question and returns an answer.


In [None]:
# Import necessary components for building the RAG chain
from langchain_core.runnables import RunnablePassthrough, RunnableSequence
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

# Define the prompt template for the RAG chain
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
# Create a chat prompt template from the defined template
prompt = ChatPromptTemplate.from_template(template)

# Construct the RAG chain
rag_chain = (
    # Pass the retrieved context and the question to the prompt
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt  # Apply the prompt
    | llm  # Pass the result to the language model
    | StrOutputParser()  # Parse the output as a string
)

## Test the rag chain

Ask a question to the RAG chain to test its functionality.


In [None]:
# Define the question to test the RAG chain
question = "What is the process for obtaining a learner's permit?"
# Invoke the RAG chain with the question and get the answer
answer = rag_chain.invoke(question)
# Print the answer
print(answer)

To obtain a learner's permit, the applicant must be at least 16 years of age and pass both a vision and a 25-question knowledge test. After holding the learner's permit for the required time and meeting the training requirements, the applicant can schedule a road test appointment. The applicant must also have a parent or legal guardian complete two hours of instruction concerning the laws governing drivers under age 18 and the dangers of teen driving. The permit will be valid until the driver obtains a driver's license or for 2 years from the date it is issued, whichever comes first.


# Improving and Evaluating the RAG Chain

This section explores different methods to enhance the performance and assess the effectiveness of the RAG chain. This includes:

- Experimenting with different text chunking strategies (size and overlap).
- Exploring alternative embedding models.
- Incorporating a re-ranking step to improve document relevance.
- Refining the prompt template to guide the language model's response.
- Implementing methods to evaluate the RAG chain's performance.

## Experiment with different chunk sizes and overlaps
Experiment with different chunk sizes and overlaps, generate embeddings, create a new vector store and retriever, construct and test the new RAG chain.


In [None]:
# 1. Create a new list of text chunks with different values
text_splitter_experiment = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=80)
texts_experiment = text_splitter_experiment.split_text(data)

# 2. Print the number of generated chunks and the first chunk
print(f"Number of chunks (experiment): {len(texts_experiment)}")
print("First chunk (experiment):")
print(texts_experiment[0])

# 3. Generate embeddings for these new text chunks
vector_experiment = embeddings.embed_documents(texts_experiment)

# 4. Create a new Chroma vector store
documents_experiment = [Document(page_content=text) for text in texts_experiment]
vectorstore_experiment = Chroma.from_documents(documents_experiment, embeddings)

# 5. Create a new retriever from this experimental vector store
retriever_experiment = vectorstore_experiment.as_retriever()

# 6. Construct a new RAG chain
rag_chain_experiment = (
    {"context": retriever_experiment, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# 7. Test the rag_chain_experiment
question = "What is the process for obtaining a learner's permit?"
answer_experiment = rag_chain_experiment.invoke(question)
print("\nAnswer from experimental RAG chain:")
print(answer_experiment)

Number of chunks (experiment): 306
First chunk (experiment):
Ned Lamont 
Governor
Sibongile Magubane 
Commissioner
ct.gov/dmv
facebook.com/CTDMVteens
@CTDMV
Driver’s  
Manual
State of Connecticut 
Department of Motor Vehicles

Answer from experimental RAG chain:
To obtain a learner's permit, the applicant must be at least 16 years old and must pass both a vision test and a 25-question knowledge test. Additionally, a favorable review of the applicant's condition(s) must be obtained prior to the issuance of the learner's permit. It is recommended to contact the DMV Driver Services Division at (860) 263-5720 in advance of making the application to avoid any delays. The learner's permit will be valid until the applicant obtains a driver's license or for 2 years from the date it is issued, whichever comes first.


## Incorporate a re-ranking step

Add a re-ranking step to the RAG pipeline to improve the relevance of retrieved documents.

In [None]:
# Import necessary components for re-ranking
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import DocumentCompressorPipeline
from langchain_community.document_transformers import EmbeddingsRedundantFilter, EmbeddingsClusteringFilter
from langchain_community.llms import OpenAI # Using OpenAI for demonstration, you can use other models
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

# Assuming 'embeddings' and 'retriever' (from the original Chroma setup) are available
# If you want to use the new_embeddings, replace 'embeddings' accordingly

# Create a redundant document filter
redundant_filter = EmbeddingsRedundantFilter(embeddings=embeddings)

# You could also use a clustering filter, but for simplicity, we'll stick to redundant filter
# clustering_filter = EmbeddingsClusteringFilter(embeddings=embeddings)

# Create a document compressor pipeline
pipeline = DocumentCompressorPipeline(transformers=[redundant_filter])

# Configure the base retriever to return fewer documents
# Reduced k from 2 to 1 to decrease context length
retriever.search_kwargs['k'] = 1

# Create a contextual compression retriever
compression_retriever = ContextualCompressionRetriever(
    base_compressor=pipeline, base_retriever=retriever
)

# Define the prompt template (same as before)
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

# Construct the RAG chain with re-ranking
rag_chain_rerank = (
    # Use the compression_retriever which includes the re-ranking step
    {"context": compression_retriever, "question": RunnablePassthrough()}
    | prompt  # Apply the prompt
    | llm  # Pass the result to the language model
    | StrOutputParser()  # Parse the output as a string
)

# Test the rag_chain_rerank
question = "What is the process for obtaining a learner's permit?"
answer_rerank = rag_chain_rerank.invoke(question)
print("\nAnswer from RAG chain with re-ranking:")
print(answer_rerank)


Answer from RAG chain with re-ranking:
To obtain a learner's permit, the individual must be at least 16 years of age, pass both a vision and a 25-question knowledge test, and contact the DMV Driver Services Division prior to making the application. The permit will be valid until the individual obtains a driver's license or 2 years from the date it is issued.


In [None]:
# Evaluate the RAG chain with the test cases
print("Evaluating the RAG chain:")
for case in test_cases:
    question = case["question"]
    expected_answer = case["expected_answer"]
    # Using the rag_chain with re-ranking for evaluation as it's the latest improved chain
    generated_answer = rag_chain_rerank.invoke(question)

    print(f"\nQuestion: {question}")
    print(f"Expected Answer: {expected_answer}")
    print(f"Generated Answer: {generated_answer}")

    # You can add more sophisticated evaluation metrics here,
    # such as comparing similarity between generated and expected answers.
    # For simplicity, we'll just print and visually inspect for now.

Evaluating the RAG chain:

Question: What is the process for obtaining a learner's permit?
Expected Answer: To obtain a learner's permit, the applicant must be at least 16 years of age and pass both a vision and a 25-question knowledge test. After holding the learner's permit for the required time and meeting the training requirements, the applicant can schedule a road test appointment. The applicant must also have a parent or legal guardian complete two hours of instruction concerning the laws governing drivers under age 18 and the dangers of teen driving. The permit will be valid until the driver obtains a driver's license or for 2 years from the date it is issued, whichever comes first.
Generated Answer: To obtain a learner's permit, the applicant must be at least 16 years of age and pass both a vision and a 25-question knowledge test. The applicant or applicant's parent or legal guardian should contact the DMV Driver Services Division at (860) 263-5720 as far in advance of making t

## Evaluate the RAG chain
To evaluate the RAG chain, I will define a set of test questions and their expected answers. Then, I will run the RAG chain with these questions and compare the generated answers to the expected answers. This will provide a basic assessment of the chain's performance.

In [None]:
# Define a set of test questions and expected answers
test_cases = [
    {
        "question": "What is the process for obtaining a learner's permit?",
        "expected_answer": "To obtain a learner's permit, the applicant must be at least 16 years of age and pass both a vision and a 25-question knowledge test. After holding the learner's permit for the required time and meeting the training requirements, the applicant can schedule a road test appointment. The applicant must also have a parent or legal guardian complete two hours of instruction concerning the laws governing drivers under age 18 and the dangers of teen driving. The permit will be valid until the driver obtains a driver's license or for 2 years from the date it is issued, whichever comes first."
    },
    {
        "question": "What are the restrictions for a 16 or 17 year old with a learner's permit?",
        "expected_answer": "A 16 or 17 year old with a learner's permit cannot operate a motor vehicle between the hours of 11 p.m. and 5 a.m. unless for employment, school, religious activities, or medical necessity. They also cannot transport more than one passenger other than a parent, legal guardian, or a licensed driving instructor, unless the passenger is a sibling."
    }
    # Add more test cases as needed
]

# Evaluate the RAG chain with the test cases
print("Evaluating the RAG chain:")
for case in test_cases:
    question = case["question"]
    expected_answer = case["expected_answer"]
    # Using the rag_chain with re-ranking for evaluation as it's the latest improved chain
    generated_answer = rag_chain_rerank.invoke(question)

    print(f"\nQuestion: {question}")
    print(f"Expected Answer: {expected_answer}")
    print(f"Generated Answer: {generated_answer}")

    # You can add more sophisticated evaluation metrics here,
    # such as comparing similarity between generated and expected answers.
    # For simplicity, we'll just print and visually inspect for now.

Evaluating the RAG chain:

Question: What is the process for obtaining a learner's permit?
Expected Answer: To obtain a learner's permit, the applicant must be at least 16 years of age and pass both a vision and a 25-question knowledge test. After holding the learner's permit for the required time and meeting the training requirements, the applicant can schedule a road test appointment. The applicant must also have a parent or legal guardian complete two hours of instruction concerning the laws governing drivers under age 18 and the dangers of teen driving. The permit will be valid until the driver obtains a driver's license or for 2 years from the date it is issued, whichever comes first.
Generated Answer: The process for obtaining a learner's permit includes being at least 16 years of age, passing both a vision and a 25-question knowledge test, and undergoing a review of the applicant's condition(s). A favorable review must be obtained prior to issuance of the learner’s permit. The a

## Refine the prompt template

To refine the prompt template, I will define a new template with clearer instructions or a different structure. I will then create a new RAG chain using this refined prompt and test it to see if it yields better results.

In [None]:
# Import necessary components for building the RAG chain
from langchain_core.runnables import RunnablePassthrough, RunnableSequence
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

# Define a refined prompt template
# This template encourages the model to be more concise and directly answer the question
refined_template = """Using the following context, answer the question concisely and directly.
If the answer is not in the context, state that you don't know.

Context: {context}

Question: {question}
"""
# Create a chat prompt template from the refined template
refined_prompt = ChatPromptTemplate.from_template(refined_template)

# Construct a new RAG chain with the refined prompt
# We'll use the original retriever and language model for this example
rag_chain_refined_prompt = (
    {"context": retriever, "question": RunnablePassthrough()}
    | refined_prompt  # Apply the refined prompt
    | llm  # Pass the result to the language model
    | StrOutputParser()  # Parse the output as a string
)

# Test the rag_chain_refined_prompt
question = "What is the process for obtaining a learner's permit?"
answer_refined_prompt = rag_chain_refined_prompt.invoke(question)
print("\nAnswer from RAG chain with refined prompt:")
print(answer_refined_prompt)


Answer from RAG chain with refined prompt:
To obtain a learner's permit, you must be at least 16 years of age, pass both a vision and a 25-question knowledge test, and undergo a favorable review of your condition(s) prior to issuance. You can schedule your knowledge test and make payment online by visiting CT.GOV/DMV.


In [None]:
# Import the ChatOpenAI class for the language model
from langchain_openai import ChatOpenAI
from google.colab import userdata

# Initialize the ChatOpenAI language model
llm = ChatOpenAI(openai_api_key=userdata.get("OPENAI_API_KEY"))

In [None]:
# Import necessary components for building the RAG chain
from langchain_core.runnables import RunnablePassthrough, RunnableSequence
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

# Define the prompt template for the RAG chain
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
# Create a chat prompt template from the defined template
prompt = ChatPromptTemplate.from_template(template)

## Explore different embedding models

To explore a different embedding model, I need to import the chosen model and initialize it. I will then use this new embedding model to create a new vector store and retriever, and finally test the RAG chain with the new embeddings.



In [None]:
# Import a different embedding model, for example, HuggingFaceEmbeddings
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_core.documents import Document
from langchain_community.vectorstores import Chroma
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser


# Initialize the new embedding model
# You can specify a different model name from the Hugging Face Model Hub
# For example, "sentence-transformers/all-MiniLM-L6-v2" is a commonly used model
new_embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# Generate embeddings for the text chunks using the new model
# vector_new = new_embeddings.embed_documents(texts) # This line is not needed for Chroma.from_documents

# Create a new Chroma vector store with the new embeddings
# Make sure to use a different collection name if you want to keep the old one
vectorstore_new = Chroma.from_documents(documents, new_embeddings, collection_name="rag_experiment_huggingface")


# Create a new retriever from this new vector store
retriever_new = vectorstore_new.as_retriever()

# Construct a new RAG chain with the new retriever
# Assuming 'prompt' and 'llm' are already defined
rag_chain_new_embeddings = (
    {"context": retriever_new, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Test the rag_chain_new_embeddings
question = "What are the restrictions for a 16 or 17 year old with a learner's permit?"
answer_new_embeddings = rag_chain_new_embeddings.invoke(question)
print("\nAnswer from RAG chain with new embeddings:")
print(answer_new_embeddings)


Answer from RAG chain with new embeddings:
The restrictions for a 16 or 17 year old with a learner's permit include passenger restrictions.
