# Evaluation Of RAG using Ragas

We will follow these steps to evaluate RAG;
- Data laoding and pre-processing
- Generating synthetic test data
- Building RAG
- Evaluating RAG using Test data

## Data laoding and pre-processing

Document Preprocessing

In [10]:
from langchain_community.document_loaders import PDFPlumberLoader

file_path = 'data/Human-Rights.pdf'

# Create document loader
loader = PDFPlumberLoader(file_path)
# Load documents
docs = loader.load()

# Get the number of document pages
len(docs)

8

## Generating synthetic test data

We will be using OpenAI LLM and embedding model to generate test data

In [None]:
import os
os.environ["OPENAI_API_KEY"] = ""  # replace with your API key

The LLM and embedding model need to be wrapped in `LangchainLLMWrapper` so that they can be used with Ragas.

In [12]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings

generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o-mini"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

  from .autonotebook import tqdm as notebook_tqdm


Create a test set generator and generate test dataset

In [13]:
from ragas.testset import TestsetGenerator

generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)
dataset = generator.generate_with_langchain_docs(docs, testset_size=10)

0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off
0.00s - to python to disable frozen modules.
0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.
Generating personas: 100%|██████████| 3/3 [00:01<00:00,  1.99it/s]                                           
Generating Scenarios: 100%|██████████| 2/2 [00:06<00:00,  3.27s/it]
Generating Samples: 100%|██████████| 10/10 [00:02<00:00,  3.71it/s]


The generated test dataset consists of columns: `user_input`(query), `reference_contexts`(reference chunks for comparing), `reference`(response) and `synthesizer_name`(type of generated query). For more details refer the Ragas documentation.

In [14]:
dataset.to_pandas()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,Why is the United Nations important for human ...,[Universal Declaration of Human Rights\nPreamb...,The United Nations is important for human righ...,single_hop_specifc_query_synthesizer
1,What does Article 4 say about slavery and serv...,[teaching and education to promote respect for...,Article 4 states that no one shall be held in ...,single_hop_specifc_query_synthesizer
2,What rights are guaranteed under Article 15 re...,[penalty be imposed than the one that was appl...,Article 15 guarantees that everyone has the ri...,single_hop_specifc_query_synthesizer
3,Wht r the rights outlined in Article 24?,[1. Everyone has the right to take part in the...,Article 24 states that everyone has the right ...,single_hop_specifc_query_synthesizer
4,What rights are guaranteed under Article 25?,[Article 25\n1. Everyone has the right to a st...,Article 25 guarantees everyone the right to a ...,single_hop_specifc_query_synthesizer
5,What rights are guaranteed under Article 5 and...,[<1-hop>\n\n1. Everyone has the right to take ...,Article 5 guarantees that no one shall be subj...,multi_hop_specific_query_synthesizer
6,How do Articles 22 and 27 of the human rights ...,[<1-hop>\n\nArticle 25\n1. Everyone has the ri...,"Article 22 ensures that everyone, as a member ...",multi_hop_specific_query_synthesizer
7,How do Articles 12 and 25 of the human rights ...,[<1-hop>\n\nArticle 25\n1. Everyone has the ri...,Article 12 ensures that no one is subjected to...,multi_hop_specific_query_synthesizer
8,What rights are protected under Article 4 rega...,[<1-hop>\n\nArticle 25\n1. Everyone has the ri...,Article 4 states that no one shall be held in ...,multi_hop_specific_query_synthesizer
9,What are the rights guaranteed under Article I...,[<1-hop>\n\npenalty be imposed than the one th...,Article I guarantees that all human beings are...,multi_hop_specific_query_synthesizer


Save created test dataset to disk

In [15]:
dataset.to_pandas().to_csv("data/ragas_synthetic_dataset.csv", index=False)

## Building RAG

In [17]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.prompts import PromptTemplate
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# Step 1: Load Documents
loader = PyMuPDFLoader("data/Human-Rights.pdf")
docs = loader.load()

# Step 2: Split Documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
split_documents = text_splitter.split_documents(docs)

# Step 3: Create Embeddings
embeddings = OpenAIEmbeddings()

# Step 4: Create DB and Save
vectorstore = FAISS.from_documents(documents=split_documents, embedding=embeddings)

# Step 5: Create Retriever
retriever = vectorstore.as_retriever()

# Step 6: Create Prompt
prompt = PromptTemplate.from_template(
    """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the question. 
If you don't know the answer, just say that you don't know. 

#Context: 
{context}

#Question:
{question}

#Answer:"""
)

# Step 7: Create LLM
llm = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

# Step 8: Create Chain
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

## Evaluating RAG using Test data

Load the Ragas generated test dataset that we saved in the previous step.


In [18]:
import pandas as pd
df = pd.read_csv("data/ragas_synthetic_dataset.csv")
df.head()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,Why is the United Nations important for human ...,['Universal Declaration of Human Rights\nPream...,The United Nations is important for human righ...,single_hop_specifc_query_synthesizer
1,What does Article 4 say about slavery and serv...,['teaching and education to promote respect fo...,Article 4 states that no one shall be held in ...,single_hop_specifc_query_synthesizer
2,What rights are guaranteed under Article 15 re...,['penalty be imposed than the one that was app...,Article 15 guarantees that everyone has the ri...,single_hop_specifc_query_synthesizer
3,Wht r the rights outlined in Article 24?,['1. Everyone has the right to take part in th...,Article 24 states that everyone has the right ...,single_hop_specifc_query_synthesizer
4,What rights are guaranteed under Article 25?,['Article 25\n1. Everyone has the right to a s...,Article 25 guarantees everyone the right to a ...,single_hop_specifc_query_synthesizer


In [51]:
from datasets import Dataset
test_dataset = Dataset.from_pandas(df)
test_dataset

Dataset({
    features: ['user_input', 'reference_contexts', 'reference', 'synthesizer_name'],
    num_rows: 10
})

In [52]:
test_dataset[1]['reference_contexts']

"['teaching and education to promote respect for these rights and freedoms and by\\nprogressive measures, national and international, to secure their universal and\\neffective recognition and observance, both among the peoples of Member States\\nthemselves and among the peoples of territories under their jurisdiction.\\nArticle I\\nAll human beings are born free and equal in dignity and rights. They are\\nendowed with reason and conscience and should act towards one another in a\\nspirit of brotherhood.\\nArticle 2\\nEveryone is entitled to all the rights and freedoms set forth in this Declaration,\\nwithout distinction of any kind, such as race, colour, sex, language, religion,\\npolitical or other opinion, national or social origin, property, birth or other status.\\nFurthermore, no distinction shall be made on the basis of the political,\\njurisdictional or international status of the country or territory to which a person\\nbelongs, whether it be independent, trust, non-self-govern

As seen above the `reference_contexts` column items are strings containg list so these need to be converted from strings to list for further processing

In [53]:
import ast

# Convert contexts column from string to list
def convert_to_list(data):
    contexts = ast.literal_eval(data["reference_contexts"])
    return {"reference_contexts": contexts}

test_dataset = test_dataset.map(convert_to_list)
print(test_dataset)

Map: 100%|██████████| 10/10 [00:00<00:00, 1077.45 examples/s]

Dataset({
    features: ['user_input', 'reference_contexts', 'reference', 'synthesizer_name'],
    num_rows: 10
})





In [54]:
test_dataset[1]["reference_contexts"]

['teaching and education to promote respect for these rights and freedoms and by\nprogressive measures, national and international, to secure their universal and\neffective recognition and observance, both among the peoples of Member States\nthemselves and among the peoples of territories under their jurisdiction.\nArticle I\nAll human beings are born free and equal in dignity and rights. They are\nendowed with reason and conscience and should act towards one another in a\nspirit of brotherhood.\nArticle 2\nEveryone is entitled to all the rights and freedoms set forth in this Declaration,\nwithout distinction of any kind, such as race, colour, sex, language, religion,\npolitical or other opinion, national or social origin, property, birth or other status.\nFurthermore, no distinction shall be made on the basis of the political,\njurisdictional or international status of the country or territory to which a person\nbelongs, whether it be independent, trust, non-self-governing or under an

### Generate response using RAG
Now we will input the test queries in to our RAG app. Instead of passing single query we will create a batch of quereies and pass it to RAG chain to get set of answers one each for every query.

Batch dataset is useful when you want to process a large number of questions at once. We will create a batch dataset by assigning the questions to `batch_dataset`.

In [42]:
batch_dataset = [question for question in test_dataset["user_input"]]
batch_dataset[:5]

['Why is the United Nations important for human rights and what does it say about the rights of all people?',
 'What does Article 4 say about slavery and servitude and how it relates to human rights?',
 'What rights are guaranteed under Article 15 regarding nationality?',
 'Wht r the rights outlined in Article 24?',
 'What rights are guaranteed under Article 25?']

Call `batch()` to get answers for the batch dataset ( `batch_dataset` ).

In [55]:
answer = chain.batch(batch_dataset)
answer[:5]

['The United Nations is important for human rights because it serves as a platform for promoting universal respect for and observance of human rights and fundamental freedoms. The Universal Declaration of Human Rights, proclaimed by the General Assembly, establishes a common standard of achievement for all peoples and nations. It emphasizes that all human beings are born free and equal in dignity and rights, and that everyone is entitled to all the rights and freedoms set forth in the Declaration without any distinction of any kind. This underscores the importance of protecting human rights through the rule of law and fostering friendly relations between nations, which are essential for freedom, justice, and peace in the world.',
 'Article 4 states that no one shall be held in slavery or servitude, and it prohibits slavery and the slave trade in all their forms. This article emphasizes the importance of human rights by asserting that individuals should not be subjected to any form of s

Store the answers generated by the LLM in a dataset column `answer`.

In [56]:
# Overwrite or add 'answer' column
if "answer" in test_dataset.column_names:
    test_dataset = test_dataset.remove_columns(["answer"]).add_column("answer", answer)
else:
    test_dataset = test_dataset.add_column("answer", answer)

In [57]:
test_dataset.to_pandas().head()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name,answer
0,Why is the United Nations important for human ...,[Universal Declaration of Human Rights\nPreamb...,The United Nations is important for human righ...,single_hop_specifc_query_synthesizer,The United Nations is important for human righ...
1,What does Article 4 say about slavery and serv...,[teaching and education to promote respect for...,Article 4 states that no one shall be held in ...,single_hop_specifc_query_synthesizer,Article 4 states that no one shall be held in ...
2,What rights are guaranteed under Article 15 re...,[penalty be imposed than the one that was appl...,Article 15 guarantees that everyone has the ri...,single_hop_specifc_query_synthesizer,Article 15 guarantees that everyone has the ri...
3,Wht r the rights outlined in Article 24?,[1. Everyone has the right to take part in the...,Article 24 states that everyone has the right ...,single_hop_specifc_query_synthesizer,Article 24 outlines the following rights:\n\n1...
4,What rights are guaranteed under Article 25?,[Article 25\n1. Everyone has the right to a st...,Article 25 guarantees everyone the right to a ...,single_hop_specifc_query_synthesizer,Article 25 guarantees the following rights:\n\...


Evaluate the test dataset using Ragas metrics

In [58]:
from ragas import evaluate
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_recall,
    context_precision,
)

# Format dataset structure
formatted_dataset = []
for item in test_dataset:
    formatted_item = {
        "question": item["user_input"],
        "answer": item["answer"],
        "reference": item["answer"],
        "contexts": item["reference_contexts"],
        "retrieved_contexts": item["reference_contexts"],
    }
    formatted_dataset.append(formatted_item)

# Convert to RAGAS dataset
ragas_dataset = Dataset.from_list(formatted_dataset)

result = evaluate(
    dataset=ragas_dataset,
    metrics=[
        context_precision,
        faithfulness,
        answer_relevancy,
        context_recall,
    ],
)

result

Evaluating: 100%|██████████| 40/40 [00:24<00:00,  1.65it/s]


{'context_precision': 0.8000, 'faithfulness': 0.8433, 'answer_relevancy': 0.9459, 'context_recall': 0.8050}

In [59]:
result_df = result.to_pandas()
result_df.head()

Unnamed: 0,user_input,retrieved_contexts,response,reference,context_precision,faithfulness,answer_relevancy,context_recall
0,Why is the United Nations important for human ...,[Universal Declaration of Human Rights\nPreamb...,The United Nations is important for human righ...,The United Nations is important for human righ...,1.0,0.75,0.95049,0.75
1,What does Article 4 say about slavery and serv...,[teaching and education to promote respect for...,Article 4 states that no one shall be held in ...,Article 4 states that no one shall be held in ...,1.0,0.6,0.958473,0.5
2,What rights are guaranteed under Article 15 re...,[penalty be imposed than the one that was appl...,Article 15 guarantees that everyone has the ri...,Article 15 guarantees that everyone has the ri...,1.0,1.0,0.935646,1.0
3,Wht r the rights outlined in Article 24?,[1. Everyone has the right to take part in the...,Article 24 outlines the following rights:\n\n1...,Article 24 outlines the following rights:\n\n1...,1.0,0.75,0.980299,1.0
4,What rights are guaranteed under Article 25?,[Article 25\n1. Everyone has the right to a st...,Article 25 guarantees the following rights:\n\...,Article 25 guarantees the following rights:\n\...,1.0,1.0,0.998046,1.0


Extract results of evaluation

In [60]:
result_df.loc[:, "context_precision":"context_recall"]

Unnamed: 0,context_precision,faithfulness,answer_relevancy,context_recall
0,1.0,0.75,0.95049,0.75
1,1.0,0.6,0.958473,0.5
2,1.0,1.0,0.935646,1.0
3,1.0,0.75,0.980299,1.0
4,1.0,1.0,0.998046,1.0
5,1.0,1.0,0.901518,0.6
6,0.5,0.666667,0.959502,0.6
7,0.0,1.0,0.945045,1.0
8,0.5,0.666667,0.915635,0.6
9,1.0,1.0,0.914644,1.0


Store results in a file

In [61]:
result_df.to_csv("data/ragas_evaluation_result.csv", index=False)