# Naive RAG evaluation with RAGAs

In this notebook, we will develop a naive RAG using langchain and we will make its evaluation using the RAGAs framework.

### Install dependencies

In [16]:
!pip -q install langchain langchain-groq faiss-cpu sentence_transformers pypdf
!pip -q install ragas

### Setup API keys

You must create a Groq api key from [here](https://wow.groq.com/)

In [17]:
import os

os.environ['GROQ_API_KEY'] = "gsk_xxxxxxxxxxxxxx"

### Load data

All our PDF files are stored in the "/data" folder, we use the DirectoryLoader to load them.

Because the loaded documents are too big to fit in a single context, we'll split them into smaller text chunks with RecursiveCharacterTextSplitter.

A critical parameter that you must tune to get a good RAG performance is the chunk_size (set to 1024 below).

* While a large chunk size could give more context, it will introduce more noise and require more computational costs, and could also result in “Lost in the Middle” effect (when LLM context is too long it will forget what's in the middle and concentrate on the beginning and the end).

* On the other hand, a smaller chunk size will have less noise to the model but may not provide the full context necessary for the answer.

So we must pick the perfect size between both extremes.

The chunk_overlap paramter is used to keep some sort of continuity in the text chunks, and it can also be tuned.

In [18]:
from langchain_community.document_loaders import DirectoryLoader, PyPDFLoader

loader = DirectoryLoader(
    "./data",
    loader_cls=PyPDFLoader,
    show_progress=True
)

documents = loader.load()
print(f"Loaded {len(documents)} docs")

100%|██████████| 1/1 [00:00<00:00,  1.08it/s]

Loaded 21 docs





In [19]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=128)
text_chunks = text_splitter.split_documents(documents)

In [20]:
from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-small-en-v1.5")

In [21]:
from langchain_community.vectorstores import FAISS

vectorstore = FAISS.from_documents(text_chunks, embeddings)

In [22]:
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 4})

In [23]:
from langchain_core.prompts import ChatPromptTemplate

template = """You are a helpful assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question.
If you don't know the answer, just say that I don't know.
Question: {question}
Context: {context}
"""
prompt = ChatPromptTemplate.from_template(template)

In [24]:
from langchain_groq import ChatGroq
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser

llm = ChatGroq(model="llama3-70b-8192", api_key=os.getenv("GROQ_API_KEY"))

# build retrieval chain using LCEL
# this will take the user query and generate the answer
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [25]:
result = rag_chain.invoke("How to get your first customers?")
print(result)

Based on the provided context, to get your first customers, you can follow these steps:

1. Create a landing page: Buy a domain name and build a basic web page with a form for users to provide their contact information, social media links, and your logo.

2. Reach out to your immediate network: Start selling to your friends, family, and colleagues.

Additionally, the context suggests that you should focus on getting customers rather than spending too much time on building a website or designing a logo. It's more important to start selling and getting customers early on.

Note that the context does not provide a comprehensive guide to getting your first customers, but these two steps can be a good starting point.


In [63]:
result = rag_chain.invoke("How to increase your sales efforts?")
print(result)

Based on the provided context, here are some tips to increase your sales efforts:

1. Identify your authentic competitive advantage and leverage that to pitch your product.
2. Start selling immediately, don't spend too much time planning. Reach out to at least 10 people in the next 24 hours to practice taking action.
3. Increase your online searchability by generating valuable content.
4. Hustle and reach out to as many qualified prospects as possible, pitch your service, and close deals.
5. Create a sales script and establish a conversion funnel to optimize your sales process.
6. Iterate and refine your sales process by segmenting your sales team and empowering them to tweak or overhaul the process.
7. Analyze your customer acquisition costs (CAC) and make informed decisions to optimize your existing marketing channels.

Remember, the key is to take action, start selling, and continuously refine your sales process to increase your sales efforts.


# RAG evaluation with RAGAs

RAGAs is framework that allow to evaluate the performance of LLM applications and particularly RAG pipelines, it allow the generation of test data from the original RAG knowldge documents which consists of constructing a dataset with the following features:

* questions: user queries which are given as input to the RAG pipeline.

* answers: the output generated by the RAG pipeline

* context: the documents retrieved from the vector store and used to answer the question

* ground of truth: the real/correct answer to the question (can be human-annotated information)

Using the generated testset, RAGAs allow us to evaluate the performance of the our RAG pipeline, the evaluation could on a component-wise level meaning on the retrieval or generation component individually or it can be data the full pipeline (retrieval+generation).

There are many metrics that can be used in the evaluation, the most commun ones are:

### For retrieval:

- context_precision: evaluate whether items in the retrieved context are relevant to the given question.

- context_recall: checks whether relevant documents related to the question were not retrieved, this is the only metrics that needs ground of truth.

### For generation:

- Faithfullness: measures the quality of the RAG pipeline's responses by evaluating whether the response factually aligns with the contents of the provided context.

- Answer relevancy: measures how relevant the RAG output response is compared to the provided question.

### full pipeline:

- Semantic Answer Similarity: evaluates predicted answers using ground truth labels. It checks the semantic similarity of the predicted answer against the ground truth answer.

## Generating Testset

We could considere generating the test data synthetically using the RAGAs TestsetGenerator or we can create manually the question-answer data and use them to generate the dataset.

### Manual method

We did already create the qa data and stored in "evaluation.txt" file.

In [29]:
# Initialize empty lists to store questions and correct answers (ground_truths)
questions = []
ground_truths = []

# Open the file for reading
with open("data/evaluation.txt", "r") as file:
    # Read each line from the file
    lines = file.readlines()

    # Loop through the lines
    i = 0
    while i < len(lines):
        # Check if the line starts with "Question:"
        if lines[i].startswith("Question:"):
            # Extract the question and answer
            question = lines[i][10:].strip()
            answer = lines[i+1][7:].strip()
            # Append to lists
            questions.append(question)
            ground_truths.append(answer)
            # Move to the next question
            i += 2
        else:
            i += 1

In [30]:
print("Evaluation lenght:", len(questions))
print("Question:", questions[0])
print("Answer:", ground_truths[0])

Evaluation lenght: 40
Question: What is the primary focus for startups in their early stages?
Answer: Startups should prioritize acquiring customers to validate market demand and generate revenue.


In [32]:
from datasets import Dataset

answers = []
contexts = []

# Inference
for query in questions:
  answers.append(rag_chain.invoke(query))
  contexts.append([docs.page_content for docs in retriever.get_relevant_documents(query)])

In [33]:
# To dict
data = {
    "question": questions,
    "answer": answers,
    "contexts": contexts,
    "ground_truth": ground_truths
}

# Convert dict to dataset
dataset = Dataset.from_dict(data)

### RAGAs generator

In [None]:
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# add file name metadata to our documents
for document in documents:
    document.metadata['filename'] = document.metadata['source']

# generator will use groq llama3-70b & BAAI/bge-small-en-v1.5 embeddings
generator = TestsetGenerator.from_langchain(
    generator_llm=llm,
    critic_llm=llm,
    embeddings=embeddings
)

# generate testset
testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})

## Evaluation

In [None]:
from ragas import evaluate
from ragas.metrics import (
    faithfulness,
    answer_relevancy,
    context_recall,
    context_precision,
)

result = evaluate(
    llm=llm,
    embeddings=embeddings,
    dataset = dataset,
    metrics=[
        context_precision,
        context_recall,
        faithfulness,
        answer_relevancy,
    ],
)

df = result.to_pandas()

In [96]:
df.head()

Unnamed: 0,question,answer,contexts,ground_truth,context_precision,context_recall,faithfulness,answer_relevancy
0,What is the primary focus for startups in thei...,"Based on the provided context, the primary foc...",[They focus primarily on the analytics featur...,Startups should prioritize acquiring customers...,0.25,1.0,1.0,0.0
1,What's the first step suggested to land the in...,"According to the provided context, the first s...",[Milestone 1: The first 10 customers\nIf you d...,The first step recommended is to create a basi...,1.0,1.0,1.0,0.986531
2,How can one leverage their immediate network t...,To leverage your immediate network to find cus...,[You can add to your site later; right now all...,Entrepreneurs can leverage personal connection...,1.0,1.0,1.0,0.994681
