[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mongodb-developer/GenAI-Showcase/blob/main/notebooks/evals/ragas-evaluation.ipynb)

# RAG Series Part 2: How to evaluate your RAG application

This notebook shows how to evaluate a RAG application using the [RAGAS](https://docs.ragas.io/en/stable/index.html) framework.

## Step 1: Install required libraries

In [42]:
! pip install -qU ragas datasets pandas langchain langchain-mongodb pymongo tqdm langchain-anthropic

## Step 2: Setup pre-requisites

* Set the MongoDB connection string. Follow the steps [here](https://www.mongodb.com/docs/manual/reference/connection-string/) to get the connection string from the Atlas UI.

* Set the OpenAI API key. Steps to obtain an API key as [here](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key)

In [2]:
import os
import getpass
from openai import OpenAI

In [3]:
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")
openai_client = OpenAI()

Enter your OpenAI API Key:········


In [4]:
MONGODB_URI = getpass.getpass("Enter your MongoDB connection string:")

Enter your MongoDB connection string:········


In [46]:
os.environ["ANTHROPIC_API_KEY"] = getpass.getpass("Enter your Anthropic API Key:")

Enter your Anthropic API Key:········


## Step 3: Download the Hugging Face dataset

In [5]:
from datasets import load_dataset
import pandas as pd

In [6]:
data = load_dataset("explodinggradients/ragas-wikiqa", split="train", streaming=True)
data_head = data.take(50)
df = pd.DataFrame(data_head)

In [7]:
df.head(1)

Unnamed: 0,question,correct_answer,incorrect_answer,question_id,generated_with_rag,context,generated_without_rag
0,HOW AFRICAN AMERICANS WERE IMMIGRATED TO THE US,"As such, African immigrants are to be distingu...",From the Immigration and Nationality Act of 19...,Q0,\nAfrican Americans were immigrated to the Uni...,[African immigration to the United States refe...,African Americans were immigrated to the US in...


## Step 4: Chunk up the documents

In [8]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [9]:
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    encoding_name="cl100k_base",
    keep_separator=False,
    chunk_size=200,
    chunk_overlap=30)

In [10]:
def split_texts(texts):
    chunked_texts = []
    for text in texts:
        chunks = text_splitter.create_documents([text])
        chunked_texts.extend([chunk.page_content for chunk in chunks]) 
    return chunked_texts

In [11]:
df["chunks"] = df["context"].apply(lambda x: split_texts(x))

In [12]:
all_chunks = df["chunks"].tolist()
docs = []
for chunk in all_chunks:
    docs.extend(chunk)

In [13]:
len(docs)

778

In [14]:
docs[100]

'Figgis had problems because permits were not issued for some street scenes. This caused him to film some scenes on the Las Vegas strip in one take to avoid the police, which Figgis said benefited production and the authenticity of the acting, remarking "I\'ve always hated the convention of shooting on a street, and then having to stop the traffic, and then having to tell the actors, \'Well, there\'s meant to be traffic here, so you\'re going to have to shout.\' And they\'re shouting, but it\'s quiet and they feel really stupid, because it\'s unnatural. You put them up against a couple of trucks, with it all happening around them, and their voices become great". Filming took place over 28 days.'

## Step 5: Create embeddings and ingest them into MongoDB

In [15]:
from typing import List
from pymongo import MongoClient
from tqdm.auto import tqdm

In [16]:
def get_embeddings(
    docs: List[str], model: str = "text-embedding-3-large"
) -> List[List[float]]:
    """
    Generate embeddings using the OpenAI API.

    Args:
        docs (List[str]): List of texts to embed
        model (str, optional): Model name. Defaults to "text-embedding-3-large".

    Returns:
        List[float]: Array of embeddings
    """
    # replace newlines, which can negatively affect performance.
    docs = [doc.replace("\n", " ") for doc in docs]
    response = openai_client.embeddings.create(input=docs, model=model)
    response = [r.embedding for r in response.data]
    return response

In [17]:
client = MongoClient(MONGODB_URI)
DB_NAME = "ragas_evals"
db = client[DB_NAME]

In [18]:
batch_size = 128

In [19]:
EVAL_EMBEDDING_MODELS = ["text-embedding-ada-002", "text-embedding-3-small"]

In [20]:
for model in EVAL_EMBEDDING_MODELS:
    embedded_docs = []
    print(f"Getting embeddings for the {model} model")
    for i in tqdm(range(0, len(docs), batch_size)):
        end = min(len(docs), i + batch_size)
        batch = docs[i:end]
        # Generate embeddings for current batch
        batch_embeddings = get_embeddings(batch, model)
        batch_embedded_docs = [{"text": batch[i], "embedding": batch_embeddings[i]} for i in range(len(batch))]
        embedded_docs.extend(batch_embedded_docs)
    print(f"Finished getting embeddings for the {model} model")
    
    print(f"Inserting embeddings for the {model} model")
    collection = db[model]
    collection.delete_many({})
    collection.insert_many(embedded_docs)
    print(f"Finished inserting embeddings for the {model} model")

Getting embeddings for the text-embedding-ada-002 model


  0%|          | 0/7 [00:00<?, ?it/s]

Finished getting embeddings for the text-embedding-ada-002 model
Inserting embeddings for the text-embedding-ada-002 model
Finished inserting embeddings for the text-embedding-ada-002 model
Getting embeddings for the text-embedding-3-small model


  0%|          | 0/7 [00:00<?, ?it/s]

Finished getting embeddings for the text-embedding-3-small model
Inserting embeddings for the text-embedding-3-small model
Finished inserting embeddings for the text-embedding-3-small model


## Step 6: Compare Embeddings for the Retriever

In [21]:
from langchain_openai import OpenAIEmbeddings
from langchain_mongodb import MongoDBAtlasVectorSearch
from datasets import Dataset
from ragas import evaluate
from ragas.metrics import context_precision, context_recall
import nest_asyncio

nest_asyncio.apply()

In [26]:
def get_retriever(model):
    embeddings = OpenAIEmbeddings(model=model)
    
    vector_store = MongoDBAtlasVectorSearch.from_connection_string(
    connection_string=MONGODB_URI,
    namespace=f"{DB_NAME}.{model}",
    embedding= embeddings,
    index_name="vector_index",
    text_key="text"
    )
    
    retriever = vector_store.as_retriever(search_type="similarity", search_kwargs={"k": 5})
    return retriever

In [23]:
QUESTIONS = df["question"].to_list()
GROUND_TRUTH = df["correct_answer"].tolist()

In [28]:
for model in EVAL_EMBEDDING_MODELS:
    data = {"question": [], "ground_truth": [], "contexts": []}
    data["question"] = QUESTIONS
    data["ground_truth"] = GROUND_TRUTH

    retriever = get_retriever(model)
    for i in tqdm(range(0, len(QUESTIONS))):
        data["contexts"].append([doc.page_content for doc in retriever.get_relevant_documents(QUESTIONS[i])])
        
    dataset = Dataset.from_dict(data)
    
    result= evaluate(dataset=dataset, metrics=[context_precision, context_recall])
    print(f"Result for the {model} model: {result}")

  0%|          | 0/50 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/100 [00:00<?, ?it/s]

Result for the text-embedding-ada-002 model: {'context_precision': 0.8580, 'context_recall': 0.9050}


  0%|          | 0/50 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/100 [00:00<?, ?it/s]

Result for the text-embedding-3-small model: {'context_precision': 0.8586, 'context_recall': 0.9217}


## Step 7: Compare Completion Models for the Generator

In [44]:
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from ragas.metrics import faithfulness, answer_relevancy

In [33]:
def get_rag_chain(model):
    # Generate context using the retriever, and pass the user question through
    retrieve = {"context": get_retriever("text-embedding-3-small") | (lambda docs: "\n\n".join([d.page_content for d in docs])), "question": RunnablePassthrough()}
    template = """Answer the question based only on the following context: \
    {context}

    Question: {question}
    """
    # Defining the chat prompt
    prompt = ChatPromptTemplate.from_template(template)
    # Defining the model to be used for chat completion
    if model == "openai":
        llm = ChatOpenAI(temperature=0)
    elif model == "anthropic":
        llm = ChatAnthropic(temperature=0, anthropic_api_key="YOUR_API_KEY", model_name="claude-3-sonnet-20240229")
    # Parse output as a string
    parse_output = StrOutputParser()

    # Naive RAG chain 
    rag_chain = (
        retrieve
        | prompt
        | llm
        | parse_output
    )
    return rag_chain

In [None]:
for model in ["anthropic", "openai"]:
    data = {"question": [], "ground_truth": [], "contexts": [], "answer": []}
    data["question"] = QUESTIONS
    data["ground_truth"] = GROUND_TRUTH

    retriever = get_retriever("text-embedding-3-small")
    rag_chain = get_rag_chain(model)
    for i in tqdm(range(0, len(QUESTIONS))):
        question = QUESTIONS[i]
        data["answer"].append(rag_chain.invoke(question))
        data["contexts"].append([doc.page_content for doc in retriever.get_relevant_documents(question)])
        
    dataset = Dataset.from_dict(data)
    
    result= evaluate(dataset=dataset, metrics=[faithfulness, answer_relevancy])
    print(f"Result for the {model} model: {result}")