[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/mongodb-developer/GenAI-Showcase/blob/main/notebooks/evals/ragas-evaluation.ipynb)


# RAG Series Part 3: Data Modeling Strategies for RAG

In this notebook, we will explore and evaluate different chunking techniques for RAG.


## Step 1: Install required libraries


In [1]:
! pip install -qU langchain langchain-openai langchain-mongodb langchain-experimental ragas pymongo tqdm

## Step 2: Setup pre-requisites

- Set the MongoDB connection string. Follow the steps [here](https://www.mongodb.com/docs/manual/reference/connection-string/) to get the connection string from the Atlas UI.

- Set the OpenAI API key. Steps to obtain an API key as [here](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key)


In [2]:
import os
import getpass
from openai import OpenAI

In [3]:
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")
openai_client = OpenAI()

In [4]:
MONGODB_URI = getpass.getpass("Enter your MongoDB connection string:")

## Step 3: Load the dataset


In [5]:
from langchain_community.document_loaders import WebBaseLoader

web_loader = WebBaseLoader(
    [
        "https://peps.python.org/pep-0483/",
        "https://peps.python.org/pep-0008/",
        "https://peps.python.org/pep-0257/",
    ]
)

pages = web_loader.load()

In [6]:
len(pages)

3

## Step 4: Define chunking functions


In [7]:
from langchain.text_splitter import (
    Language,
    RecursiveCharacterTextSplitter,
    TokenTextSplitter,
)
from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai.embeddings import OpenAIEmbeddings

In [25]:
def fixed_token(docs, chunk_size, chunk_overlap):
    splitter = TokenTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    return splitter.split_documents(docs)

In [9]:
def recursive_split(docs, chunk_size, chunk_overlap):
    splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
        encoding_name="cl100k_base",
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
    )
    return splitter.split_documents(docs)

In [10]:
def recursive_python_split(docs, chunk_size, chunk_overlap, language):
    splitter = RecursiveCharacterTextSplitter.from_language(
        language=language,
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
    )
    return splitter.split_documents(docs)

In [11]:
def semantic_split(docs):
    splitter = SemanticChunker(
        OpenAIEmbeddings(), breakpoint_threshold_type="percentile"
    )
    return splitter.split_documents(docs)

## Step 5: Generate the Evaluation Dataset


In [12]:
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# generator with openai models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()

generator = TestsetGenerator.from_langchain(generator_llm, critic_llm, embeddings)

# Change resulting question type distribution
distributions = {simple: 0.5, multi_context: 0.4, reasoning: 0.1}

testset = generator.generate_with_langchain_docs(pages, 10, distributions)

  from .autonotebook import tqdm as notebook_tqdm
Filename and doc_id are the same for all nodes.                 
Generating: 100%|██████████| 10/10 [01:41<00:00, 10.13s/it]


In [13]:
testset = testset.to_pandas()
testset.head()

Unnamed: 0,question,contexts,ground_truth,evolution_type,metadata,episode_done
0,What is the recommended approach for implement...,[ations.\n\nComparisons to singletons like Non...,When implementing ordering operations with ric...,simple,[{'source': 'https://peps.python.org/pep-0008/...,True
1,What is the recommended encoding for code in t...,[ in adjacent columns.\nThe default wrapping i...,Code in the core Python distribution should al...,simple,[{'source': 'https://peps.python.org/pep-0008/...,True
2,What are function annotations and how are they...,[ presence increases code understandability.\n...,The Python standard library should be conserva...,simple,[{'source': 'https://peps.python.org/pep-0008/...,True
3,What are the guidelines for writing Pythonic c...,[ be removed.\nWe don’t use the term “private”...,Public attributes should have no leading under...,simple,[{'source': 'https://peps.python.org/pep-0008/...,True
4,What is the difference between classes and typ...,[ at the top (it has all values)\nand bottom (...,,simple,[{'source': 'https://peps.python.org/pep-0483/...,True


## Step 6: Evaluate Chunking Strategies


In [14]:
from langchain_mongodb import MongoDBAtlasVectorSearch
from langchain_core.vectorstores import VectorStoreRetriever
from pymongo import MongoClient

client = MongoClient(MONGODB_URI)
DB_NAME = "evals"
COLLECTION_NAME = "chunking"
ATLAS_VECTOR_SEARCH_INDEX_NAME = "vector_index"
MONGODB_COLLECTION = client[DB_NAME][COLLECTION_NAME]

In [19]:
def get_retriever(docs) -> VectorStoreRetriever:
    vector_store = MongoDBAtlasVectorSearch.from_documents(
        documents=docs,
        embedding=OpenAIEmbeddings(model="text-embedding-3-small"),
        collection=MONGODB_COLLECTION,
        index_name=ATLAS_VECTOR_SEARCH_INDEX_NAME,
    )

    return vector_store

In [29]:
from tqdm import tqdm
import time
from datasets import Dataset
from ragas import evaluate, RunConfig
from ragas.metrics import context_precision, context_recall
import nest_asyncio

# Allow nested use of asyncio (used by RAGAS)
nest_asyncio.apply()

# Disable tqdm locks
tqdm.get_lock().locks = []

QUESTIONS = testset.question.to_list()
GROUND_TRUTH = testset.ground_truth.to_list()

In [30]:
def perform_eval(docs):
    eval_data = {
        "question": QUESTIONS,
        "ground_truth": GROUND_TRUTH,
        "contexts": [],
    }

    print(f"Deleting existing documents in the collection {DB_NAME}.{COLLECTION_NAME}")
    MONGODB_COLLECTION.delete_many({})
    print(f"Deletion complete")
    retriever = get_retriever(docs)

    # Getting relevant documents for the evaluation dataset
    print(f"Getting contexts for evaluation set")
    for question in tqdm(QUESTIONS):
        eval_data["contexts"].append(
            [doc.page_content for doc in retriever.similarity_search(question, k=3)]
        )
    # RAGAS expects a Dataset object
    dataset = Dataset.from_dict(eval_data)
    # RAGAS runtime settings to avoid hitting OpenAI rate limits
    print(f"Running evals")
    run_config = RunConfig(max_workers=4, max_wait=180)
    result = evaluate(
        dataset=dataset,
        metrics=[context_precision, context_recall],
        run_config=run_config,
        raise_exceptions=False,
    )
    return result

In [31]:
for chunk_size in [100, 200, 500, 1000]:
    chunk_overlap = int(0.15 * chunk_size)
    print(f"CHUNK SIZE: {chunk_size}")
    print("------ Fixed token without overlap ------")
    print(f"Result: {perform_eval(fixed_token(pages, chunk_size, 0))}")
    print("------ Fixed token with overlap ------")
    print(f"Result: {perform_eval(fixed_token(pages, chunk_size, chunk_overlap))}")
    print("------ Recursive with overlap ------")
    print(f"Result: {perform_eval(recursive_split(pages, chunk_size, chunk_overlap))}")
    print("------ Recursive Python splitter with overlap ------")
    print(
        f"Result: {perform_eval(recursive_python_split(pages, 4*chunk_size, 4*chunk_overlap, Language.PYTHON))}"
    )
print("------ Semantic chunking ------")
print(f"Result: {perform_eval(semantic_split(pages))}")

CHUNK SIZE: 100
------ Fixed token without overlap ------
Deleting existing documents in the collection evals.chunking
Deletion complete
Getting contexts for evaluation set


100%|██████████| 10/10 [00:02<00:00,  4.92it/s]


Running evals


Evaluating: 100%|██████████| 20/20 [00:21<00:00,  1.08s/it]


Result: {'context_precision': 0.9000, 'context_recall': 0.8300}
------ Fixed token with overlap ------
Deleting existing documents in the collection evals.chunking
Deletion complete
Getting contexts for evaluation set


100%|██████████| 10/10 [00:03<00:00,  3.33it/s]


Running evals


Evaluating: 100%|██████████| 20/20 [00:20<00:00,  1.03s/it]


Result: {'context_precision': 0.8500, 'context_recall': 0.7643}
------ Recursive with overlap ------
Deleting existing documents in the collection evals.chunking
Deletion complete
Getting contexts for evaluation set


100%|██████████| 10/10 [00:01<00:00,  5.17it/s]


Running evals


Evaluating: 100%|██████████| 20/20 [00:21<00:00,  1.07s/it]


Result: {'context_precision': 0.9333, 'context_recall': 0.8436}
------ Recursive Python splitter with overlap ------
Deleting existing documents in the collection evals.chunking
Deletion complete
Getting contexts for evaluation set


100%|██████████| 10/10 [00:02<00:00,  4.01it/s]


Running evals


Evaluating: 100%|██████████| 20/20 [00:25<00:00,  1.27s/it]


Result: {'context_precision': 0.9833, 'context_recall': 0.9013}
CHUNK SIZE: 200
------ Fixed token without overlap ------
Deleting existing documents in the collection evals.chunking
Deletion complete
Getting contexts for evaluation set


100%|██████████| 10/10 [00:01<00:00,  5.15it/s]


Running evals


Evaluating: 100%|██████████| 20/20 [00:31<00:00,  1.56s/it]


Result: {'context_precision': 0.8583, 'context_recall': 0.8913}
------ Fixed token with overlap ------
Deleting existing documents in the collection evals.chunking
Deletion complete
Getting contexts for evaluation set


100%|██████████| 10/10 [00:02<00:00,  4.66it/s]


Running evals


Evaluating: 100%|██████████| 20/20 [00:21<00:00,  1.06s/it]


Result: {'context_precision': 0.9500, 'context_recall': 0.8765}
------ Recursive with overlap ------
Deleting existing documents in the collection evals.chunking
Deletion complete
Getting contexts for evaluation set


100%|██████████| 10/10 [00:01<00:00,  5.35it/s]


Running evals


Evaluating: 100%|██████████| 20/20 [00:24<00:00,  1.21s/it]


Result: {'context_precision': 1.0000, 'context_recall': 0.8226}
------ Recursive Python splitter with overlap ------
Deleting existing documents in the collection evals.chunking
Deletion complete
Getting contexts for evaluation set


100%|██████████| 10/10 [00:02<00:00,  4.54it/s]


Running evals


Evaluating: 100%|██████████| 20/20 [00:29<00:00,  1.49s/it]


Result: {'context_precision': 0.8417, 'context_recall': 0.7953}
CHUNK SIZE: 500
------ Fixed token without overlap ------
Deleting existing documents in the collection evals.chunking
Deletion complete
Getting contexts for evaluation set


100%|██████████| 10/10 [00:02<00:00,  4.80it/s]


Running evals


Evaluating: 100%|██████████| 20/20 [00:17<00:00,  1.11it/s]


Result: {'context_precision': 0.6000, 'context_recall': 0.8209}
------ Fixed token with overlap ------
Deleting existing documents in the collection evals.chunking
Deletion complete
Getting contexts for evaluation set


100%|██████████| 10/10 [00:02<00:00,  4.99it/s]


Running evals


Evaluating: 100%|██████████| 20/20 [00:20<00:00,  1.04s/it]


Result: {'context_precision': 0.5000, 'context_recall': 0.9050}
------ Recursive with overlap ------
Deleting existing documents in the collection evals.chunking
Deletion complete
Getting contexts for evaluation set


100%|██████████| 10/10 [00:01<00:00,  5.27it/s]


Running evals


Evaluating: 100%|██████████| 20/20 [00:22<00:00,  1.14s/it]


Result: {'context_precision': 0.5583, 'context_recall': 0.8686}
------ Recursive Python splitter with overlap ------
Deleting existing documents in the collection evals.chunking
Deletion complete
Getting contexts for evaluation set


100%|██████████| 10/10 [00:02<00:00,  3.41it/s]


Running evals


Evaluating: 100%|██████████| 20/20 [00:31<00:00,  1.57s/it]


Result: {'context_precision': 0.8833, 'context_recall': 1.0000}
CHUNK SIZE: 1000
------ Fixed token without overlap ------
Deleting existing documents in the collection evals.chunking
Deletion complete
Getting contexts for evaluation set


100%|██████████| 10/10 [00:01<00:00,  5.03it/s]


Running evals


Evaluating: 100%|██████████| 20/20 [00:25<00:00,  1.29s/it]


Result: {'context_precision': 0.6000, 'context_recall': 0.9219}
------ Fixed token with overlap ------
Deleting existing documents in the collection evals.chunking
Deletion complete
Getting contexts for evaluation set


100%|██████████| 10/10 [00:01<00:00,  5.13it/s]


Running evals


Evaluating: 100%|██████████| 20/20 [00:18<00:00,  1.06it/s]


Result: {'context_precision': 0.7583, 'context_recall': 0.9000}
------ Recursive with overlap ------
Deleting existing documents in the collection evals.chunking
Deletion complete
Getting contexts for evaluation set


100%|██████████| 10/10 [00:02<00:00,  4.58it/s]


Running evals


Evaluating: 100%|██████████| 20/20 [00:21<00:00,  1.07s/it]


Result: {'context_precision': 0.8000, 'context_recall': 0.8600}
------ Recursive Python splitter with overlap ------
Deleting existing documents in the collection evals.chunking
Deletion complete
Getting contexts for evaluation set


100%|██████████| 10/10 [00:02<00:00,  4.28it/s]


Running evals


Evaluating: 100%|██████████| 20/20 [01:24<00:00,  4.21s/it]


Result: {'context_precision': 0.7833, 'context_recall': 0.9000}
------ Semantic chunking ------
Deleting existing documents in the collection evals.chunking
Deletion complete
Getting contexts for evaluation set


100%|██████████| 10/10 [00:03<00:00,  3.19it/s]


Running evals


Evaluating: 100%|██████████| 20/20 [00:19<00:00,  1.04it/s]

Result: {'context_precision': 0.7500, 'context_recall': 0.9000}



