# RAG Chunking Systems and Effectiveness Analysis

**Introduction**
 Retrieval-Augmented Generation (RAG) systems enhance the effectiveness of large language models (LLMs) by incorporating external knowledge sources.
 This notebook explores different chunking methods used in RAG systems and evaluates their effectiveness.

 **Objectives:**
 1. Define each chunking method.
 2. Implement code snippets for demonstration.
 3. Use recent papers for evaluation insights.
 4. Incorporate practical examples from the [Chunking Strategies Tutorial](https://github.com/ALucek/chunking-strategies/blob/main/chunking.ipynb).

In [1]:
!pip install -qU langchain_experimental langchain_openai langchain_community langchain ragas faiss-cpu tiktoken

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/209.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m209.2/209.2 kB[0m [31m13.7 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/54.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.3/54.3 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m60.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m35.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m176.9/176.9 kB[0m [31m14.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.5/27.5 MB[0m [31m44.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m

In [7]:
import numpy as np
import matplotlib.pyplot as plt
from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter
from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai.embeddings import OpenAIEmbeddings
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_recall,
    context_precision,
)
from ragas import evaluate
from datasets import Dataset
from sklearn.cluster import KMeans
from transformers import AutoTokenizer, AutoModelForCausalLM

**1. Fixed-Size Chunking**
Fixed-size chunking splits the document into equal-sized chunks, irrespective of content boundaries.

In [5]:
import requests

url = "https://huyenchip.com/2025/01/07/agents.html"
response = requests.get(url)

if response.status_code == 200:
    source_text = response.text
    # print(source_text)
else:
    print(f"Failed to fetch the URL. Status code: {response.status_code}")

In [22]:
fixed_splitter = CharacterTextSplitter(chunk_size=100, chunk_overlap=10)
fixed_chunks = fixed_splitter.split_text(source_text)
# print("Fixed-Size Chunks:")
# for chunk in fixed_chunks:
#     print(chunk)



 **2. Recursive Character Text Splitting (Naive Chunking)**
 Traditional non-semantic chunking splits text based on character limits and ignores semantic coherence.

In [24]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=200,
    chunk_overlap=0,
    length_function=len,
    is_separator_regex=False
)

naive_chunks = text_splitter.split_text(source_text)
# print("Naive Chunks:")
# for chunk in naive_chunks:
#     print(chunk)

 **3. Semantic Chunking**
 Semantic Chunking identifies meaningful boundaries using embedding models.

In [12]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

Enter your OpenAI API Key:··········


In [25]:
# Example: Semantic Chunking
from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai.embeddings import OpenAIEmbeddings


semantic_chunker = SemanticChunker(OpenAIEmbeddings(model="text-embedding-3-large"), breakpoint_threshold_type="percentile")
semantic_chunks = semantic_chunker.create_documents([source_text])

# for semantic_chunk in semantic_chunks:
#     print(semantic_chunk.page_content)
#     print(len(semantic_chunk.page_content))

**4. RAG Pipeline with Semantic Chunking**
 Create a RAG LCEL chain leveraging the Semantic Chunking method.

In [26]:
from langchain.vectorstores import FAISS
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

# Prepare Vectorstore
vectorstore = FAISS.from_texts(
    [chunk.page_content for chunk in semantic_chunks],
    embedding=OpenAIEmbeddings(model="text-embedding-3-large")
)
semantic_chunk_retriever = vectorstore.as_retriever(search_kwargs={"k": 1})

In [27]:
# Prepare RAG Chain
rag_template = """
Use the following context to answer the user's query. If you cannot answer, please respond with 'I don't know'.

User's Query:
{question}

Context:
{context}
"""
rag_prompt = ChatPromptTemplate.from_template(rag_template)
base_model = ChatOpenAI()

semantic_rag_chain = (
    {"context": semantic_chunk_retriever, "question": RunnablePassthrough()}
    | rag_prompt
    | base_model
    | StrOutputParser()
)

# Test RAG Pipeline
question = "How do agents orchestrate LLMs and tools?"
response = semantic_rag_chain.invoke(question)
print(response)

To orchestrate LLMs and tools, agents need access to external tools that allow them to perceive the environment (read-only actions) and act upon it (write actions). The tools in an agent's tool inventory significantly impact what tasks the agent can accomplish. Agents also require a strong AI planner to determine sequences of actions needed to complete tasks successfully.


In [31]:
naive_chunk_vectorstore = FAISS.from_texts(naive_chunks, embedding=OpenAIEmbeddings(model="text-embedding-3-large"))
naive_chunk_retriever = naive_chunk_vectorstore.as_retriever(search_kwargs={"k" : 15})

naive_rag_chain = (
    {"context" : naive_chunk_retriever, "question" : RunnablePassthrough()}
    | rag_prompt
    | base_model
    | StrOutputParser()
)

# Test RAG Pipeline
question = "How do agents orchestrate LLMs and tools?"
response = naive_rag_chain.invoke(question)
print(response)

Agents orchestrate LLMs and tools by utilizing various tools that enhance their capabilities. More tools give an agent more capabilities, but it can also make it challenging to understand and utilize them efficiently. Agents are equipped with different sets of tools based on their environment and the specific task at hand. Experimentation and careful consideration are required when selecting the best set of tools for an agent to use.


 **5. Evaluation with Ragas**
 Evaluate RAG performance using metrics like answer relevancy and context precision.

In [32]:
# Generate Evaluation Dataset
questions = [
    "What are agents used for in the context of LLMs?",
    "How do agents orchestrate tools efficiently?",
]
answers = [semantic_rag_chain.invoke(q) for q in questions]
contexts = [[chunk.page_content for chunk in semantic_chunks[:5]]] * len(questions)

qagc_list = [
    {"question": q, "answer": a, "contexts": c, "ground_truth": a}
    for q, a, c in zip(questions, answers, contexts)
]

eval_dataset = Dataset.from_list(qagc_list)

# Evaluate with Ragas
result = evaluate(
    eval_dataset,
    metrics=[context_precision, faithfulness, answer_relevancy, context_recall],
)
print(result)

Evaluating:   0%|          | 0/8 [00:00<?, ?it/s]

{'context_precision': 1.0000, 'faithfulness': 0.8750, 'answer_relevancy': 0.9345, 'context_recall': 1.0000}


**6. Ragas Assessment Comparison**
 Leverage Ragas to compare chunking strategies.

In [33]:
# Split documents using a different chunking strategy
synthetic_data_splitter = RecursiveCharacterTextSplitter(
    chunk_size=400,
    chunk_overlap=0,
    length_function=len,
    is_separator_regex=False
)
synthetic_data_chunks = synthetic_data_splitter.create_documents([source_text])

# Generate synthetic questions, contexts, and ground truths
questions = []
ground_truths_semantic = []
contexts = []
answers = []

question_prompt = ChatPromptTemplate.from_template("""
You are a teacher preparing a test. Please create a question that can be answered by referencing the following context.

Context:
{context}
""")

ground_truth_prompt = ChatPromptTemplate.from_template("""
Use the following context and question to answer this question using *only* the provided context.

Question:
{question}

Context:
{context}
""")

In [34]:
question_chain = question_prompt | ChatOpenAI(model="gpt-3.5-turbo") | StrOutputParser()
ground_truth_chain = ground_truth_prompt | ChatOpenAI(model="gpt-4-turbo-preview") | StrOutputParser()

for chunk in synthetic_data_chunks[10:20]:
    questions.append(question_chain.invoke({"context": chunk.page_content}))
    contexts.append([chunk.page_content])
    ground_truths_semantic.append(ground_truth_chain.invoke({"question": questions[-1], "context": contexts[-1]}))
    answers.append(semantic_rag_chain.invoke(questions[-1]))


In [35]:
# Create evaluation dataset
qagc_list = []
for question, answer, context, ground_truth in zip(questions, answers, contexts, ground_truths_semantic):
    qagc_list.append({
        "question": question,
        "answer": answer,
        "contexts": context,
        "ground_truth": ground_truth
    })

eval_dataset = Dataset.from_list(qagc_list)

# Evaluate with Ragas
result = evaluate(
    eval_dataset,
    metrics=[context_precision, faithfulness, answer_relevancy, context_recall],
)
print(result)

Evaluating:   0%|          | 0/40 [00:00<?, ?it/s]

{'context_precision': 1.0000, 'faithfulness': 0.3500, 'answer_relevancy': 0.3688, 'context_recall': 0.9750}


In [39]:
# Compare to naive strategy
semantic_answers = []
for chunk in synthetic_data_chunks[10:20]:
    semantic_answers.append(semantic_rag_chain.invoke(question_chain.invoke({"context": chunk.page_content})))

semantic_result = evaluate(
    eval_dataset,
    metrics=[context_precision, faithfulness, answer_relevancy, context_recall],
)

print("Naive Strategy Result:", semantic_result)

Evaluating:   0%|          | 0/40 [00:00<?, ?it/s]

Naive Strategy Result: {'context_precision': 1.0000, 'faithfulness': 0.2500, 'answer_relevancy': 0.3688, 'context_recall': 1.0000}


In [41]:
# Compare to naive strategy
naive_answers = []
for chunk in synthetic_data_chunks[10:20]:
    naive_answers.append(naive_rag_chain.invoke(question_chain.invoke({"context": chunk.page_content})))

naive_result = evaluate(
    eval_dataset,
    metrics=[context_precision, faithfulness, answer_relevancy, context_recall],
)

print("Naive Strategy Result:", naive_result)

Evaluating:   0%|          | 0/40 [00:00<?, ?it/s]

Naive Strategy Result: {'context_precision': 1.0000, 'faithfulness': 0.3500, 'answer_relevancy': 0.3688, 'context_recall': 0.9750}
