# Session 11: Advanced Retrieval with LangChain

## Learning Objectives:

- Understand and implement multiple retrieval strategies for RAG
- Compare naive, BM25, multi-query, parent-document, contextual compression, ensemble, and semantic chunking approaches
- Build RAG chains over a health and wellness knowledge base using LangChain and QDrant

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- ü§ù Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- ü§ù Breakout Room Part #2
  - Activity: Evaluate with Ragas

---

# ü§ù Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

> NOTE: Create a `.env` file in this directory with `OPENAI_API_KEY` and `COHERE_API_KEY` to avoid being prompted each time.

In [1]:
import os
import getpass
from dotenv import load_dotenv

load_dotenv()

if not os.environ.get("OPENAI_API_KEY"):
    os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [2]:
if not os.environ.get("COHERE_API_KEY"):
    os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using our Health and Wellness Guide - a comprehensive resource covering exercise, nutrition, sleep, stress management, habits, and common health concerns.

### Data Preparation

We'll load the wellness guide as a single document, then split it into smaller chunks using a `RecursiveCharacterTextSplitter` for our vector store. We also keep the raw (unsplit) document for use with the Parent Document Retriever and Semantic Chunker later.

In [3]:
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = TextLoader("data/HealthWellnessGuide.txt")
raw_docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
wellness_docs = text_splitter.split_documents(raw_docs)

Let's verify our data was loaded and split correctly!

In [4]:
print(f"Raw documents: {len(raw_docs)}")
print(f"Split chunks: {len(wellness_docs)}")
print(f"\nExample chunk:\n{wellness_docs[0]}")

Raw documents: 1
Split chunks: 45

Example chunk:
page_content='The Personal Wellness Guide
A Comprehensive Resource for Health and Well-being

PART 1: EXERCISE AND MOVEMENT

Chapter 1: Understanding Exercise Basics

Exercise is one of the most important things you can do for your health. Regular physical activity can improve your brain health, help manage weight, reduce the risk of disease, strengthen bones and muscles, and improve your ability to do everyday activities.' metadata={'source': 'data/HealthWellnessGuide.txt'}


## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "wellness_guide".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [5]:
from langchain_qdrant import QdrantVectorStore
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = QdrantVectorStore.from_documents(
    wellness_docs,
    embeddings,
    location=":memory:",
    collection_name="wellness_guide",
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [6]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [7]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [8]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [9]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [10]:
naive_retrieval_chain.invoke({"question" : "What exercises can help with lower back pain?"})["response"].content

'Exercises that can help alleviate lower back pain include:\n\n- **Cat-Cow Stretch:** Start on your hands and knees, alternate between arching your back up (cat) and letting it sag down (cow). Perform 10-15 repetitions.\n- **Bird Dog:** From hands and knees, extend the opposite arm and leg while keeping your core engaged. Hold each extension for about 5 seconds, then switch sides. Do 10 repetitions per side.\n- **Pelvic Tilts:** Lie on your back with knees bent, flatten your back against the floor by tightening your abs and tilting your pelvis up slightly. Hold for 10 seconds and repeat 8-12 times.\n\nThese exercises are recommended to help stretch and strengthen the muscles supporting your lower back, potentially providing relief from pain.'

In [11]:
naive_retrieval_chain.invoke({"question" : "How does sleep affect overall health?"})["response"].content

'Sleep plays a vital role in overall health by supporting physical restoration, mental well-being, and cognitive function. During sleep, the body repairs tissues, consolidates memories, and releases hormones that regulate growth and appetite. Adequate sleep (7-9 hours for adults) helps strengthen the immune system, improves mood, and enhances learning and memory. Poor sleep or sleep disturbances like insomnia can negatively impact health, increasing the risk of issues such as weakened immunity, mental health problems, and chronic conditions. Maintaining good sleep hygiene and creating an optimal sleep environment are important for ensuring restorative sleep and supporting overall health.'

In [12]:
naive_retrieval_chain.invoke({"question" : "What are some natural remedies for stress and headaches?"})["response"].content

'Some natural remedies for stress include deep breathing exercises, progressive muscle relaxation, grounding techniques (such as naming things you see, hear, feel, smell, and taste), taking short walks in nature, and listening to calming music. \n\nFor headaches, natural remedies include staying well-hydrated by drinking water, applying cold or warm compresses to the head or neck, resting in a dark, quiet room, gentle massage of the temples and neck, using essential oils like peppermint or lavender, maintaining a regular sleep schedule, and managing triggers such as dehydration, stress, and poor sleep.'

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [13]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(wellness_docs)

We'll construct the same chain - only changing the retriever.

In [14]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [15]:
bm25_retrieval_chain.invoke({"question" : "What exercises can help with lower back pain?"})["response"].content

'Exercises that can help with lower back pain include:\n\n- **Cat-Cow Stretch:** Start on your hands and knees. Alternate between arching your back up (like a cat) and letting it sag down (like a cow). Do 10-15 repetitions.\n- **Bird Dog:** From hands and knees, extend opposite arm and leg while keeping your core engaged. Hold each extension for 5 seconds, then switch sides. Perform 10 repetitions per side.\n- **Pelvic Tilts:** Lie on your back with knees bent. Flatten your lower back against the floor by tightening your abdominal muscles and tilting your pelvis upward. Hold for 10 seconds and repeat 8-12 times.\n\nThese gentle stretching and strengthening exercises can help alleviate lower back discomfort and reduce future episodes.'

In [16]:
bm25_retrieval_chain.invoke({"question" : "How does sleep affect overall health?"})["response"].content

'Sleep plays a crucial role in overall health. It occurs in cycles of about 90 minutes, with different stages including light sleep, deep sleep, and REM sleep. Adequate sleep, typically 7-9 hours for adults, allows the body to repair and regenerate tissues during deep sleep and supports memory and learning during REM sleep. Proper sleep also helps regulate immune function, mood, and overall mental health. Maintaining a healthy sleep environment‚Äîsuch as a cool, dark, quiet, and comfortable space‚Äîcan improve sleep quality and positively impact overall wellness.'

In [17]:
bm25_retrieval_chain.invoke({"question" : "What are some natural remedies for stress and headaches?"})["response"].content

'Some natural remedies for stress and headaches include relaxation techniques such as progressive muscle relaxation, meditation, and deep breathing exercises. Herbal teas like chamomile or valerian root may also help reduce headache symptoms and promote relaxation. Additionally, ensuring proper hydration, maintaining good sleep hygiene, and managing stress through mindfulness or gentle physical activities can be beneficial. Remember to consult with a healthcare provider before starting any supplements or new remedies.'

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

### ‚ùì Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.

##### ‚úÖ Answer:
An example query where BM25 would likely perform better is: **‚ÄúWhat does the guide say about the 10-15 repetitions for Cat-Cow stretch?‚Äù**

BM25 would work well here because the question contains very specific words and numbers that probably appear exactly the same in the document. Since BM25 is based on keyword matching, it is very strong when the query uses the same terms as the source text.

Embeddings focus more on overall meaning, so they are better for paraphrased or conceptual questions, but for precise wording and exact matches, BM25 can be more accurate.

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [18]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [19]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [20]:
contextual_compression_retrieval_chain.invoke({"question" : "What exercises can help with lower back pain?"})["response"].content

'Exercises that can help with lower back pain include gentle stretches and strengthening movements such as:\n\n- **Cat-Cow Stretch:** Start on hands and knees, alternate between arching your back up (cat position) and letting it sag down (cow position), doing 10-15 repetitions.\n- **Bird Dog:** From hands and knees, extend opposite arm and leg while engaging your core, hold for 5 seconds, then switch sides. Aim for 10 repetitions per side.\n- **Pelvic Tilts:** Lie on your back with knees bent, flatten your back against the floor by tightening your abs and tilting your pelvis slightly upwards. Hold for 10 seconds and repeat 8-12 times.\n\nThese exercises can help alleviate discomfort and prevent future episodes of lower back pain. However, it‚Äôs always best to consult with a healthcare professional before starting any new exercise routine.'

In [21]:
contextual_compression_retrieval_chain.invoke({"question" : "How does sleep affect overall health?"})["response"].content

'Sleep has a significant impact on overall health. It is essential for physical recovery, as the body repairs tissues and regenerates cells during sleep. Additionally, sleep supports mental well-being and cognitive functions, including memory consolidation and learning. Adequate sleep helps regulate hormones related to growth and appetite, contributing to overall bodily balance. Poor or insufficient sleep can impair these processes and negatively affect health.'

In [22]:
contextual_compression_retrieval_chain.invoke({"question" : "What are some natural remedies for stress and headaches?"})["response"].content

'Some natural remedies for stress and headaches include practicing deep breathing, progressive muscle relaxation, and grounding techniques. Taking a short walk in nature and listening to calming music can also help reduce stress. For headaches specifically, remedies include drinking water to stay hydrated, applying cold or warm compresses to the head or neck, resting in a dark, quiet room, gently massaging the temples and neck, using peppermint or lavender essential oils, and maintaining a regular sleep schedule.'

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [23]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
) 

In [24]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [25]:
multi_query_retrieval_chain.invoke({"question" : "What exercises can help with lower back pain?"})["response"].content

'Exercises that can help with lower back pain include:\n\n- Cat-Cow Stretch: Start on hands and knees, alternate between arching your back up (cat) and letting it sag down (cow). Do 10-15 repetitions.\n- Bird Dog: From hands and knees, extend opposite arm and leg while keeping your core engaged. Hold for 5 seconds, then switch sides. Do 10 repetitions per side.\n- Partial Crunches: Lie on your back with knees bent, cross arms over your chest, tighten your stomach muscles, and raise shoulders off the floor. Hold briefly, then lower. Do 8-12 repetitions.\n- Knee-to-Chest Stretch: Lie on your back, pull one knee toward your chest while keeping the other foot flat. Hold for 15-30 seconds, then switch legs.\n- Pelvic Tilts: Lie on your back with knees bent, flatten your back against the floor by tightening abs and tilting your pelvis slightly upward. Hold for 10 seconds, repeat 8-12 times.\n\nEngaging in these gentle stretching and strengthening exercises can help alleviate lower back disco

In [26]:
multi_query_retrieval_chain.invoke({"question" : "How does sleep affect overall health?"})["response"].content

'Sleep has a significant impact on overall health. It is essential for physical health, mental well-being, and cognitive function. During sleep, the body repairs tissues, consolidates memories, and releases hormones that regulate growth and appetite. Adequate sleep (7-9 hours per night) supports a strong immune system, helps manage stress, and promotes mood stability. Poor sleep or sleep disorders like insomnia can negatively affect health by impairing immune function, increasing stress levels, and contributing to chronic conditions. Therefore, maintaining good sleep hygiene and ensuring restful sleep are vital for overall wellness.'

In [27]:
multi_query_retrieval_chain.invoke({"question" : "What are some natural remedies for stress and headaches?"})["response"].content

'Some natural remedies for stress include practicing deep breathing, progressive muscle relaxation, grounding techniques (like identifying things you see, hear, feel, smell, and taste), taking a short walk in nature, and listening to calming music. For headaches, natural remedies encompass drinking water to stay hydrated, applying cold or warm compresses to the head or neck, resting in a dark and quiet room, giving gentle massage to the temples and neck, using essential oils such as peppermint or lavender, maintaining a regular sleep schedule, and managing stress effectively.'

### ‚ùì Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

##### ‚úÖ Answer:

Generating multiple reformulations of a user query improves recall because different versions of the same question may match different parts of the documents. A single query might use specific wording that does not perfectly align with how the information is written in the text. By creating alternative phrasings, we increase the chances of retrieving additional relevant documents that might otherwise be missed.

Multiple query variations help cover more semantic ground, reducing the risk that important information is overlooked due to wording differences.

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. We split the full document into large "parent" chunks (e.g. 2000 characters).
2. Each parent chunk is further split into smaller "child" chunks (e.g. 400 characters).
3. The child chunks are stored in a VectorStore, while the parent chunks are stored in an in-memory docstore.
4. When we query our Retriever, we do a similarity search comparing our query vector to the child chunks.
5. Instead of returning the child chunks, we return their associated parent chunks.

The basic idea is:

- **Search** for small, focused chunks (better semantic matching)
- **Return** big chunks (richer surrounding context)

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by defining our parent and child splitters.

In [28]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200)
child_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=50)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [29]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="wellness_parent_child",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="wellness_parent_child", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [30]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore=parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
    parent_splitter=parent_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [31]:
parent_document_retriever.add_documents(raw_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [32]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [33]:
parent_document_retrieval_chain.invoke({"question" : "What exercises can help with lower back pain?"})["response"].content

'To help with lower back pain, gentle stretching and strengthening exercises can be beneficial. Some recommended exercises include:\n\n- Cat-Cow Stretch: Start on hands and knees, alternate between arching your back up (cat) and letting it sag down (cow). Do 10-15 repetitions.\n- Bird Dog: From hands and knees, extend opposite arm and leg while keeping your core engaged. Hold for 5 seconds, then switch sides. Do 10 repetitions per side.\n- Partial Crunches: Lie on your back with knees bent, cross arms over chest, tighten stomach muscles, and raise shoulders off the floor. Do 8-12 repetitions.\n- Knee-to-Chest Stretch: Lie on your back, pull one knee toward your chest while keeping the other foot flat. Hold for 15-30 seconds, then switch legs.\n- Pelvic Tilts: Lie on your back with knees bent, flatten your back against the floor by tightening your abs and tilting pelvis up slightly. Hold for 10 seconds, then repeat 8-12 times.\n\nAlways consult with a healthcare professional before star

In [34]:
parent_document_retrieval_chain.invoke({"question" : "How does sleep affect overall health?"})["response"].content

'Sleep has a significant impact on overall health. It is crucial for physical repair, mental well-being, and cognitive functions. During sleep, the body repairs tissues, consolidates memories, and releases hormones that help regulate growth and appetite. Adequate sleep, typically 7-9 hours per night for adults, is essential for maintaining good health, supporting immune function, and enhancing learning and memory. Poor sleep quality or insufficient sleep can lead to fatigue, low energy, headaches, and other health issues. Therefore, practicing good sleep hygiene and creating a restful sleep environment are important for ensuring restorative sleep and overall wellness.'

In [35]:
parent_document_retrieval_chain.invoke({"question" : "What are some natural remedies for stress and headaches?"})["response"].content

'Some natural remedies for stress and headaches include practicing deep breathing exercises, engaging in mindfulness or meditation, doing gentle stretches or yoga, taking a warm bath, and listening to calming music. For headaches specifically, remedies also include drinking plenty of water to stay hydrated, applying cold or warm compresses to the head or neck, resting in a dark and quiet room, gently massaging the temples and neck, and using essential oils like peppermint or lavender.'

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [36]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [37]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [38]:
ensemble_retrieval_chain.invoke({"question" : "What exercises can help with lower back pain?"})["response"].content

'Exercises that can help with lower back pain include:\n\n- **Cat-Cow Stretch:** Start on hands and knees, alternate between arching your back up (cat) and letting it sag down (cow). Do 10-15 repetitions.\n- **Bird Dog:** From hands and knees, extend opposite arm and leg while keeping your core engaged. Hold for 5 seconds, then switch sides. Do 10 repetitions per side.\n- **Pelvic Tilts:** Lie on your back with knees bent, flatten your back against the floor by tightening abs and tilting pelvis up slightly. Hold for 10 seconds, repeat 8-12 times.\n- **Partial Crunches:** Lie on your back with knees bent, cross arms over chest, tighten stomach muscles and raise shoulders off floor. Hold briefly, then lower. Do 8-12 repetitions.\n- **Knee-to-Chest Stretch:** Lie on your back, pull one knee toward your chest while keeping the other foot flat. Hold for 15-30 seconds, then switch legs.\n\nThese gentle stretching and strengthening exercises are recommended to alleviate lower back discomfort 

In [39]:
ensemble_retrieval_chain.invoke({"question" : "How does sleep affect overall health?"})["response"].content

'Sleep has a significant impact on overall health. It is essential for physical repair, mental well-being, and cognitive function. During sleep, the body repairs tissues, consolidates memories, and releases hormones that regulate growth and appetite. Adequate sleep (7-9 hours per night) supports immune function, helps manage stress, and maintains hormonal balance. Poor sleep or sleep disturbances like insomnia can lead to physical and mental health issues, including increased stress, impaired immune response, and cognitive difficulties. Creating a conducive sleep environment and practicing good sleep hygiene can improve sleep quality and, consequently, enhance overall health.'

In [40]:
ensemble_retrieval_chain.invoke({"question" : "What are some natural remedies for stress and headaches?"})["response"].content

'Some natural remedies for stress and headaches include:\n\n- Deep breathing exercises to promote relaxation\n- Progressive muscle relaxation to reduce tension\n- Grounding techniques, such as naming objects around you to stay present\n- Taking short walks, preferably in nature\n- Listening to calming music\n- Applying cold or warm compresses to the head or neck\n- Resting in a dark, quiet environment\n- Gentle massage of temples and neck\n- Using essential oils like peppermint or lavender\n- Maintaining a regular sleep schedule\n- Drinking plenty of water to stay hydrated\n\nThese approaches can help manage stress and alleviate headache symptoms naturally.'

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [41]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [42]:
semantic_documents = semantic_chunker.split_documents(raw_docs)

Let's create a new vector store.

In [43]:
semantic_vectorstore = QdrantVectorStore.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="wellness_guide_semantic_chunks"
)

We'll use naive retrieval for this example.

In [44]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [45]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [46]:
semantic_retrieval_chain.invoke({"question" : "What exercises can help with lower back pain?"})["response"].content

'Exercises that can help with lower back pain include:\n\n- **Cat-Cow Stretch:** Start on hands and knees, alternate between arching your back up (cat) and letting it sag down (cow). Do 10-15 repetitions.\n- **Partial Crunches:** Lie on your back with knees bent, cross arms over chest, tighten stomach muscles, and raise shoulders off the floor. Hold briefly, then lower. Do 8-12 repetitions.\n- **Knee-to-Chest Stretch:** Lie on your back, pull one knee toward your chest while keeping the other foot flat. Hold for 15-30 seconds, then switch legs.\n- **Pelvic Tilts:** Lie on your back with knees bent, flatten your back against the floor by tightening your abs and tilting your pelvis up slightly. Hold for 10 seconds, repeat 8-12 times.\n\nGentle stretching and strengthening exercises like these can alleviate discomfort and help prevent future episodes of lower back pain.'

In [47]:
semantic_retrieval_chain.invoke({"question" : "How does sleep affect overall health?"})["response"].content

'Sleep plays a vital role in maintaining overall health. It is essential for physical health, mental well-being, and cognitive function. During sleep, the body repairs tissues, consolidates memories, and releases hormones that regulate growth and appetite. Adults generally need 7-9 hours of sleep per night, with sleep occurring in cycles that include REM and non-REM stages. Good sleep quality is promoted by maintaining a consistent sleep schedule, creating a relaxing bedtime routine, and optimizing the sleep environment (such as keeping the room cool, dark, and quiet). Proper sleep hygiene and managing issues like insomnia can improve sleep quality, which in turn supports various aspects of overall health.'

In [48]:
semantic_retrieval_chain.invoke({"question" : "What are some natural remedies for stress and headaches?"})["response"].content

'Some natural remedies for stress include deep breathing exercises, progressive muscle relaxation, mindfulness and meditation practices, grounding techniques (such as naming things you see, hear, feel, smell, and taste), taking short walks in nature, and listening to calming music. \n\nFor headaches, natural remedies include staying well-hydrated by drinking water, applying cold or warm compresses to the head or neck, resting in a dark, quiet room, gently massaging the temples and neck, using essential oils like peppermint or lavender, and maintaining a regular sleep schedule.'

### ‚ùì Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

##### ‚úÖ Answer:

If sentences are short and highly repetitive, semantic chunking might struggle to detect meaningful differences between them. Since many sentences would be semantically similar, the distance between embeddings would be small, which could result in very large chunks being grouped together instead of cleanly separated sections.

To adjust for this, I would modify the thresholding strategy by making it more sensitive ‚Äî for example, lowering the percentile threshold so that smaller semantic differences trigger a split. Alternatively, I could combine semantic chunking with a maximum chunk size limit to prevent overly large chunks from forming.

---

# ü§ù Breakout Room Part #2

### üèóÔ∏è Activity #1:

Your task is to evaluate the various Retriever methods against each other.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparison between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

### üèóÔ∏è Step 1:  Generating a Synthetic Golden Dataset

- I use **RAGAS TestsetGenerator** with an LLM (generator + critic) to automatically create realistic evaluation questions and corresponding ground-truth answers based on our source documents.
- This synthetic dataset serves as our **gold standard** for comparing retriever performance in a consistent and reproducible way.

In [49]:
from ragas.testset import TestsetGenerator 
from langchain_openai import ChatOpenAI
from ragas.embeddings import LangchainEmbeddingsWrapper

generator_llm = ChatOpenAI(model="gpt-4.1-nano")
critic_llm = ChatOpenAI(model="gpt-4.1-nano")

ragas_embeddings = LangchainEmbeddingsWrapper(embeddings)

gen = TestsetGenerator.from_langchain(generator_llm, critic_llm, ragas_embeddings)

testset_size = 20
testset = gen.generate_with_langchain_docs(
    raw_docs,
    testset_size=testset_size,
    transforms_embedding_model=ragas_embeddings
)
testset

  ragas_embeddings = LangchainEmbeddingsWrapper(embeddings)


Applying HeadlinesExtractor:   0%|          | 0/1 [00:00<?, ?it/s]

Applying HeadlineSplitter:   0%|          | 0/1 [00:00<?, ?it/s]

Applying SummaryExtractor:   0%|          | 0/1 [00:00<?, ?it/s]

Applying CustomNodeFilter:   0%|          | 0/5 [00:00<?, ?it/s]

Applying EmbeddingExtractor:   0%|          | 0/1 [00:00<?, ?it/s]

Applying ThemesExtractor:   0%|          | 0/4 [00:00<?, ?it/s]

Applying NERExtractor:   0%|          | 0/4 [00:00<?, ?it/s]

Applying CosineSimilarityBuilder:   0%|          | 0/1 [00:00<?, ?it/s]

Applying OverlapScoreBuilder:   0%|          | 0/1 [00:00<?, ?it/s]

Skipping multi_hop_abstract_query_synthesizer due to unexpected error: No relationships match the provided condition. Cannot form clusters.


Generating personas:   0%|          | 0/1 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/22 [00:00<?, ?it/s]

Testset(samples=[TestsetSample(eval_sample=SingleTurnSample(user_input='What is chapter 1 about', retrieved_contexts=None, reference_contexts=['PART 1: EXERCISE AND MOVEMENT\n\nChapter 1: Understanding Exercise Basics\n\nExercise is one of the most important things you can do for your health. Regular physical activity can improve your brain health, help manage weight, reduce the risk of disease, strengthen bones and muscles, and improve your ability to do everyday activities.\n\nThe four main types of exercise are aerobic (cardio), strength training, flexibility, and balance exercises. A well-rounded fitness routine includes all four types. Adults should aim for at least 150 minutes of moderate-intensity aerobic activity per week, along with muscle-strengthening activities on 2 or more days per week.\n\nChapter 2: Exercises for Common Problems\n\nLower Back Pain Relief\nLower back pain affects approximately 80% of adults at some point in their lives. Gentle stretching and strengthening

### üèóÔ∏è Step 2 ‚Äì Preparing the Golden Dataset

- I convert the synthetic RAGAS testset into a structured DataFrame and standardize the column names (`question`, `ground_truth`) to simplify the evaluation pipeline.
- I save the dataset locally to ensure reproducibility and to avoid regenerating synthetic data (and additional API cost) in future runs.

In [50]:
import pandas as pd

# 1) Convert Ragas testset -> DataFrame
df_raw = testset.to_pandas()

# 2) Standardize column names (so the rest of the eval code is simpler)
df = df_raw.rename(columns={
    "user_input": "question",
    "reference": "ground_truth"
})

# 3) Keep only what we need for evaluation
df = df[["question", "ground_truth"]].dropna().reset_index(drop=True)

# 4) Save locally (reproducibility + avoid regen cost)
out_path = "data/golden_dataset.csv"
df.to_csv(out_path, index=False)

print("Saved:", out_path)
print("Rows:", len(df))

# 5) Quick preview
df.head(10)

Saved: data/golden_dataset.csv
Rows: 22


Unnamed: 0,question,ground_truth
0,What is chapter 1 about,Chapter 1: Understanding Exercise Basics expla...
1,What are some effective exercises to incorpora...,The context discusses the importance of a well...
2,Is Shanghai a good place for wellness activities?,The provided context does not include informat...
3,What role do fats play in a healthy diet accor...,Fats are necessary for hormone production and ...
4,What role do carbohydrates play in a healthy d...,Carbohydrates are described as the primary ene...
5,What are micronutrients and why are they impor...,Micronutrients are inorganic elements like vit...
6,What is Chapter 18 about?,"Chapter 18 discusses boosting immune function,..."
7,What information is covered in Chapter 19 of t...,Chapter 19 discusses maintaining work-life bal...
8,Can you tell me about Chapter 15 and what it s...,Chapter 15 discusses evening wind-down routine...
9,"Can you explain what the chapter titled ""Chapt...",Chapter 11 discusses stress reduction techniqu...


### üèóÔ∏è Step 3 ‚Äì Setting Up Retrieval Evaluation

- I import the required libraries for benchmarking, including RAGAS evaluation tools and retrieval-specific metrics ('context_precision', 'context_recall').
- I define the evaluation metrics that will be used consistently across all retriever strategies to ensure a fair and controlled comparison.

In [63]:
import time
import pandas as pd
from tqdm import tqdm

from ragas import evaluate
from ragas.dataset_schema import EvaluationDataset
from ragas.metrics import context_precision, context_recall  # legacy, but works

METRICS = [context_precision, context_recall]

  from ragas.metrics import context_precision, context_recall  # legacy, but works
  from ragas.metrics import context_precision, context_recall  # legacy, but works


### üèóÔ∏è Step 4 ‚Äì Building the Retrieval Evaluation Dataset

- I retrieve relevant document chunks for a subset of questions from the golden dataset and construct a structured evaluation DataFrame containing 'user_input', 'retrieved_contexts', and 'reference'.
- I measure total retrieval time and latency per question, allowing me to compare not only retrieval quality but also performance across different retriever strategies.

In [68]:
def build_eval_df_for_retriever(retriever, df_gold, n_questions=10):
    """
    Returns:
      eval_df: columns = user_input, retrieved_contexts, reference
      timing: retrieval_total_sec, retrieval_sec_per_q
    """
    # uzmi prvih N pitanja (mozes i .sample kasnije)
    df_subset = df_gold.head(n_questions).copy()
    questions = df_subset["question"].tolist()

    rows = []
    t0 = time.perf_counter()

    for q in tqdm(questions, desc="Retrieving"):
        docs = retriever.get_relevant_documents(q)
        rows.append({
            "user_input": q,
            "retrieved_contexts": [d.page_content for d in docs],
        })

    retrieval_total = time.perf_counter() - t0
    retrieval_per_q = retrieval_total / max(len(questions), 1)

    retrieval_df = pd.DataFrame(rows)

    # merge reference (ground truth) iz golden dataseta
    eval_df = retrieval_df.merge(
        df_subset.rename(columns={"ground_truth": "reference", "question": "user_input"})[["user_input", "reference"]],
        on="user_input",
        how="left"
    )

    # safety: makni redove bez reference
    eval_df = eval_df.dropna(subset=["reference"]).reset_index(drop=True)

    timing = {
        "retrieval_total_sec": retrieval_total,
        "retrieval_sec_per_q": retrieval_per_q,
        "n_questions": len(eval_df),
    }
    return eval_df, timing

### üèóÔ∏è Step 5 ‚Äì Evaluating Retrieval Quality with RAGAS

- I convert the evaluation DataFrame into a RAGAS 'EvaluationDataset' and compute retrieval-specific metrics ('context_precision', 'context_recall').
- I measure total evaluation time and latency per question to understand the computational cost of the retrieval evaluation process.

In [69]:
def ragas_retrieval_eval(eval_df, ragas_embeddings):
    """
    eval_df columns: user_input, retrieved_contexts, reference
    Returns: (scores_dict, timing_dict)
    """
    dataset = EvaluationDataset.from_list(eval_df.to_dict(orient="records"))

    t0 = time.perf_counter()
    scores = evaluate(
        dataset,
        metrics=METRICS,
        embeddings=ragas_embeddings
    )
    ragas_total = time.perf_counter() - t0
    ragas_per_q = ragas_total / max(len(eval_df), 1)

    timing = {
        "ragas_total_sec": ragas_total,
        "ragas_sec_per_q": ragas_per_q,
    }
    return scores, timing

### üèóÔ∏è Step 6 ‚Äì Running the Retriever Benchmark and Comparing Results

- I run a standardized evaluation for each retriever: build the evaluation dataset, compute RAGAS retrieval metrics (precision/recall), and track latency per question (retrieval time + evaluation time).
- I compile all results into a single summary table (sorted by recall) to compare performance and practical trade-offs across retriever strategies.

In [74]:
import numpy as np

def _to_mean(x):
    # Ragas zna vratiti listu (po pitanju) ili scalar (agregat)
    if isinstance(x, (list, tuple)):
        x = [v for v in x if v is not None]
        return float(np.mean(x)) if len(x) else float("nan")
    return float(x)

def run_retriever_eval(name, retriever, df_gold, ragas_embeddings, n_questions=4, cost_note=""):
    # 1) build eval df
    eval_df, t_retr = build_eval_df_for_retriever(retriever, df_gold, n_questions=n_questions)

    # 2) ragas eval
    scores, t_ragas = ragas_retrieval_eval(eval_df, ragas_embeddings)

    total_per_q = t_retr["retrieval_sec_per_q"] + t_ragas["ragas_sec_per_q"]

    row = {
        "retriever": name, # Name of the retrieval strategy being evaluated
        "n_questions": t_retr["n_questions"],  # Number of questions used in this benchmark run
        "context_precision": _to_mean(scores["context_precision"]), # How many of the retrieved chunks are actually relevant
                                                                    # (lower noise = higher precision)
        "context_recall": _to_mean(scores["context_recall"]), # How much of the ground-truth information was successfully retrieved
                                                                # (higher recall = better coverage)
        "retrieval_sec_per_q": round(t_retr["retrieval_sec_per_q"], 2), # Average time (in seconds) spent retrieving documents per question
        "ragas_sec_per_q": round(t_ragas["ragas_sec_per_q"], 2), # Average time (in seconds) spent running RAGAS evaluation per question
        "total_sec_per_q": round(total_per_q, 2), # Combined latency per question (retrieval + evaluation)
    }
    return row

# Evaluate all retrieval strategies on the same subset of N questions
# to ensure a fair and controlled comparison
N = 10

rows = []
rows.append(run_retriever_eval("naive", naive_retriever, df_gold, ragas_embeddings, n_questions=N))
rows.append(run_retriever_eval("bm25", bm25_retriever, df_gold, ragas_embeddings, n_questions=N))
rows.append(run_retriever_eval("parent", parent_document_retriever, df_gold, ragas_embeddings, n_questions=N))
rows.append(run_retriever_eval("multi_query", multi_query_retriever, df_gold, ragas_embeddings, n_questions=N))
rows.append(run_retriever_eval("compression", compression_retriever, df_gold, ragas_embeddings, n_questions=N))
rows.append(run_retriever_eval("ensemble", ensemble_retriever, df_gold, ragas_embeddings, n_questions=N))

# Final comparison table sorted by recall (recall often prioritized in RAG systems to reduce missed context)
summary = pd.DataFrame(rows).sort_values(by="context_recall", ascending=False).reset_index(drop=True)
summary

Retrieving: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10/10 [00:03<00:00,  2.86it/s]


Evaluating:   0%|          | 0/20 [00:00<?, ?it/s]

Retrieving: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10/10 [00:00<00:00, 1335.30it/s]


Evaluating:   0%|          | 0/20 [00:00<?, ?it/s]

Retrieving: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10/10 [00:01<00:00,  6.97it/s]


Evaluating:   0%|          | 0/20 [00:00<?, ?it/s]

Retrieving: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10/10 [00:10<00:00,  1.04s/it]


Evaluating:   0%|          | 0/20 [00:00<?, ?it/s]

Retrieving: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10/10 [00:03<00:00,  2.59it/s]


Evaluating:   0%|          | 0/20 [00:00<?, ?it/s]

Retrieving: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 10/10 [00:24<00:00,  2.46s/it]


Evaluating:   0%|          | 0/20 [00:00<?, ?it/s]

Unnamed: 0,retriever,n_questions,context_precision,context_recall,retrieval_sec_per_q,ragas_sec_per_q,total_sec_per_q
0,ensemble,10,0.411848,0.808333,2.15,44.93,47.07
1,parent,10,0.9,0.741667,0.14,11.73,11.87
2,compression,10,0.716667,0.675,0.39,11.94,12.33
3,multi_query,10,0.569286,0.666667,1.04,31.69,32.73
4,naive,10,0.608889,0.591667,0.35,24.82,25.17
5,bm25,10,0.191667,0.158333,0.0,13.04,13.05


### üìå Conclusion ‚Äì Retriever Comparison

In this activity, multiple retriever strategies were evaluated using a synthetic golden dataset and RAGAS retrieval metrics (context_precision and context_recall), along with latency per query.

The **Parent Document Retriever** provides the best overall balance for this dataset. It achieves very high precision (0.90) and strong recall (0.74), meaning it retrieves highly relevant context while still covering most of the ground-truth information. Its latency (~12 seconds per query) is also reasonable compared to more complex approaches.

Although the **Ensemble retriever** achieved the highest recall (0.80), it introduced significantly more noise (low precision) and much higher latency, making it less practical for real-world deployment. Multi-query and compression methods improved recall compared to naive retrieval but added additional latency due to LLM-based processing.

Overall, the Parent retriever offers the best trade-off between performance, latency, and practical usability for this dataset.

In [None]:
### YOUR CODE HERE