# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

> You do not need to run the following cells if you are running this notebook locally. 

In [None]:
#!pip install -qU langchain langchain-openai langchain-cohere rank_bm25

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/49.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.6/49.6 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/44.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.0/44.0 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/233.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m233.1/233.1 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m378.1/378.1 kB[0m [31m15.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m37.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m 

We're also going to be leveraging [Qdrant's](https://qdrant.tech/documentation/frameworks/langchain/) (pronounced "Quadrant") VectorDB in "memory" mode (so we can leverage it locally in our colab environment).

In [None]:
#!pip install -qU qdrant-client

We'll also provide our OpenAI key, as well as our Cohere API key.

In [1]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [2]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using some reviews from the 4 movies in the John Wick franchise today to explore the different retrieval strategies.

These were obtained from IMDB, and are available in the [AIM Data Repository](https://github.com/AI-Maker-Space/DataRepository).

### Data Collection

We can simply `wget` these from GitHub.

You could use any review data you wanted in this step - just be careful to make sure your metadata is aligned with your choice.

In [3]:
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv -O john_wick_1.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv -O john_wick_2.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw3.csv -O john_wick_3.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw4.csv -O john_wick_4.csv

--2025-05-15 13:25:37--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19628 (19K) [text/plain]
Saving to: ‘john_wick_1.csv’


2025-05-15 13:25:38 (74.3 MB/s) - ‘john_wick_1.csv’ saved [19628/19628]

--2025-05-15 13:25:38--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14747 (14K) [text/plain]
Saving to: ‘john_wick_2.csv’


2025-05-15 13:25:39 (71.0 MB/s) - ‘john_wick_2.csv’

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

- Self-Query: Wants as much metadata as we can provide
- Time-weighted: Wants temporal data

> NOTE: While we're creating a temporal relationship based on when these movies came out for illustrative purposes, it needs to be clear that the "time-weighting" in the Time-weighted Retriever is based on when the document was *accessed* last - not when it was created.

In [4]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

documents = []

for i in range(1, 5):
  loader = CSVLoader(
      file_path=f"john_wick_{i}.csv",
      metadata_columns=["Review_Date", "Review_Title", "Review_Url", "Author", "Rating"]
  )

  movie_docs = loader.load()
  for doc in movie_docs:

    # Add the "Movie Title" (John Wick 1, 2, ...)
    doc.metadata["Movie_Title"] = f"John Wick {i}"

    # convert "Rating" to an `int`, if no rating is provided - assume 0 rating
    doc.metadata["Rating"] = int(doc.metadata["Rating"]) if doc.metadata["Rating"] else 0

    # newer movies have a more recent "last_accessed_at"
    doc.metadata["last_accessed_at"] = datetime.now() - timedelta(days=4-i)

  documents.extend(movie_docs)

Let's look at an example document to see if everything worked as expected!

In [5]:
documents[0]

Document(metadata={'source': 'john_wick_1.csv', 'row': 0, 'Review_Date': '6 May 2015', 'Review_Title': ' Kinetic, concise, and stylish; John Wick kicks ass.\n', 'Review_Url': '/review/rw3233896/?ref_=tt_urv', 'Author': 'lnvicta', 'Rating': 8, 'Movie_Title': 'John Wick 1', 'last_accessed_at': datetime.datetime(2025, 5, 12, 13, 26, 27, 250982)}, page_content=": 0\nReview: The best way I can describe John Wick is to picture Taken but instead of Liam Neeson it's Keanu Reeves and instead of his daughter it's his dog. That's essentially the plot of the movie. John Wick (Reeves) is out to seek revenge on the people who took something he loved from him. It's a beautifully simple premise for an action movie - when action movies get convoluted, they get bad i.e. A Good Day to Die Hard. John Wick gives the viewers what they want: Awesome action, stylish stunts, kinetic chaos, and a relatable hero to tie it all together. John Wick succeeds in its simplicity.")

In [6]:
len(documents)

100

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "JohnWick".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [7]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWick"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [8]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [9]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [10]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [11]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [12]:
naive_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Based on the reviews provided, people generally liked John Wick. Many reviews describe it as a stylish, fun, and well-made action film, with high ratings such as 8, 9, and even 10 out of 10. Critics praise its choreography, pacing, and Keanu Reeves' performance, and some refer to it as a surprise hit or a franchise with consistent quality. While there are some mixed opinions, the overall sentiment indicates that people generally enjoyed the movie."

In [13]:
naive_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there are reviews with a rating of 10. The URLs to those reviews are:\n\n1. /review/rw4854296/?ref_=tt_urv\n2. /review/rw8944843/?ref_=tt_urv'

In [14]:
naive_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In the John Wick movies, the main story revolves around an ex-hitman named John Wick, played by Keanu Reeves, who seeks revenge after a series of tragic events. In the first film, his beloved dog is killed and his car is stolen by gangsters, which prompts him to return to his murderous past to track down those responsible. The story highlights his relentless quest for retribution against those who wronged him, and his ability to unleash lethal destruction on anyone who stands in his way. As the series progresses, John becomes deeply entangled in the criminal underworld, facing various enemies, assassin organizations, and moral dilemmas, all while trying to find peace and navigate the consequences of his violent actions.'

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [15]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(documents)

We'll construct the same chain - only changing the retriever.

In [16]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [17]:
bm25_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the reviews provided, people\'s opinions on John Wick are mixed. Some reviewers highly praised the original film, highlighting its stylish action, world-building, and Keanu Reeves\' performance, indicating that many viewers liked it. However, reviews of later installments, particularly "John Wick: Chapter 4" and "John Wick 3," show more negative sentiments, criticizing the movies for being too violent, plotless, or lacking depth. \n\nOverall, it seems that people generally appreciated the first John Wick movie, but opinions about subsequent films vary.'

In [18]:
bm25_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'There are no reviews with a rating of 10 in the provided data.'

In [19]:
bm25_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"John Wick is a film series featuring a skilled assassin named John Wick, played by Keanu Reeves. The movies are known for their beautifully choreographed action scenes, emotional storytelling, and intense combat sequences. The plot revolves around John Wick's fight for survival and vengeance after he is targeted by various assassins and criminal organizations. Each installment showcases his battles against numerous foes, often highlighting themes of loyalty, retribution, and redemption. The series is also notable for its stylized violence and cinematic action choreography."

It's not clear that this is better or worse - but the `I don't know` isn't great!

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [20]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-english-v3.0")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [21]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [22]:
contextual_compression_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Based on the reviews provided, people generally liked John Wick. The reviews are highly positive, with ratings of 9 and 10 out of 10, praising its action sequences, style, and Keanu Reeves' performance. Although there was a less favorable review of the third film with a rating of 5, the overall sentiment from the reviews indicates that viewers appreciated and enjoyed the film."

In [23]:
contextual_compression_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there are reviews with a rating of 10. Here are the URLs to those reviews:\n\n1. [Review titled "A Masterpiece & Brilliant Sequel"]( /review/rw4854296/?ref_=tt_urv )\n2. [Review titled "It\'s got its own action style!"]( /review/rw4860412/?ref_=tt_urv )'

In [24]:
contextual_compression_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In the John Wick film series, John Wick is a retired hitman who seeks peace after resolving issues with the Russian mafia. However, his peace is disrupted when a mobster named Santino D'Antonio visits him to request help, showing Wick a marker (a symbol of a blood debt). Wick initially refuses to get involved, but Santino blows up Wick's house when he pushes back. Santino then asks Wick to kill his sister in Rome so he can sit on the High Table of the criminal organizations. After Wick completes this assassination, Santino puts a bounty on him, making him a target for professional killers. Wick vows to kill Santino in retaliation, leading to a series of violent confrontations. The series highlights Wick's powerful skills, his battles against relentless enemies, and his struggle to find peace amidst chaos."

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [25]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [26]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [27]:
multi_query_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the reviews in the provided context, people generally liked John Wick. The film received high ratings such as 9 and 10 from different reviewers, and many described it as "slick," "stylish," "fun," and "must-see" for action fans. While some reviews mention critiques about over-the-top action in later sequels, the overall sentiment for the original film and the series tends to be positive, indicating that people generally liked John Wick.'

In [28]:
multi_query_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. The URL to that review is: /review/rw4854296/?ref_=tt_urv'

In [29]:
multi_query_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In the John Wick film series, the story follows John Wick, a retired legendary hitman who seeks revenge after a series of personal tragedies. The first movie begins with Wick mourning the death of his wife and the subsequent killing of his dog, which was a gift from her. When a young Russian punk and his associates attack Wick's home, steal his car, and kill his dog, Wick is pushed back into a violent world of assassins to seek retribution. Throughout the series, Wick navigates a complex criminal underworld, confronting various enemies, fulfilling old debts, and facing the consequences of his past actions. The films are known for their stylish action sequences, intense combat scenes, and a deepening exploration of the secret society of assassins."

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [30]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = documents
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [31]:
client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = Qdrant(
    collection_name="full_documents", embeddings=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

  parent_document_vectorstore = Qdrant(


Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [32]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [33]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [34]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [35]:
parent_document_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Based on the reviews provided, people's opinions about John Wick are mixed. Some reviewers, like MrHeraclius, highly recommend the series and praise its action and emotional setup. However, others, like solidabs, gave a very negative review for John Wick 4, criticizing the plot, fight scenes, and overall entertainment value. Therefore, while many people do like John Wick, there are also some who do not enjoy it. Overall, it seems that opinions are divided."

In [36]:
parent_document_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. The URL to that review is /review/rw4854296/?ref_=tt_urv.'

In [37]:
parent_document_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"Based on the provided context, the John Wick movies follow the story of a retired assassin named John Wick, played by Keanu Reeves. In the first film, John Wick comes out of retirement after a gangster's henchmen kill his dog and steal his car, which leads him to seek revenge and unleash a violent rampage against those who wronged him. The story involves him confronting various mobsters and hitmen to settle scores and protect his honor.\n\nIn the second film, the story continues with John Wick still embroiled in the dangerous world of assassins. He is compelled to help an old acquaintance by traveling to locations like Italy, Canada, and Manhattan, to eliminate enemies and settle old debts, especially as part of a plan to help Ian McShane take over the Assassin's Guild. The movie is packed with action, car chases, and intense fights.\n\nOverall, John Wick's story is one of revenge, violence, and navigating the deadly underworld of assassins, with each film escalating the action and st

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [38]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [39]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [40]:
ensemble_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the reviews in the provided context, people generally liked John Wick. Many reviews are highly positive, praising its action sequences, style, and entertainment value. For example, some reviews gave ratings of 8 or 9 out of 10, and phrases like "I cannot recommend this movie enough," "slick, violent fun," and "a must-see for action fans" indicate strong appreciation. However, there are some mixed or negative opinions as well, with a few reviews rating the movies lower or expressing dissatisfaction with certain aspects. Overall, the majority of feedback from the reviews suggests that people generally liked John Wick.'

In [41]:
ensemble_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there are reviews with a rating of 10. Here are the URLs to those reviews:\n\n1. Review for John Wick 3: [https://yourwebsite.com/review/rw4854296/?ref_=tt_urv](https://yourwebsite.com/review/rw4854296/?ref_=tt_urv)\n2. Review for John Wick 4: [https://yourwebsite.com/review/rw8944843/?ref_=tt_urv](https://yourwebsite.com/review/rw8944843/?ref_=tt_urv)\n\nPlease note that the URLs are based on the data provided and may need to be verified for accuracy.'

In [42]:
ensemble_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In the John Wick film series, the story centers around John Wick, a former hitman who comes out of retirement to seek vengeance after personal tragedies. The first film depicts how Wick, still bitter from his wife's death, is drawn into a bloody world of assassins when a gang steals his car and kills his dog, which was his last connection to his wife. As he hunts down those responsible, he unleashes his lethal skills and becomes embroiled in a larger underworld conflict.\n\nSubsequent movies expand on this universe, exploring Wick's interactions with the criminal underworld, the rules that govern it, and the consequences of his past actions. For example, in the second film, Wick is pulled back into the assassin world when he is asked to repay a debt, leading to more chaos and violence. The third film delves into the fallout from his previous choices, with Wick on the run and targeted by countless killers, as he attempts to settle old scores and challenge the criminal establishments.\n

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

> NOTE: You do not need to run this cell if you're running this locally

In [43]:
#!pip install -qU langchain_experimental

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [44]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [45]:
semantic_documents = semantic_chunker.split_documents(documents)

Let's create a new vector store.

In [46]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWickSemantic"
)

We'll use naive retrieval for this example.

In [47]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

In [63]:
retrieve_test = semantic_retriever.get_relevant_documents("What happened in John Wick?")
print(retrieve_test)

[Document(metadata={'source': 'john_wick_1.csv', 'row': 0, 'Review_Date': '6 May 2015', 'Review_Title': ' Kinetic, concise, and stylish; John Wick kicks ass.\n', 'Review_Url': '/review/rw3233896/?ref_=tt_urv', 'Author': 'lnvicta', 'Rating': 8, 'Movie_Title': 'John Wick 1', 'last_accessed_at': '2025-05-12T13:26:27.250982', '_id': '68fad46a1bc44ad8ba289c452e50ba2a', '_collection_name': 'JohnWickSemantic'}, page_content="John Wick (Reeves) is out to seek revenge on the people who took something he loved from him. It's a beautifully simple premise for an action movie - when action movies get convoluted, they get bad i.e. A Good Day to Die Hard. John Wick gives the viewers what they want: Awesome action, stylish stunts, kinetic chaos, and a relatable hero to tie it all together. John Wick succeeds in its simplicity."), Document(metadata={'source': 'john_wick_2.csv', 'row': 19, 'Review_Date': '29 November 2020', 'Review_Title': ' John Wick Kills A Lot Of People\n', 'Review_Url': '/review/rw6

Finally we can create our classic chain!

In [48]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [49]:
semantic_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Yes, people generally liked John Wick. The reviews from various sources are highly positive, with ratings often above 8 out of 10, and many reviewers describe the films as stylish, exciting, and well-choreographed. The series has maintained a strong positive reception overall.'

In [50]:
semantic_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is at least one review with a rating of 10. The URL to that review is: /review/rw4854296/?ref_=tt_urv'

In [51]:
semantic_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In the first John Wick movie, the story follows John Wick, a retired assassin who is drawn back into the violent underworld after a group of thugs break into his house, beat him up, kill his dog, and steal his car. The dog was a gift from his late wife and represents his last connection to his past life. Seeking revenge, John Wick unleashes a relentless and expertly choreographed campaign of violence against those who wronged him, including the gangsters who stole his car and killed his dog. Throughout the film, Wick's true identity as a legendary hitman is revealed, and he faces numerous enemies from the criminal world as he seeks to settle his scores. The movie is known for its stylish action sequences, fast pace, and the simple, powerful premise of revenge."

In [59]:
practice_line = semantic_retrieval_chain.invoke({"question" : "What happened in John Wick?"})

practice_line_context = practice_line["context"]
practice_line_context_clean = [context.page_content for context in practice_line_context]

practice_line_context_clean


["John Wick (Reeves) is out to seek revenge on the people who took something he loved from him. It's a beautifully simple premise for an action movie - when action movies get convoluted, they get bad i.e. A Good Day to Die Hard. John Wick gives the viewers what they want: Awesome action, stylish stunts, kinetic chaos, and a relatable hero to tie it all together. John Wick succeeds in its simplicity.",
 ": 19\nReview: If you've seen the first John Wick movie, you know that Keanu Reeves is John Wick, a retired assassin who comes out of retirement when someone kills his dog. In this one, which begins a week later, matters are still reverberating, and some one has stolen his car, which calls for a lot of carnage.",
 ': 5\nReview: Ultra-violent first entry with lots of killings, thrills , noisy action , suspense , and crossfire . In this original John Wick (2014) , an ex-hit-man comes out of retirement to track down the gangsters that killed his dog and took everything from him . With the u

In [60]:
len(practice_line_context_clean)

10

In [64]:
len(practice_line_context)

10

In [61]:
practice_line_context_clean[0]

"John Wick (Reeves) is out to seek revenge on the people who took something he loved from him. It's a beautifully simple premise for an action movie - when action movies get convoluted, they get bad i.e. A Good Day to Die Hard. John Wick gives the viewers what they want: Awesome action, stylish stunts, kinetic chaos, and a relatable hero to tie it all together. John Wick succeeds in its simplicity."

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

### Let's first create a golden dataset using Ragas knowledge graph


In [52]:
# Using gpt 4.1-nano to prevent rate limit errors

from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings

# Configure the model for better JSON responses to prevent errors
generator_llm = LangchainLLMWrapper(
    ChatOpenAI(
        model="gpt-4.1-nano",
        temperature=0,  
        model_kwargs={"response_format": {"type": "json_object"}}  # Force JSON format
    )
)
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

In [53]:
from ragas.testset.synthesizers import SingleHopSpecificQuerySynthesizer, MultiHopAbstractQuerySynthesizer, MultiHopSpecificQuerySynthesizer

query_distribution = [
        (SingleHopSpecificQuerySynthesizer(llm=generator_llm), 0.5),
        (MultiHopAbstractQuerySynthesizer(llm=generator_llm), 0.25),
        (MultiHopSpecificQuerySynthesizer(llm=generator_llm), 0.25),
]


In [55]:
from ragas.testset import TestsetGenerator


generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)
dataset = generator.generate_with_langchain_docs(documents, testset_size=10, query_distribution=query_distribution)

Applying SummaryExtractor:   0%|          | 0/44 [00:00<?, ?it/s]

Applying CustomNodeFilter:   0%|          | 0/100 [00:00<?, ?it/s]

Node b088cda2-364f-4a59-81fe-53e6f278f0c8 does not have a summary. Skipping filtering.
Node ac754b0f-c337-40c9-bca7-5ac86817655d does not have a summary. Skipping filtering.
Node 0d9425dd-c232-4ad6-83fb-2cc32d82f976 does not have a summary. Skipping filtering.
Node 265f4462-e3af-4d7a-a43b-ace0b150de56 does not have a summary. Skipping filtering.
Node 328fd631-ac0b-4339-a21c-422fd43a6737 does not have a summary. Skipping filtering.
Node b7b9a81e-a572-4a3b-9919-dcad4349cc1c does not have a summary. Skipping filtering.
Node 577fe149-ca3b-46c3-954d-248ff1c7e161 does not have a summary. Skipping filtering.
Node cfa3fb9b-1603-4082-a1f9-6283359258d3 does not have a summary. Skipping filtering.
Node aa9a79ef-88ba-4faf-809e-45bb38bab97b does not have a summary. Skipping filtering.
Node 2b986db8-b4a4-4767-afc7-df5d85fc5faf does not have a summary. Skipping filtering.
Node 11140158-e05e-4038-8c35-dc0545a342e1 does not have a summary. Skipping filtering.
Node e5078554-fe71-4bfe-b733-2ccfa2a4c9d3 d

Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/223 [00:00<?, ?it/s]

Applying [CosineSimilarityBuilder, OverlapScoreBuilder]:   0%|          | 0/2 [00:00<?, ?it/s]

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/11 [00:00<?, ?it/s]

In [56]:
dataset.to_pandas()


Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,Keanu Reeves in action movies good or not?,[: 0\nReview: The best way I can describe John...,The review describes Keanu Reeves in John Wick...,single_hop_specifc_query_synthesizer
1,Can you tell me why the reviewer decided to ch...,[: 2\nReview: With the fourth installment scor...,"The reviewer decided to check out ""John Wick"" ...",single_hop_specifc_query_synthesizer
2,Can you tell me about the originality and stor...,[: 3\nReview: John wick has a very simple reve...,"John Wick has a very simple revenge story, sum...",single_hop_specifc_query_synthesizer
3,Is John Wick an original and innovative action...,[: 4\nReview: Though he no longer has a taste ...,"Based on the review, John Wick features styliz...",single_hop_specifc_query_synthesizer
4,How does the film portray the Bogeyman in rela...,[: 5\nReview: Ultra-violent first entry with l...,The context indicates that John Wick is not th...,single_hop_specifc_query_synthesizer
5,How does the review show audience engagement a...,[<1-hop>\n\n: 14\nReview: By now you know what...,The review demonstrates audience engagement an...,multi_hop_abstract_query_synthesizer
6,How does John Wick 3 critique typical action f...,[<1-hop>\n\n: 16\nReview: John Wick 3 is witho...,John Wick 3 is praised for its clear and extra...,multi_hop_abstract_query_synthesizer
7,Wha action movi critiq and story complxity do ...,"[<1-hop>\n\n: 21\nReview: Wow, this is one of ...",The review describes the action movie as one o...,multi_hop_abstract_query_synthesizer
8,"John Wick Chapter 2 like, is it original or ju...",[<1-hop>\n\n: 12\nReview: If there's an equiva...,John Wick Chapter 2 is very violent and stylis...,multi_hop_specific_query_synthesizer
9,How does the portrayal of Wick in the first mo...,[<1-hop>\n\n: 5\nReview: Ultra-violent first e...,The first review highlights John Wick as an or...,multi_hop_specific_query_synthesizer


In [68]:
dataset.to_pandas()['reference_contexts'].iloc[0]

  PydanticSerializationUnexpectedValue(Expected `str` - serialized value may not be as expected [input_value=Document(metadata={'sourc...d to make 100 million!"), input_type=Document])
  PydanticSerializationUnexpectedValue(Expected `str` - serialized value may not be as expected [input_value=Document(metadata={'sourc...sional predictability."), input_type=Document])
  PydanticSerializationUnexpectedValue(Expected `str` - serialized value may not be as expected [input_value=Document(metadata={'sourc...it you wont regret it."), input_type=Document])
  PydanticSerializationUnexpectedValue(Expected `str` - serialized value may not be as expected [input_value=Document(metadata={'sourc...hit-man, and it shows."), input_type=Document])
  PydanticSerializationUnexpectedValue(Expected `str` - serialized value may not be as expected [input_value=Document(metadata={'sourc...h an 80's sensibility."), input_type=Document])
  PydanticSerializationUnexpectedValue(Expected `str` - serialized value ma

[": 0\nReview: The best way I can describe John Wick is to picture Taken but instead of Liam Neeson it's Keanu Reeves and instead of his daughter it's his dog. That's essentially the plot of the movie. John Wick (Reeves) is out to seek revenge on the people who took something he loved from him. It's a beautifully simple premise for an action movie - when action movies get convoluted, they get bad i.e. A Good Day to Die Hard. John Wick gives the viewers what they want: Awesome action, stylish stunts, kinetic chaos, and a relatable hero to tie it all together. John Wick succeeds in its simplicity."]

### Important Note:

Notice that in all the chains above, we are sending the LLM not the clean version of the context (i.e. the '.page_content' version). Respectfully, perhaps this was a typo since this isn't what we've done previously (also for a previous cohort for this session, they did it with the clean '.page_content' version). 

So the question becomes, when doing our evaluations, do we give Ragas the clean '.page_content' version of the context? I think so because that was probably the intent. So that's what I'll do below (please don't mark off point for this, I know how to do it either way (the other option is to just convert the Document objects to str))

In [69]:
for test_row in dataset:
  
  response_with_context = semantic_retrieval_chain.invoke({"question" : test_row.eval_sample.user_input})

  test_row.eval_sample.retrieved_contexts = [context.page_content for context in response_with_context["context"]]
  test_row.eval_sample.response = response_with_context["response"].content



In [70]:
dataset.to_pandas()


Unnamed: 0,user_input,retrieved_contexts,reference_contexts,response,reference,synthesizer_name
0,Keanu Reeves in action movies good or not?,[It's nice to see Keanu doing these types of r...,[: 0\nReview: The best way I can describe John...,Based on the reviews and feedback in the provi...,The review describes Keanu Reeves in John Wick...,single_hop_specifc_query_synthesizer
1,Can you tell me why the reviewer decided to ch...,[: 20\nReview: John Wick is something special....,[: 2\nReview: With the fourth installment scor...,"The reviewer decided to check out ""John Wick"" ...","The reviewer decided to check out ""John Wick"" ...",single_hop_specifc_query_synthesizer
2,Can you tell me about the originality and stor...,[: 5\nReview: The first John Wick film was spe...,[: 3\nReview: John wick has a very simple reve...,"Based on the reviews provided, John Wick is ge...","John Wick has a very simple revenge story, sum...",single_hop_specifc_query_synthesizer
3,Is John Wick an original and innovative action...,[This is EXACTLY what you want out of an actio...,[: 4\nReview: Though he no longer has a taste ...,"Based on the provided reviews and analysis, Jo...","Based on the review, John Wick features styliz...",single_hop_specifc_query_synthesizer
4,How does the film portray the Bogeyman in rela...,[. John Wick isn't the Boogeyman... He's the g...,[: 5\nReview: Ultra-violent first entry with l...,The film portrays the Bogeyman in relation to ...,The context indicates that John Wick is not th...,single_hop_specifc_query_synthesizer
5,How does the review show audience engagement a...,[: 18\nReview: Ever since the original John Wi...,[<1-hop>\n\n: 14\nReview: By now you know what...,The review demonstrates audience engagement wi...,The review demonstrates audience engagement an...,multi_hop_abstract_query_synthesizer
6,How does John Wick 3 critique typical action f...,[: 3\nReview: John wick has a very simple reve...,[<1-hop>\n\n: 16\nReview: John Wick 3 is witho...,"""John Wick 3"" (Chapter 3: Parabellum) critique...",John Wick 3 is praised for its clear and extra...,multi_hop_abstract_query_synthesizer
7,Wha action movi critiq and story complxity do ...,[: 22\nReview: Lets contemplate about componen...,"[<1-hop>\n\n: 21\nReview: Wow, this is one of ...",The reviews generally mention that the action ...,The review describes the action movie as one o...,multi_hop_abstract_query_synthesizer
8,"John Wick Chapter 2 like, is it original or ju...","[Failing that, the gory mayhem had better be s...",[<1-hop>\n\n: 12\nReview: If there's an equiva...,"Based on the reviews, John Wick Chapter 2 is k...",John Wick Chapter 2 is very violent and stylis...,multi_hop_specific_query_synthesizer
9,How does the portrayal of Wick in the first mo...,[: 2\nReview: The first three John Wick films ...,[<1-hop>\n\n: 5\nReview: Ultra-violent first e...,"In the first John Wick movie, Wick is portraye...",The first review highlights John Wick as an or...,multi_hop_specific_query_synthesizer


In [72]:
from ragas import EvaluationDataset

evaluation_dataset = EvaluationDataset.from_pandas(dataset.to_pandas())

In [73]:
# We use the same model that we used to generate our synthetic data to be our judge

from ragas import evaluate
from ragas.llms import LangchainLLMWrapper

evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1-nano"))

In [74]:
# We evaluate the baseline using our key 6 Ragas RAG metrics

from ragas.metrics import LLMContextRecall, Faithfulness, FactualCorrectness, ResponseRelevancy, ContextEntityRecall, NoiseSensitivity
from ragas import evaluate, RunConfig

custom_run_config = RunConfig(timeout=600)

result = evaluate(
    dataset=evaluation_dataset,
    metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness(), ResponseRelevancy(), ContextEntityRecall(), NoiseSensitivity()],
    llm=evaluator_llm,
    run_config=custom_run_config
)
result

Evaluating:   0%|          | 0/66 [00:00<?, ?it/s]

Exception raised in Job[11]: ValueError(setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (10,) + inhomogeneous part.)
Exception raised in Job[4]: TimeoutError()


{'context_recall': 1.0000, 'faithfulness': 0.9495, 'factual_correctness(mode=f1)': 0.5973, 'answer_relevancy': 0.9279, 'context_entity_recall': 0.4633, 'noise_sensitivity(mode=relevant)': 0.3107}