# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

> You do not need to run the following cells if you are running this notebook locally. 

In [None]:
#!pip install -qU langchain langchain-openai langchain-cohere rank_bm25

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/49.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.6/49.6 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/44.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.0/44.0 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/233.1 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m233.1/233.1 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m378.1/378.1 kB[0m [31m15.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m37.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m 

We're also going to be leveraging [Qdrant's](https://qdrant.tech/documentation/frameworks/langchain/) (pronounced "Quadrant") VectorDB in "memory" mode (so we can leverage it locally in our colab environment).

In [None]:
#!pip install -qU qdrant-client

We'll also provide our OpenAI key, as well as our Cohere API key.

In [25]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [88]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using some reviews from the 4 movies in the John Wick franchise today to explore the different retrieval strategies.

These were obtained from IMDB, and are available in the [AIM Data Repository](https://github.com/AI-Maker-Space/DataRepository).

### Data Collection

We can simply `wget` these from GitHub.

You could use any review data you wanted in this step - just be careful to make sure your metadata is aligned with your choice.

In [None]:
# !wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv -O john_wick_1.csv
# !wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv -O john_wick_2.csv
# !wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw3.csv -O john_wick_3.csv
# !wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw4.csv -O john_wick_4.csv

--2025-05-13 18:55:43--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19628 (19K) [text/plain]
Saving to: ‘john_wick_1.csv’


2025-05-13 18:55:43 (45.3 MB/s) - ‘john_wick_1.csv’ saved [19628/19628]

--2025-05-13 18:55:43--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14747 (14K) [text/plain]
Saving to: ‘john_wick_2.csv’


2025-05-13 18:55:43 (164 MB/s) - ‘john_wick_2.csv’ 

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

- Self-Query: Wants as much metadata as we can provide
- Time-weighted: Wants temporal data

> NOTE: While we're creating a temporal relationship based on when these movies came out for illustrative purposes, it needs to be clear that the "time-weighting" in the Time-weighted Retriever is based on when the document was *accessed* last - not when it was created.

In [27]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

documents = []

for i in range(1, 5):
  loader = CSVLoader(
      file_path=f"john_wick_{i}.csv",
      metadata_columns=["Review_Date", "Review_Title", "Review_Url", "Author", "Rating"]
  )

  movie_docs = loader.load()
  for doc in movie_docs:

    # Add the "Movie Title" (John Wick 1, 2, ...)
    doc.metadata["Movie_Title"] = f"John Wick {i}"

    # convert "Rating" to an `int`, if no rating is provided - assume 0 rating
    doc.metadata["Rating"] = int(doc.metadata["Rating"]) if doc.metadata["Rating"] else 0

    # newer movies have a more recent "last_accessed_at"
    doc.metadata["last_accessed_at"] = datetime.now() - timedelta(days=4-i)

  documents.extend(movie_docs)

Let's look at an example document to see if everything worked as expected!

In [28]:
documents[0]

Document(metadata={'source': 'john_wick_1.csv', 'row': 0, 'Review_Date': '6 May 2015', 'Review_Title': ' Kinetic, concise, and stylish; John Wick kicks ass.\n', 'Review_Url': '/review/rw3233896/?ref_=tt_urv', 'Author': 'lnvicta', 'Rating': 8, 'Movie_Title': 'John Wick 1', 'last_accessed_at': datetime.datetime(2025, 5, 15, 11, 57, 30, 848670)}, page_content=": 0\nReview: The best way I can describe John Wick is to picture Taken but instead of Liam Neeson it's Keanu Reeves and instead of his daughter it's his dog. That's essentially the plot of the movie. John Wick (Reeves) is out to seek revenge on the people who took something he loved from him. It's a beautifully simple premise for an action movie - when action movies get convoluted, they get bad i.e. A Good Day to Die Hard. John Wick gives the viewers what they want: Awesome action, stylish stunts, kinetic chaos, and a relatable hero to tie it all together. John Wick succeeds in its simplicity.")

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "JohnWick".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [29]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWick"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [30]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [31]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [32]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")


### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [33]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [12]:
naive_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the reviews provided, people generally liked John Wick. Several reviews gave high ratings (such as 9 or 10 out of 10) and praised its action, style, and overall entertainment value. While there are some mixed reviews with lower ratings (around 5 or 6), the overall trend suggests that most viewers appreciated the film and considered it a strong action movie.'

In [13]:
naive_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. The URL to that review is: /review/rw4854296/?ref_=tt_urv'

In [14]:
naive_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In the John Wick series, the story revolves around a retired hitman named John Wick, played by Keanu Reeves, who seeks revenge after his dog is killed and his car is stolen by a young Russian punk and his gang. The series explores his relentless quest for retribution, revealing his lethal abilities as a former assassin, and the complex criminal underworld he operates in. Throughout the series, John Wick faces numerous enemies, bounty hunters, and mobsters, leading to intense action sequences and a world filled with criminal organizations, rules, and alliances. His actions have significant consequences, and the films depict his journey through violence and vengeance to find peace or complete his vendettas.'

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [34]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(documents)

We'll construct the same chain - only changing the retriever.

In [35]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [17]:
bm25_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Based on the reviews provided, people's opinions on John Wick vary. Some reviews highly praise the series, especially the first film, calling it stylish, exciting, and a must-see for action fans. Others have mixed feelings, noting that later installments can be less engaging or overly violent. There is at least one negative review criticizing the third film for being dull and stereotypical. Overall, while many viewers enjoyed the movies, opinions are mixed, and not everyone liked John Wick."

In [18]:
bm25_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Based on the provided reviews, there are no reviews with a rating of 10.'

In [19]:
bm25_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In the John Wick movies, the story revolves around John Wick, a former hitman who is drawn back into violence and chaos after a series of events. The original film depicts how Wick seeks vengeance against those who wronged him, especially after the death of his beloved dog, which was a gift from his late wife. Throughout the series, Wick faces numerous enemies, including assassins and criminal organizations, engaging in highly choreographed and brutal combat. The movies are known for their exceptional action sequences, emotional depth, and a fictional criminal underworld that drives the plot.'

It's not clear that this is better or worse - but the `I don't know` isn't great!

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [90]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-english-v3.0")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [91]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [92]:
contextual_compression_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the reviews provided, people generally liked John Wick. The first two reviews highly praise the film, giving it ratings of 9 and 10 out of 10, and describe it as a fun, stylish, and exciting action movie. However, the third review gives a lower rating of 5 out of 10 for John Wick 3, indicating some disappointment. Overall, the general reception appears to be positive, especially for the first film, but opinions on subsequent installments vary.'

In [23]:
contextual_compression_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there are reviews with a rating of 10. Here are the URLs to those reviews:\n\n1. [Review of John Wick 3 titled "A Masterpiece & Brilliant Sequel"]( /review/rw4854296/?ref_=tt_urv )  \n2. [Review of John Wick 4 titled "How Can Anyone Choose to Watch Marvel Over This?"]( /review/rw8944843/?ref_=tt_urv )  \n3. [Review of John Wick 3 titled "It\'s got its own action style!"]( /review/rw4860412/?ref_=tt_urv )  \n\nLet me know if you\'d like more details!'

In [27]:
contextual_compression_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In John Wick, after resolving his issues with the Russian mafia, John Wick is forced back into action when mobster Santino D'Antonio asks him to kill his sister in Rome. When Wick completes the task, Santino puts a bounty on him, leading to professional killers coming after him. Wick then seeks revenge on Santino."

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [38]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [39]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [26]:
multi_query_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the reviews in the provided context, people generally liked John Wick. Many reviews gave high ratings and used positive language to describe the film\'s action, style, and entertainment value. For example, some reviews rated it 9 or 10 out of 10 and called it "slick," "brilliant," "insanely fun," and "remarkable." However, there are a few mixed or negative opinions as well, with some reviewers giving lower ratings and expressing that the movie became over-the-top or lost some of its appeal over multiple sequels. Overall, the majority of reviews suggest that people generally enjoyed John Wick.'

In [27]:
multi_query_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there are reviews with a rating of 10. One such review has the URL: /review/rw4854296/?ref_=tt_urv.'

In [28]:
multi_query_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In the John Wick film series, the story follows John Wick, a retired hitman who is drawn back into the violent world of assassins after personal tragedies. The original film begins with Wick seeking revenge after gangsters kill his dog and steal his car, which prompts him to unleash a relentless and highly skilled campaign of vengeance against those responsible. The series explores a fictional underworld filled with crime syndicates and strict rules, with Wick navigating this dangerous world while confronting various enemies and personal enemies. Throughout the movies, Wick deals with the consequences of his actions, goes on brutal missions, and becomes a legendary figure feared by many in the criminal underworld.'

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [40]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = documents
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [41]:
client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = Qdrant(
    collection_name="full_documents", embeddings=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

  parent_document_vectorstore = Qdrant(


Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [42]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [43]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [44]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [34]:
parent_document_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the reviews provided, people\'s opinions on John Wick vary. Some reviewers, like MrHeraclius, highly recommend the series and praise its action and emotional depth, indicating they like the movies. Others, like solidabs, give a very negative review of John Wick 4, calling it "horrible" and criticizing its plot and action scenes, which suggests they did not like that installment despite generally liking the series.\n\nTherefore, it can be said that people generally have mixed feelings about John Wick. Many fans enjoy the series, but there are also notable negative opinions.'

In [35]:
parent_document_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. The URL for that review is: /review/rw4854296/?ref_=tt_urv'

In [36]:
parent_document_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In the John Wick movies, John Wick is a retired hitman who comes out of retirement to seek vengeance and handle various dangerous situations. The first film depicts him returning to the violent world after his dog is killed and his car is stolen, leading him to unleash a ruthless and orchestrated revenge against those who wronged him. The second film continues his adventures, involving him dealing with old debts, an international journey, and being pulled back into the assassin world, resulting in extensive action and killing. Overall, the series features John Wick's relentless combat, his quest for retribution, and his efforts to navigate a dangerous underworld."

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [93]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [94]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [95]:
ensemble_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the reviews in the provided context, people generally liked John Wick. The reviews mention high ratings, praise for its stylish action sequences, and positive impressions of the film and its sequels. While some reviews express criticism of certain aspects, the overall sentiment indicates that John Wick is well-received and appreciated, especially by action fans.'

In [40]:
ensemble_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there are reviews with a rating of 10. \n\nThe URLs to those reviews are:\n- /review/rw4854296/?ref_=tt_urv\n- /review/rw8946038/?ref_=tt_urv'

In [41]:
ensemble_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In the John Wick film series, the story revolves around John Wick, a retired hitman who is drawn back into the violent underworld he thought he had left behind. The first movie starts with Wick seeking revenge after a group of gangsters steal his car and kill his dog, which was a gift from his deceased wife. As he uncovers the betrayal, Wick unleashes a relentless and expertly choreographed series of violent retributions against those who wronged him, showcasing his lethal skills and setting the tone for the franchise.\n\nSubsequent films explore Wick's ongoing conflicts within the assassin underworld, with themes of revenge, loyalty, and the consequences of his past actions. The series features a richly developed criminal world, including hotels that serve as neutral grounds, a powerful criminal hierarchy, and moral codes that even assassins follow. Over time, Wick's quest for peace turns into a continuous cycle of violence, as he faces numerous adversaries, Old contacts, and new ene

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

> NOTE: You do not need to run this cell if you're running this locally

In [None]:
#!pip install -qU langchain_experimental

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.4/50.4 kB[0m [31m1.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m208.1/208.1 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m40.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m399.9/399.9 kB[0m [31m17.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m32.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m292.1/292.1 kB[0m [31m15.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [47]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [48]:
semantic_documents = semantic_chunker.split_documents(documents)

Let's create a new vector store.

In [49]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWickSemantic"
)

We'll use naive retrieval for this example.

In [50]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [51]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [47]:
semantic_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Based on the reviews in the provided context, people generally liked John Wick. Many reviews are highly positive, praising its action sequences, style, and Keanu Reeves' performance, with ratings often around 8 to 10. However, there are a few mixed or negative reviews, with some ratings as low as 0 and 2, indicating that not everyone enjoyed it. Overall, the majority of reviews suggest that people tend to like John Wick."

In [48]:
semantic_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. The URL to that review is: /review/rw4854296/?ref_=tt_urv'

In [49]:
semantic_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In the film "John Wick," the main character, played by Keanu Reeves, is a retired assassin who seeks revenge after a group of thugs break into his home, beat him up, kill his dog, and steal his car. The dog was a gift from his late wife, and its killing sparks his return to a life of violence. As he hunts down those responsible, he reveals himself to be a highly skilled and deadly hitman. The story centers around his relentless pursuit of vengeance against the gangsters who wronged him, with the narrative showcasing intense action, stylish stunts, and a vivid underworld setting.'

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

## Step 1: Generate Synthetic Data

First we should generate test data that will use to evaulate the performance of our retrievers, we will use RAGAS Synthetic Data Generator to generate the test data and store the data in a CSV file so that we can re-use it for each retriever

> `testset.csv` is generated using `grok-3` and `Snowflake/snowflake-arctic-embed-l` in `grok_3.ipynb` notebook.

## Step 2: Generate Eval Dataset

Now, we will create LangSmith dataset and run evalution for `LLMContextRecall`, `Faithfulness`, `ContextRecall`, `AnswerRelevancy` RAGAS.

## Step 3: Evaluation

Each retriever chain will be evaulated using experiement on LangSmith dataset.


## Step 4: Write Report

In this final step, we build comparison table and write a report. We will get Cost latency and performace data from LangSmith



In [None]:
from ragas.testset import Testset
import ast
import pandas as pd

def get_testset_from_csv(csv_path="testset.csv"):
    """
    Load a test set from a CSV file.
    
    The CSV file should have the following columns:
    - 'user_input'
    - 'reference_contexts'
    - 'reference'
    - 'synthesizer_name'
    """
    

    
    df = pd.read_csv(csv_path)
    # Convert string representations of lists to actual Python lists
    df['reference_contexts'] = df['reference_contexts'].apply(ast.literal_eval)
    
    testset = Testset.from_pandas(df)
    
    return testset

In [14]:
testset  = get_testset_from_csv()

In [8]:
import os
import getpass

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LangChain API Key:")
os.environ["LANGCHAIN_PROJECT"] = "AIE6 - Advanced RAG - John Wick"

## Upload LangSmith DataSet

In [None]:
# from ragas.integrations.langsmith import upload_dataset

from __future__ import annotations

import typing as t

from langchain.smith import RunEvalConfig

from ragas.integrations.langchain import EvaluatorChain

if t.TYPE_CHECKING:
    from langsmith.schemas import Dataset as LangsmithDataset

    from ragas.testset import Testset

try:
    from langsmith import Client
    from langsmith.utils import LangSmithNotFoundError
except ImportError:
    raise ImportError(
        "Please install langsmith to use this feature. You can install it via pip install langsmith"
    )


def upload_dataset(
    dataset: pd.DataFrame, dataset_name: str, dataset_desc: str = ""
) -> LangsmithDataset:
    """
    Uploads a new dataset to LangSmith, converting it from a TestDataset object to a
    pandas DataFrame before upload. If a dataset with the specified name already
    exists, the function raises an error.

    Parameters
    ----------
    dataset : TestDataset
        The dataset to be uploaded.
    dataset_name : str
        The name for the new dataset in LangSmith.
    dataset_desc : str, optional
        A description for the new dataset. The default is an empty string.

    Returns
    -------
    LangsmithDataset
        The dataset object as stored in LangSmith after upload.

    Raises
    ------
    ValueError
        If a dataset with the specified name already exists in LangSmith.

    Notes
    -----
    The function attempts to read a dataset by the given name to check its existence.
    If not found, it proceeds to upload the dataset after converting it to a pandas
    DataFrame. This involves specifying input and output keys for the dataset being
    uploaded.
    """
    client = Client()
    try:
        # check if dataset exists
        langsmith_dataset: LangsmithDataset = client.read_dataset(
            dataset_name=dataset_name
        )
        raise ValueError(
            f"Dataset {dataset_name} already exists in langsmith. [{langsmith_dataset}]"
        )
    except LangSmithNotFoundError:
        # if not create a new one with the generated query examples
        langsmith_dataset: LangsmithDataset = client.upload_dataframe(
            df=dataset,
            name=dataset_name,
            input_keys=["question"],
            output_keys=["ground_truth"],
            #metadata_keys=["context"],
            description=dataset_desc,
        )

        print(
            f"Created a new dataset '{langsmith_dataset.name}'. Dataset is accessible at {langsmith_dataset.url}"
        )
        return langsmith_dataset
    
# Load the test set from a CSV file
df = pd.read_csv("testset.csv")
# Convert string representations of lists to actual Python lists
df['reference_contexts'] = df['reference_contexts'].apply(ast.literal_eval)
# set columns to question, context, ground_truth
df = df.rename(columns={
    'user_input': 'question',
    'reference_contexts': 'context',
    'reference': 'ground_truth'
})

upload_dataset(
    dataset=df,
    dataset_name="John Wick Reviews",
    dataset_desc="A test set of John Wick reviews",
)    

Created a new dataset 'John Wick Reviews'. Dataset is accessible at https://smith.langchain.com/o/e106fdae-1163-4ad0-b46b-09a4850df972/datasets/889df422-5a6c-48f0-9309-af01b44a9348


Dataset(name='John Wick Reviews', description='A test set of John Wick reviews', data_type=<DataType.kv: 'kv'>, id=UUID('889df422-5a6c-48f0-9309-af01b44a9348'), created_at=datetime.datetime(2025, 5, 18, 16, 55, 4, 395106, tzinfo=datetime.timezone.utc), modified_at=datetime.datetime(2025, 5, 18, 16, 55, 4, 395106, tzinfo=datetime.timezone.utc), example_count=0, session_count=0, last_session_start_time=None, inputs_schema=None, outputs_schema=None, transformations=None)

## Run Evaluation

In [None]:
from ragas.integrations.langsmith import evaluate
from ragas.metrics import context_recall, faithfulness, context_precision, answer_relevancy

# build the evaluation metrics
metrics = [answer_relevancy, context_precision, faithfulness, context_recall]

# Create a list of chains to evaluate
chain_list = [
    ("Naive Retrieval", naive_retrieval_chain),
    ("BM25 Retrieval", bm25_retrieval_chain),
    ("Parent Document Retrieval", parent_document_retrieval_chain),
    ("Contextual Compression Retrieval", contextual_compression_retrieval_chain),
    ("Multi-Query Retrieval", multi_query_retrieval_chain),
    ("Ensemble Retrieval", ensemble_retrieval_chain),
]


# Run evaluation on each chain
for chain_name, chain in chain_list:
    print(f"Evaluating {chain_name}...")

    # Create a new chain to use with evaluation
    rag_chain = (chain | 
    {
        "answer": itemgetter("response") | StrOutputParser(),
        "contexts": itemgetter("context") ,
    })

    evaluate(
        dataset_name="John Wick Reviews",
        llm_or_chain_factory=rag_chain,
        experiment_name=f"{chain_name}"
        metrics=metrics,
    )


In [None]:
a_chain = (naive_retrieval_chain | 
    {
        "answer": itemgetter("response") | StrOutputParser(),
        "contexts": itemgetter("context") ,
    })

evaluate( dataset_name="John Wick Reviews",llm_or_chain_factory=a_chain,experiment_name="Naive Retrieval Chain", metrics=metrics, verbose=True)

View the evaluation results for project 'Naive Retrieval Chain 2' at:
https://smith.langchain.com/o/e106fdae-1163-4ad0-b46b-09a4850df972/datasets/889df422-5a6c-48f0-9309-af01b44a9348/compare?selectedSessions=2944ff47-e738-4f2e-8540-3183d2e1480a

View all tests for Dataset John Wick Reviews at:
https://smith.langchain.com/o/e106fdae-1163-4ad0-b46b-09a4850df972/datasets/889df422-5a6c-48f0-9309-af01b44a9348
[>                                                 ] 0/12

Error evaluating run 2975100f-0c8d-4cbd-9e96-1c9efb05f6a1 with EvaluatorChain
Traceback (most recent call last):
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/openai/_base_client.py", line 1484, in request
    response = await self._client.send(
               ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
    )
    ^
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/httpx/_client.py", line 1629, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<4 lines>...
    )
    ^
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/httpx/_client.py", line 1657, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
    )
    ^
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/httpx/_clie

[--->                                              ] 1/12

Error evaluating run 2a282776-41ee-4ea8-aebf-e785a8b2ac21 with EvaluatorChain
Traceback (most recent call last):
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/openai/_base_client.py", line 1484, in request
    response = await self._client.send(
               ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
    )
    ^
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/httpx/_client.py", line 1629, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<4 lines>...
    )
    ^
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/httpx/_client.py", line 1657, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
    )
    ^
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/httpx/_clie

[------------------------------------------------->] 12/12

Unnamed: 0,feedback.answer_relevancy,feedback.context_precision,feedback.faithfulness,feedback.context_recall,error,execution_time,run_id
count,11.0,10.0,10.0,10.0,0.0,12.0,12
unique,,,,,0.0,,12
top,,,,,,,999ae328-18c6-4e6b-aa16-0e1bfaa559d3
freq,,,,,,,1
mean,0.924541,0.987558,0.731377,0.816667,,2.448142,
std,0.046807,0.025256,0.40135,0.337474,,1.004864,
min,0.854073,0.924036,0.0,0.0,,0.989464,
25%,0.885428,0.991667,0.559211,0.75,,1.828652,
50%,0.931781,1.0,1.0,1.0,,2.5673,
75%,0.959721,1.0,1.0,1.0,,2.792025,


{'project_name': 'Naive Retrieval Chain 2',
 'results': {'2db262f2-1977-4517-bfd7-20fc41ada2d8': {'input': {'question': 'Hey, can u tell me what’s the deal with John Wick movie, like what’s the story about and why it’s so cool, ya know?'},
   'feedback': [EvaluationResult(key='answer_relevancy', score=np.float64(0.8677742808428901), value=None, comment=None, correction=None, evaluator_info={'__run': RunInfo(run_id=UUID('7291571f-090c-4f64-b484-e16cdbd92675'))}, feedback_config=None, source_run_id=None, target_run_id=None, extra=None),
    EvaluationResult(key='context_precision', score=0.9888888888779013, value=None, comment=None, correction=None, evaluator_info={'__run': RunInfo(run_id=UUID('9800f3e5-a089-4447-b96a-8a97058fed29'))}, feedback_config=None, source_run_id=None, target_run_id=None, extra=None),
    EvaluationResult(key='faithfulness', score=1.0, value=None, comment=None, correction=None, evaluator_info={'__run': RunInfo(run_id=UUID('3acb10da-2049-40bb-85f9-1d77c32b0913'))}

In [70]:
a_chain = (bm25_retrieval_chain | 
    {
        "answer": itemgetter("response") | StrOutputParser(),
        "contexts": itemgetter("context") ,
    })

evaluate( dataset_name="John Wick Reviews",llm_or_chain_factory=a_chain,experiment_name="BM25 Retrieval Chain", metrics=metrics, verbose=True)

View the evaluation results for project 'BM25 Retrieval Chain' at:
https://smith.langchain.com/o/e106fdae-1163-4ad0-b46b-09a4850df972/datasets/889df422-5a6c-48f0-9309-af01b44a9348/compare?selectedSessions=c46c065c-40d1-41cd-97e1-d2829f407f23

View all tests for Dataset John Wick Reviews at:
https://smith.langchain.com/o/e106fdae-1163-4ad0-b46b-09a4850df972/datasets/889df422-5a6c-48f0-9309-af01b44a9348
[>                                                 ] 0/12

Error evaluating run f5796b2c-a20c-48ca-bb19-b6d2620e962e with EvaluatorChain
Traceback (most recent call last):
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/openai/_base_client.py", line 1484, in request
    response = await self._client.send(
               ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
    )
    ^
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/httpx/_client.py", line 1629, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<4 lines>...
    )
    ^
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/httpx/_client.py", line 1657, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
    )
    ^
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/httpx/_clie

[-------------------->                             ] 5/12

Error evaluating run 95f9b499-ca7d-4360-a001-22d4f305fb16 with EvaluatorChain
Traceback (most recent call last):
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/openai/_base_client.py", line 1484, in request
    response = await self._client.send(
               ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
    )
    ^
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/httpx/_client.py", line 1629, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<4 lines>...
    )
    ^
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/httpx/_client.py", line 1657, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
    )
    ^
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/httpx/_clie

[------------------------------------------------->] 12/12

Unnamed: 0,feedback.answer_relevancy,feedback.context_precision,feedback.faithfulness,feedback.context_recall,error,execution_time,run_id
count,11.0,11.0,11.0,10.0,0.0,12.0,12
unique,,,,,0.0,,12
top,,,,,,,0c47aa33-c4b0-4c21-9139-dcb75eb4a596
freq,,,,,,,1
mean,0.844969,0.883838,0.664769,0.666667,,1.661484,
std,0.281685,0.299387,0.34207,0.451335,,0.766325,
min,0.0,0.0,0.0,0.0,,0.406423,
25%,0.895429,0.958333,0.452381,0.25,,1.169496,
50%,0.928726,1.0,0.666667,1.0,,1.838358,
75%,0.940275,1.0,1.0,1.0,,2.120933,


{'project_name': 'BM25 Retrieval Chain',
 'results': {'2db262f2-1977-4517-bfd7-20fc41ada2d8': {'input': {'question': 'Hey, can u tell me what’s the deal with John Wick movie, like what’s the story about and why it’s so cool, ya know?'},
   'feedback': [EvaluationResult(key='answer_relevancy', score=np.float64(0.8919927601609879), value=None, comment=None, correction=None, evaluator_info={'__run': RunInfo(run_id=UUID('99fe46f3-baf6-4511-a8ae-a5e4211c87ee'))}, feedback_config=None, source_run_id=None, target_run_id=None, extra=None),
    EvaluationResult(key='context_precision', score=0.9999999999, value=None, comment=None, correction=None, evaluator_info={'__run': RunInfo(run_id=UUID('a8cbd208-0722-4a20-88a3-247e3fc76b4f'))}, feedback_config=None, source_run_id=None, target_run_id=None, extra=None),
    EvaluationResult(key='faithfulness', score=0.6666666666666666, value=None, comment=None, correction=None, evaluator_info={'__run': RunInfo(run_id=UUID('7b151f48-37c5-4d2a-8051-caaaeb054f

In [71]:
a_chain = (parent_document_retrieval_chain | 
    {
        "answer": itemgetter("response") | StrOutputParser(),
        "contexts": itemgetter("context") ,
    })

evaluate( dataset_name="John Wick Reviews",llm_or_chain_factory=a_chain,experiment_name="Parent Document Retrieval Chain", metrics=metrics, verbose=True)

View the evaluation results for project 'Parent Document Retrieval Chain' at:
https://smith.langchain.com/o/e106fdae-1163-4ad0-b46b-09a4850df972/datasets/889df422-5a6c-48f0-9309-af01b44a9348/compare?selectedSessions=685991d5-6b07-4577-a945-ebe1096a257d

View all tests for Dataset John Wick Reviews at:
https://smith.langchain.com/o/e106fdae-1163-4ad0-b46b-09a4850df972/datasets/889df422-5a6c-48f0-9309-af01b44a9348
[------------------------------------------------->] 12/12

Unnamed: 0,feedback.answer_relevancy,feedback.context_precision,feedback.faithfulness,feedback.context_recall,error,execution_time,run_id
count,12.0,12.0,12.0,12.0,0.0,12.0,12
unique,,,,,0.0,,12
top,,,,,,,245e219a-3e7a-4dbe-b39e-80639113e745
freq,,,,,,,1
mean,0.769292,1.0,0.806527,0.644444,,2.841522,
std,0.361485,2.112174e-11,0.295983,0.384638,,1.442535,
min,0.0,1.0,0.0,0.0,,0.933219,
25%,0.864809,1.0,0.75,0.291667,,1.618455,
50%,0.924088,1.0,0.916084,0.733333,,2.687184,
75%,0.946889,1.0,1.0,1.0,,3.696229,


{'project_name': 'Parent Document Retrieval Chain',
 'results': {'2db262f2-1977-4517-bfd7-20fc41ada2d8': {'input': {'question': 'Hey, can u tell me what’s the deal with John Wick movie, like what’s the story about and why it’s so cool, ya know?'},
   'feedback': [EvaluationResult(key='answer_relevancy', score=np.float64(0.8897836481665472), value=None, comment=None, correction=None, evaluator_info={'__run': RunInfo(run_id=UUID('19bc1f6d-73d7-4399-9798-7f6628b0ca06'))}, feedback_config=None, source_run_id=None, target_run_id=None, extra=None),
    EvaluationResult(key='context_precision', score=0.9999999999, value=None, comment=None, correction=None, evaluator_info={'__run': RunInfo(run_id=UUID('3b48ce54-2797-4b2d-8261-f560b028b40e'))}, feedback_config=None, source_run_id=None, target_run_id=None, extra=None),
    EvaluationResult(key='faithfulness', score=0.75, value=None, comment=None, correction=None, evaluator_info={'__run': RunInfo(run_id=UUID('cd77fdba-3fe7-4746-b7b1-d282c2b7971a'

In [96]:
a_chain = (contextual_compression_retrieval_chain | 
    {
        "answer": itemgetter("response") | StrOutputParser(),
        "contexts": itemgetter("context") ,
    })

evaluate( dataset_name="John Wick Reviews",llm_or_chain_factory=a_chain,experiment_name="Contextual Compression Retrieval Chain", metrics=metrics, verbose=True)

View the evaluation results for project 'Contextual Compression Retrieval Chain' at:
https://smith.langchain.com/o/e106fdae-1163-4ad0-b46b-09a4850df972/datasets/889df422-5a6c-48f0-9309-af01b44a9348/compare?selectedSessions=67dfdc0f-7ff4-4d42-94d6-f866505b7deb

View all tests for Dataset John Wick Reviews at:
https://smith.langchain.com/o/e106fdae-1163-4ad0-b46b-09a4850df972/datasets/889df422-5a6c-48f0-9309-af01b44a9348
[>                                                 ] 0/12

Error evaluating run 6075740f-d254-46f0-9e90-dc58c05e070f with EvaluatorChain
Traceback (most recent call last):
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/openai/_base_client.py", line 1484, in request
    response = await self._client.send(
               ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
    )
    ^
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/httpx/_client.py", line 1629, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<4 lines>...
    )
    ^
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/httpx/_client.py", line 1657, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
    )
    ^
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/httpx/_clie

[--->                                              ] 1/12

Error evaluating run 9a027e8f-f164-4255-8791-ce66987b2af0 with EvaluatorChain
Traceback (most recent call last):
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/openai/_base_client.py", line 1484, in request
    response = await self._client.send(
               ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
    )
    ^
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/httpx/_client.py", line 1629, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<4 lines>...
    )
    ^
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/httpx/_client.py", line 1657, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
    )
    ^
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/httpx/_clie

[------------------------------------------------->] 12/12

Unnamed: 0,feedback.answer_relevancy,feedback.context_precision,feedback.faithfulness,feedback.context_recall,error,execution_time,run_id
count,11.0,10.0,10.0,10.0,0.0,12.0,12
unique,,,,,0.0,,12
top,,,,,,,c4aa1f87-390c-4be5-b9cf-3219a3ebff42
freq,,,,,,,1
mean,0.826003,1.0,0.720531,0.65,,1.782861,
std,0.27549,0.0,0.247887,0.372264,,0.491622,
min,0.0,1.0,0.285714,0.0,,1.098517,
25%,0.880317,1.0,0.525,0.416667,,1.394877,
50%,0.907138,1.0,0.763393,0.666667,,1.840992,
75%,0.919627,1.0,0.922222,1.0,,1.988306,


{'project_name': 'Contextual Compression Retrieval Chain',
 'results': {'2db262f2-1977-4517-bfd7-20fc41ada2d8': {'input': {'question': 'Hey, can u tell me what’s the deal with John Wick movie, like what’s the story about and why it’s so cool, ya know?'},
   'feedback': [EvaluationResult(key='answer_relevancy', score=np.float64(0.8692283652385302), value=None, comment=None, correction=None, evaluator_info={'__run': RunInfo(run_id=UUID('f08bacdd-5f11-4621-9dc8-1ed1213ea61d'))}, feedback_config=None, source_run_id=None, target_run_id=None, extra=None),
    EvaluationResult(key='context_precision', score=0.9999999999666667, value=None, comment=None, correction=None, evaluator_info={'__run': RunInfo(run_id=UUID('ae184e61-e9a0-4198-a428-4c516db94d4d'))}, feedback_config=None, source_run_id=None, target_run_id=None, extra=None),
    EvaluationResult(key='faithfulness', score=0.9333333333333333, value=None, comment=None, correction=None, evaluator_info={'__run': RunInfo(run_id=UUID('1b5d0b49-c

In [75]:
a_chain = (multi_query_retrieval_chain | 
    {
        "answer": itemgetter("response") | StrOutputParser(),
        "contexts": itemgetter("context") ,
    })

evaluate( dataset_name="John Wick Reviews",llm_or_chain_factory=a_chain,experiment_name="Multi Query Retrieval Chain", metrics=metrics, verbose=True)

View the evaluation results for project 'Multi Query Retrieval Chain' at:
https://smith.langchain.com/o/e106fdae-1163-4ad0-b46b-09a4850df972/datasets/889df422-5a6c-48f0-9309-af01b44a9348/compare?selectedSessions=c5b683f0-f212-4381-9b89-d428600e67f3

View all tests for Dataset John Wick Reviews at:
https://smith.langchain.com/o/e106fdae-1163-4ad0-b46b-09a4850df972/datasets/889df422-5a6c-48f0-9309-af01b44a9348
[>                                                 ] 0/12

Error evaluating run 175f52ac-87ab-4d0f-b819-66a338d6b6e3 with EvaluatorChain
Traceback (most recent call last):
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/openai/_base_client.py", line 1484, in request
    response = await self._client.send(
               ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
    )
    ^
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/httpx/_client.py", line 1629, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<4 lines>...
    )
    ^
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/httpx/_client.py", line 1657, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
    )
    ^
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/httpx/_clie

[------------------------------------------------->] 12/12

Unnamed: 0,feedback.answer_relevancy,feedback.context_precision,feedback.faithfulness,feedback.context_recall,error,execution_time,run_id
count,11.0,11.0,11.0,11.0,0.0,12.0,12
unique,,,,,0.0,,12
top,,,,,,,b8e5b601-bdac-40d2-9f53-420d7c469748
freq,,,,,,,1
mean,0.845563,0.949432,0.805762,0.863636,,3.35719,
std,0.283299,0.058264,0.229425,0.245155,,1.029957,
min,0.0,0.860018,0.363636,0.333333,,1.537914,
25%,0.886642,0.891734,0.642857,0.833333,,2.787685,
50%,0.925608,0.96692,0.916667,1.0,,3.203811,
75%,0.955203,1.0,0.973684,1.0,,3.995018,


{'project_name': 'Multi Query Retrieval Chain',
 'results': {'2db262f2-1977-4517-bfd7-20fc41ada2d8': {'input': {'question': 'Hey, can u tell me what’s the deal with John Wick movie, like what’s the story about and why it’s so cool, ya know?'},
   'feedback': [EvaluationResult(key='answer_relevancy', score=np.float64(0.8692457203603787), value=None, comment=None, correction=None, evaluator_info={'__run': RunInfo(run_id=UUID('61d4e425-6e0a-4c1b-a4f5-a6b11ed1b955'))}, feedback_config=None, source_run_id=None, target_run_id=None, extra=None),
    EvaluationResult(key='context_precision', score=0.9614769489689367, value=None, comment=None, correction=None, evaluator_info={'__run': RunInfo(run_id=UUID('3968ba95-3deb-4eb8-bc9f-2d671dbbf895'))}, feedback_config=None, source_run_id=None, target_run_id=None, extra=None),
    EvaluationResult(key='faithfulness', score=1.0, value=None, comment=None, correction=None, evaluator_info={'__run': RunInfo(run_id=UUID('d94402a0-5d92-4469-9005-a7b97cb27ebf

In [97]:
a_chain = (ensemble_retrieval_chain | 
    {
        "answer": itemgetter("response") | StrOutputParser(),
        "contexts": itemgetter("context") ,
    })

evaluate( dataset_name="John Wick Reviews",llm_or_chain_factory=a_chain,experiment_name="Ensemble Retrieval Chain", metrics=metrics, verbose=True)

View the evaluation results for project 'Ensemble Retrieval Chain' at:
https://smith.langchain.com/o/e106fdae-1163-4ad0-b46b-09a4850df972/datasets/889df422-5a6c-48f0-9309-af01b44a9348/compare?selectedSessions=22dfd104-b18a-4f79-85db-4ee852f275f6

View all tests for Dataset John Wick Reviews at:
https://smith.langchain.com/o/e106fdae-1163-4ad0-b46b-09a4850df972/datasets/889df422-5a6c-48f0-9309-af01b44a9348
[-------------------->                             ] 5/12

Error evaluating run bf5a7511-ea76-4876-ab22-e942c9dce2a7 with EvaluatorChain
Traceback (most recent call last):
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/openai/_base_client.py", line 1484, in request
    response = await self._client.send(
               ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
    )
    ^
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/httpx/_client.py", line 1629, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<4 lines>...
    )
    ^
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/httpx/_client.py", line 1657, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<3 lines>...
    )
    ^
  File "/home/mafzaal/source/AIE6/13_Advanced_Retrieval/.venv/lib/python3.13/site-packages/httpx/_clie

[------------------------------------------------->] 12/12

Unnamed: 0,feedback.answer_relevancy,feedback.context_precision,feedback.faithfulness,feedback.context_recall,error,execution_time,run_id
count,12.0,12.0,11.0,11.0,0.0,12.0,12
unique,,,,,0.0,,12
top,,,,,,,9d391de3-2ad3-4921-86e3-65693c256346
freq,,,,,,,1
mean,0.928326,0.970445,0.813321,0.939394,,4.410931,
std,0.040837,0.049922,0.317336,0.201008,,1.121921,
min,0.864494,0.863276,0.0,0.333333,,3.275321,
25%,0.909279,0.964896,0.738889,1.0,,3.721385,
50%,0.928985,1.0,1.0,1.0,,3.766979,
75%,0.957635,1.0,1.0,1.0,,4.922066,


{'project_name': 'Ensemble Retrieval Chain',
 'results': {'2db262f2-1977-4517-bfd7-20fc41ada2d8': {'input': {'question': 'Hey, can u tell me what’s the deal with John Wick movie, like what’s the story about and why it’s so cool, ya know?'},
   'feedback': [EvaluationResult(key='answer_relevancy', score=np.float64(0.8710523342037025), value=None, comment=None, correction=None, evaluator_info={'__run': RunInfo(run_id=UUID('010cc0be-4a53-4a69-9df4-4624e292cf2d'))}, feedback_config=None, source_run_id=None, target_run_id=None, extra=None),
    EvaluationResult(key='context_precision', score=0.8816807951191897, value=None, comment=None, correction=None, evaluator_info={'__run': RunInfo(run_id=UUID('17a8be35-8951-4c91-8f3e-9a97e299e2c8'))}, feedback_config=None, source_run_id=None, target_run_id=None, extra=None),
    EvaluationResult(key='faithfulness', score=1.0, value=None, comment=None, correction=None, evaluator_info={'__run': RunInfo(run_id=UUID('6aa27afa-4f80-45eb-9751-b6fb0d923f58'))