# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

> You do not need to run the following cells if you are running this notebook locally. 

In [2]:
#!pip install -qU langchain langchain-openai langchain-cohere rank_bm25

We're also going to be leveraging [Qdrant's](https://qdrant.tech/documentation/frameworks/langchain/) (pronounced "Quadrant") VectorDB in "memory" mode (so we can leverage it locally in our colab environment).

In [3]:
#!pip install -qU qdrant-client

We'll also provide our OpenAI key, as well as our Cohere API key.

In [75]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [76]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using some reviews from the 4 movies in the John Wick franchise today to explore the different retrieval strategies.

These were obtained from IMDB, and are available in the [AIM Data Repository](https://github.com/AI-Maker-Space/DataRepository).

### Data Collection

We can simply `wget` these from GitHub.

You could use any review data you wanted in this step - just be careful to make sure your metadata is aligned with your choice.

In [6]:
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv -O john_wick_1.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv -O john_wick_2.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw3.csv -O john_wick_3.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw4.csv -O john_wick_4.csv

--2025-05-19 22:50:41--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19628 (19K) [text/plain]
Saving to: ‘john_wick_1.csv’


2025-05-19 22:50:42 (3.16 MB/s) - ‘john_wick_1.csv’ saved [19628/19628]

--2025-05-19 22:50:42--  https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 14747 (14K) [text/plain]
Saving to: ‘john_wick_2.csv’


2025-05-19 22:50:42 (5.51 MB/s) - ‘john_wick_2.csv’

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

- Self-Query: Wants as much metadata as we can provide
- Time-weighted: Wants temporal data

> NOTE: While we're creating a temporal relationship based on when these movies came out for illustrative purposes, it needs to be clear that the "time-weighting" in the Time-weighted Retriever is based on when the document was *accessed* last - not when it was created.

In [77]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

documents = []

for i in range(1, 5):
  loader = CSVLoader(
      file_path=f"john_wick_{i}.csv",
      metadata_columns=["Review_Date", "Review_Title", "Review_Url", "Author", "Rating"]
  )

  movie_docs = loader.load()
  for doc in movie_docs:

    # Add the "Movie Title" (John Wick 1, 2, ...)
    doc.metadata["Movie_Title"] = f"John Wick {i}"

    # convert "Rating" to an `int`, if no rating is provided - assume 0 rating
    doc.metadata["Rating"] = int(doc.metadata["Rating"]) if doc.metadata["Rating"] else 0

    # newer movies have a more recent "last_accessed_at"
    doc.metadata["last_accessed_at"] = datetime.now() - timedelta(days=4-i)

  documents.extend(movie_docs)

Let's look at an example document to see if everything worked as expected!

In [78]:
documents[0]

Document(metadata={'source': 'john_wick_1.csv', 'row': 0, 'Review_Date': '6 May 2015', 'Review_Title': ' Kinetic, concise, and stylish; John Wick kicks ass.\n', 'Review_Url': '/review/rw3233896/?ref_=tt_urv', 'Author': 'lnvicta', 'Rating': 8, 'Movie_Title': 'John Wick 1', 'last_accessed_at': datetime.datetime(2025, 5, 17, 22, 50, 9, 934900)}, page_content=": 0\nReview: The best way I can describe John Wick is to picture Taken but instead of Liam Neeson it's Keanu Reeves and instead of his daughter it's his dog. That's essentially the plot of the movie. John Wick (Reeves) is out to seek revenge on the people who took something he loved from him. It's a beautifully simple premise for an action movie - when action movies get convoluted, they get bad i.e. A Good Day to Die Hard. John Wick gives the viewers what they want: Awesome action, stylish stunts, kinetic chaos, and a relatable hero to tie it all together. John Wick succeeds in its simplicity.")

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "JohnWick".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [79]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWick"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [80]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [81]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [82]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")


### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [83]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [14]:
naive_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the reviews provided, people generally liked John Wick. Many reviews are highly positive, praising its stylish action, choreography, and overall entertainment value. Some reviewers gave it ratings of 8, 9, or even 10 out of 10, indicating strong approval. Although there are a few mixed or slightly negative opinions, the overall sentiment suggests that the film was well-received and appreciated by most viewers.'

In [15]:
naive_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there are reviews with a rating of 10. The URLs to those reviews are:\n\n1. [Review URL](https://yourdomain.com/review/rw4854296/?ref_=tt_urv) (Review titled "A Masterpiece & Brilliant Sequel")\n2. [Review URL](https://yourdomain.com/review/rw4860412/?ref_=tt_urv) (Review titled "It\'s got its own action style!")\n\n(Note: The actual URLs are relative paths from the data: "/review/rw4854296/?ref_=tt_urv" and "/review/rw4860412/?ref_=tt_urv". You may need to prepend your domain or access directly as is.)'

In [16]:
naive_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In the film "John Wick," the story follows a retired hitman named John Wick who lives a peaceful life after leaving his violent profession. His world is turned upside down when a gang of criminals, including a young Russian punk, break into his home, beat him senseless, kill his dog (a cherished companion), and steal his car. It is revealed that John Wick was once a legendary assassin, and his beastly reputation makes him a formidable force. Enraged and driven by grief and revenge, Wick relentlessly hunts down those responsible, unleashing a trail of destruction and violence to recover his stolen possessions and seek retribution. The story emphasizes themes of vengeance, consequence, and the aftermath of a life committed to violence.'

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [84]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(documents)

We'll construct the same chain - only changing the retriever.

In [85]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [86]:
bm25_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Based on the reviews provided, people's opinions on John Wick vary. Some reviewers, like IceSkateUpHill and lnvicta, highly praised the first movie, highlighting its stylish action, world-building, and entertainment value. However, other reviews, such as Phil_H's review of John Wick 4, describe it as lacking plot and being somewhat disappointing. Additionally, JanetWilkinson rated John Wick 3 very poorly, criticizing it for being mindless and overly violent. Overall, while many fans seem to enjoy the series, there are also negative opinions, indicating that people's preferences are mixed."

In [13]:
bm25_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Based on the provided reviews, none of them have a rating of 10.'

In [14]:
bm25_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In the John Wick film series, the story centers around John Wick, a former assassin who is drawn back into the violent world he tried to leave behind. The movies depict his relentless fight against various enemies and assassins, often involving highly choreographed combat, gunfights, and a code of honor among killers. Throughout the series, John Wick seeks revenge, justice, or simply survival, while navigating a dangerous underworld filled with complex characters and rules.'

It's not clear that this is better or worse - but the `I don't know` isn't great!

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [87]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-english-v3.0")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [88]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [17]:
contextual_compression_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the reviews provided, people generally liked John Wick. The first two reviews give high ratings (9 and 10 out of 10) and describe the film as stylish, fun, and highly entertaining, especially for action fans. The third review, which is more critical with a rating of 5, still acknowledges that the original John Wick was a unique and stylish film that broke the mold. Overall, the positive reviews suggest that most people appreciated the movie.'

In [25]:
contextual_compression_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there are reviews with a rating of 10. Here are the URLs to those reviews:\n\n1. Review Title: "A Masterpiece & Brilliant Sequel"  \n   URL: /review/rw4854296/?ref_=tt_urv\n\n2. Review Title: "Most American action flicks released these days have poor screenplays and overuse computer-generated imagery."  \n   URL: /review/rw4860412/?ref_=tt_urv'

In [26]:
contextual_compression_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In the John Wick film series, the story follows an ex-hitman named John Wick (played by Keanu Reeves) who comes out of retirement following personal tragedies. The first movie depicts how Wick's peaceful life is disrupted when a gang of mobsters steals his car and kills his dog, which was a gift from his deceased wife. This betrayal triggers Wick's return to a life of violence as he seeks revenge against those who wronged him. Throughout the series, Wick faces various criminal organizations, including the Russian mafia, and must navigate a world governed by strict rules and honor codes, especially concerning the mysterious Continental hotel and the High Table of criminal elites. Wick's quest for vengeance and survival involves intense action, gunfights, and strategies to outsmart his enemies."

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [90]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [91]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [92]:
multi_query_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Based on the reviews in the provided context, people generally liked John Wick. Many reviews praise the film's action sequences, style, and Keanu Reeves' performance, often giving it high ratings like 8 or 9 out of 10. Reviewers describe it as fun, stylish, and a must-see for action fans, indicating a positive reception. However, there are some mixed or negative opinions as well, with a few critics expressing that the series has become over-the-top or frenetic, and some giving lower ratings. Overall, the majority of reviews suggest that people generally liked John Wick."

In [30]:
multi_query_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. The URL to that review is: /review/rw4854296/?ref_=tt_urv'

In [21]:
multi_query_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In the John Wick film series, Keanu Reeves plays the character John Wick, a retired assassin who comes out of retirement to seek revenge. The first movie begins with John Wick mourning the death of his wife and is then brutalized when a gang steals his car and kills his dog, which was a gift from his wife. This betrayal prompts him to re-enter the violent underworld of assassins to exact revenge.\n\nSubsequent films in the series expand on this world of assassins, exploring its rules, conflicts, and the consequences of Wick\'s actions. In "John Wick 2," he is drawn back into the criminal underworld when he is forced to assist in helping a crime boss take over the Assassin\'s Guild. "John Wick 3" continues his saga, where his actions have broader repercussions, leading him on an even more intense and violent odyssey. The series is known for its stylish action sequences, choreographed fight scenes, and a deepening exploration of the assassin universe.'

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [93]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = documents
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [94]:
client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = Qdrant(
    collection_name="full_documents", embeddings=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [95]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [96]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [97]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [27]:
parent_document_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the provided reviews, people\'s opinions on John Wick vary. Some reviews are very positive, praising the series and the first film highly, indicating that many people do like it. However, there are also negative reviews, such as one calling John Wick 4 "horrible" and criticizing its plot and fights. Overall, while there is a significant fan appreciation for the series and some individuals strongly like it, there are also critics who did not enjoy certain installments. Therefore, people\'s general opinion is mixed.'

In [28]:
parent_document_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. The URL to that review is: /review/rw4854296/?ref_=tt_urv'

In [29]:
parent_document_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In the John Wick film series, John Wick is a retired hitman who initially comes out of retirement after personal tragedy—specifically, the death of his wife and the killing of his dog, which was a gift from her. In the first movie, he seeks vengeance against those who stole his car and killed his dog, leading to a violent rampage against the gangsters responsible.\n\nIn "John Wick: Chapter 2," the story continues with John being pulled back into the assassin world when an Italian crime lord calls in a favor, forcing him to undertake dangerous tasks across cities like Italy, Canada, and Manhattan. The film involves numerous action sequences, car chases, and a broader exploration of the assassin underworld.\n\nThe subsequent movies, such as "John Wick 2," depict his ongoing struggles with the violent consequences of his actions, deepening his involvement in a criminal society, and his relentless efforts to find peace or escape from his violent life.\n\nOverall, the series is characteriz

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [98]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [99]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [32]:
ensemble_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"Based on the reviews in the provided context, people generally liked John Wick. Several reviews gave high ratings (such as 9 or 10), praised the action sequences, style, and Keanu Reeves' performance, and recommended the films, especially to action fans. However, there were also some negative reviews with low ratings and criticisms of the plot, pacing, or over-the-top violence. Overall, the majority of reviews seem to be positive, indicating that many people generally liked John Wick."

In [33]:
ensemble_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there are reviews with a rating of 10. Here are the URLs to those reviews:\n\n1. [Review for John Wick 3](https://example.com/review/rw4854296/?ref_=tt_urv)\n2. [Review for John Wick 4](https://example.com/review/rw8944843/?ref_=tt_urv)\n\nPlease note that the URLs provided are based on the review paths in the data. If you need the exact full URLs, they would typically be prefixed with the website domain.'

In [34]:
ensemble_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In the "John Wick" film series, the story centers around John Wick, a retired hitman who seeks revenge after a personal tragedy. The first movie depicts how Wick comes out of retirement to avenge the killing of his dog and the theft of his car, which leads to a violent rampage against gangsters and assassins. The sequels expand on his ongoing conflicts with criminal organizations, involving complex rules of the assassin world, old debts, and vendettas. Throughout the series, Wick faces numerous enemies, bounty hunters, and betrayals, all while trying to find peace and survival in a dangerous underworld. The series is known for its stylish action sequences, intricate world-building, and the relentless pursuit of vengeance by Wick.'

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

> NOTE: You do not need to run this cell if you're running this locally

In [45]:
#!pip install -qU langchain_experimental

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [100]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [101]:
semantic_documents = semantic_chunker.split_documents(documents)

Let's create a new vector store.

In [102]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWickSemantic"
)

We'll use naive retrieval for this example.

In [103]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [104]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [40]:
semantic_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the reviews provided, people generally liked John Wick. The reviews are predominantly positive, with many reviewers praising the action sequences, style, and overall entertainment value of the films. Ratings for the series tend to be high, with some reviews giving scores as high as 10 and 9 out of 10, indicating strong approval. However, there are a few mixed or negative reviews, but overall, the sentiment suggests that most people appreciated and enjoyed the John Wick movies.'

In [41]:
semantic_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. Here is the URL to that review:\n- [https://<base_url>/review/rw4854296/?ref_=tt_urv](https://<base_url>/review/rw4854296/?ref_=tt_urv)'

In [42]:
semantic_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In the movie John Wick, the main character, played by Keanu Reeves, is a retired assassin who seeks revenge after a violent attack on his home. The story begins when a young punk and his associates beat him up, kill his dog, and steal his car—events that happen because Wick refuses to sell his beloved car to the punk. It is soon revealed that Wick is a super-assassin, and the attack was a mistake by the punk’s father, who is a Russian mobster. This betrayal and loss push Wick to come out of retirement to confront those responsible, leading to a violent vendetta against the criminal underworld. Throughout the film, Wick unleashes a series of intense action sequences, showcasing his lethal skills as he fights to reclaim his peace and exact revenge.'

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

### Create Golden Dataset

In [105]:
### YOUR CODE HERE
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package punkt to /home/patil/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /home/patil/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

In [107]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
generator_llm = LangchainLLMWrapper(
    ChatOpenAI(
        model="gpt-4.1-nano",
        temperature=0,  
        model_kwargs={"response_format": {"type": "json_object"}} 
    )
)
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

In [108]:
from ragas.testset.synthesizers import SingleHopSpecificQuerySynthesizer, MultiHopAbstractQuerySynthesizer, MultiHopSpecificQuerySynthesizer

query_distribution = [
        (SingleHopSpecificQuerySynthesizer(llm=generator_llm), 0.5),
        (MultiHopAbstractQuerySynthesizer(llm=generator_llm), 0.25),
        (MultiHopSpecificQuerySynthesizer(llm=generator_llm), 0.25),
]

In [109]:
from ragas.testset import TestsetGenerator

generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)
dataset = generator.generate_with_langchain_docs(documents, testset_size=10)

Applying SummaryExtractor:   0%|          | 0/44 [00:00<?, ?it/s]

Applying CustomNodeFilter:   0%|          | 0/100 [00:00<?, ?it/s]

Node ba0840b4-32ec-4721-938b-c692253b7bde does not have a summary. Skipping filtering.
Node f6d08c82-189a-4fc4-81d5-2ef68878962e does not have a summary. Skipping filtering.
Node 54c34d70-c8fb-4faa-b96c-d9b1073d4c1e does not have a summary. Skipping filtering.
Node cfbb9497-c204-4efc-b4c1-41351f8525c9 does not have a summary. Skipping filtering.
Node f7e0bf9e-15cc-4d01-90fa-fce12a1a783d does not have a summary. Skipping filtering.
Node 73ffd4a4-367d-4f9b-99f9-e3fb36c39405 does not have a summary. Skipping filtering.
Node 3e910e98-e0fc-46b9-80e0-be15df7092d8 does not have a summary. Skipping filtering.
Node 7bef1ba3-32d7-4596-9db7-46b6ff7f3156 does not have a summary. Skipping filtering.
Node 5f1a1f4e-9547-4588-a7b3-b6584300a0dd does not have a summary. Skipping filtering.
Node f82af42f-a507-457f-aadc-49ad8def98f4 does not have a summary. Skipping filtering.
Node 0024ec2a-dfae-4102-8bc3-27b056883c09 does not have a summary. Skipping filtering.
Node 9202bfc9-3c8f-49d5-bd12-d13f0b4426c3 d

Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/214 [00:00<?, ?it/s]

Applying OverlapScoreBuilder:   0%|          | 0/1 [00:00<?, ?it/s]

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/10 [00:00<?, ?it/s]

In [110]:
dataset.to_pandas()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,So like Reeves is he the guy in John Wick righ...,[: 0\nReview: The best way I can describe John...,John Wick (Reeves) is out to seek revenge on t...,single_hop_specifc_query_synthesizer
1,Why John Wick so popular?,[: 2\nReview: With the fourth installment scor...,The context mentions that after three previous...,single_hop_specifc_query_synthesizer
2,What makes John Wick stand out among action mo...,[: 3\nReview: John wick has a very simple reve...,John Wick stands out because it features virtu...,single_hop_specifc_query_synthesizer
3,Why Reeves is so cool in action movies and why...,[: 4\nReview: Though he no longer has a taste ...,"In the context, Reeves is described as a savvy...",single_hop_specifc_query_synthesizer
4,Who is John Wick?,[: 5\nReview: Ultra-violent first entry with l...,"In the original John Wick (2014), he is an ex-...",single_hop_specifc_query_synthesizer
5,Hooow is The Matrix connected to The Marquis i...,[<1-hop>\n\n: 20\nReview: John Wick is somethi...,"In the context of the action film, the review ...",multi_hop_specific_query_synthesizer
6,How does Keanu Reeve's portrayal of John Wick ...,[<1-hop>\n\n: 18\nReview: Ever since the origi...,Keanu Reeve's portrayal of John Wick is centra...,multi_hop_specific_query_synthesizer
7,How does 'John Wick: Chapter 4' build upon the...,[<1-hop>\n\n: 19\nReview: John Wick: Chapter 4...,John Wick: Chapter 4 continues from the fallou...,multi_hop_specific_query_synthesizer
8,Reeve in John Wick how he fight Reeves bad guys?,[<1-hop>\n\n: 20\nReview: John Wick is somethi...,"In John Wick, Reeves's character fights the Ru...",multi_hop_specific_query_synthesizer
9,Is Ian McShane in John Wick and does he play a...,"[<1-hop>\n\n: 9\nReview: At first glance, John...","Yes, Ian McShane appears in both the first and...",multi_hop_specific_query_synthesizer


Setup LangSmith 

In [111]:
import os
import getpass
from uuid import uuid4

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LangChain API Key:")
os.environ["LANGCHAIN_PROJECT"] = f"AIE6 - Assignment 13  - {uuid4().hex[0:8]}"

In [112]:
from ragas import EvaluationDataset


from ragas import evaluate
from ragas.llms import LangchainLLMWrapper

evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1-mini")) # gpt-4.1-mini is judge model


In [113]:
# We evaluate the baseline using our key 6 Ragas RAG metrics

from ragas.metrics import LLMContextRecall, Faithfulness, FactualCorrectness, ResponseRelevancy, ContextEntityRecall, NoiseSensitivity
from ragas import evaluate, RunConfig

custom_run_config = RunConfig(timeout=600)

In [65]:
#dataset_naive = dataset

In [117]:
import copy
import pandas as pd
import numpy as np
from ragas.evaluation import evaluate
from ragas.metrics import (
    LLMContextRecall, Faithfulness, FactualCorrectness,
    ResponseRelevancy, ContextEntityRecall, NoiseSensitivity
)

# List of retrieval chains you want to compare
# Define retrieval chains to evaluate
retrieval_chains = [
    ("naive_retrieval_chain", naive_retrieval_chain),
    ("bm25_retrieval_chain", bm25_retrieval_chain),
    ("contextual_compression_retrieval_chain", contextual_compression_retrieval_chain),
    ("multi_query_retrieval_chain", multi_query_retrieval_chain),
    ("parent_document_retrieval_chain", parent_document_retrieval_chain),
    ("ensemble_retrieval_chain", ensemble_retrieval_chain),
    ("semantic_retrieval_chain", semantic_retrieval_chain)
]

results = []

# Evaluate each retrieval chain
for chain_name, chain in retrieval_chains:
    # Create dataset copy so we don't have mutation issues
    dataset_copy = copy.deepcopy(dataset)
    
    # Process dataset with current chain
    for test_row in dataset_copy:
        response = chain.invoke({"question": test_row.eval_sample.user_input})
        test_row.eval_sample.retrieved_contexts = [ctx.page_content for ctx in response["context"]]
        test_row.eval_sample.response = response["response"].content
    
    # Run evaluation
    eval_dataset = EvaluationDataset.from_pandas(dataset_copy.to_pandas())
    result = evaluate(
        dataset=eval_dataset,
        metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness(), ResponseRelevancy(), ContextEntityRecall()],
        llm=evaluator_llm,
        run_config=custom_run_config
    )
    
    # Convert metric lists to average values
    results.append({
        "Chain": chain_name,
        "context_recall": np.nanmean(result["context_recall"]),
        "faithfulness": np.nanmean(result["faithfulness"]),
        "factual_correctness": np.nanmean(result["factual_correctness"]),
        "answer_relevancy": np.nanmean(result["answer_relevancy"]),
        "context_entity_recall": np.nanmean(result["context_entity_recall"]),
        #"noise_sensitivity(mode=relevant)": np.nanmean(result["noise_sensitivity(mode=relevant)"])
    })

# Create DataFrame for comparison
df = pd.DataFrame(results).set_index("Chain")
df = df[
    [
        'context_recall',
        'faithfulness',
        'factual_correctness',
        'answer_relevancy',
        'context_entity_recall',
        #'noise_sensitivity(mode=relevant)'
    ]
].round(4)

# Show final evaluation results
df


Evaluating:   0%|          | 0/50 [00:00<?, ?it/s]

Exception raised in Job[17]: AttributeError('StringIO' object has no attribute 'statements')


Evaluating:   0%|          | 0/50 [00:00<?, ?it/s]

Exception raised in Job[7]: AttributeError('StringIO' object has no attribute 'statements')


Evaluating:   0%|          | 0/50 [00:00<?, ?it/s]

Exception raised in Job[46]: AttributeError('StringIO' object has no attribute 'statements')


Evaluating:   0%|          | 0/50 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/50 [00:00<?, ?it/s]

Exception raised in Job[12]: AttributeError('StringIO' object has no attribute 'statements')
Exception raised in Job[11]: AttributeError('StringIO' object has no attribute 'statements')


Evaluating:   0%|          | 0/50 [00:00<?, ?it/s]

Evaluating:   0%|          | 0/50 [00:00<?, ?it/s]

Unnamed: 0_level_0,context_recall,faithfulness,factual_correctness,answer_relevancy,context_entity_recall
Chain,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
naive_retrieval_chain,0.84,0.8162,0.4344,0.9484,0.5783
bm25_retrieval_chain,0.74,0.6942,0.5522,0.9521,0.5817
contextual_compression_retrieval_chain,0.7233,0.7715,0.491,0.9542,0.6017
multi_query_retrieval_chain,0.975,0.8389,0.422,0.9521,0.6817
parent_document_retrieval_chain,0.6367,0.6589,0.4367,0.8509,0.565
ensemble_retrieval_chain,0.96,0.8228,0.403,0.9506,0.5983
semantic_retrieval_chain,0.86,0.8281,0.446,0.9547,0.7317


Evaluate naive_retrieval_chain