# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

> You do not need to run the following cells if you are running this notebook locally. 

In [None]:
#!pip install -qU langchain langchain-openai langchain-cohere rank_bm25

We're also going to be leveraging [Qdrant's](https://qdrant.tech/documentation/frameworks/langchain/) (pronounced "Quadrant") VectorDB in "memory" mode (so we can leverage it locally in our colab environment).

In [None]:
#!pip install -qU qdrant-client

We'll also provide our OpenAI key, as well as our Cohere API key.

In [1]:
import os
from dotenv import load_dotenv

load_dotenv()

True

## Task 2: Data Collection and Preparation

We'll be using some reviews from the 4 movies in the John Wick franchise today to explore the different retrieval strategies.

These were obtained from IMDB, and are available in the [AIM Data Repository](https://github.com/AI-Maker-Space/DataRepository).

### Data Collection

We can simply `wget` these from GitHub.

You could use any review data you wanted in this step - just be careful to make sure your metadata is aligned with your choice.

In [None]:
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw1.csv -O john_wick_1.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw2.csv -O john_wick_2.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw3.csv -O john_wick_3.csv
!wget https://raw.githubusercontent.com/AI-Maker-Space/DataRepository/main/jw4.csv -O john_wick_4.csv

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

- Self-Query: Wants as much metadata as we can provide
- Time-weighted: Wants temporal data

> NOTE: While we're creating a temporal relationship based on when these movies came out for illustrative purposes, it needs to be clear that the "time-weighting" in the Time-weighted Retriever is based on when the document was *accessed* last - not when it was created.

In [2]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

documents = []

for i in range(1, 5):
  loader = CSVLoader(
      file_path=f"john_wick_{i}.csv",
      metadata_columns=["Review_Date", "Review_Title", "Review_Url", "Author", "Rating"]
  )

  movie_docs = loader.load()
  for doc in movie_docs:

    # Add the "Movie Title" (John Wick 1, 2, ...)
    doc.metadata["Movie_Title"] = f"John Wick {i}"

    # convert "Rating" to an `int`, if no rating is provided - assume 0 rating
    doc.metadata["Rating"] = int(doc.metadata["Rating"]) if doc.metadata["Rating"] else 0

    # newer movies have a more recent "last_accessed_at"
    doc.metadata["last_accessed_at"] = datetime.now() - timedelta(days=4-i)

  documents.extend(movie_docs)

Let's look at an example document to see if everything worked as expected!

In [3]:
documents[0]

Document(metadata={'source': 'john_wick_1.csv', 'row': 0, 'Review_Date': '6 May 2015', 'Review_Title': ' Kinetic, concise, and stylish; John Wick kicks ass.\n', 'Review_Url': '/review/rw3233896/?ref_=tt_urv', 'Author': 'lnvicta', 'Rating': 8, 'Movie_Title': 'John Wick 1', 'last_accessed_at': datetime.datetime(2025, 2, 26, 14, 42, 9, 823159)}, page_content=": 0\nReview: The best way I can describe John Wick is to picture Taken but instead of Liam Neeson it's Keanu Reeves and instead of his daughter it's his dog. That's essentially the plot of the movie. John Wick (Reeves) is out to seek revenge on the people who took something he loved from him. It's a beautifully simple premise for an action movie - when action movies get convoluted, they get bad i.e. A Good Day to Die Hard. John Wick gives the viewers what they want: Awesome action, stylish stunts, kinetic chaos, and a relatable hero to tie it all together. John Wick succeeds in its simplicity.")

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "JohnWick".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [4]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWick"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [5]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [6]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-3.5-turbo` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [7]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI()

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [8]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [9]:
naive_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"From the reviews provided, it seems that the majority of people really liked John Wick. The film was praised for its action sequences, Keanu Reeves' performance, and the overall entertainment value it provided. Therefore, it can be concluded that people generally liked John Wick."

In [10]:
naive_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10 for "John Wick 3". Here is the URL to that review: \'/review/rw4854296/?ref_=tt_urv\'.'

In [11]:
naive_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In John Wick, an ex-hitman seeks vengeance after gangsters kill his dog and steal his car, leading to a series of violent and action-packed events that unfold in the movie.'

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [12]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(documents)

We'll construct the same chain - only changing the retriever.

In [13]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [14]:
bm25_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"People's opinions on John Wick seem divided based on the reviews provided. Some reviewers loved the action and style of the movie, while others found it lacking in plot and substance. It ultimately depends on personal preference."

In [15]:
bm25_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

"I'm sorry, there are no reviews with a rating of 10 in the provided context."

In [16]:
bm25_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In John Wick, the main character John Wick is a retired hitman seeking vengeance for the death of his dog, which was a final gift from his deceased wife. This sets off a chain of events where Wick goes on a violent rampage against those responsible for the dog's death. The movie is known for its intense action sequences and stylish choreography."

It's not clear that this is better or worse - but the `I don't know` isn't great!

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [17]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-english-v3.0")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [18]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [19]:
contextual_compression_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Yes, people generally liked John Wick based on the positive reviews provided in the context.'

In [20]:
contextual_compression_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

"Yes, there is a review with a rating of 10. Here is the URL to that review: '/review/rw4854296/?ref_=tt_urv'"

In [21]:
contextual_compression_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In John Wick, after resolving his issues with the Russian mafia, John Wick returns home. However, the mobster Santino D'Antonio visits him and asks for his help, but Wick refuses. As a result, Santino blows up his house. Wick is then asked to kill Santino's sister in Rome, which he does. This leads to a contract being put on John Wick, attracting professional killers from everywhere. Wick promises to kill Santino, who is no longer protected by his marker."

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [22]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [23]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [24]:
multi_query_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Yes, people generally liked John Wick. The reviews mention how it was a cool, stylish, and fun action film that was highly recommended, with some even calling it the best action film of the year.'

In [25]:
multi_query_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

"Yes, there is a review with a rating of 10. Here is the URL to that review: '/review/rw4854296/?ref_=tt_urv'"

In [26]:
multi_query_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In John Wick, the main character, played by Keanu Reeves, is a retired assassin who is forced back into action after someone kills his dog and steals his car. He goes on a mission to pay off an old debt by helping take over the Assassin's Guild in various locations, resulting in a lot of carnage and action-packed sequences."

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [27]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = documents
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [28]:
client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = Qdrant(
    collection_name="full_documents", embeddings=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

  parent_document_vectorstore = Qdrant(


Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [29]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [30]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [31]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [32]:
parent_document_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

"People's opinions on John Wick seem to vary. Some really enjoy the series and find it consistent and well-received, while others find it to be horrible and nonsensical."

In [33]:
parent_document_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10. The review for "John Wick 3" by user \'ymyuseda\' has a rating of 10. The URL to that review is: \'/review/rw4854296/?ref_=tt_urv\''

In [34]:
parent_document_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

"In John Wick, Keanu Reeves plays John Wick, a retired assassin who comes out of retirement to seek vengeance after his dog is killed and his car is stolen. He is forced back into the world of assassins to pay off an old debt and help take over the Assassin's Guild. John travels to Italy, Canada, and Manhattan, killing many assassins along the way."

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [35]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [36]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [37]:
ensemble_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the reviews provided, it seems that people generally liked John Wick. The positive feedback from reviewers indicates that the movie was well-received by many.'

In [38]:
ensemble_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10 for the movie "John Wick 3". Here is the URL to that review: \'/review/rw4854296/?ref_=tt_urv\'.'

In [39]:
ensemble_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In John Wick, an ex-hitman comes out of retirement to seek vengeance against the gangsters who killed his dog and took everything from him. This leads to intense action, shootouts, and thrilling fights as he faces off against various adversaries in a quest for retribution.'

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

> NOTE: You do not need to run this cell if you're running this locally

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [40]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [41]:
semantic_documents = semantic_chunker.split_documents(documents)

Let's create a new vector store.

In [42]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="JohnWickSemantic"
)

We'll use naive retrieval for this example.

In [43]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [44]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [45]:
semantic_retrieval_chain.invoke({"question" : "Did people generally like John Wick?"})["response"].content

'Based on the reviews provided, it seems that people generally liked John Wick.'

In [46]:
semantic_retrieval_chain.invoke({"question" : "Do any reviews have a rating of 10? If so - can I have the URLs to those reviews?"})["response"].content

'Yes, there is a review with a rating of 10 for the movie "John Wick 3." Here is the URL to that review: \'/review/rw4854296/?ref_=tt_urv\'.'

In [47]:
semantic_retrieval_chain.invoke({"question" : "What happened in John Wick?"})["response"].content

'In "John Wick," the main character seeks revenge on the people who took something he loved from him. Specifically, his dog is killed, which sets off a chain of events leading to intense action scenes and a quest for vengeance.'

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

### Create a Golden Test Dataset

In [48]:
### YOUR CODE HERE
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings

generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

In [49]:
from ragas.testset import TestsetGenerator

generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)
golden_dataset = generator.generate_with_langchain_docs(documents, testset_size=10)

Applying SummaryExtractor:   0%|          | 0/44 [00:00<?, ?it/s]

Applying CustomNodeFilter:   0%|          | 0/100 [00:00<?, ?it/s]

Node 08af88b6-2491-41bf-a962-5921e33a66b4 does not have a summary. Skipping filtering.
Node 363d04ec-a722-491c-978f-fc4a76bf2e01 does not have a summary. Skipping filtering.
Node aa709b82-e61e-4377-a4b0-fe9effb4576f does not have a summary. Skipping filtering.
Node b5e591ba-4c2a-4dbf-8f0e-ebddcb0bf8e3 does not have a summary. Skipping filtering.
Node dfb4b243-afba-4833-9182-ef95a7348712 does not have a summary. Skipping filtering.
Node aaa38131-217f-4826-9c13-dc29e5b155ea does not have a summary. Skipping filtering.
Node cd501dc9-a32b-4182-8195-14689e076a4a does not have a summary. Skipping filtering.
Node a1f14fa0-9641-4dcc-8dc8-d6bc30d52002 does not have a summary. Skipping filtering.
Node 11756c76-1e1a-4f15-9e91-cf8decb6711a does not have a summary. Skipping filtering.
Node e5743135-6946-499f-bb8c-b9c7035319aa does not have a summary. Skipping filtering.
Node 1985469e-aefa-4dc7-9490-f1a2d24ed653 does not have a summary. Skipping filtering.
Node 7fd8f7e3-a44f-4d4c-8090-178e56c9d916 d

Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/244 [00:00<?, ?it/s]

Applying OverlapScoreBuilder:   0%|          | 0/1 [00:00<?, ?it/s]

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/10 [00:00<?, ?it/s]

In [54]:
golden_dataset.to_pandas()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,How does the film 'A Good Day to Die Hard' com...,[: 0\nReview: The best way I can describe John...,'A Good Day to Die Hard' is mentioned as an ex...,single_hop_specifc_query_synthesizer
1,What is the general reception of the John Wick...,[: 2\nReview: With the fourth installment scor...,The John Wick film series is apparently loved ...,single_hop_specifc_query_synthesizer
2,Who Chad Stahelski and what he do in John Wick?,[: 3\nReview: John wick has a very simple reve...,"Chad Stahelski is the director of John Wick, a...",single_hop_specifc_query_synthesizer
3,"What John Wick movie about, and how it do with...",[: 4\nReview: Though he no longer has a taste ...,The John Wick movie is about a retired assassi...,single_hop_specifc_query_synthesizer
4,What is the plot of the first John Wick movie?,[: 5\nReview: Ultra-violent first entry with l...,"In the original John Wick (2014), an ex-hit-ma...",single_hop_specifc_query_synthesizer
5,What are the criticisms of John Wick Chapter 2...,[<1-hop>\n\n: 23\nReview: I love me a bit of t...,Critics of John Wick Chapter 2 have pointed ou...,multi_hop_specific_query_synthesizer
6,Why John Wick 2 not surprise like first movie ...,[<1-hop>\n\n: 10\nReview: The first John Wick ...,John Wick 2 doesn't have the ability to surpri...,multi_hop_specific_query_synthesizer
7,How does the action and narrative development ...,[<1-hop>\n\n: 10\nReview: The first John Wick ...,"John Wick 2, while unable to surprise audience...",multi_hop_specific_query_synthesizer
8,In what ways does 'John Wick: Chapter 4' build...,[<1-hop>\n\n: 8\nReview: About 6 months ago I ...,'John Wick: Chapter 4' builds upon the stylist...,multi_hop_specific_query_synthesizer
9,How does 'John Wick: Chapter 3 - Parabellum' b...,[<1-hop>\n\n: 8\nReview: About 6 months ago I ...,'John Wick: Chapter 3 - Parabellum' is praised...,multi_hop_specific_query_synthesizer


In [50]:

from uuid import uuid4

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"AIE5 - Retrieval_Chain_Evaluation - {uuid4().hex[0:8]}"

In [51]:
from langsmith import Client

client = Client()

dataset_name = f"Retrievals_Eval_Dataset - {uuid4().hex[0:8]}"

langsmith_dataset = client.create_dataset(
    dataset_name=dataset_name,
    description="Questions about John Wick movies for evaluating multipler retrievals."
)
     

In [52]:
for data_row in golden_dataset.to_pandas().iterrows():
  print(data_row[1]["user_input"])
  print(data_row[1]["reference"])
  client.create_example(
      inputs={
          "question": data_row[1]["user_input"]
      },
      outputs={
          "answer": data_row[1]["reference"]
      },
      metadata={
          "context": data_row[1]["reference_contexts"]
      },
      dataset_id=langsmith_dataset.id
  )


How does the film 'A Good Day to Die Hard' compare to 'John Wick' in terms of plot complexity and effectiveness as an action movie?
'A Good Day to Die Hard' is mentioned as an example of an action movie that becomes bad when it gets convoluted, in contrast to 'John Wick,' which succeeds due to its beautifully simple premise. 'John Wick' is praised for delivering awesome action, stylish stunts, kinetic chaos, and a relatable hero, all tied together by its simplicity.
What is the general reception of the John Wick film series?
The John Wick film series is apparently loved by everyone else in the world, with the fourth installment scoring immensely at the cinemas.
Who Chad Stahelski and what he do in John Wick?
Chad Stahelski is the director of John Wick, and as a stunt specialist, he showcases his expertise through the film's virtuoso action sequences and well-made choreographies.
What John Wick movie about, and how it do with story and action?
The John Wick movie is about a retired assa

### LangSmith Evaluation Set-up

In [53]:
eval_llm = ChatOpenAI(model="gpt-4o")

In [55]:
from langsmith.evaluation import LangChainStringEvaluator, evaluate

# Define prepare_data to skip null outputs
def prepare_data(run, example):
    if run.outputs is None or not run.outputs:  # Check if outputs is None or empty
        print(f"Skipping run {run.id} due to null or empty output")
        return None  # This skips the evaluation for this run
    prediction = run.outputs["response"].content
    if isinstance(prediction, list):  # Handle list outputs (e.g., documents)
        prediction = " ".join(doc.page_content if hasattr(doc, "page_content") else str(doc) for doc in prediction)
    return {
        "prediction": prediction,
        "reference": example.outputs["answer"],
        "input": example.inputs["question"],
    }

def prepare_data_with_context(run, example):
    if run.outputs is None or not run.outputs:
        print(f"Skipping run {run.id} due to null or empty output")
        return None
    prediction = run.outputs["response"].content
    context = example.metadata["context"] 
    if isinstance(context, list):
        context = " ".join(doc.page_content if hasattr(doc, "page_content") else str(doc) for doc in context)
    elif not context and "context" in example.metadata:
        context = example.metadata["context"]  # Fallback to metadata if available
    return {
        "prediction": prediction,
        "reference": example.outputs["answer"],
        "input": example.inputs["question"],
        "context": context,  # Add context for precision/recall
    }

qa_evaluator = LangChainStringEvaluator("qa", config={"llm" : eval_llm})

answer_relevancy_evaluator = LangChainStringEvaluator(
    "labeled_score_string",
    config={
        "criteria": {
            "relevancy": "How relevant is this prediction to the question on a scale of 1-10? Focus on whether the answer addresses the query, ignoring factual accuracy."
        },
        "normalize_by": 10,
    },
    prepare_data=prepare_data  # Reuses existing prepare_data
)

context_precision_evaluator = LangChainStringEvaluator(
    "labeled_score_string",
    config={
        "criteria": {
            "precision": "How much of the provided context is relevant to the question and answer on a scale of 1-10? Penalize irrelevant or extraneous information."
        },
        "normalize_by": 10,
    },
    prepare_data=prepare_data_with_context
)

context_recall_evaluator = LangChainStringEvaluator(
    "labeled_score_string",
    config={
        "criteria": {
            "recall": "How much of the necessary context to answer the question correctly is present in the provided context on a scale of 1-10? Penalize missing relevant information."
        },
        "normalize_by": 10,
    },
    prepare_data=prepare_data_with_context  # Reuses context-enabled prepare_data
)

faithfulness_evaluator = LangChainStringEvaluator(
    "labeled_score_string",
    config={
        "criteria": {
            "faithfulness": "How faithful is the prediction to the provided context on a scale of 1-10? Penalize answers that introduce information not supported by the context."
        },
        "normalize_by": 10,
    },
    prepare_data=prepare_data_with_context
)


### Evaluate Naive Retrieval by using LangSmith Customized Evaluators

In [56]:
evaluate(
    naive_retrieval_chain.invoke,
    data=dataset_name,
    evaluators=[
        answer_relevancy_evaluator,
        context_precision_evaluator,
        context_recall_evaluator ,
        faithfulness_evaluator,
        context_recall_evaluator
    ],
    metadata={"retriever": "naive_retriever"},
)

View the evaluation results for experiment: 'best-month-50' at:
https://smith.langchain.com/o/48c839b0-a50d-4327-8b62-d77e548f014c/datasets/299968e3-f9b8-43be-bdfb-858258f2f342/compare?selectedSessions=0bb063b0-a321-4573-bfba-b48d4f55a4d3




0it [00:00, ?it/s]

Unnamed: 0,inputs.question,outputs.response,outputs.context,error,reference.answer,feedback.score_string:relevancy,feedback.score_string:precision,feedback.score_string:recall,feedback.score_string:faithfulness,execution_time,example_id,id
0,How does 'John Wick: Chapter 3 - Parabellum' b...,"content=""I don't know the specific details of ...",[page_content=': 0\nReview: It is 5 years sinc...,,'John Wick: Chapter 3 - Parabellum' is praised...,0.6,0.6,0.3,0.6,1.625579,83b8e7d6-0fc5-492c-abb7-b3df6676c720,579da5a4-ecc0-4c6f-a999-62fc9314df8c
1,In what ways does 'John Wick: Chapter 4' build...,"content=""In 'John Wick: Chapter 4', the film b...",[page_content=': 19\nReview: John Wick: Chapte...,,'John Wick: Chapter 4' builds upon the stylist...,1.0,1.0,0.9,1.0,2.047453,c428323a-3182-4c8b-84d1-0a507b5fa2f9,6c9d05b9-86d9-4d1c-9e82-89d34d6421b4
2,How does the action and narrative development ...,"content=""I don't know."" additional_kwargs={'re...",[page_content=': 0\nReview: It is 5 years sinc...,,"John Wick 2, while unable to surprise audience...",0.1,0.1,0.1,0.1,0.814736,e4b4fb23-5414-4932-85f9-84d76c34b4db,2c8ff229-4640-4dfa-9159-dd5d9d06e2b3
3,Why John Wick 2 not surprise like first movie ...,content='The first John Wick movie was surpris...,"[page_content=': 9\nReview: ""John Wick: Chapte...",,John Wick 2 doesn't have the ability to surpri...,1.0,0.9,0.8,0.9,1.635964,2141564b-a959-461b-85ae-7492503d76ef,e3b4757c-7fd5-4f88-9338-1b1f5322fd22
4,What are the criticisms of John Wick Chapter 2...,content='Some criticisms of John Wick Chapter ...,[page_content=': 23\nReview: I love me a bit o...,,Critics of John Wick Chapter 2 have pointed ou...,0.9,0.7,0.7,0.6,2.103687,6b9598f4-ceff-4b09-939a-f493186d516c,79f37925-3b94-4592-b16e-3883bce35b76
5,What is the plot of the first John Wick movie?,content='The plot of the first John Wick movie...,[page_content=': 5\nReview: Ultra-violent firs...,,"In the original John Wick (2014), an ex-hit-ma...",0.8,0.8,0.7,0.7,1.063877,0ef03896-2e2b-4410-b353-a802616458c3,756c16cf-5c91-4170-a189-7179e1d49b36
6,"What John Wick movie about, and how it do with...",content='John Wick is about a former hitman se...,[page_content=': 0\nReview: The best way I can...,,The John Wick movie is about a retired assassi...,0.9,0.8,0.7,0.8,1.838484,d563f5e1-7c57-4b4b-bf18-ab352b7a0320,27d7c896-125d-412b-888b-18518f0ce176
7,Who Chad Stahelski and what he do in John Wick?,content='Chad Stahelski is a stunt specialist ...,[page_content=': 3\nReview: John wick has a ve...,,"Chad Stahelski is the director of John Wick, a...",1.0,1.0,1.0,1.0,1.04778,8c924974-dbad-4ded-9c20-9012056429ef,278f2da5-ba42-4711-99d9-3066f163e884
8,What is the general reception of the John Wick...,"content=""The John Wick film series has been ge...","[page_content=': 9\nReview: At first glance, J...",,The John Wick film series is apparently loved ...,1.0,1.0,0.8,1.0,0.989175,8112ac9f-1486-4382-aa39-6b59adebede8,6f5ed88f-5cad-47db-a213-7d93903a36d1
9,How does the film 'A Good Day to Die Hard' com...,"content='Based on the context provided, ""John ...",[page_content=': 0\nReview: The best way I can...,,'A Good Day to Die Hard' is mentioned as an ex...,1.0,1.0,1.0,1.0,2.040199,3c2e3603-7399-4bc3-9741-2140da5af483,db180ea5-6d58-4517-9cdb-43cbdb818cb0


### Evaulate Best-Matching 25 Retrieval by using LangSmith Customized Evaluators

In [57]:
evaluate(
    bm25_retrieval_chain.invoke,
    data=dataset_name,
    evaluators=[
        answer_relevancy_evaluator,
        context_precision_evaluator,
        context_recall_evaluator ,
        faithfulness_evaluator,
        context_recall_evaluator
    ],
    metadata={"retriever": "bm25_retriever"},
)

View the evaluation results for experiment: 'abandoned-print-25' at:
https://smith.langchain.com/o/48c839b0-a50d-4327-8b62-d77e548f014c/datasets/299968e3-f9b8-43be-bdfb-858258f2f342/compare?selectedSessions=e0ce42c1-515e-4fa1-bc7c-d9e4ec5faebf




0it [00:00, ?it/s]

Unnamed: 0,inputs.question,outputs.response,outputs.context,error,reference.answer,feedback.score_string:relevancy,feedback.score_string:precision,feedback.score_string:recall,feedback.score_string:faithfulness,execution_time,example_id,id
0,How does 'John Wick: Chapter 3 - Parabellum' b...,"content=""I'm sorry, I don't have the specific ...",[page_content=': 8\nReview: About 6 months ago...,,'John Wick: Chapter 3 - Parabellum' is praised...,0.1,0.1,0.1,0.1,0.676443,83b8e7d6-0fc5-492c-abb7-b3df6676c720,634b9add-8c05-4132-8ad5-5fde011b0330
1,In what ways does 'John Wick: Chapter 4' build...,"content=""In 'John Wick: Chapter 4', the action...",[page_content=': 19\nReview: John Wick: Chapte...,,'John Wick: Chapter 4' builds upon the stylist...,1.0,1.0,0.7,0.8,1.231243,c428323a-3182-4c8b-84d1-0a507b5fa2f9,3389bba6-d288-43cc-8242-dbf7473b13e9
2,How does the action and narrative development ...,"content=""I don't know the specific details of ...",[page_content=': 18\nReview: Ever since the or...,,"John Wick 2, while unable to surprise audience...",0.2,0.2,0.2,0.1,1.239985,e4b4fb23-5414-4932-85f9-84d76c34b4db,bc59299f-78fc-4cbd-b309-0dcdd3cc0e4e
3,Why John Wick 2 not surprise like first movie ...,"content=""John Wick: Chapter 2 may not surprise...",[page_content=': 16\nReview: John Wick Chapter...,,John Wick 2 doesn't have the ability to surpri...,0.9,1.0,0.7,0.8,0.869902,2141564b-a959-461b-85ae-7492503d76ef,e876a4e7-2c7b-4079-acef-b74c45a0ba71
4,What are the criticisms of John Wick Chapter 2...,content='The criticisms of John Wick Chapter 2...,[page_content=': 23\nReview: I love me a bit o...,,Critics of John Wick Chapter 2 have pointed ou...,0.8,0.7,0.8,0.8,0.676614,6b9598f4-ceff-4b09-939a-f493186d516c,55e525db-824a-412a-ba23-0444f36ba430
5,What is the plot of the first John Wick movie?,"content=""I'm sorry, I don't know the plot of t...",[page_content=': 5\nReview: What is all the ra...,,"In the original John Wick (2014), an ex-hit-ma...",0.1,0.1,0.1,0.1,0.59303,0ef03896-2e2b-4410-b353-a802616458c3,03cbe976-d8ab-4f45-873c-9a6c5b8009a6
6,"What John Wick movie about, and how it do with...","content=""The John Wick movies are about a form...",[page_content=': 22\nReview: Lets contemplate ...,,The John Wick movie is about a retired assassi...,0.8,0.7,0.7,0.7,0.856432,d563f5e1-7c57-4b4b-bf18-ab352b7a0320,ed3d1d72-cf72-4e2c-88c5-61c307d44d7c
7,Who Chad Stahelski and what he do in John Wick?,content='Chad Stahelski is a stuntman turned w...,[page_content=': 11\nReview: Who needs a 2hr a...,,"Chad Stahelski is the director of John Wick, a...",0.7,0.7,0.6,0.7,0.580968,8c924974-dbad-4ded-9c20-9012056429ef,8eb5fc54-11cb-4972-95ea-c71cbd924f57
8,What is the general reception of the John Wick...,"content=""The general reception of the John Wic...",[page_content=': 8\nReview: It's hard to find ...,,The John Wick film series is apparently loved ...,1.0,1.0,0.8,0.9,0.953862,8112ac9f-1486-4382-aa39-6b59adebede8,64339165-c353-4371-8837-8d6a4ac15282
9,How does the film 'A Good Day to Die Hard' com...,content='In terms of plot complexity and effec...,[page_content=': 0\nReview: The best way I can...,,'A Good Day to Die Hard' is mentioned as an ex...,1.0,1.0,0.9,1.0,1.197196,3c2e3603-7399-4bc3-9741-2140da5af483,b7162bff-faaf-4489-a7b7-5746d8858bd3


### Evaluate Contextual Compression Retrieval By Using LangSmith Customized Evaluators

In [58]:
evaluate(
    contextual_compression_retrieval_chain.invoke,
    data=dataset_name,
    evaluators=[
        answer_relevancy_evaluator,
        context_precision_evaluator,
        context_recall_evaluator ,
        faithfulness_evaluator,
        context_recall_evaluator
    ],
    metadata={"retriever": "contextual_compression_retriever"},
)

View the evaluation results for experiment: 'reflecting-tub-67' at:
https://smith.langchain.com/o/48c839b0-a50d-4327-8b62-d77e548f014c/datasets/299968e3-f9b8-43be-bdfb-858258f2f342/compare?selectedSessions=1a1cff93-f370-414e-9544-34178e5f544d




0it [00:00, ?it/s]

Unnamed: 0,inputs.question,outputs.response,outputs.context,error,reference.answer,feedback.score_string:relevancy,feedback.score_string:precision,feedback.score_string:recall,feedback.score_string:faithfulness,execution_time,example_id,id
0,How does 'John Wick: Chapter 3 - Parabellum' b...,"content=""I'm sorry, I don't have the specific ...","[page_content=': 11\nReview: The overrated ""Jo...",,'John Wick: Chapter 3 - Parabellum' is praised...,0.1,0.1,0.1,0.1,2.281386,83b8e7d6-0fc5-492c-abb7-b3df6676c720,39d92dc3-c822-4825-8b34-e001f353fcea
1,In what ways does 'John Wick: Chapter 4' build...,"content=""Based on the provided reviews, 'John ...",[page_content=': 19\nReview: John Wick: Chapte...,,'John Wick: Chapter 4' builds upon the stylist...,1.0,1.0,1.0,1.0,2.227369,c428323a-3182-4c8b-84d1-0a507b5fa2f9,06c96382-783c-4031-9abf-7f6bf86745f6
2,How does the action and narrative development ...,"content=""I'm sorry, I don't have the specific ...",[page_content=': 8\nReview: In this 2nd instal...,,"John Wick 2, while unable to surprise audience...",0.1,0.1,0.1,0.1,1.367513,e4b4fb23-5414-4932-85f9-84d76c34b4db,eaef27a5-805f-4b36-9ad0-a7ca9cfd5ee9
3,Why John Wick 2 not surprise like first movie ...,content='John Wick 2 may not have surprised li...,[page_content=': 16\nReview: John Wick Chapter...,,John Wick 2 doesn't have the ability to surpri...,1.0,1.0,1.0,1.0,1.442475,2141564b-a959-461b-85ae-7492503d76ef,14417a9b-e7c9-410c-a9af-13972bcb28d7
4,What are the criticisms of John Wick Chapter 2...,content='The criticisms of John Wick Chapter 2...,[page_content=': 11\nReview: Don't believe the...,,Critics of John Wick Chapter 2 have pointed ou...,0.8,0.7,0.7,0.6,2.301835,6b9598f4-ceff-4b09-939a-f493186d516c,6237e021-2f45-4dcc-8a1d-d52a71110d56
5,What is the plot of the first John Wick movie?,content='The plot of the first John Wick movie...,[page_content=': 0\nReview: The best way I can...,,"In the original John Wick (2014), an ex-hit-ma...",0.6,0.7,0.5,0.6,1.16872,0ef03896-2e2b-4410-b353-a802616458c3,f2961ce3-d324-4c1a-a466-3e875c2aaafc
6,"What John Wick movie about, and how it do with...","content=""John Wick is a movie about a man seek...",[page_content=': 0\nReview: The best way I can...,,The John Wick movie is about a retired assassi...,0.8,0.8,0.6,0.7,1.785372,d563f5e1-7c57-4b4b-bf18-ab352b7a0320,42f63bbd-674a-4279-9238-ca6b80d1f98a
7,Who Chad Stahelski and what he do in John Wick?,content='Chad Stahelski is a stuntman turned w...,[page_content=': 17\nReview: Stuntman turned w...,,"Chad Stahelski is the director of John Wick, a...",0.7,0.8,0.6,0.7,0.969143,8c924974-dbad-4ded-9c20-9012056429ef,ab9b367c-0e10-4e77-89c6-6df1f8744629
8,What is the general reception of the John Wick...,content='The general reception of the John Wic...,[page_content=': 20\nReview: In a world where ...,,The John Wick film series is apparently loved ...,1.0,1.0,0.8,0.8,1.856211,8112ac9f-1486-4382-aa39-6b59adebede8,aae4b5b0-0a63-4764-94ba-74a1530d1788
9,How does the film 'A Good Day to Die Hard' com...,"content=""Based on the context provided, 'John ...",[page_content=': 0\nReview: The best way I can...,,'A Good Day to Die Hard' is mentioned as an ex...,1.0,1.0,1.0,1.0,1.733475,3c2e3603-7399-4bc3-9741-2140da5af483,881be1a6-d9a6-4ecb-873d-ca3ebaa7fbc1


### Evaluate Multi-Query Retrieval by using LangSmith Customized Evaluators

In [59]:
evaluate(
    multi_query_retrieval_chain.invoke,
    data=dataset_name,
    evaluators=[
        answer_relevancy_evaluator,
        context_precision_evaluator,
        context_recall_evaluator ,
        faithfulness_evaluator,
        context_recall_evaluator
    ],
    metadata={"retriever": "multi_query_retriever"},
)

View the evaluation results for experiment: 'excellent-dust-16' at:
https://smith.langchain.com/o/48c839b0-a50d-4327-8b62-d77e548f014c/datasets/299968e3-f9b8-43be-bdfb-858258f2f342/compare?selectedSessions=7b677c70-9daf-43bb-8977-7c4a95e7797c




0it [00:00, ?it/s]

Unnamed: 0,inputs.question,outputs.response,outputs.context,error,reference.answer,feedback.score_string:relevancy,feedback.score_string:precision,feedback.score_string:recall,feedback.score_string:faithfulness,execution_time,example_id,id
0,How does 'John Wick: Chapter 3 - Parabellum' b...,"content=""I don't know the specific details abo...",[page_content=': 0\nReview: It is 5 years sinc...,,'John Wick: Chapter 3 - Parabellum' is praised...,0.1,0.1,0.1,0.1,3.880544,83b8e7d6-0fc5-492c-abb7-b3df6676c720,b9d06e9a-61cc-4172-8299-ec5a7105d3ca
1,In what ways does 'John Wick: Chapter 4' build...,"content=""In 'John Wick: Chapter 4', the stylis...",[page_content=': 19\nReview: John Wick: Chapte...,,'John Wick: Chapter 4' builds upon the stylist...,1.0,1.0,0.9,1.0,6.138372,c428323a-3182-4c8b-84d1-0a507b5fa2f9,7b49e087-b0c7-4a49-a7e1-72c909b74a32
2,How does the action and narrative development ...,"content=""I'm sorry, I don't have the specific ...","[page_content=': 9\nReview: ""John Wick: Chapte...",,"John Wick 2, while unable to surprise audience...",0.1,0.1,0.1,0.1,2.946474,e4b4fb23-5414-4932-85f9-84d76c34b4db,999d0fba-5dd4-4ade-915a-cbd0a1c74992
3,Why John Wick 2 not surprise like first movie ...,content='John Wick 2 may not have been as surp...,[page_content=': 10\nReview: The first John Wi...,,John Wick 2 doesn't have the ability to surpri...,0.9,0.9,0.8,0.8,3.461864,2141564b-a959-461b-85ae-7492503d76ef,76d6b4f0-df8e-421b-b9b1-96ba02feb164
4,What are the criticisms of John Wick Chapter 2...,content='The criticisms of John Wick: Chapter ...,"[page_content=': 9\nReview: ""John Wick: Chapte...",,Critics of John Wick Chapter 2 have pointed ou...,0.8,0.8,0.8,0.8,5.532222,6b9598f4-ceff-4b09-939a-f493186d516c,81062b21-b984-450c-9c96-0942ec36855e
5,What is the plot of the first John Wick movie?,"content=""The plot of the first John Wick movie...",[page_content=': 18\nReview: When the story be...,,"In the original John Wick (2014), an ex-hit-ma...",1.0,1.0,0.7,1.0,3.845059,0ef03896-2e2b-4410-b353-a802616458c3,f7792337-26c5-4573-8f77-ae41c5957ed6
6,"What John Wick movie about, and how it do with...","content=""The John Wick movies are action-packe...",[page_content=': 1\nReview: I'm a fan of the J...,,The John Wick movie is about a retired assassi...,1.0,0.8,0.7,0.8,4.33374,d563f5e1-7c57-4b4b-bf18-ab352b7a0320,a7243d1a-c767-4012-92a8-267878914f98
7,Who Chad Stahelski and what he do in John Wick?,content='Chad Stahelski is a stunt specialist ...,[page_content=': 3\nReview: John wick has a ve...,,"Chad Stahelski is the director of John Wick, a...",1.0,1.0,1.0,1.0,3.392288,8c924974-dbad-4ded-9c20-9012056429ef,950d82e8-2f90-4343-9000-1914f50315a2
8,What is the general reception of the John Wick...,"content=""The general reception of the John Wic...",[page_content=': 20\nReview: In a world where ...,,The John Wick film series is apparently loved ...,1.0,1.0,0.8,0.9,2.883424,8112ac9f-1486-4382-aa39-6b59adebede8,44f6ca39-f438-4934-a2d4-06267e3728ff
9,How does the film 'A Good Day to Die Hard' com...,"content=""Based on the context provided, the fi...",[page_content=': 0\nReview: The best way I can...,,'A Good Day to Die Hard' is mentioned as an ex...,1.0,1.0,1.0,1.0,3.157135,3c2e3603-7399-4bc3-9741-2140da5af483,47f1de02-ac90-448d-9de5-c81c7901f29d


### Evaluate Parent Document Retrieval by using LangSmith Evaluators

In [60]:
evaluate(
    parent_document_retrieval_chain.invoke,
    data=dataset_name,
    evaluators=[
        answer_relevancy_evaluator,
        context_precision_evaluator,
        context_recall_evaluator ,
        faithfulness_evaluator,
        context_recall_evaluator
    ],
    metadata={"retriever": "parent_document_retriever"},
)

View the evaluation results for experiment: 'mealy-income-6' at:
https://smith.langchain.com/o/48c839b0-a50d-4327-8b62-d77e548f014c/datasets/299968e3-f9b8-43be-bdfb-858258f2f342/compare?selectedSessions=882e9e30-e7a9-4b53-8f9f-660a40e60e7a




0it [00:00, ?it/s]

Unnamed: 0,inputs.question,outputs.response,outputs.context,error,reference.answer,feedback.score_string:relevancy,feedback.score_string:precision,feedback.score_string:recall,feedback.score_string:faithfulness,execution_time,example_id,id
0,How does 'John Wick: Chapter 3 - Parabellum' b...,"content=""John Wick: Chapter 3 - Parabellum bal...",[page_content=': 13\nReview: Following on from...,,'John Wick: Chapter 3 - Parabellum' is praised...,0.5,0.4,0.3,0.3,1.660241,83b8e7d6-0fc5-492c-abb7-b3df6676c720,2a33da9e-59d0-4cb2-b05c-037c0636f82f
1,In what ways does 'John Wick: Chapter 4' build...,"content=""Based on the provided context, 'John ...",[page_content=': 19\nReview: John Wick: Chapte...,,'John Wick: Chapter 4' builds upon the stylist...,1.0,1.0,1.0,1.0,2.740898,c428323a-3182-4c8b-84d1-0a507b5fa2f9,965112aa-fca9-4a6d-8d00-cc4212e9ea44
2,How does the action and narrative development ...,"content=""I'm sorry, I don't have the specific ...",[page_content=': 14\nReview: By now you know w...,,"John Wick 2, while unable to surprise audience...",0.1,0.1,0.1,0.2,0.964714,e4b4fb23-5414-4932-85f9-84d76c34b4db,a142de55-0657-4c8f-abc1-6aab961b39c7
3,Why John Wick 2 not surprise like first movie ...,content='John Wick Chapter 2 may not have surp...,[page_content=': 16\nReview: John Wick Chapter...,,John Wick 2 doesn't have the ability to surpri...,0.9,0.9,0.8,0.7,1.286105,2141564b-a959-461b-85ae-7492503d76ef,5851d1d0-9e19-45d8-9cbd-66a3893146e7
4,What are the criticisms of John Wick Chapter 2...,content='The criticisms of John Wick Chapter 2...,[page_content=': 11\nReview: Don't believe the...,,Critics of John Wick Chapter 2 have pointed ou...,0.6,0.5,0.3,0.4,1.026806,6b9598f4-ceff-4b09-939a-f493186d516c,8dba4e3b-e0a4-409a-950b-69b4aadc1842
5,What is the plot of the first John Wick movie?,content='The plot of the first John Wick movie...,[page_content=': 0\nReview: The best way I can...,,"In the original John Wick (2014), an ex-hit-ma...",0.8,0.7,0.7,0.7,1.237546,0ef03896-2e2b-4410-b353-a802616458c3,397a0ef8-bc41-488a-bb12-b34e453c6772
6,"What John Wick movie about, and how it do with...",content='The John Wick movie series follows th...,[page_content=': 8\nReview: It's hard to find ...,,The John Wick movie is about a retired assassi...,0.7,0.7,0.7,0.7,1.866116,d563f5e1-7c57-4b4b-bf18-ab352b7a0320,3510c6b4-1648-4b32-80d3-3ad5a0d5d593
7,Who Chad Stahelski and what he do in John Wick?,content='Chad Stahelski is a stuntman turned w...,[page_content=': 17\nReview: Stuntman turned w...,,"Chad Stahelski is the director of John Wick, a...",1.0,1.0,1.0,0.9,0.996398,8c924974-dbad-4ded-9c20-9012056429ef,7628289a-259b-439a-b3d9-593e66232a7f
8,What is the general reception of the John Wick...,content='The general reception of the John Wic...,[page_content=': 20\nReview: In a world where ...,,The John Wick film series is apparently loved ...,0.8,0.8,0.7,0.6,1.086273,8112ac9f-1486-4382-aa39-6b59adebede8,2637a911-3e23-4b7f-8ac3-45593b9432e6
9,How does the film 'A Good Day to Die Hard' com...,content='In terms of plot complexity and effec...,[page_content=': 0\nReview: The best way I can...,,'A Good Day to Die Hard' is mentioned as an ex...,1.0,1.0,1.0,1.0,2.237478,3c2e3603-7399-4bc3-9741-2140da5af483,75ef6e21-7555-4527-a3bd-664d416cf507


### Evaluate Ensemble Retrieval by using LangSmith Customized Evaluators

In [61]:
evaluate(
    ensemble_retrieval_chain.invoke,
    data=dataset_name,
    evaluators=[
        answer_relevancy_evaluator,
        context_precision_evaluator,
        context_recall_evaluator ,
        faithfulness_evaluator,
        context_recall_evaluator
    ],
    metadata={"retriever": "ensemble_retriever"},
)

View the evaluation results for experiment: 'long-wing-13' at:
https://smith.langchain.com/o/48c839b0-a50d-4327-8b62-d77e548f014c/datasets/299968e3-f9b8-43be-bdfb-858258f2f342/compare?selectedSessions=e74215bf-c916-4a1f-b4b3-f72db767a0f8




0it [00:00, ?it/s]

Unnamed: 0,inputs.question,outputs.response,outputs.context,error,reference.answer,feedback.score_string:relevancy,feedback.score_string:precision,feedback.score_string:recall,feedback.score_string:faithfulness,execution_time,example_id,id
0,How does 'John Wick: Chapter 3 - Parabellum' b...,"content=""I don't know."" additional_kwargs={'re...",[page_content=': 13\nReview: Following on from...,,'John Wick: Chapter 3 - Parabellum' is praised...,0.1,0.1,0.1,0.1,3.620429,83b8e7d6-0fc5-492c-abb7-b3df6676c720,7043b5c8-f70b-496d-a56b-74e720276fdd
1,In what ways does 'John Wick: Chapter 4' build...,"content=""In 'John Wick: Chapter 4', the stylis...",[page_content=': 19\nReview: John Wick: Chapte...,,'John Wick: Chapter 4' builds upon the stylist...,1.0,1.0,0.9,1.0,8.24629,c428323a-3182-4c8b-84d1-0a507b5fa2f9,8f4dba4c-727f-4c0f-8c91-a3473033585d
2,How does the action and narrative development ...,"content=""I don't know the specific details of ...",[page_content=': 8\nReview: In this 2nd instal...,,"John Wick 2, while unable to surprise audience...",0.1,0.1,0.1,0.1,3.719703,e4b4fb23-5414-4932-85f9-84d76c34b4db,6f4557d6-2d24-4589-8dc0-7b05716a35fe
3,Why John Wick 2 not surprise like first movie ...,content='John Wick Chapter 2 may not have been...,[page_content=': 16\nReview: John Wick Chapter...,,John Wick 2 doesn't have the ability to surpri...,1.0,0.9,0.8,1.0,3.919303,2141564b-a959-461b-85ae-7492503d76ef,23cd0c6e-fa75-48aa-91b0-cfda5fa27a19
4,What are the criticisms of John Wick Chapter 2...,content='The criticisms of John Wick Chapter 2...,[page_content=': 23\nReview: I love me a bit o...,,Critics of John Wick Chapter 2 have pointed ou...,0.9,0.8,0.7,0.7,3.709244,6b9598f4-ceff-4b09-939a-f493186d516c,2d1a62d3-5736-4783-a405-bf897e4d8b6b
5,What is the plot of the first John Wick movie?,content='The plot of the first John Wick movie...,[page_content=': 5\nReview: Ultra-violent firs...,,"In the original John Wick (2014), an ex-hit-ma...",0.8,0.8,0.6,0.7,4.836296,0ef03896-2e2b-4410-b353-a802616458c3,1addeccc-6ead-4124-b50a-3575a805fc51
6,"What John Wick movie about, and how it do with...",content='John Wick is a movie series that foll...,[page_content=': 0\nReview: The best way I can...,,The John Wick movie is about a retired assassi...,1.0,0.8,0.7,0.8,3.887344,d563f5e1-7c57-4b4b-bf18-ab352b7a0320,9722252c-9a05-49f2-8257-19e9a78d9a8a
7,Who Chad Stahelski and what he do in John Wick?,content='Chad Stahelski is a stuntman turned w...,[page_content=': 17\nReview: Stuntman turned w...,,"Chad Stahelski is the director of John Wick, a...",1.0,1.0,0.8,0.9,3.14839,8c924974-dbad-4ded-9c20-9012056429ef,0cd489f4-8ceb-450b-9303-9d721bce1194
8,What is the general reception of the John Wick...,"content=""The general reception of the John Wic...",[page_content=': 20\nReview: In a world where ...,,The John Wick film series is apparently loved ...,1.0,1.0,1.0,1.0,3.890408,8112ac9f-1486-4382-aa39-6b59adebede8,2977bf04-580a-4603-bed4-805348f83670
9,How does the film 'A Good Day to Die Hard' com...,"content='Based on the context provided, ""John ...",[page_content=': 0\nReview: The best way I can...,,'A Good Day to Die Hard' is mentioned as an ex...,1.0,1.0,1.0,1.0,8.866274,3c2e3603-7399-4bc3-9741-2140da5af483,7cfaf0b2-acee-4c03-bf46-46839c6bd90a


### Evaluate Semantic Chunk with Naive Retrieval by using LangSmith Evaluators

In [62]:
evaluate(
    semantic_retrieval_chain.invoke,
    data=dataset_name,
    evaluators=[
        answer_relevancy_evaluator,
        context_precision_evaluator,
        context_recall_evaluator ,
        faithfulness_evaluator,
        context_recall_evaluator
    ],
    metadata={"retriever": "semantic_retriever"},
)

View the evaluation results for experiment: 'abandoned-death-7' at:
https://smith.langchain.com/o/48c839b0-a50d-4327-8b62-d77e548f014c/datasets/299968e3-f9b8-43be-bdfb-858258f2f342/compare?selectedSessions=f6dc7a92-2517-45a7-b8ac-ff179dc08182




0it [00:00, ?it/s]

Unnamed: 0,inputs.question,outputs.response,outputs.context,error,reference.answer,feedback.score_string:relevancy,feedback.score_string:precision,feedback.score_string:recall,feedback.score_string:faithfulness,execution_time,example_id,id
0,How does 'John Wick: Chapter 3 - Parabellum' b...,"content=""I don't know."" additional_kwargs={'re...",[page_content=': 13\nReview: Following on from...,,'John Wick: Chapter 3 - Parabellum' is praised...,0.1,0.1,0.1,0.1,0.705761,83b8e7d6-0fc5-492c-abb7-b3df6676c720,1459df39-2292-4a87-9756-645184d761c5
1,In what ways does 'John Wick: Chapter 4' build...,"content=""In 'John Wick: Chapter 4', the stylis...",[page_content=': 19\nReview: John Wick: Chapte...,,'John Wick: Chapter 4' builds upon the stylist...,1.0,1.0,0.9,1.0,2.038134,c428323a-3182-4c8b-84d1-0a507b5fa2f9,cc86c9a2-bff5-4c14-bd66-efbb2f6cba90
2,How does the action and narrative development ...,"content=""I don't know the specific details of ...",[page_content=': 0\nReview: It is 5 years sinc...,,"John Wick 2, while unable to surprise audience...",0.1,0.1,0.1,0.1,0.868458,e4b4fb23-5414-4932-85f9-84d76c34b4db,375956a1-83cf-4574-abf4-0649b58d7dc1
3,Why John Wick 2 not surprise like first movie ...,"content=""I don't know why John Wick 2 did not ...",[page_content='This is EXACTLY what you want o...,,John Wick 2 doesn't have the ability to surpri...,0.7,0.6,0.4,0.6,0.988735,2141564b-a959-461b-85ae-7492503d76ef,bece7987-6b14-4dd4-b133-1bcc98bdce85
4,What are the criticisms of John Wick Chapter 2...,content='The criticisms of John Wick Chapter 2...,"[page_content='Failing that, the gory mayhem h...",,Critics of John Wick Chapter 2 have pointed ou...,0.7,0.5,0.5,0.5,1.178022,6b9598f4-ceff-4b09-939a-f493186d516c,01e102a6-1c53-45d7-824b-047a53a7ba01
5,What is the plot of the first John Wick movie?,content='The plot of the first John Wick movie...,[page_content='John Wick (Reeves) is out to se...,,"In the original John Wick (2014), an ex-hit-ma...",0.8,0.7,0.7,0.7,1.139589,0ef03896-2e2b-4410-b353-a802616458c3,4a8300a6-9101-4837-ae62-6317dce3489f
6,"What John Wick movie about, and how it do with...",content='John Wick is about a former hitman se...,[page_content='John Wick (Reeves) is out to se...,,The John Wick movie is about a retired assassi...,1.0,1.0,0.7,0.8,1.600873,d563f5e1-7c57-4b4b-bf18-ab352b7a0320,bba3cb3b-8fc5-4b4d-aae1-6bb32a681df8
7,Who Chad Stahelski and what he do in John Wick?,content='Chad Stahelski is a stunt specialist ...,[page_content='Directed by Chad Stahelski who'...,,"Chad Stahelski is the director of John Wick, a...",1.0,1.0,1.0,1.0,0.994378,8c924974-dbad-4ded-9c20-9012056429ef,63449f33-a08f-4337-b040-3073e51f937d
8,What is the general reception of the John Wick...,"content=""The John Wick film series has been we...",[page_content=': 20\nReview: In a world where ...,,The John Wick film series is apparently loved ...,1.0,1.0,0.8,0.9,1.173549,8112ac9f-1486-4382-aa39-6b59adebede8,1af0b202-1f6f-4865-8a5e-d399602f443a
9,How does the film 'A Good Day to Die Hard' com...,"content=""Based on the context provided, 'John ...",[page_content='John Wick (Reeves) is out to se...,,'A Good Day to Die Hard' is mentioned as an ex...,1.0,0.9,0.9,0.9,2.450341,3c2e3603-7399-4bc3-9741-2140da5af483,1abc624f-4bb0-4e17-a80b-4d1341c42249


## Evaluate Each Retrieval's Performance By using Ragas

In [65]:
from ragas import EvaluationDataset
from ragas import evaluate
from ragas.llms import LangchainLLMWrapper
from ragas.metrics import LLMContextRecall, Faithfulness, FactualCorrectness, ResponseRelevancy, ContextEntityRecall, NoiseSensitivity
from ragas import evaluate, RunConfig

evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o", seed=10))
custom_run_config = RunConfig(timeout=1440)


def ragas_evaluate_one_chain(evaluator_llm, dataset, the_chain, custom_run_config):

    for test_row in dataset:
        response = the_chain.invoke({"question" : test_row.eval_sample.user_input})
        test_row.eval_sample.response = response["response"].content
        test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]

    evaluation_dataset = EvaluationDataset.from_pandas(dataset.to_pandas())

    result = evaluate(
        dataset=evaluation_dataset,
        metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness(), ResponseRelevancy(), ContextEntityRecall(), NoiseSensitivity()],
        llm=evaluator_llm,
        run_config=custom_run_config
    )

    return result


In [66]:
ragas_evaluate_one_chain(evaluator_llm, golden_dataset, naive_retrieval_chain, custom_run_config)

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

{'context_recall': 0.7817, 'faithfulness': 0.7732, 'factual_correctness': 0.5470, 'answer_relevancy': 0.6678, 'context_entity_recall': 0.5750, 'noise_sensitivity_relevant': 0.3390}

In [68]:
ragas_evaluate_one_chain(evaluator_llm, golden_dataset, bm25_retrieval_chain, custom_run_config)

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

{'context_recall': 0.6600, 'faithfulness': 0.6908, 'factual_correctness': 0.4410, 'answer_relevancy': 0.6679, 'context_entity_recall': 0.5217, 'noise_sensitivity_relevant': 0.1175}

In [70]:
contextual_compression_performance=ragas_evaluate_one_chain(evaluator_llm, golden_dataset, contextual_compression_retrieval_chain, custom_run_config)

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

In [71]:
contextual_compression_performance

{'context_recall': 0.7233, 'faithfulness': 0.8703, 'factual_correctness': 0.4090, 'answer_relevancy': 0.7667, 'context_entity_recall': 0.4833, 'noise_sensitivity_relevant': 0.3703}

In [67]:
ragas_evaluate_one_chain(evaluator_llm, golden_dataset, multi_query_retrieval_chain, custom_run_config)

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

{'context_recall': 0.7495, 'faithfulness': 0.7515, 'factual_correctness': 0.4950, 'answer_relevancy': 0.7554, 'context_entity_recall': 0.5250, 'noise_sensitivity_relevant': 0.3858}

In [72]:
parent_document_performance = ragas_evaluate_one_chain(evaluator_llm, golden_dataset, parent_document_retrieval_chain, custom_run_config)
parent_document_performance

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

{'context_recall': 0.5567, 'faithfulness': 0.8917, 'factual_correctness': 0.3960, 'answer_relevancy': 0.8660, 'context_entity_recall': 0.4583, 'noise_sensitivity_relevant': 0.3292}

In [73]:
ensemble_performance = ragas_evaluate_one_chain(evaluator_llm, golden_dataset, ensemble_retrieval_chain, custom_run_config)
ensemble_performance

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

Exception raised in Job[35]: RateLimitError(Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}})
Exception raised in Job[59]: RateLimitError(Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}})
Exception raised in Job[53]: RateLimitError(Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': '

{'context_recall': 0.8567, 'faithfulness': 0.8909, 'factual_correctness': 0.5460, 'answer_relevancy': 0.7682, 'context_entity_recall': 0.5167, 'noise_sensitivity_relevant': 0.1250}

### Semantic chunk with Naive Retriever

In [75]:
semantic_performance = ragas_evaluate_one_chain(evaluator_llm, golden_dataset, semantic_retrieval_chain, custom_run_config)
semantic_performance

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

{'context_recall': 0.8314, 'faithfulness': 0.7092, 'factual_correctness': 0.4860, 'answer_relevancy': 0.7651, 'context_entity_recall': 0.5583, 'noise_sensitivity_relevant': 0.2597}

## Evalution Summary on Retrievals' Cost, Latency, Performance

#### Summary on Evaluation Steps

    1. Used Ragas to create a golden dataset with 10 examples of questions and related answers
    2. Created 4 LangSmith customized labeled_score_evaluators for relevancy, precision, recall and faithfulness, ran evaluators on 7 retrievers by using the golden dataset
    3. After comparing the langsmith evaluation results, I found the cost and latency results seem make sense while the relvancy, precision, recall and faithfulness scores generated by the customized evaluators seems not trustable because Naive Retrieval got the best scores
    4. Ran Ragas Evaluator on each retrieval chain to compare 6 RAG metrics   

#### Summary on my Evaluation Results
1. Cost
    - The most expensive retrieval is Ensemble Retrieval, and it runs multiple retrievers and merges results. So its cost was higher than other retrievals. I am not sure why the cost was not doubled or trippled.
    - The 2nd expensive retrieval is Multi-Query Retrieval as it generates multiple queries, often using a language model, which increases computational cost. Additional retrieval passes and result aggregation adds to resource usage.
2. Latency
    - Ensemble Retrieval had the longest latency. As LangChain’s EnsembleRetriever typically executes retrievers in parallel where possible, it minizes the latency. So its latency was higher than other retrievals but not much higher.
    - Multi-Query Retrieval had the 2nd longest latency. Multiple queries mean multiple retrieval operations, increasing latency compared to single-query methods.
3. Performance
    - Ensemble Retrieval got the highest scores on context_recall, faithfulness and factual-correctness, and 2nd high on answer_relevancy (lower than parent document retrieval). Because this retrieval combines strengths of different methods (e.g., lexical precision of BM25 and semantic depth of embeddings), leading to robust recall and precision.
    - Parent Document Retrieval got the highest scores on faithfulness and answer_relevancy. Because it enhances context by returning full documents rather than isolated snippets, improving relevance for tasks needing broader understanding.
    - Contextual Compression (Rerank) Retrieval excels at improving precision by leveraging semantic understanding, making it ideal for queries requiring nuanced relevance. but it did not excel in the evaluation results compared with other retrievals. 
    - With only 10 examples and one time run, the evaluation results may not conclude correctly. 


##### Runs List:

    1. Naive Retrieval
    2. Best Matching 25 (BM25) Retrieval
    3. Contextual Compression Retrieval
    4. Multi-query Retrieval
    5. Parent-document Retrieval 
    6. Ensemble Retrieval
    7. Semantic Chunk Retrieval


<img src="img/LangSmith_Comparing.jpg" />

<img src="img/LangSmith_customized_evaulators.jpg" />

<img src="img/Ragas_Performance_Comparing.jpg" />