# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [1]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [2]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [3]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [4]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [5]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [6]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [7]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [8]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [9]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [10]:
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided complaints, one of the most common issues with loans appears to be problems related to mismanagement and errors by loan servicers. This includes errors in loan balances, misapplied payments, wrongful denials of payment plans, incorrect reporting of account status, and difficulties in applying payments correctly. Many complaints also involve discrepancies in loan balances, incorrect or outdated information on credit reports, unauthorized transfers of loans without proper notification, and trouble with loan information and repayment terms.\n\nTherefore, the most common issue seems to be **problems caused by errors, mismanagement, or mishandling by loan servicers**, which often lead to inaccurate loan information, credit report inaccuracies, and repayment difficulties for borrowers.'

In [11]:
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided data, it appears that several complaints were not handled in a timely manner. For example:\n\n- The complaint received on 03/28/25 from MOHELA was marked as **"Timely response?": No**, indicating it was not handled promptly.\n- The complaint received on 04/14/25 from Nelnet, Inc. also was marked as **"Timely response?": Yes**, so it was handled promptly.\n- The complaint received on 04/24/25 from Maximus Federal Services, Inc. was also marked as **"Timely response?": Yes**.\n- However, the complaint received on 04/24/25 from Nelnet, Inc. was handled in a timely manner as well.\n- The complaint from 04/18/25 from EdFinancial Services was handled promptly.\n- The complaint received on 04/05/25 from Maximus Federal Services, Inc. was marked as **"Timely response?": Yes**.\n- The complaint from 04/14/25 from EdFinancial Services was handled promptly.\n\nBut the complaint from 03/28/25 from MOHELA was explicitly not handled in time, which indicates at least one compla

In [12]:
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans mainly because of financial hardships arising from unmanageable interest accumulation, miscommunication or lack of proper information from lenders or servicers, and the inability to afford increased payments. Many borrowers experienced complications such as interest continuously accruing despite making payments, being placed in forbearance or deferment options that extended repayment periods and increased total interest owed, and not being properly notified about their loan status or payment schedules. Additionally, some borrowers faced issues like transfers between loan servicers without proper communication, incorrect or confusing information about repayment obligations, or difficulties in applying extra payments towards principal. These factors, combined with stagnant wages and economic challenges, made timely loan repayment difficult or impossible for many, leading to delinquency or default.'

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [13]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [14]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [15]:
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be problems with dealing with lenders or servicers, including issues like incorrect or bad information about loans, difficulty applying payments correctly, and disputes over fees or loan details. Specifically, complaints highlight issues such as:\n\n- Disputes over fees charged or incorrect billing\n- Trouble with how payments are applied, often in ways that are unfavorable to borrowers\n- Receiving inaccurate or incomplete information about loan balances or terms\n- Problems with loan repayment terms and understanding of interest calculations\n\nTherefore, the most common issue with loans, as reflected in the complaints, is difficulties in communication and trust with lenders or servicers, including mismanagement, misinformation, and predatory practices.'

In [16]:
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, it appears that several complaints were addressed with a response from the companies, and all responses indicate that they were handled in a timely manner. Specifically, the complaints received responses marked as "Yes" for "Timely response?" from the companies involved. \n\nHowever, there are complaints where the consumer expressed ongoing frustration, particularly regarding issues with account corrections and responses, but there is no indication that these unresolved issues were left unhandled entirely. \n\nTherefore, according to the provided data, no complaints were explicitly stated as not being handled in a timely manner.'

In [17]:
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People often fail to pay back their loans due to various issues such as miscommunication, improper management by lenders or servicers, and lack of clarity about their loan status. For example, some borrowers experienced automatic payments being canceled or not received due to errors or lack of proper notification, leading to late or missed payments. Others struggled because they were not informed about loan transfers, repayment suspensions, or changes in autopay arrangements, which resulted in their accounts being marked as overdue and negatively impacting their credit scores. Additionally, delayed or insufficient responses from loan servicing companies and difficulties in navigating payment plans or forbearance options contribute to repayment failures.'

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.

</div>

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

### Answer:

Since Best-Marchin 25 relies on the frequency of terms, then it's strength is in exact matching. In my queries, I gave specific terms which would aid in retrieval.

</div>

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [18]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [19]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [20]:
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be problems related to dealing with lenders or servicers, including errors in loan balances, misapplied payments, wrongful denials of payment plans, incorrect or inconsistent loan information, and improper handling of loan data. Many complaints involve incorrect or confusing information about loan balances, interest, and payment requirements, as well as issues with communication and documentation.\n\nIf I had to sum it up, a prevalent issue is **mismanagement and miscommunication by loan servicers**, leading to errors in account information, unpaid balances, and disputes over loan handling.'

In [21]:
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, yes, there are complaints that did not get handled in a timely manner. For example, one consumer\'s complaint regarding the status of their loan account has been open for over 18 months with no resolution, and they are still awaiting responses and account adjustments. Additionally, another complaint regarding the issue of payments not being applied to their student loan account was not resolved promptly. Although the company responses to these complaints were marked as "Closed with explanation" and responses were timely, the complaints themselves remained unresolved for an extended period, indicating that some issues were not addressed in a timely manner.'

In [22]:
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for several reasons indicated in the complaints:\n\n1. **Lack of Understanding and Miscommunication:** Borrowers were often unaware that they were required to repay their loans or did not receive clear information about the repayment obligations, interest accumulation, or loan terms. For example, one individual was unaware they had to repay their student aid until years later because they were not informed by financial aid officers.\n\n2. **Administrative Issues and Poor Communication:** Borrowers reported that loan servicers did not notify them when they bought out other lenders, failed to inform them about due dates, or did not provide proper notifications about payments or account status. This lack of communication made it difficult for borrowers to stay informed or manage their loans effectively.\n\n3. **Interest Accumulation and Loan Management Options:** Although options like forbearance or deferment were available, interest continued to acc

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [23]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [24]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [25]:
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issues with student loans, based on the complaints provided, tend to involve:\n\n- Mismanagement by servicers, including errors in loan balances and interest calculations.\n- Difficulty in obtaining accurate or clear information about loan terms, balances, or forgiveness options.\n- Problems with loan transfers and lack of proper notification.\n- Unauthorized or incorrect reporting to credit bureaus.\n- Challenges with repayment plans, such as being steered into forbearance with accruing interest, or inability to apply extra payments to principal.\n- Mishandling of data breaches or privacy violations.\n- Issues with loan forgiveness, cancellation, or discharge processes.\n\nIn summary, the most common issue appears to be **mismanagement and miscommunication by loan servicers**, leading to errors, lack of transparency, and difficulties for borrowers trying to manage or resolve their student loans.'

In [26]:
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, yes, some complaints indicate that complaints were not handled in a timely manner. Specifically:\n\n- One complaint (Complaint #12709087) about a delay in processing a graduated loan application, where the complaint response was "Not timely" and the complaint was unresolved for over 15 days.\n- Another complaint (Complaint #12739706) about failure to inform about delinquent loans, which was marked "Not timely" and had been unresolved for over 18 months.\n- Several complaints mention that the organizations failed to respond or resolve issues for extended periods, ranging from several months to over a year.\n- Additionally, some complaints note that the companies\' responses were "Closed with explanation" despite ongoing issues, indicating they may not have addressed the issues promptly.\n\nIn summary, multiple complaints suggest that certain issues were not handled in a timely manner by the organizations involved.'

In [27]:
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans primarily due to a combination of financial hardships, lack of clear information, and systemic issues with loan servicing. Specific reasons include:\n\n1. **Accruing interest and repayment difficulties:** Borrowers often had limited options besides forbearance or deferment, which allowed interest to continue accumulating, making loans harder to pay off over time.\n\n2. **Lack of alternative repayment options:** Many borrowers were not informed about income-driven repayment plans, loan forgiveness programs, or rehabilitation options that could have made repayment more manageable.\n\n3. **Systemic mismanagement by servicers:** Problems such as improper handling of accounts, unauthorized transfers, and failure to follow federal regulations led to inaccurate reporting, defaults, and credit score drops.\n\n4. **Lack of sufficient income or job stability:** Borrowers faced stagnant wages, unemployment, or job loss, making it impossible to keep up with p

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

</div>

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

### Answer:
- I think since reformulation can be done by paraphrasing, highlighting, or expounding the original user query, it provides a comprehensive coverage, and a deeper or more guided retrieval.
- And then since reformulated each n queries retrieves k number of related documents, the chances of finding related documents for quality context increases.
</div>

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [28]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [29]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [30]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [31]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [32]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [33]:
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be problems related to federal student loan servicing. Specific sub-issues include errors in loan balances, misapplied payments, wrongful denials of payment plans, and disputes over loan information such as interest rates and account status. Many complaints also involve misconduct from loan servicers, including errors, unfair practices, and issues with credit reporting and verification.\n\nIn summary, the most prevalent problem highlighted is misconduct or errors by loan servicers in managing federal student loans.'

In [34]:
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided context, several complaints, including the one received on 03/28/25 and others on 04/11/25, indicate that some complaints were not handled in a timely manner. Specifically, the complaint dated 03/28/25 mentions that the consumer was told it would take 15 days for someone to reach out, but as of the date of the complaint, no one had contacted them yet. Similarly, the complaints from 04/11/25 regarding various issues with Mohela were also marked as "No" for timely response, which suggests they were not handled promptly. \n\nTherefore, yes, there were complaints that did not get handled in a timely manner.'

In [35]:
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans mainly due to a lack of proper communication and understanding of their payment obligations, as well as financial hardship. In some cases, issues such as erroneous reporting, failure to notify borrowers about payment requirements, and the inability to access information about loan ownership or payment plans contributed to missed payments. Additionally, borrowers faced severe financial difficulties, such as inability to find employment or experiencing health issues, which made it difficult or impossible to keep up with loan repayments.'

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [36]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [37]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [38]:
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided data, the most common issues with student loans include:\n\n1. Dealing with lenders or servicers, often involving errors in loan balances, misapplied payments, or wrongful denials of payment plans.\n2. Problems with how payments are handled, such as restrictions on applying extra payments to principal or issues with payment application methods.\n3. Discrepancies in reported loan status or balances, including inaccurate credit reporting and account status errors.\n4. Bad information or misinformation about loan terms, interest, or repayment options.\n5. Problems with loan management related to transfers, legal disputes, or improper documentation.\n6. Struggles with repayment, including issues like interest accumulation, difficulty with repayment plans or forgiveness programs, and complications caused by loan servicing practices.\n7. Privacy violations and improper handling of personal data during loan processes or transfers.\n\nWhile multiple issues are frequent, 

In [39]:
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Yes, based on the provided complaints, some complaints did not get handled in a timely manner. For example, the complaint with ID \'12739706\' from MOHELA filed on 04/01/25 was marked as "Timely response?": "No," indicating it was not responded to in a timely manner. Additionally, several other complaints with statuses such as "Closed with explanation" or "Non-response" suggest delays or lack of timely handling.'

In [40]:
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content



## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [41]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [42]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [43]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [44]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [45]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [46]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the information provided, a common issue with loans appears to be problems related to loan servicing and administrative errors. Specific frequent issues include:\n\n- Struggling to repay loans or problems with forgiveness, cancellation, or discharge.\n- Improper reporting or use of credit reports.\n- Issues with loan payments, such as auto-debit problems, incorrect payment amounts, or delays in re-amortization after forbearance.\n- Discrepancies or errors in loan account statuses, such as mistakenly being reported in default or delinquency despite never being in such status.\n- Difficulties in obtaining clear information about loan balances, servicer changes, or loan terms.\n- Unauthorized access or breaches related to personal and financial information.\n\nOverall, the most common issue appears to be administrative mistakes or miscommunications by loan servicers, leading to hardships in repayment, incorrect reporting, and challenges in managing loan accounts.'

In [47]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, it appears that most complaints received timely responses, but many involve issues with payments, account corrections, or disputes that may take time to resolve. \n\nSpecifically, one complaint (row 14) indicates that the complaint was handled with a "Closed with explanation" response and a "Yes" for timely response. The other complaints also received responses within the expected time frames, but some involve ongoing disputes or complicated issues like breaches of contract or unlawful data reporting, which may not have been fully resolved yet.\n\nTherefore, from the information given, it does not clearly indicate that any complaints were not handled in a timely manner. However, some complaints involve complex issues that might still be unresolved, but the data provided does not specify delays or failures to respond in time.\n\n**In conclusion:** No, there is no explicit evidence in the provided data that any complaints were not handled in a timely ma

In [48]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including difficulties with loan management, issues with documentation and verification, disputes over the legitimacy of debts, problems with the reporting and handling of their loans, and errors or delays caused by loan servicers. Some specific issues highlighted include lack of clarity and transparency from lenders, poor communication, delays in re-amortizing payments after forbearance, and potentially illegal or improper reporting of debts. Additionally, some borrowers faced challenges due to alleged mismanagement or breaches of privacy and contract law by loan servicers, which further complicated their ability to repay or resolve their loan issues.'

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

</div>

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

### Answer:

Its behavior would be generating small chunks and the produced embedded vectors are similar to each other. As a result, a retrieval would return repetitive unhelpful documents.

If I were to adjust anything, I would increase the breakpoint_threshold_amount and/or min_chunk_size, which are the parameters of SemanticChunker based on the given website. Other than that, if possible, combining each Q and A pair and adding metadata (e.g. tag, company) might help.

</div>

# 🤝 Breakout Room Part #2

In [None]:
import pandas as pd
import time
from datetime import datetime
from typing import List, Dict, Any
import numpy as np
import os
import getpass

# Import Ragas components
from ragas.testset import TestsetGenerator
from ragas import evaluate
from ragas.metrics import (
    context_precision,
    context_recall,
    answer_relevancy,
    faithfulness,
    answer_correctness
)

# Import LangSmith for tracking
from langchain.callbacks import LangChainTracer
from langsmith import Client

# Set up LangSmith tracking
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("Enter your LangChain API Key:")
os.environ["LANGCHAIN_PROJECT"] = "Retrievers Eval"
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

import os

print("API key found?" , bool(os.environ.get("LANGCHAIN_API_KEY")))
print("Tracing v2 enabled?" , os.environ.get("LANGCHAIN_TRACING_V2"))



API key found? True
Tracing v2 enabled? true


In [2]:
client = Client()

# Your existing imports and setup
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from operator import itemgetter

# Load your documents (using your existing code)
path = "bills/"
loader = DirectoryLoader(path, glob="*.pdf", loader_cls=PyMuPDFLoader)
docs = loader.load()

# Step 1: Create Golden Dataset (Enhanced version of your approach)
print("Step 1: Creating Golden Dataset...")

from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper

generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o-mini", temperature=0))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings(model="text-embedding-3-small"))

# Generate test dataset
generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)

# Use a subset for cost management
test_docs = docs[:15]
dataset = generator.generate_with_langchain_docs(
    test_docs, 
    testset_size=20,  # Increased for better evaluation
    with_debugging_logs=True
)

# Convert to DataFrame and save
test_df = dataset.to_pandas()
test_df.to_csv('bills_evaluation_dataset.csv', index=False)
print(f"Generated {len(test_df)} test cases")

# Step 2: Set up all retrievers (using your bills data)
print("\nStep 2: Setting up retrievers...")

from langchain_community.vectorstores import Qdrant
from langchain_community.retrievers import BM25Retriever
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain.retrievers import ParentDocumentRetriever, EnsembleRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models
from langchain_qdrant import QdrantVectorStore

# Initialize embeddings and chat model
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
chat_model = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Create base vectorstore
vectorstore = Qdrant.from_documents(
    test_docs,
    embeddings,
    location=":memory:",
    collection_name="BillsComplaints"
)

Step 1: Creating Golden Dataset...


Applying SummaryExtractor:   0%|          | 0/15 [00:00<?, ?it/s]

Applying CustomNodeFilter:   0%|          | 0/15 [00:00<?, ?it/s]

Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/45 [00:00<?, ?it/s]

Applying [CosineSimilarityBuilder, OverlapScoreBuilder]:   0%|          | 0/2 [00:00<?, ?it/s]

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/21 [00:00<?, ?it/s]

Generated 21 test cases

Step 2: Setting up retrievers...


In [4]:
# 1. Naive Retriever
naive_retriever = vectorstore.as_retriever(search_kwargs={"k": 10})

# 2. BM25 Retriever
bm25_retriever = BM25Retriever.from_documents(test_docs, k=10)

# 3. Contextual Compression Retriever (with Cohere Rerank)
compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, 
    base_retriever=naive_retriever
)

# 4. Multi-Query Retriever
multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, 
    llm=chat_model,
    include_original=True
)

# 5. Parent Document Retriever
client_parent = QdrantClient(location=":memory:")
client_parent.create_collection(
    collection_name="parent_docs",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_vectorstore = QdrantVectorStore(
    collection_name="parent_docs", 
    embedding=embeddings, 
    client=client_parent
)

store = InMemoryStore()
child_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)

parent_document_retriever = ParentDocumentRetriever(
    vectorstore=parent_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)
parent_document_retriever.add_documents(test_docs)

# 6. Ensemble Retriever
ensemble_retriever = EnsembleRetriever(
    retrievers=[naive_retriever, bm25_retriever],
    weights=[0.5, 0.5]
)



In [5]:
# Define retrievers to evaluate
retrievers = {
    "Naive": naive_retriever,
    "BM25": bm25_retriever,
    "Contextual_Compression": compression_retriever,
    "Multi_Query": multi_query_retriever,
    "Parent_Document": parent_document_retriever,
    "Ensemble": ensemble_retriever
}

# Step 3: Create RAG chains for each retriever
print("\nStep 3: Creating RAG chains...")

RAG_TEMPLATE = """You are a helpful assistant. Use the context provided below to answer the question accurately.

If you do not know the answer, or are unsure, say you don't know.

Question: {question}

Context: {context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

def create_rag_chain(retriever):
    """Create a RAG chain with the given retriever"""
    return (
        {"context": itemgetter("question") | retriever, "question": itemgetter("question")}
        | RunnablePassthrough.assign(context=itemgetter("context"))
        | {"response": rag_prompt | chat_model | StrOutputParser(), "context": itemgetter("context")}
    )

# Create chains for each retriever
rag_chains = {name: create_rag_chain(retriever) for name, retriever in retrievers.items()}

# Step 4: Evaluate each retriever
print("\nStep 4: Starting evaluation...")

def format_docs(docs):
    """Format documents for Ragas evaluation"""
    if isinstance(docs, list):
        return [doc.page_content if hasattr(doc, 'page_content') else str(doc) for doc in docs]
    return [str(docs)]

# Metrics to evaluate
metrics = [
    context_precision,
    context_recall,
    answer_relevancy,
    faithfulness,
    answer_correctness
]



Step 3: Creating RAG chains...

Step 4: Starting evaluation...


In [None]:
results = {}
cost_tracking = {}
latency_tracking = {}


for retriever_name, chain in rag_chains.items():
    print(f"\nEvaluating {retriever_name}...")
    
    # Track costs and latency
    start_time = time.time()
    
    # Prepare data for evaluation
    questions = test_df['user_input'].tolist()
    ground_truths = test_df['reference'].tolist()
    
    # Generate answers and contexts
    answers = []
    contexts = []

    
    # And modify each chain invocation to include project info:
    for question in questions:
        try:
            result = chain.invoke(
                {"question": question}, 
                metadata={"revision_id": "default_chain_init","project_name": f"ragas-{retriever_name}"},
            )
            answers.append(result["response"])
            contexts.append(format_docs(result["context"]))
        except Exception as e:
            print(f"Error processing question: {e}")
            answers.append("Error generating answer")
            contexts.append(["Error retrieving context"])
    
    # Calculate latency
    end_time = time.time()
    avg_latency = (end_time - start_time) / len(questions)
    latency_tracking[retriever_name] = avg_latency
    
    # Create evaluation dataset
    eval_data = {
        "question": questions,
        "answer": answers,
        "contexts": contexts,
        "ground_truth": ground_truths
    }

# Run Ragas evaluation
    try:
        from datasets import Dataset
        
        # Convert to Ragas dataset format
        ragas_dataset = Dataset.from_dict(eval_data)
        
        ragas_result = evaluate(
            dataset=ragas_dataset,  # Now using HuggingFace Dataset
            metrics=metrics,
            llm=generator_llm,
            embeddings=generator_embeddings
        )
        
        results[retriever_name] = {
            "scores": ragas_result,
            "avg_latency": avg_latency
        }
        
        print(f"{retriever_name} - Average Latency: {avg_latency:.2f}s")
        print(f"{retriever_name} - Scores: {ragas_result}")
        
    except Exception as e:
        print(f"Error evaluating {retriever_name}: {e}")
        results[retriever_name] = {"error": str(e), "avg_latency": avg_latency}



Evaluating Naive...


Evaluating:   0%|          | 0/105 [00:00<?, ?it/s]

KeyboardInterrupt: 

Exception raised in Job[35]: AttributeError('NoneType' object has no attribute 'generate')
Exception raised in Job[52]: AssertionError(LLM is not set)
Exception raised in Job[53]: AssertionError(LLM is not set)
Exception raised in Job[30]: AttributeError('NoneType' object has no attribute 'generate')
Exception raised in Job[40]: AttributeError('NoneType' object has no attribute 'generate')
Exception raised in Job[45]: AttributeError('NoneType' object has no attribute 'generate')
Exception raised in Job[50]: AttributeError('NoneType' object has no attribute 'generate')
Exception raised in Job[48]: AssertionError(llm must be set to compute score)
Exception raised in Job[54]: AssertionError(LLM must be set)
Exception raised in Job[55]: AssertionError(LLM is not set)
Exception raised in Job[56]: AssertionError(set LLM before use)
Exception raised in Job[57]: AssertionError(LLM is not set)
Exception raised in Job[58]: AssertionError(LLM is not set)
Exception raised in Job[59]: AssertionErro

In [None]:
# Step 5: Compile results and analysis
print("\n" + "="*50)
print("EVALUATION RESULTS")
print("="*50)

# Create comparison DataFrame
comparison_data = []
# Replace the results compilation section with this:
for retriever_name, result in results.items():
    if "error" not in result:
        scores = result["scores"]
        
        # Try multiple ways to access the scores
        score_dict = {}
        
        # Method 1: Direct attribute access
        for metric_name in ["context_precision", "context_recall", "answer_relevancy", "faithfulness", "answer_correctness"]:
            try:
                score_dict[metric_name] = getattr(scores, metric_name, None)
            except:
                pass
        
        # Method 2: If it's a dict-like object
        if hasattr(scores, 'items'):
            score_dict.update(dict(scores.items()))
        
        # Method 3: If it has to_pandas method
        if hasattr(scores, 'to_pandas'):
            pandas_scores = scores.to_pandas()
            if not pandas_scores.empty:
                score_dict.update(pandas_scores.iloc[0].to_dict())
        
        print(f"{retriever_name} extracted scores: {score_dict}")
        
        row = {
            "Retriever": retriever_name,
            "Context Precision": score_dict.get("context_precision", "N/A"),
            "Context Recall": score_dict.get("context_recall", "N/A"),
            "Answer Relevancy": score_dict.get("answer_relevancy", "N/A"),
            "Faithfulness": score_dict.get("faithfulness", "N/A"),
            "Answer Correctness": score_dict.get("answer_correctness", "N/A"),
            "Avg Latency (s)": result["avg_latency"]
        }
        comparison_data.append(row)

comparison_df = pd.DataFrame(comparison_data)
print(comparison_df.to_string(index=False))

# Save results
comparison_df.to_csv('retriever_evaluation_results.csv', index=False)

# Analysis and Recommendations
print("\n" + "="*50)
print("ANALYSIS & RECOMMENDATIONS")
print("="*50)

# Cost Analysis (based on model usage)
print("\n📊 COST ANALYSIS:")
print("- Naive, BM25, Parent Document: Low cost (basic embedding + LLM)")
print("- Contextual Compression: Medium-High cost (additional Cohere Rerank API)")
print("- Multi-Query: Medium cost (additional LLM calls for query generation)")
print("- Ensemble: Medium cost (combines multiple retrieval methods)")

# Latency Analysis
print("\n⏱️ LATENCY ANALYSIS:")
latency_sorted = sorted(latency_tracking.items(), key=lambda x: x[1])
for retriever, latency in latency_sorted:
    print(f"- {retriever}: {latency:.2f}s")

# Performance Analysis
if comparison_data:
    print("\n🎯 PERFORMANCE ANALYSIS:")
    
    # Find best performer for each metric
    metrics_to_check = ["Context Precision", "Context Recall", "Answer Relevancy", "Faithfulness", "Answer Correctness"]
    
    for metric in metrics_to_check:
        if metric in comparison_df.columns:
            valid_scores = comparison_df[comparison_df[metric] != "N/A"][metric]
            if not valid_scores.empty:
                best_idx = valid_scores.astype(float).idxmax()
                best_retriever = comparison_df.loc[best_idx, "Retriever"]
                best_score = valid_scores.iloc[best_idx]
                print(f"- Best {metric}: {best_retriever} ({best_score})")

print("\n📝 FINAL RECOMMENDATION:")
print("""
Based on the evaluation results, here's the analysis:

1. **For Production Use**: Consider the balance of cost, latency, and performance
2. **For High Accuracy**: Contextual Compression often performs best but at higher cost
3. **For Fast Response**: Naive or BM25 retrievers offer good speed-performance trade-offs
4. **For Comprehensive Coverage**: Multi-Query and Ensemble methods provide broader context

Choose based on your specific requirements for accuracy vs. speed vs. cost.
""")

# Upload traces to LangSmith
print(f"\n🔗 Evaluation traces have been uploaded to LangSmith project: ragas-evaluation-*")
print("Check your LangSmith dashboard for detailed cost and latency metrics.")

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against each other. 
You can use the loans or bills dataset.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

</div>

#### Task 1: Retriever Evaluation with Ragas

#### All Retriever Evaluation with Ragas

##### HINTS:

- LangSmith provides detailed information about latency and cost.

<div style="background-color: #204B8E; color: white; padding: 10px; border-radius: 5px;">

### Analysis & Observations:


</div>