# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [1]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [2]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [3]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [4]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [5]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [6]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [7]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [8]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [9]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [10]:
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be dealing with the mishandling and mismanagement of the loans by lenders or servicers. This includes errors in loan balances, misapplied payments, wrongful denials of payment plans, unnotified transfers of loan accounts, inaccurate reporting of account status, and improper handling of loan data. Many complaints also highlight issues with incorrect information on credit reports, difficulties in applying payments correctly, and challenges in obtaining clear and accurate information regarding their loans.\n\nIn summary, a prevalent issue is the mishandling of loans by servicers, leading to errors, misinformation, and financial hardship for borrowers.'

In [11]:
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, it appears that some complaints did not get handled in a timely manner. Specifically, one complaint (Complaint ID: 12709087) submitted to MOHELA on 03/28/25 was marked with "Timely response?": "No," indicating it was not addressed promptly. The complaint involved delays in processing a loan application, with the consumer noting that they had not heard from the company despite previous assurances, and that the issue remained unresolved despite being ongoing for months.\n\nAdditionally, multiple other complaints involve delays or lack of response, such as a complaint to Maximus Federal Services on 04/24/25, which was marked as "Timely response?": "Yes," but still involved ongoing issues and delays in correcting loan information, suggesting some resolutions after delays.\n\nIn summary, at least some complaints, notably the one to MOHELA, were not handled in a timely manner.'

In [12]:
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to issues such as lack of clear communication from loan servicers about loan status and repayment requirements, unexpected resumption of payments without proper notification, and difficulties in managing or understanding complex interest accumulation and account information. Additionally, some borrowers faced hardships like financial instability, stagnant wages, or increased interest that made repayments unmanageable, especially when options like forbearance or deferment led to ongoing interest accrual that extended the repayment period and increased total debt. In some cases, borrowers were not adequately informed about changes in their loan servicing or transfer of their loans between different companies, which contributed to missed payments and credit impact. Overall, these factors—mismanagement, poor communication, and financial hardship—contributed to failure to repay loans.'

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [13]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [14]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [15]:
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be dealing with the lender or servicer, particularly related to misunderstandings or disputes about fees, payments, interest calculations, and loan information. Many complaints highlight problems such as incorrect fees, difficulty in applying payments correctly, incorrect or confusing loan data, and inadequate or unhelpful responses from the companies managing the loans.'

In [16]:
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the information provided, all the complaints listed in the context indicate that the companies responded with a public response of "Closed with explanation" and are marked as "Timely response?": "Yes." This suggests that the complaints were handled in a timely manner according to their records. \n\nTherefore, no complaints appear to have been left unhandled or not handled in a timely manner.'

In [17]:
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for various reasons, including issues with their payment plans, mismanagement or misdirection by lenders, and poor communication. Some specific causes highlighted in the complaints include:\n\n- Being steered into the wrong types of forbearances or unable to access the appropriate repayment options.\n- Lack of response or acknowledgment from loan servicers after applying for forbearance or deferment.\n- Automatic payment systems not functioning correctly, leading to missed or reversed payments, and a lack of clear communication about payment status.\n- Being unaware of transferring or changes in loan management, such as loans being transferred to new servicers without proper notification.\n- Errors or deceptive practices by loan servicers that result in increased loan balances, negative reporting to credit bureaus, and unexpected bills.\n- Poor customer service and failure to respond to borrower inquiries or concerns, which hampers borrowers' abil

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.

#### ✅ Answer

BM25 is likely to perform better than embeddings when searching for specific keywords or phrases that appear verbatim in the complaints. For example, when searching for "FASFA deadline", BM25 might give higher weight to documents that contain this exact phrase. BM25 is based on FREQUENCY not SEMANTICS. I found this viedo to be very helpful to explain the difference: https://www.youtube.com/watch?v=3FbJOKhLv9M.



## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [18]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [19]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [20]:
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issues with loans, particularly student loans, involve errors and mismanagement related to the handling of loans by servicers. These issues include receiving bad information about loans, errors in loan balances, misapplied payments, wrongful denials of payment plans, and mishandling of loan data. Additionally, complaints often involve lack of communication, incorrect or inconsistent loan information, unauthorized transfers, privacy violations, and administrative errors.'

In [21]:
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, some complaints did not get handled in a timely manner. For example, one complaint about student loan issues with Maximus Federal Services has been open for over a year with no resolution, and the complainant is still awaiting a response. Additionally, a prior complaint regarding EdFinancial Services involved ongoing issues with no resolution within a few weeks to months despite multiple follow-ups. \n\nHowever, for the complaint about EdFinancial Services regarding the unmapped payments, the response indicates that it was closed with an explanation, and the response was timely.\n\nIn summary, at least some complaints, such as the case with Maximus Federal Services, appeared to not be handled in a timely manner.'

In [22]:
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans mainly due to lack of awareness, misinformation, and communication issues. Some borrowers were unaware that they needed to repay their loans after receiving financial aid, often because financial aid officers did not inform them about repayment requirements. Others faced difficulties because their loan information was transferred without their knowledge or consent, leading to confusion about payment obligations. Additionally, many borrowers encountered problems with their loan servicers, such as incorrect or inconsistent account information, inability to access their online accounts, and inadequate notifications about payments or due dates. \n\nFurthermore, borrowers faced challenges related to the accumulation of interest during periods of forbearance or deferment, making it harder to pay off the loan over time. Some found that the available repayment options, like decreased monthly payments, caused their interest to grow faster than they could p

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [23]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [24]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [25]:
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided complaints and concerns, the most common issues with loans, particularly student loans, appear to be:\n\n- Trouble with how payments are being handled, including misapplied payments, cancellations, or delays in processing payments.\n- Errors in loan balances, interest calculations, and inaccurate reporting of account status to credit bureaus.\n- Lack of proper communication or notification from servicers regarding loan status, default, or transfer of loan ownership.\n- Unauthorized access or transfer of loans without proper notice.\n- Problems with loan forgiveness, cancellation, or discharge not being processed correctly.\n- Disputes over incorrect or outdated information on credit reports.\n- Harassment or repeated calls without resolution.\n\nOverall, a key and recurring theme is **mismanagement and communication issues by loan servicers**, leading to errors in account status, incorrect reporting, and financial hardship for borrowers. This includes failure to 

In [26]:
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the information provided in the context, yes, some complaints were not handled in a timely manner. For example, one complaint filed on 04/18/25 with EdFinancial Services was responded to "Closed with explanation" and marked as "Timely response?": Yes, but the complainant indicated ongoing issues with delayed or unaddressed concerns spanning over months, including unrecorded payments and failure to respond adequately. \n\nAdditionally, a complaint against Mohela reported delays in processing documentation and continued reporting of delinquency despite proof of payments, with wait times for customer service exceeding 3 hours and messages left without reply. Another complaint regarding a dispute settlement with Nelnet noted that over 30 days had passed without response.\n\nFurthermore, there are multiple complaints where the issue persisted for months or over a year without proper resolution or acknowledgment, indicating that some complaints did not get handled promptly.\n\nIn s

In [27]:
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans mainly because of financial hardships, mismanagement, and lack of clear information about repayment options. Many borrowers faced difficulties such as not qualifying for forgiveness programs, experiencing unexpected interest accumulation due to forbearance or deferment, or being misled about repayment obligations. Additionally, systemic issues like poor communication from lenders and servicers, incorrect reporting of account status, and obstacles in managing payments contributed to their inability to repay loans. In some cases, borrowers also encountered longstanding complex procedures, unreliable customer support, or were steered into arrangements like long-term forbearance that increased their debt over time, making repayment unmanageable.\n\nIf you need more specific details or examples from the complaints, I can provide that as well.'

#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

##### ✅ Answer:
Generating multiple reformulations of a user query improves recall in a similar way to how image generators expand user prompts. Just like how an image generator might take a simple prompt like "dog" and expand it to "a golden retriever puppy sitting in a grassy field on a sunny day" to get more specific and relevant results, the MultiQueryRetriever takes a basic query and generates multiple variations that capture different aspects and phrasings of the same information need.

This improves recall because:
1. Different phrasings may match different relevant documents that use varying terminology
2. Multiple queries explore different semantic angles of the same question
3. By casting a wider net with related queries, we're more likely to catch relevant documents that might be missed by a single query
4. The LLM can add helpful context and specifications, just like how image generators flesh out scene details

For example, a query about "loan issues" might be expanded to:
"What are common problems people face with their loan servicers?"
"What types of complaints do borrowers report about loan payments?"
"What difficulties do people encounter when dealing with student loan companies?"
Each variation increases the chance of matching relevant documents in the corpus.



## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [28]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [29]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [30]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [31]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [32]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [33]:
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be related to misconduct by loan servicers, including errors in loan balances, misapplied payments, wrongful denials of payment plans, and issues with loan reporting and legitimacy. Many complaints involve inaccurate information on credit reports, unexpected interest rate increases, and difficulties in managing or understanding loan accounts.'

In [34]:
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, it appears that at least one complaint was not handled in a timely manner. Specifically, the complaint lodged with MOHELA regarding the unprocessed graduate loan application (Complaint ID: 12709087) indicates that the consumer was told it would take 15 days for someone to reach out, but as of the date of the complaint, no one had contacted them. Additionally, similar complaints about delays and unresolved issues with Mohela suggest that timely handling was not achieved. \n\nTherefore, yes, there were complaints that did not get handled in a timely manner.'

In [35]:
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to various issues such as being unaware of payment obligations, experiencing severe financial hardship, and facing challenges related to loan mismanagement. For example, some individuals had their payments incorrectly resumed before the end of the grace period without proper notification, or they were not informed about when payments were due. Others faced difficulties because their educational institution misrepresented the value of their degree, closed unexpectedly, and failed to provide adequate financial guidance, leading to long-term financial hardship and inability to repay loans. Additionally, issues like unverified debts, legal complications, and poor communication from loan servicers have also contributed to difficulties in repayment.'

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [36]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [37]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [38]:
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

"The most common issue with loans, based on the provided complaints, appears to be dealing with the lender or servicer, specifically problems related to inaccurate or misleading information, errors in loan balances, misapplied payments, wrongful denials of repayment or forgiveness, and poor communication or lack of transparency from the loan servicers. These issues frequently lead to negative impacts on borrowers' credit reports, scores, and financial well-being."

In [39]:
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Yes, some complaints were not handled in a timely manner. For example, one complaint (Complaint ID: 12935889) filed on 04/11/25 by Mohela had a response marked as "No" for being timely, indicating it was handled past the expected timeframe. Additionally, several other complaints, such as those involving unresolved issues with student loan servicing and credit reporting errors, show delays in responses or resolutions despite ongoing follow-up efforts.'

In [40]:
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans primarily due to several interconnected reasons highlighted in the complaints:\n\n1. **Complex and Opaque Repayment Options:** Borrowers often were not adequately informed about available repayment plans like income-driven repayment, rehabilitation, or forgiveness programs, leading them to rely on forbearance or deferment, which often resulted in accruing interest and growing balances.\n\n2. **Mismanagement and Lack of Communication:** Many borrowers reported that loan servicers failed to notify them of important changes, such as transfer of loan accounts, upcoming payments, or delinquency notices—sometimes delivering notices via email or mail they did not receive—leading to unintentional delinquencies.\n\n3. **Interest Accumulation and Compounding:** A common theme was the escalation of debt due to unanticipated interest, especially when loans were placed into long-term forbearance, which allowed interest to accrue and capitalize, increasing the 

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [41]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [42]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [43]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [44]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [45]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [46]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided complaints, a common issue with loans appears to be problems related to loan servicing and reporting, including:\n\n- Struggling to repay or issues with repayment plans.\n- Errors in loan account status, such as loans being reported in default without proper cause.\n- Disputes over loan information, including incorrect reporting, account status, or unauthorized disclosure.\n- Difficulties in communication with servicers, delays, or lack of transparency.\n- Problems with accessing or understanding loan details and documentation.\n\nWhile these complaints highlight various specific issues, a recurring theme is dissatisfaction with loan servicing processes, errors in reporting or account status, and communication problems. \n\nTherefore, the most common issues seem to involve **errors or disputes related to loan reporting, status, and servicing communications**.'

In [47]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, all of the complaints indicate that the issues were addressed in a timely manner. Specifically, each complaint shows a response status of "Closed with explanation," and the responses are marked as "Yes" for being timely. Therefore, there do not appear to be any complaints that were not handled in a timely manner.'

In [48]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People often failed to pay back their loans due to a variety of issues, including:\n\n1. Miscommunication or lack of transparency from loan servicers, leading borrowers to be unsure about their loan status or repayment obligations.\n2. Problems with the handling and verification of documentation, which can delay or complicate forgiveness or discharge processes.\n3. Disputes over the legitimacy or accuracy of their debt, including incorrect reporting or records showing false defaults.\n4. Errors or delays in re-amortization after forbearance periods, resulting in higher payments than expected.\n5. technical or administrative issues, such as payments not being processed correctly despite being made.\n6. Allegations of improper or illegal collection and reporting practices, which can cause borrowers to believe they are ineligible to repay or have their debts invalidated.\n7. Personal or financial difficulties, though these are less explicitly mentioned in the provided complaints, can als

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

##### ✅ Answer:
With short, repetitive sentences like FAQs, semantic chunking may face several challenges:

1. Overlapping Embeddings: Similar/repetitive sentences will have very similar embeddings, making it harder to distinguish between chunks and potentially leading to redundant retrievals

2. Loss of Context: Short sentences may lack sufficient context for the embedding model to capture meaningful semantic relationships

3. Inefficient Chunking: The algorithm may create many small chunks that don't effectively group related content

To address these issues, you could:

Increase chunk size to capture more context around each FAQ
Use custom chunking rules that keep question-answer pairs together
Add metadata/tags to help distinguish between similar FAQs
Consider alternative chunking strategies like combining related FAQs into topical groups
Use hybrid retrieval approaches that combine semantic and keyword matching


# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

In [49]:
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders import PyMuPDFLoader


path = "data/"
loader = DirectoryLoader(path, glob="*.pdf", loader_cls=PyMuPDFLoader)
docs = loader.load()

In [50]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

In [51]:
from ragas.testset import TestsetGenerator

generator = TestsetGenerator(
    llm=generator_llm, embedding_model=generator_embeddings)
dataset = generator.generate_with_langchain_docs(docs[:20], testset_size=10)

Applying HeadlinesExtractor:   0%|          | 0/17 [00:00<?, ?it/s]

Applying HeadlineSplitter:   0%|          | 0/20 [00:00<?, ?it/s]

unable to apply transformation: 'headlines' property not found in this node
unable to apply transformation: 'headlines' property not found in this node
unable to apply transformation: 'headlines' property not found in this node


Applying SummaryExtractor:   0%|          | 0/31 [00:00<?, ?it/s]

Property 'summary' already exists in node 'f85bbf'. Skipping!
Property 'summary' already exists in node '418762'. Skipping!
Property 'summary' already exists in node '9a591f'. Skipping!
Property 'summary' already exists in node 'c77b54'. Skipping!
Property 'summary' already exists in node '2e2d9f'. Skipping!
Property 'summary' already exists in node '650612'. Skipping!
Property 'summary' already exists in node 'ae87a8'. Skipping!
Property 'summary' already exists in node 'b678f1'. Skipping!
Property 'summary' already exists in node '7c757a'. Skipping!
Property 'summary' already exists in node '309804'. Skipping!
Property 'summary' already exists in node '043daf'. Skipping!
Property 'summary' already exists in node 'c04c06'. Skipping!
Property 'summary' already exists in node '73e646'. Skipping!
Property 'summary' already exists in node '5f8aa8'. Skipping!


Applying CustomNodeFilter:   0%|          | 0/6 [00:00<?, ?it/s]

Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/41 [00:00<?, ?it/s]

Property 'summary_embedding' already exists in node '9a591f'. Skipping!
Property 'summary_embedding' already exists in node '73e646'. Skipping!
Property 'summary_embedding' already exists in node 'c04c06'. Skipping!
Property 'summary_embedding' already exists in node 'ae87a8'. Skipping!
Property 'summary_embedding' already exists in node '5f8aa8'. Skipping!
Property 'summary_embedding' already exists in node '043daf'. Skipping!
Property 'summary_embedding' already exists in node '650612'. Skipping!
Property 'summary_embedding' already exists in node '418762'. Skipping!
Property 'summary_embedding' already exists in node 'b678f1'. Skipping!
Property 'summary_embedding' already exists in node '7c757a'. Skipping!
Property 'summary_embedding' already exists in node '2e2d9f'. Skipping!
Property 'summary_embedding' already exists in node 'f85bbf'. Skipping!
Property 'summary_embedding' already exists in node '309804'. Skipping!
Property 'summary_embedding' already exists in node 'c77b54'. Sk

Applying [CosineSimilarityBuilder, OverlapScoreBuilder]:   0%|          | 0/2 [00:00<?, ?it/s]

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/12 [00:00<?, ?it/s]

In [52]:
dataset.to_pandas().head()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,What is BBAY 2 in the context of monitoring Di...,"[non-term (includes clock-hour calendars), or ...","BBAY 2 is one of the options, along with Sched...",single_hop_specifc_query_synthesizer
1,How do skool districs affect the includsion of...,[Inclusion of Clinical Work in a Standard Term...,School districts may affect the inclusion of c...,single_hop_specifc_query_synthesizer
2,Is the Federal Work-Study program subject to p...,[Non-Term Characteristics A program that measu...,The payment period is applicable to all Title ...,single_hop_specifc_query_synthesizer
3,How Direct Loan work if student finish more ho...,[both the credit or clock hours and the weeks ...,If a student in a clock-hour or non-term credi...,single_hop_specifc_query_synthesizer
4,How do the disbursement requirements for feder...,[<1-hop>\n\nboth the credit or clock hours and...,In clock-hour or non-term credit-hour programs...,multi_hop_abstract_query_synthesizer


In [53]:
# Import required libraries for evaluation
from ragas.metrics import ContextPrecision, ContextRecall, ContextRelevance
from ragas import evaluate
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from ragas.dataset_schema import SingleTurnSample, EvaluationDataset
import pandas as pd
import time
from langsmith import traceable
from datetime import datetime
import os
import getpass

In [74]:
# # Get LangSmith API key
# langsmith_key = getpass.getpass("LangSmith API Key:")
os.environ["LANGSMITH_API_KEY"] = "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"

# Enable LangSmith tracing
os.environ["LANGSMITH_TRACING"] = "true"
#os.environ["LANGSMITH_PROJECT"] = "retriever-evaluation-max"
os.environ["LANGSMITH_ENDPOINT"] = "https://api.smith.langchain.com"

print("✅ LangSmith tracing enabled!")
#print(f"📊 Project: {os.environ['LANGSMITH_PROJECT']}")
print("🔗 Visit https://smith.langchain.com to view your traces")

✅ LangSmith tracing enabled!
🔗 Visit https://smith.langchain.com to view your traces


In [None]:
# Create LLM with rate limit handling
from langchain.chat_models import ChatOpenAI
from langchain.schema.output_parser import BaseOutputParser
import time

class RateLimitRetryLLM(ChatOpenAI):
    def __init__(self, *args, max_retries=5, **kwargs):
        super().__init__(*args, **kwargs)
        self.max_retries = max_retries
    
    def invoke(self, *args, **kwargs):
        for attempt in range(self.max_retries):
            try:
                return super().invoke(*args, **kwargs)
            except Exception as e:
                if "rate_limit" in str(e).lower() and attempt < self.max_retries - 1:
                    wait_time = 1 + attempt * 2  # Exponential backoff
                    print(f"\n⏳ Rate limit hit, waiting {wait_time}s before retry...")
                    time.sleep(wait_time)
                    continue
                raise

evaluator_llm = LangchainLLMWrapper(RateLimitRetryLLM(
    model="gpt-4.1-mini",
    max_retries=5,
    request_timeout=30
))
evaluator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

# Define RAGAS metrics for retriever evaluation
ragas_metrics = [
    ContextPrecision(llm=evaluator_llm),
    ContextRecall(llm=evaluator_llm), 
    ContextRelevance(llm=evaluator_llm)
]

print("RAGAS evaluator setup complete with rate limit handling!")

RAGAS evaluator setup complete!


In [56]:
# Convert the generated dataset to RAGAS format
test_df = dataset.to_pandas()

# Create evaluation samples from the test dataset
evaluation_samples = []
for idx, row in test_df.iterrows():
    sample = {
        'user_input': row['user_input'],
        'reference_contexts': row['reference_contexts'],
        'reference': row['reference']
    }
    evaluation_samples.append(sample)

print(f"Created {len(evaluation_samples)} evaluation samples")
print("Sample structure:", evaluation_samples[0].keys())

Created 12 evaluation samples
Sample structure: dict_keys(['user_input', 'reference_contexts', 'reference'])


In [57]:
evaluation_samples

[{'user_input': 'What is BBAY 2 in the context of monitoring Direct Loan annual loan limit progression?',
  'reference_contexts': ['non-term (includes clock-hour calendars), or subscription-based. In a standard term or nonstandard term academic calendar, a term is generally a period in which all classes are scheduled to begin and end within a set time frame, and academic progress is measured in credit hours. In a non-term academic calendar, classes do not begin and end within a set time frame, such as a term. Academic progress in a non-term program can be measured in either credit or clock hours. In some cases (as discussed below), a program with terms must be treated as a non-term program for Title IV purposes. A subscription-based academic calendar is used only by subscription-based programs. A subscription-based program is a term-based program in which the school charges a student for each term on a subscription basis with the expectation that the student will complete a specified n

In [58]:
retrievers_to_evaluate = {
    "naive": naive_retriever,
    "bm25": bm25_retriever, 
    "contextual_compression": compression_retriever,
    "multi_query": multi_query_retriever,
    "parent_document": parent_document_retriever,
    "ensemble": ensemble_retriever,
    "semantic": semantic_retriever
}

print(f"Will evaluate {len(retrievers_to_evaluate)} retrieval methods")

Will evaluate 7 retrieval methods


In [93]:
from langsmith.run_helpers import traceable

@traceable(run_type="chain", name="retriever_evaluation")
def evaluate_retriever(retriever, retriever_name, evaluation_samples):
    """Evaluate a single retriever using RAGAS metrics"""
    
    print(f"Evaluating {retriever_name}...")
    start_time = time.time()
    
    # Prepare evaluation data
    ragas_samples = []
    successful_samples = 0
    
    for sample in evaluation_samples:
        try:
            # Retrieve documents for this question
            retrieved_docs = retriever.invoke(
                sample['user_input'],
                config={"callbacks": None}  # Disable callbacks for individual retrievals
            )
            retrieved_contexts = [doc.page_content for doc in retrieved_docs]
            
            # Create RAGAS sample
            ragas_sample = SingleTurnSample(
                user_input=sample['user_input'],
                retrieved_contexts=retrieved_contexts,
                reference_contexts=sample['reference_contexts'],
                reference=sample['reference']
            )
            ragas_samples.append(ragas_sample)
            successful_samples += 1
            
        except Exception as e:
            print(f"Error processing sample for {retriever_name}: {e}")
            continue
    
    # Calculate timing
    end_time = time.time()
    avg_latency = (end_time - start_time) / len(evaluation_samples)
    
    # Evaluate with RAGAS
    if ragas_samples:
        eval_dataset = EvaluationDataset(samples=ragas_samples)
        try:
            print("Running RAGAS evaluation...")
            # RAGAS evaluate() returns an EvaluationResult object
            eval_results = evaluate(dataset=eval_dataset, metrics=ragas_metrics)
            
            # Extract metrics directly from the evaluation results
            metrics_dict = {}
            
            # Get the raw scores from the evaluation results
            try:
                # Convert evaluation results to a string and parse it
                results_str = str(eval_results)
                if '{' in results_str and '}' in results_str:
                    # Extract the dictionary part from the string
                    dict_str = results_str[results_str.find('{'): results_str.find('}') + 1]
                    # Clean up the string and evaluate it
                    dict_str = dict_str.replace("'", '"')  # Ensure proper JSON quotes
                    import json
                    results_dict = json.loads(dict_str)
                    
                    # Map the metrics to our names
                    if 'context_precision' in results_dict:
                        metrics_dict['ContextPrecision'] = float(results_dict['context_precision'])
                    if 'context_recall' in results_dict:
                        metrics_dict['ContextRecall'] = float(results_dict['context_recall'])
                    if 'nv_context_relevance' in results_dict:
                        metrics_dict['ContextRelevance'] = float(results_dict['nv_context_relevance'])
            except Exception as e:
                print(f"Debug - Error parsing results: {e}")
                print(f"Debug - Results string: {str(eval_results)}")
            
            print(f"Extracted metrics: {metrics_dict}")
            
            if not metrics_dict:
                print("❌ No metrics were extracted from the evaluation results")
                print(f"Evaluation results type: {type(eval_results)}")
                print(f"Evaluation results content: {eval_results}")
                return None
            
            # Store results in run metadata via the traceable decorator
            run = {
                'retriever_name': retriever_name,
                'metrics': metrics_dict,
                'avg_latency_seconds': avg_latency,
                'total_samples': len(evaluation_samples),
                'successful_samples': successful_samples
            }
            
            return run
            
        except Exception as e:
            print(f"❌ Error during evaluation: {e}")
            print(f"Error type: {type(e)}")
            print(f"Error details: {str(e)}")
            return None
    else:
        return None

print("Evaluation function defined with LangSmith integration!")

Evaluation function defined with LangSmith integration!


In [None]:
print("Starting retriever evaluation...")
print("="*60)

results = []
for retriever_name, retriever in retrievers_to_evaluate.items():
    max_retries = 3
    retry_count = 0
    
    while retry_count < max_retries:
        try:
            print(f"\nEvaluating {retriever_name} (attempt {retry_count + 1}/{max_retries})...")
            result = evaluate_retriever(retriever, retriever_name, evaluation_samples)
            if result:
                results.append(result)
                print(f"✅ {retriever_name} evaluation completed")
                break  # Success, move to next retriever
            else:
                print(f"⚠️ {retriever_name} evaluation returned no results")
                retry_count += 1
        except Exception as e:
            print(f"❌ Error evaluating {retriever_name}: {e}")
            if "rate_limit" in str(e).lower():
                print(f"⚠️ Rate limit hit, waiting 60 seconds before retry...")
                time.sleep(60)
                retry_count += 1
            else:
                break  # Non-rate-limit error, move to next retriever
    
    # Wait between retrievers to avoid rate limits
    if retry_count < max_retries:  # Only wait if we're continuing to next retriever
        print("Waiting 30 seconds before next retriever...")
        time.sleep(30)

print(f"\nCompleted evaluation of {len(results)} retrievers")

# Print results summary if we have any
if results:
    print("\nResults Summary:")
    print("="*60)
    for result in results:
        print(f"\n{result['retriever_name']}:")
        print("-" * len(result['retriever_name']))
        print(f"Successful samples: {result['successful_samples']}/{result['total_samples']}")
        print(f"Average latency: {result['avg_latency_seconds']:.2f} seconds")
        print("Metrics:")
        for metric_name, value in result['metrics'].items():
            print(f"  {metric_name}: {value:.4f}")

Starting retriever evaluation...

Evaluating naive (attempt 1/3)...
Evaluating naive...
Running RAGAS evaluation...


Evaluating:   0%|          | 0/36 [00:00<?, ?it/s]

An error occurred: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4.1-mini in organization org-j1CMbmsNyPI0ZBMA3XAQnXoH on tokens per min (TPM): Limit 200000, Used 199229, Requested 1993. Please try again in 366ms. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}. Skipping a sample by assigning it nan score.
An error occurred: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4.1-mini in organization org-j1CMbmsNyPI0ZBMA3XAQnXoH on tokens per min (TPM): Limit 200000, Used 200000, Requested 2005. Please try again in 601ms. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}. Skipping a sample by assigning it nan score.
An error occurred: Error code: 429 - {'error': {'message': 'Rate limit reached for gpt-4.1-mini in organization org-j1CMbmsNyPI0ZBMA3XAQnXoH on tokens per min (TPM)

Evaluating:   0%|          | 0/36 [00:00<?, ?it/s]

Extracted metrics: {'ContextPrecision': 0.0833, 'ContextRecall': 0.0, 'ContextRelevance': 0.0625}
✅ bm25 evaluation completed
Waiting 30 seconds before next retriever...
