# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [1]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [92]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [3]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [4]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [5]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [6]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [7]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [8]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [9]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [10]:
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided information, the most common issues with loans appear to be:\n\n- Errors and discrepancies in loan balances and account information.\n- Problems with loan servicing, such as misapplied payments, inability to make principal payments, or account transfers without notice.\n- Difficulty with repayment plans, including being steered into long-term forbearance or facing unexpected interest capitalization.\n- Issues with inaccurate or outdated information impacting credit reports.\n- Lack of clear communication or documentation from loan servicers.\n- Allegations of mismanagement, unethical practices, or violations of privacy laws.\n\nThe most frequent issue seems to revolve around mismanagement and errors in handling the loan account details, which leads to complications in repayment, inaccurate reporting, and customer frustration. \n\nIf you need a specific answer, it could be summarized as: \n\nThe most common issue with loans is **errors and mismanagement related to

In [11]:
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, yes, some complaints did not get handled in a timely manner. Specifically, the complaint filed with MOHELA on 03/28/25 was marked as "No" under "Timely response," indicating it was not responded to within the expected timeframe. Additionally, in the case of another complaint concerning Maximus Federal Services, Inc. (received on 04/05/25), the response was marked as "Yes," indicating timely handling there. However, the MOHELA complaint clearly was not handled promptly.'

In [12]:
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People often fail to pay back their loans due to a combination of factors such as lack of clear information from loan servicers, difficulties in managing accruing interest, limited or no access to income-driven repayment options, and unexpected administrative issues like mismanagement or transfers of loan accounts. In many cases, borrowers are unaware of the exact terms, due dates, or changes in their loan status, which can lead to missed payments or delinquency. Additionally, financial hardships, stagnant wages, or underemployment can make it impossible for borrowers to keep up with payments, especially when repayment options are limited or not properly communicated. Overall, systemic issues such as administrative errors, lack of transparency, and inadequate financial guidance contribute significantly to loan repayment failures.'

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [13]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [14]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [15]:
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided information, the most common issue with loans appears to be problems related to dealing with lenders or servicers, such as discrepancies or disputes over fees, payment application, loan information, and handling of loan terms. Specifically, complaints frequently mention issues like incorrect or bad information, difficulty in understanding or accessing loan details, and disputes over fees or the application of payments.'

In [16]:
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, all the complaints listed indicate that the companies responded and the responses were categorized as "Closed with explanation" and marked as "Timely response? Yes." This suggests that these complaints were handled in a timely manner. Therefore, there is no evidence from the data to suggest that any complaints did not get handled in a timely manner.'

In [17]:
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People often fail to pay back their loans due to a variety of issues, such as misunderstandings or mismanagement by loan servicers, improper communication, or challenges in adhering to payment plans. For example, some borrowers experience problems with their payment plans, having been steered into the wrong types of forbearances or having their autopayments inadvertently discontinued without proper notification. Others face technical issues, such as payments being reversed or not processed correctly due to errors on the part of the service provider, which can impact credit scores and create confusion. Additionally, there are cases where borrowers have applied for deferments or forbearances but did not receive timely responses or guidance, leading to missed payments. In some instances, bad communication or procedural errors by loan servicers and a lack of clear information can prevent borrowers from meeting their repayment obligations.'

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.
##### ✅ Answer:
'Why Aid Avantage not tell me who they is and why they put my loans in default when I never was late?'

From the golden dataset in activity #1. I'm guessing embeddings did not funtion properly here since the query uses the wrong name for the company. So in the case where the stored documents may have the same "peculiarities" as the query, a semantic embedding may not recognize these as well as a comparison on matching words.


## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [112]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [113]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [114]:
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, it appears that a common issue with loans, particularly student loans, is dealing with errors and mismanagement by lenders or servicers. Specific frequent issues include errors in loan balances, misapplied payments, wrongful denials of payment plans, incorrect or conflicting account information, and inadequate communication or documentation. \n\nTherefore, the most common issue with loans is **mismanagement or errors in loan servicing, including inaccurate information, misapplied payments, and poor communication from lenders or servicers.**'

In [21]:
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, at least one complaint was not handled in a timely manner. Specifically, the complaint regarding the long-term issue with the student loan account and the lack of response for over a year indicates a delay in resolution. The report mentions that it has been nearly 18 months with no resolution, despite ongoing requests and follow-ups. Therefore, yes, some complaints did not get handled in a timely manner.'

In [22]:
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for several reasons highlighted in the complaints:\n\n1. Lack of Clear Communication and Awareness: Borrowers were often unaware that they had to repay their student loans, as they were not told by financial aid officers about the repayment obligations. Some also received bad or confusing information regarding their loan status and payment requirements.\n\n2. Transfer and Mismanagement by Servicers: Loans were transferred between different entities (e.g., from Great Lakes to NelNet) without the borrower's knowledge or consent, leading to confusion and difficulty in managing accounts and making payments.\n\n3. Issues with Payment Options and Accumulating Interest: The only options provided—such as forbearance or deferment—allowed interest to continue accumulating, which increased the total amount owed even when payments were being made. Lowering payments often resulted in interest compounding, making repayment harder over time.\n\n4. Technical and 

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [23]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [24]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [25]:
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided complaints and context, it appears that the most common issues with loans, particularly federal student loans, revolve around:\n\n- Mismanagement of loan balances and interest (e.g., unexpected increases, capitalization of interest, errors in balances)\n- Problems with how payments are handled and applied (e.g., payments being applied to interest only, inability to pay down principal)\n- Lack of proper documentation and verification (e.g., missing Master Promissory Notes, proof of loan validation)\n- Lack of clear communication and misinforming borrowers about repayment options or account status\n- Data and reporting errors affecting credit scores (e.g., incorrect late payments, incorrect account status)\n- Issues arising from loan transfers between servicers without proper notification\n- Violations of borrower rights regarding privacy, documentation, and dispute handling\n\nWhile these issues vary, the most common overarching theme is **mismanagement and lack o

In [26]:
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

"Based on the provided complaints, yes, some complaints indicate that issues were not handled in a timely manner. For example:\n\n- A complaint against MOHELA received on 04/01/25 noted that their response was 'No' to being timely, and it was marked as 'Timely response? No' with wait times exceeding 3 hours and no reply to multiple messages over several weeks.\n- Several complaints mention delays, such as applications or requests that took over a year to resolve or for which no response was received despite multiple follow-ups.\n- Other complaints, particularly against Maximus Federal Services, show delayed responses, with some issues still unresolved after many months and complaints marked as 'Closed with explanation' but citing ongoing problems.\n\nTherefore, yes, there were complaints that did not get handled in a timely manner."

In [27]:
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans primarily due to issues such as mismanagement and misconduct by loan servicers, including errors in loan balances, misapplied payments, wrongful denials of payment plans, and inadequate or lack of communication about repayment obligations. Some borrowers were steered into long-term forbearances instead of income-driven repayment or rehabilitation options, which led to interest accrual and ballooning balances. Others experienced systemic failures like improper reporting of delinquencies, failure to notify borrowers about repayment resumption, or technical issues with payment application, all of which contributed to missed payments and financial hardship. Additionally, systemic failures in record retention and inaccurate account reporting further hindered borrowers' ability to manage and repay their loans effectively."

#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.
##### ✅ Answer:
If by recall we are talking about the number of queries that were answered adhering to the context/(number of queries that were answered adhering to the context + number of queries that were not answered but should have been), then generating multiple reformulations of the query could help with retrieving the correct context to generate an answer because you have more chances and variation.


## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [28]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [29]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [30]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [31]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [32]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [33]:
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be related to errors or problems with loan servicing, such as incorrect information, misapplied payments, wrongful denials of payment plans, and discrepancies in loan balances and interest rates. These issues reflect systemic breakdowns and mismanagement by loan servicers, causing significant hardship for borrowers.'

In [34]:
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, several did not get handled in a timely manner. For example, complaints with IDs 12709087 and 12935889 were marked as "No" for timely response, indicating they were not responded to promptly. Additionally, the complaint with ID 13205525 about dispute settlements took over 30 days without a response, which also suggests it was not handled in a timely manner.'

In [35]:
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to a combination of factors such as financial hardship, lack of proper information, or mismanagement by the loan servicers. For example, some borrowers experienced severe financial difficulties after graduation and relied on deferrals or forbearance, which increased the interest owed. Others faced issues with miscommunication or lack of transparency from the lenders or loan servicers, such as not being notified of payment obligations or loan transfers. Additionally, some borrowers took out loans based on misrepresentations about their educational institutions, leading to long-term financial consequences and difficulty in repaying the loans.'

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [36]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [37]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [38]:
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided data, the most common issues with loans include:\n\n- Dealing with lenders or servicers and receiving bad information about the loan.\n- Problems with how payments are handled, such as restrictions on applying additional funds to principal or paying off loans more quickly.\n- Incorrect information on credit reports, such as misreported delinquency periods or account status.\n- Discrepancies and lack of transparency regarding loan balances, interest calculations, and loan terms.\n- Mishandling of loan applications, including errors in consolidations, forbearance, or application processing.\n- Problems related to loan discharges, cancellations, or misunderstandings about repayment plans.\n\nAmong these, a recurring theme is **misinformation, mismanagement, and lack of transparency** in loan handling and reporting. Errors like incorrect delinquency reporting, uninformative or misleading communication about loan status, and improper handling of application processes 

In [39]:
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Yes, based on the provided complaints, some complaints were not handled in a timely manner. Specifically, there are instances where the response was marked as "No" or "Delayed" — for example:\n\n- Complaint ID 12935889 (MOHELA, MD): Response was "No" indicating it was not timely.\n- Complaint ID 12654977 (MOHELA, MD): Response was "No."\n- Complaint ID 12779326 (Maximus/EdFinancial, GA): Response was "Yes," but the respondent still noted that the complaint was not addressed timely in the narrative.\n- Complaint ID 13056764 (EdFinancial, IN): Response was "Yes," but the narrative indicates ongoing issues with response delays and unresolved concerns.\n\nAdditionally, multiple complaints mention long wait times, delayed follow-ups, or failure to respond within expected time frames, indicating some complaints did not get handled promptly.'

In [40]:
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to issues such as:\n- Lack of clear or adequate communication from lenders or servicers about payment start dates, delinquency notices, or account status.\n- Difficulties in accessing or understanding loan information, leading to unintentional delinquency.\n- The accumulation of interest during forbearance or deferment, which increased the total amount owed and made repayment unmanageable.\n- Being incorrectly reported as delinquent or in default due to errors, mismanagement, or lack of proper notification.\n- Predatory practices like steering into long-term forbearances, misleading information about repayment options, and improper loan handling.\n- Financial hardships, such as unemployment, illness, or low income, making standard payments unfeasible, especially when compounded by mismanagement and lack of transparency.\n- Transfers between multiple servicers and poor record-keeping, leading to confusion and inaccurate credit reporti

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [41]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [42]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [43]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [44]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [45]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [46]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the complaints provided, appears to be problems related to "Dealing with your lender or servicer," such as:\n\n- Receiving incorrect or bad information about your loan (e.g., misreported account status, payment amounts, or loan details).\n- Issues with loan repayment plans and calculations, including incorrect payment amounts or delays in processing income-driven repayment applications.\n- Problems with auto-debit setups and unauthorized deductions.\n- Challenges with loan account management, such as being in default falsely, or difficulties in verifying where the loan is or its status.\n- Privacy breaches or illegal reporting related to student loans.\n\nOverall, many complaints highlight inadequate communication, misinformation, and mishandling by loan servicers, which often lead to financial and emotional distress for borrowers.\n\nIf you need a more specific summary or have other questions, please let me know!'

In [47]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Yes, according to the provided complaints, several complaints were handled in a timely manner as indicated by the "Timely response?" field noting "Yes" for each case where a response was given. The company responses generally state "Closed with explanation" and responses are marked "Yes" for timeliness. \n\nHowever, it is also noted that multiple complaints mention issues with handling or responses from the companies, sometimes involving lack of resolution or significant delays, but based on the specific data provided, the complaints that received a response were handled in a timely manner. \n\nTherefore, yes, some complaints did get handled in a timely manner.'

In [48]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People may fail to pay back their loans for various reasons, including issues with loan servicing, miscommunication, or disputes over the legitimacy or accuracy of the debt. For example, some borrowers face difficulties dealing with their lenders or servicers due to lack of transparency, bad information, or administrative delays. Others experience problems with payment processing or mismatched records, which can lead to missed payments. Additionally, disputes over the validity of the debt, such as claims that loans are unverified, legally void, or improperly reported, can also impact repayment. In some cases, borrowers are unable to continue payments due to financial hardship or because they believe their rights have been violated through breaches of privacy, contractual obligations, or legal protections. Overall, complications with servicing, documentation, and legal disputes contribute significantly to difficulties in repaying loans.'

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

##### ✅ Answer:
I would expect semantic chunking to potentially have trouble creating meaningful chunks with this sort of data. This could be mitigated by increasing the threshold.



# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

In [52]:
### YOUR CODE HERE

In [102]:
from uuid import uuid4

unique_id = uuid4().hex[0:8]

os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGCHAIN_TRACING_v2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com/"
os.environ["LANGSMITH_ENDPOINT"] = "https://api.smith.langchain.com/"

os.environ["LANGSMITH_PROJECT"] = f"Advanced_Retrieval_Assignment - {unique_id}"
os.environ["LANGSMITH_API_KEY"] = getpass.getpass('Enter your LangSmith API key: ')

In [104]:
from langchain_core.tracers import LangChainTracer

tracer= LangChainTracer(project_name=os.environ["LANGSMITH_PROJECT"])

In [49]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

  for match in re.finditer('{0}\s*'.format(re.escape(sent)), self.original_text):
  txt = re.sub('(?<={0})\.'.format(am), '∯', txt)
  txt = re.sub('(?<={0})\.'.format(am), '∯', txt)


In [95]:
from ragas.testset import TestsetGenerator

generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)
golden_dataset = generator.generate_with_langchain_docs(loan_complaint_data[:20], testset_size=15)

Applying SummaryExtractor:   0%|          | 0/14 [00:00<?, ?it/s]

Applying CustomNodeFilter:   0%|          | 0/20 [00:00<?, ?it/s]

Node ec221bfc-2415-4605-9931-10296055382d does not have a summary. Skipping filtering.
Node cf735d39-e8e8-46e2-91c1-7ca3f36f1600 does not have a summary. Skipping filtering.
Node 4fe63e18-80fc-439c-845d-566566625c51 does not have a summary. Skipping filtering.
Node fc6c2d6c-d200-45b3-b94d-67a74a4b52ea does not have a summary. Skipping filtering.
Node 113ac015-1f87-4f3b-aa6f-02d4ea8052a3 does not have a summary. Skipping filtering.
Node 6e7d3b61-04bc-4e56-a491-58b1d2564516 does not have a summary. Skipping filtering.


Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/54 [00:00<?, ?it/s]

Applying OverlapScoreBuilder:   0%|          | 0/1 [00:00<?, ?it/s]

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/16 [00:00<?, ?it/s]

In [96]:
golden_dataset.to_pandas()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,Why Nelnet wait so long to change my payment a...,[The federal student loan COVID-19 forbearance...,Payments on federal student loans serviced by ...,single_hop_specifc_query_synthesizer
1,Wut is the proccess for gettin the rite IBR pa...,[I submitted my annual Income-Driven Repayment...,If you submit your Income-Driven Repayment (ID...,single_hop_specifc_query_synthesizer
2,Howw did FERPA get violatted when my personal ...,[My personal and financial data was compromise...,My personal and financial data was compromised...,single_hop_specifc_query_synthesizer
3,Whyy is nelnet sayin my issuer is somewher els...,"[According to Studentaid.gov, Im to get an ema...","Studentaid.gov says that my issuer is nelnet, ...",single_hop_specifc_query_synthesizer
4,Wut problums hav peple had with federel loan p...,[Since the resumption of federal loan payments...,"Since the resumption of federal loan payments,...",single_hop_specifc_query_synthesizer
5,What Consumer Financial Protection Bureau say ...,[I am writing to formally dispute inaccurate i...,Recent findings by the Consumer Financial Prot...,single_hop_specifc_query_synthesizer
6,Why Aid Avantage not tell me who they is and w...,[I am devastated. I would like to report a sit...,I did not know who the servicer was until I re...,single_hop_specifc_query_synthesizer
7,Why Department of Education let DOGE team get ...,"[On XXXX XXXX XXXX, XXXX XXXX instructed his t...","On XXXX XXXX XXXX, XXXX XXXX told his Departme...",single_hop_specifc_query_synthesizer
8,why credit bureaus keep reportin student loans...,[<1-hop>\n\nIllegal Student Loan Reporting & C...,credit bureaus keep reportin student loans eve...,multi_hop_specific_query_synthesizer
9,What legal arguments are presented for demandi...,[<1-hop>\n\nXX/XX/XXXX I increased the amount ...,The legal arguments for demanding NelNet to ce...,multi_hop_specific_query_synthesizer


In [98]:
import copy
naive_dataset = copy.deepcopy(golden_dataset)
bm25_dataset = copy.deepcopy(golden_dataset)
mqr_dataset = copy.deepcopy(golden_dataset)
pdr_dataset = copy.deepcopy(golden_dataset)
ccr_dataset = copy.deepcopy(golden_dataset)
er_dataset = copy.deepcopy(golden_dataset)

In [106]:
from ragas import EvaluationDataset
from ragas import evaluate
from ragas.llms import LangchainLLMWrapper
from ragas.metrics import LLMContextRecall, ContextEntityRecall, NoiseSensitivity
from ragas import evaluate, RunConfig

def eval_retrieval(dataset_, graph, run_name):
  for test_row in dataset_:
    response = graph.invoke({"question" : test_row.eval_sample.user_input}, {"tags" : [run_name],"callbacks" : [tracer]})
    test_row.eval_sample.response = response["response"].content
    test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]
  
  evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1-mini"))
  evaluation_dataset = EvaluationDataset.from_pandas(dataset_.to_pandas())

  custom_run_config = RunConfig(timeout=360)

  result = evaluate(
    dataset=evaluation_dataset,
    metrics=[LLMContextRecall(), ContextEntityRecall(), NoiseSensitivity()],
    llm=evaluator_llm,
    run_config=custom_run_config
  )
  return result
  

In [107]:
naive_result = eval_retrieval(naive_dataset, naive_retrieval_chain, "naive_retriever")

Evaluating:   0%|          | 0/48 [00:00<?, ?it/s]

Exception raised in Job[2]: AttributeError('StringIO' object has no attribute 'statements')
Exception raised in Job[5]: TimeoutError()
Exception raised in Job[20]: TimeoutError()
Exception raised in Job[29]: TimeoutError()
Exception raised in Job[32]: TimeoutError()
Exception raised in Job[35]: TimeoutError()
Exception raised in Job[38]: TimeoutError()
Exception raised in Job[41]: TimeoutError()
Exception raised in Job[43]: TimeoutError()
Exception raised in Job[44]: TimeoutError()
Exception raised in Job[47]: TimeoutError()


In [101]:
naive_result

{'context_recall': 0.8719, 'context_entity_recall': 0.5152, 'noise_sensitivity_relevant': 0.0294}

In [108]:
bm25_result = eval_retrieval(bm25_dataset, bm25_retrieval_chain, "bm25_retriever")

Evaluating:   0%|          | 0/48 [00:00<?, ?it/s]

Exception raised in Job[10]: TimeoutError()
Exception raised in Job[31]: TimeoutError()


In [109]:
bm25_result

{'context_recall': 0.6604, 'context_entity_recall': 0.2604, 'noise_sensitivity_relevant': 0.2359}

In [116]:
cc_result = eval_retrieval(ccr_dataset, contextual_compression_retrieval_chain, "contextual_compression_retriever_rerun")
cc_result

Evaluating:   0%|          | 0/48 [00:00<?, ?it/s]

Exception raised in Job[35]: AttributeError('StringIO' object has no attribute 'statements')


{'context_recall': 0.7917, 'context_entity_recall': 0.5032, 'noise_sensitivity_relevant': 0.2215}

In [117]:
mqr_result = eval_retrieval(mqr_dataset, multi_query_retrieval_chain, "multi_query_retriever")
mqr_result

Evaluating:   0%|          | 0/48 [00:00<?, ?it/s]

Exception raised in Job[2]: TimeoutError()
Exception raised in Job[5]: TimeoutError()
Exception raised in Job[8]: TimeoutError()
Exception raised in Job[11]: TimeoutError()
Exception raised in Job[14]: TimeoutError()
Exception raised in Job[17]: TimeoutError()
Exception raised in Job[20]: TimeoutError()
Exception raised in Job[26]: TimeoutError()
Exception raised in Job[29]: TimeoutError()
Exception raised in Job[32]: TimeoutError()
Exception raised in Job[35]: TimeoutError()
Exception raised in Job[38]: TimeoutError()
Exception raised in Job[41]: TimeoutError()
Exception raised in Job[44]: TimeoutError()
Exception raised in Job[47]: TimeoutError()


{'context_recall': 0.9417, 'context_entity_recall': 0.4879, 'noise_sensitivity_relevant': 0.0769}

In [118]:
pdr_result = eval_retrieval(pdr_dataset, parent_document_retrieval_chain, "parent_document_retriever")
pdr_result

Evaluating:   0%|          | 0/48 [00:00<?, ?it/s]

Exception raised in Job[26]: AttributeError('StringIO' object has no attribute 'statements')
Exception raised in Job[35]: TimeoutError()


{'context_recall': 0.7625, 'context_entity_recall': 0.4191, 'noise_sensitivity_relevant': 0.3097}

In [119]:
er_result = eval_retrieval(er_dataset, ensemble_retrieval_chain, "ensemble_retriever")
er_result

Evaluating:   0%|          | 0/48 [00:00<?, ?it/s]

Exception raised in Job[26]: AttributeError('StringIO' object has no attribute 'statements')
Exception raised in Job[2]: TimeoutError()
Exception raised in Job[5]: TimeoutError()
Exception raised in Job[8]: TimeoutError()
Exception raised in Job[11]: TimeoutError()
Exception raised in Job[14]: TimeoutError()
Exception raised in Job[16]: TimeoutError()
Exception raised in Job[17]: TimeoutError()
Exception raised in Job[19]: TimeoutError()
Exception raised in Job[20]: TimeoutError()
Exception raised in Job[25]: TimeoutError()
Exception raised in Job[29]: TimeoutError()
Exception raised in Job[32]: TimeoutError()
Exception raised in Job[35]: TimeoutError()
Exception raised in Job[37]: TimeoutError()
Exception raised in Job[38]: TimeoutError()
Exception raised in Job[41]: TimeoutError()
Exception raised in Job[43]: TimeoutError()
Exception raised in Job[44]: TimeoutError()
Exception raised in Job[47]: TimeoutError()


{'context_recall': 0.9208, 'context_entity_recall': 0.5602, 'noise_sensitivity_relevant': 0.5556}

In [None]:
import matplotlib.pyplot as plt
import numpy as np

In [146]:
# Collect results into a list for easier processing
results = [
    ("Ensemble Retriever", er_result),
    ("Parent Document Retriever", pdr_result),
    ("Multi-Query Retriever", mqr_result),
    ("Contextual Compression Retriever", cc_result),
    ("BM25 Retriever", bm25_result),
    ("Naive Retriever", naive_result),
]

# Extract metrics for each retriever
retriever_names = []
context_recall = []
context_entity_recall = []
noise_sensitivity = []

for name, res in results:
  retriever_names.append(name)
  context_recall.append(np.nanmean(res['context_recall']))
  context_entity_recall.append(np.nanmean(res['context_entity_recall']))
  noise_sensitivity.append(np.nanmean(res['noise_sensitivity_relevant']))


In [None]:
# Create a grouped bar chart for the three metrics per retriever
x = np.arange(len(retriever_names))  # label locations
width = 0.25  # width of each bar

fig, ax = plt.subplots(figsize=(10, 6))
rects1 = ax.bar(x - width, context_recall, width, label='Context Recall')
rects2 = ax.bar(x, context_entity_recall, width, label='Context Entity Recall')
rects3 = ax.bar(x + width, noise_sensitivity, width, label='Noise Sensitivity')

# Add labels, title, and custom x-axis tick labels
ax.set_ylabel('Metric Value')
ax.set_xlabel('Retriever')
ax.set_title('Retriever Performance Metrics')
ax.set_xticks(x)
ax.set_xticklabels(retriever_names, rotation=20, ha='right')
ax.legend()

# Optionally, add value labels on top of bars
def autolabel(rects):
    for rect in rects:
        height = rect.get_height()
        ax.annotate(f'{height:.2f}',
                    xy=(rect.get_x() + rect.get_width() / 2, height),
                    xytext=(0, 3),  # 3 points vertical offset
                    textcoords="offset points",
                    ha='center', va='bottom', fontsize=8)

autolabel(rects1)
autolabel(rects2)
autolabel(rects3)

plt.tight_layout()
plt.savefig('Performance.png', dpi=300, bbox_inches='tight')
plt.close()

## Results Comparison
![Retriever Performance Analysis](Performance.png)
![Cost Analysis](Cost.png)

![Latency Analysis](Latency.png)

## Which is better and why?

Per usual, it depends. If you aren't concerend with latency and cost, the ensemble retriever has the best performance with respect to the retrieval metrics measured. Other than that you'd have to look at whatever your most important metric or metrics are and focus on an implementation that best fits.
