# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [1]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [2]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [3]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [4]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [5]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [6]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [7]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [8]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [9]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [10]:
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided information, the most common issues with loans appear to involve mismanagement and errors by servicers, including:\n\n- Errors in loan balances and balances growing despite payments\n- Incorrect or inconsistent reporting of loan status (e.g., showing as delinquent when current)\n- Difficulty applying payments correctly or paying off loans faster\n- Disputes over interest rates, fees, and loan terms\n- Unauthorized loan transfers and lack of communication about such transfers\n- Mishandling of loan data, including privacy violations\n- Challenges in getting accurate information about loans and repayment terms\n\nWhile multiple issues are reported, the most frequent theme is problems related to servicing errors, misreported information, and lack of transparency in handling loans.'

In [11]:
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, yes, some complaints were not handled in a timely manner. Specifically, there are complaints where the response status indicates "No" for timely response. For example:\n\n- Complaint ID 12709087 (submitted on 03/28/25 by MOHELA) was marked as "Timely response? No."\n- Complaint ID 12973003 (submitted on 04/14/25 by EdFinancial Services) was marked as "Timely response? Yes," so this one was handled timely.\n- Complaint ID 13062402 (submitted on 04/18/25 by Nelnet, Inc.) was "Timely response? Yes."\n- Complaint ID 12832400 (submitted on 04/05/25 by Maximus Federal Services) was "Timely response? Yes."\n- Complaint ID 12975634 (submitted on 04/14/25 by Maximus Federal Services) was "Timely response? Yes."\n\nTherefore, at least one complaint (ID 12709087) was not handled in a timely manner.'

In [12]:
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans mainly due to a combination of factors such as:\n\n1. **Accumulation of interest during forbearance or deferment:** Many borrowers had limited options besides forbearance or deferment, which allowed interest to continue accruing, increasing the total amount owed and making repayment more difficult once payments resumed.\n\n2. **Financial hardships and stagnant wages:** Some borrowers experienced financial difficulties, including unemployment, low income, high living expenses, or unexpected expenses like foreclosure or bankruptcy, which impaired their ability to make payments.\n\n3. **Lack of clear communication and information:** Several complaints highlighted a lack of notification or transparency from loan servicers regarding loan transfer dates, repayment deadlines, or changes in payment plans, leading borrowers to become delinquent unknowingly.\n\n4. **Difficulty in managing repayment plans:** Borrowers reported challenges in applying payments

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [13]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [14]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [15]:
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, is dealing with the lender or servicer, particularly issues related to the handling of payments, obtaining accurate information about loans, and disputes over fees or information. Specifically, common problems include trouble applying payments correctly, receiving incorrect or bad information about loan balances or terms, and issues with repayment practices that may be perceived as predatory.'

In [16]:
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, all the complaints in the context received a response from the companies, and each response is marked as "Closed with explanation" and noted as "Timely response? Yes." This indicates that the complaints were handled in a timely manner. Therefore, there are no complaints in the provided data that were left unhandled or not handled in a timely manner.'

In [17]:
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People often fail to pay back their loans due to a variety of reasons, including issues with their loan servicers and payment plans. For example, some complain about being unable to get proper assistance or corrections despite fulfilling requirements, as seen in cases where loan servicers steer borrowers into wrong types of forbearances or fail to communicate effectively. Others face problems such as payment reversals or billing errors, which can lead to missed payments and damage to credit scores. Additionally, some borrowers are not properly notified about changes in their loan status or repayment obligations, leading to unintentional overdue payments. All these factors contribute to difficulties in repayment and can result in failure to pay back loans.'

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.

****

BM25 excels at exact keyword matching, so it’s more likely to retrieve documents that contain words that the query literally has

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [18]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [19]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [20]:
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided data, the most common issue with loans appears to be problems related to dealing with lenders or servicers. Specifically, many complaints involve errors in loan information, mismanagement, lack of communication, incorrect balances, unauthorized transfers, and mishandling of personal data. These issues often result in confusion and disputes over loan balances, interest, and repayment obligations.'

In [21]:
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the information provided, it appears that at least one complaint was not handled in a timely manner. Specifically, the complaint regarding the student loan account review and request for account adjustments has been ongoing for over 18 months with no resolution. Additionally, the complainant reports that they have not received a response despite multiple requests, indicating a delay exceeding a year in addressing their concerns.\n\nHowever, in the case of the complaint from EdFinancial Services about auto pay issues and unapplied payments, the response was marked as "Closed with explanation" and indicated that the response was timely.\n\nIn summary, yes, there was at least one complaint that did not get handled in a timely manner.'

In [22]:
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for several reasons, including:\n\n1. Lack of awareness or understanding: Many borrowers, especially first-generation college students, were not informed about the requirement to repay loans and did not realize they had to pay back the borrowed amounts.\n\n2. Administrative issues and miscommunication: Borrowers experienced problems such as being unaware of loan transfers between servicers, not receiving proper notifications about due payments or account updates, and having difficulty accessing their accounts due to incorrect information or technical issues.\n\n3. Accumulation of interest and repayment challenges: Borrowers faced ongoing interest accrual, especially when loans were deferred or put into forbearance, which extended the repayment period and increased the total amount owed. Some found it difficult to increase payments due to financial hardship, stagnant wages, or other financial obligations.\n\n4. Inadequate or confusing options for r

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [23]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [24]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [25]:
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the complaints provided, appears to be problems related to loan servicer misconduct. This includes errors in loan balances, misapplied payments, wrongful denials of payment plans, incorrect reporting of account status (such as delinquency or overdue status), improper handling of forbearance or deferment, and inadequate communication or notification about account changes or issues. Many complaints also involve confusion or inaccuracies about loan terms, interest calculations, and account transfer procedures, which contribute to financial hardship and credit problems for borrowers.'

In [26]:
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, several complaints indicate that they were not handled in a timely manner. Specifically:\n\n- One complaint (Complaint ID: 12709087) regarding a loan application not being processed was marked "No" in timely response, with the complaint dating from late March 2025.\n- Another complaint (Complaint ID: 12654977) about a payment status issue was marked "No" in timely response, with the complaint also from late March 2025.\n- Multiple other complaints mention delays over various periods, with some persistent issues lasting several months and reactions from companies being delayed or insufficient.\n\nIn summary, yes, numerous complaints appear to have not been handled promptly, with at least two explicitly marked as "not timely."'

In [27]:
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People often failed to pay back their loans due to a variety of reasons highlighted in the complaints. Some common issues include:\n\n1. **Errors and Misreporting**: Incorrect account status updates, such as being reported as delinquent or in default when not truly in default, which adversely affected credit scores.\n\n2. **Lack of Proper Communication**: Borrowers were not adequately informed about their repayment status, loan transfer details, or the expiration of forbearance periods, leading to unintentional missed payments.\n\n3. **Servicer misconduct**: Practices such as "forbearance steering," where borrowers were placed into long-term forbearance instead of being informed about income-driven repayment or rehabilitation options, resulted in accruing interest and ballooning balances.\n\n4. **Difficulty in Navigating Repayment Options**: Borrowers were often not provided with clear or sufficient guidance on available repayment plans or forgiveness programs, making it challenging t

#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

****

Captures synonyms and paraphrases

Reformulations allow the retriever to explore different angles or interpretations of the question, reducing the risk of missing useful information
Bypasses vocabulary mismatches: Users and documents often use different terminology


## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [28]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [29]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [30]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [31]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [32]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [33]:
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the context provided, appears to be problems related to federal student loan servicing, including errors in loan balances, misapplied payments, wrongful denials of payment plans, and issues with inaccurate or misleading credit reporting. Many complaints highlight difficulties with loan management, such as discrepancies in balances and interest rates, unfair or unjustified increases, and problems verifying the legitimacy of debts.'

In [34]:
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided context, all the complaints included note that responses from the companies were not handled in a timely manner. Specifically, the complaint with ID 12709087 and ID 12935889 both indicate that responses were delayed beyond acceptable timeframes ("Timely response?": "No"). Additionally, the complaint with ID 13205525 about dispute settlement responses reflects that over 30 days had passed without a reply, which suggests it was also not handled promptly.\n\nTherefore, yes, some complaints did not get handled in a timely manner.'

In [35]:
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'Based on the provided context, people failed to pay back their loans primarily due to various issues such as financial hardship, mismanagement by loan servicers, lack of proper communication, and misrepresentation by educational institutions. For example, one individual mentioned severe financial hardship after graduation and reliance on deferment and forbearance, which increased interest and made repayment difficult. Others experienced problems with loan servicing, such as unfair collection practices, failure to notify about payment obligations, and issues related to the legitimacy of their debts. Additionally, some borrowers faced difficulties because their educational institutions were misrepresented or had closed, leaving them unprepared for the financial consequences of their loans.\n\nIn summary, reasons for failure to repay included financial difficulties, lack of clear information or support from loan providers, and the impact of misrepresented or mismanaged educational ties.'

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [36]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [37]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [38]:
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided complaints and data, the most common issues with student loans tend to be:\n\n- Errors or bad information about loan balances and interest calculations\n- Mishandling of repayment plans, including difficulty applying payments to principal\n- Inaccurate or incorrect reporting of account status and delinquencies\n- Confusing or inadequate communication about loan terms, transfers, or changes\n- Problems with loan classification (e.g., FFELP vs. HEAL), misclassification, or wrongful ending of deferments\n- Issues with loan ownership verification and validation\n- Unexplained increases in debt due to interest or mismanagement\n- Challenges accessing accurate loan information or documentation\n\nIn summary, a predominant issue appears to be mismanagement and inaccuracies in loan information, which leads to financial hardship, credit reporting errors, and confusion about loan status and repayment obligations.'

In [39]:
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints data, yes, some complaints did not get handled in a timely manner. For example:\n\n- Complaint ID 12935889 against MOHELA (filed on 04/11/25) was marked as "No" for timely response, indicating it was not handled promptly.\n- Complaint ID 12668396 against MOHELA (filed on 03/26/25) was also marked as "No" for timely response.\n- Additionally, there are multiple complaints (such as 13062402, 13056764, 13070546) where the public response indicates the companies did not resolve the issues or respond within the expected period.\n\nTherefore, it appears that several complaints went unresolved or were not addressed in a timely manner.'

In [40]:
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including:\n\n- Lack of proper notification or communication from loan servicers about payment due dates, account status changes, or loan transfers, leading to unintentional delinquency and negative impacts on credit scores.\n- Difficulties in understanding or managing complex loan terms, interest accumulation, and eligibility for repayment plans such as income-driven repayment or loan forgiveness, often exacerbated by misinformation or insufficient guidance from servicers.\n- Financial hardships such as unemployment, medical issues, or homelessness, which made it impossible to keep up with payments.\n- Problems with the handling and transfer of loans, including incorrect or inconsistent reporting, unauthorized transfers, or poor recordkeeping, resulting in discrepancies and errors in credit reports.\n- Servicers steering borrowers into long-term forbearances or alternative options that increase interest and debt over time, ra

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [41]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [42]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [43]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [44]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [45]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [46]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the complaints provided, appears to be problems related to loan servicing and communication failures. Specific issues include:\n\n- Difficulty in obtaining clear information about loan status, payments, and servicer changes.\n- Errors or delays in processing payments, auto-debit setups, and re-amortization.\n- Discrepancies in loan account status, such as loans being reported in default without borrower action.\n- Issues with loan reporting and reporting company used reports improperly or illegally.\n- Challenges in verifying loan authenticity or legitimacy, especially following administrative or legal changes.\n\nOverall, poor communication, administrative errors, and mismanagement of loan accounts are prominently featured as key issues.\n\nIf you have any further questions, feel free to ask!'

In [47]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, yes, some complaints were not handled in a timely manner. Specifically, in the complaint regarding the transfer to Nelnet (Complaint ID: 13331376), the consumer indicated that Nelnet never responded to the certified mail complaints they sent, despite acknowledgment of receipt. The company responded by closing the complaint with an explanation, which suggests the issue was not fully resolved or addressed promptly.\n\nAdditionally, several complaints indicate delays or lack of proper response, but the documented responses from companies in these cases state that responses were "Closed with explanation," which often implies the complaints were not handled as promptly or effectively as desired.\n\nTherefore, the answer is: Yes, some complaints did not get handled in a timely manner.'

In [48]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'The provided complaint data suggests several reasons why people failed to pay back their loans:\n\n1. **Miscommunication and Lack of Transparency:** Borrowers received bad information about their loans, such as incorrect forbearance statuses or payment obligations, leading to confusion and default.\n\n2. **Problems with Loan Servicers:** Issues like missing payments, inability to access account information, and failure to properly apply payments caused difficulties in managing repayment.\n\n3. **Unapproved Default or Delinquency Notices:** Borrowers reported being classified as delinquent or in default without valid reasons, sometimes due to administrative errors or misreporting.\n\n4. **Legal and Contractual Disputes:** Some borrowers believe their loans are illegitimate or have been wrongly reported, which affects their ability or willingness to repay.\n\n5. **Data Breach or Unauthorized Access:** Cases where personal information was compromised or mishandled, leading to disputes ov

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

****

Few “semantic shifts” detected -> very long chunks
When each sentence has nearly identical embeddings (because wording and topics repeat), the similarity curve stays flat. The percentile‐based breakpoint detector may decide no point is different enough, so it merges many sentences into one oversized chunk.
Tiny embedding distances dominated by noise
If every sentence is only a line or two, the cosine‑distance signal that the splitter watches becomes extremely small—sometimes smaller than embedding noise—so breakpoints fluctuate unpredictably.
Context windows wasted on duplicate information
A chunk stuffed with 20 near‑identical FAQ lines gives the LLM less useful variety than a mixed‑topic chunk of the same length.

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

In [49]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

In [50]:
from ragas.testset import TestsetGenerator

generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)
dataset = generator.generate_with_langchain_docs(loan_complaint_data[:20], testset_size=10)

Applying SummaryExtractor:   0%|          | 0/14 [00:00<?, ?it/s]

Applying CustomNodeFilter:   0%|          | 0/20 [00:00<?, ?it/s]

Node e0ed1655-a61a-4501-9ba3-97901d0d4a1c does not have a summary. Skipping filtering.
Node 717222b5-dce6-438c-947f-65231a8b72f1 does not have a summary. Skipping filtering.
Node f2b11fa4-899b-4c61-9ab4-e69bd5730e57 does not have a summary. Skipping filtering.
Node 5eddbbe7-753b-4ed6-9215-dfe727940962 does not have a summary. Skipping filtering.
Node 21f46ea5-22cc-4a55-aeeb-7010aa7a9a92 does not have a summary. Skipping filtering.
Node 90657afc-407f-40ef-bfe2-3f748adc45ff does not have a summary. Skipping filtering.


Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/54 [00:00<?, ?it/s]

Applying OverlapScoreBuilder:   0%|          | 0/1 [00:00<?, ?it/s]

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/2 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/10 [00:00<?, ?it/s]

In [51]:
dataset.to_pandas()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,When did the federal student loan COVID-19 for...,[The federal student loan COVID-19 forbearance...,The federal student loan COVID-19 forbearance ...,single_hop_specifc_query_synthesizer
1,Wut is Aidvantge doin with my IDR aplication?,[I submitted my annual Income-Driven Repayment...,Aidvantage has not procesed my IDR aplication ...,single_hop_specifc_query_synthesizer
2,How come my info got out when FERPA supposed t...,[My personal and financial data was compromise...,My personal and financial data was compromised...,single_hop_specifc_query_synthesizer
3,"Accordng to studentaid.gov, am I suppossed to ...","[According to Studentaid.gov, Im to get an ema...","According to Studentaid.gov, you are supposed ...",single_hop_specifc_query_synthesizer
4,"Sinsce the resumption of fedral loan paymnts, ...",[Since the resumption of federal loan payments...,"Since the resumption of federal loan payments,...",single_hop_specifc_query_synthesizer
5,How could a privacy advocate address issues ar...,"[<1-hop>\n\nI set up autopay with AidVantage, ...",A privacy advocate could address these issues ...,multi_hop_specific_query_synthesizer
6,How does the CFPB complaint system play a role...,[<1-hop>\n\nBreach of Contract - All four bran...,The CFPB (Consumer Financial Protection Bureau...,multi_hop_specific_query_synthesizer
7,How can a borrower challenge negative credit r...,[<1-hop>\n\nI am devastated. I would like to r...,A borrower can challenge negative credit repor...,multi_hop_specific_query_synthesizer
8,how come credit bureaus and loan servicers sti...,[<1-hop>\n\nI am writing to formally dispute i...,credit bureaus and loan servicers still report...,multi_hop_specific_query_synthesizer
9,why nelnet keep reportin my student loan on cr...,[<1-hop>\n\nXX/XX/XXXX I increased the amount ...,nelnet still reportin your student loan on cre...,multi_hop_specific_query_synthesizer


In [52]:
def retrieve(state):
  retrieved_docs = naive_retrieval_chain.invoke(state["question"])
  return {"context" : retrieved_docs}

In [53]:
for test_row in dataset:
  response = naive_retrieval_chain.invoke({"question" : test_row.eval_sample.user_input})
  test_row.eval_sample.response = response["response"]
  test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]

In [54]:
dataset.samples[0].eval_sample.response

AIMessage(content='The federal student loan COVID-19 forbearance program ended in 2023.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 17, 'prompt_tokens': 5057, 'total_tokens': 5074, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-nano-2025-04-14', 'system_fingerprint': 'fp_38343a2f8f', 'id': 'chatcmpl-BzZfSOAhBogWLx1V5bIs1QoSMNAMY', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--db855d4b-ea06-4d79-8537-63e7cd11d6a8-0', usage_metadata={'input_tokens': 5057, 'output_tokens': 17, 'total_tokens': 5074, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

In [55]:
type(dataset)

ragas.testset.synthesizers.testset_schema.Testset

In [56]:
from ragas import EvaluationDataset

evaluation_dataset = dataset.to_pandas()
evaluation_dataset["response"] = evaluation_dataset["response"].apply(lambda r: r["content"] if isinstance(r, dict) and "content" in r else r)
evaluation_dataset = EvaluationDataset.from_pandas(evaluation_dataset)

  PydanticSerializationUnexpectedValue(Expected `str` - serialized value may not be as expected [input_value=AIMessage(content='The fe...o': 0, 'reasoning': 0}}), input_type=AIMessage])
  return self.__pydantic_serializer__.to_python(
  PydanticSerializationUnexpectedValue(Expected `str` - serialized value may not be as expected [input_value=AIMessage(content='Based ...o': 0, 'reasoning': 0}}), input_type=AIMessage])
  return self.__pydantic_serializer__.to_python(
  PydanticSerializationUnexpectedValue(Expected `str` - serialized value may not be as expected [input_value=AIMessage(content="I'm so...o': 0, 'reasoning': 0}}), input_type=AIMessage])
  return self.__pydantic_serializer__.to_python(
  PydanticSerializationUnexpectedValue(Expected `str` - serialized value may not be as expected [input_value=AIMessage(content='Accord...o': 0, 'reasoning': 0}}), input_type=AIMessage])
  return self.__pydantic_serializer__.to_python(
  PydanticSerializationUnexpectedValue(Expected `str` - seri

In [57]:
from ragas import evaluate
from ragas.llms import LangchainLLMWrapper

evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1-mini"))

In [58]:
from ragas.metrics import LLMContextRecall, Faithfulness, FactualCorrectness, ResponseRelevancy, ContextEntityRecall, NoiseSensitivity
from ragas import evaluate, RunConfig
import time

custom_run_config = RunConfig(timeout=360)

start = time.time()

result = evaluate(
    dataset=evaluation_dataset,
    metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness(), ResponseRelevancy(), ContextEntityRecall(), NoiseSensitivity()],
    llm=evaluator_llm,
    run_config=custom_run_config
)
print(result)

end = time.time()

print("📊 Performance metrics:", result)
print("⏱️ Total latency:", round(end - start, 2), "seconds")
print("📈 Token usage: not available for gpt-4.1-mini")
print("💰 Estimated cost: manually estimate based on number of samples × avg token usage")

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

Exception raised in Job[35]: TimeoutError()
Exception raised in Job[41]: TimeoutError()
Exception raised in Job[47]: TimeoutError()
Exception raised in Job[53]: TimeoutError()
Exception raised in Job[59]: TimeoutError()


{'context_recall': 0.8552, 'faithfulness': 0.7755, 'factual_correctness': 0.4380, 'answer_relevancy': 0.7605, 'context_entity_recall': 0.5372, 'noise_sensitivity_relevant': 0.3813}
📊 Performance metrics: {'context_recall': 0.8552, 'faithfulness': 0.7755, 'factual_correctness': 0.4380, 'answer_relevancy': 0.7605, 'context_entity_recall': 0.5372, 'noise_sensitivity_relevant': 0.3813}
⏱️  Total latency: 434.22 seconds
📈 Token usage: not available for gpt-4.1-mini
💰 Estimated cost: manually estimate based on number of samples × avg token usage


naive_retrieval

{'context_recall': 0.8567, 'faithfulness': 0.7587, 'factual_correctness': 0.6400, 'answer_relevancy': 0.4831, 'context_entity_recall': 0.4300, 'noise_sensitivity_relevant': 0.2911}

In [84]:
#!pip install -qU langchain-community==0.3.14 langchain-openai==0.3.7 unstructured==0.16.12 langgraph==0.2.61 langchain-qdrant==0.2.0



In [None]:
import os
import getpass

os.environ["LANGCHAIN_TRACING"] = "true"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LangChain API Key:")

In [86]:
from uuid import uuid4

os.environ["LANGCHAIN_PROJECT"] = f"NAIVE2 - {uuid4().hex[0:8]}"

In [87]:
from langsmith.evaluation import LangChainStringEvaluator, evaluate

eval_llm = ChatOpenAI(model="gpt-4.1-mini")

qa_evaluator = LangChainStringEvaluator("qa", config={"llm" : eval_llm})

In [96]:
evaluate(
    naive_retrieval_chain.invoke,
    data=evaluation_dataset,
    evaluators=[
    ],
    metadata={"revision_id": "empathy_rag_chain"},
)

AttributeError: 'SingleTurnSample' object has no attribute 'dataset_id'