# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [125]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [126]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [127]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [128]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [129]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [130]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [131]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [132]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [133]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [134]:
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided data, the most common issues with student loans appear to be related to problems with loan servicing and administration. These include:\n\n- Dealing with lenders or servicers, such as receiving bad information, errors in loan balances, misapplied payments, wrongful denials of payment plans, and difficulties with how payments are being handled.\n- Incorrect or inconsistent information on credit reports and account status.\n- Discrepancies in loan balances and interest accumulation due to mismanagement or lack of transparency.\n- Problems with loan transfer notifications and changes in servicers without proper communication.\n- Issues with payment application, especially in applying extra funds to interest instead of principal, or restrictions on paying off smaller loans more quickly.\n- Challenges with understanding or accessing repayment options, including forbearance, deferment, and loan forgiveness programs.\n\nWhile the specific questions about the "most commo

In [135]:
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the information provided, yes, some complaints did not get handled in a timely manner. Specifically, there is at least one complaint where the response was noted as "No" under the "Timely response?" category, indicating it was not handled promptly. For example, the complaint received on 03/28/25 by MOHELA involved delays in addressing the issue, with the consumer reporting that they had not received any response or resolution after several weeks. \n\nAdditionally, multiple complaints mention extended periods without resolution—over a year in some cases—highlighting ongoing delays and issues with timely handling.\n\nTherefore, the answer is: Yes, some complaints did not get handled in a timely manner.'

In [136]:
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to a combination of factors including: \n\n1. Lack of clear communication and notification from loan servicers about when payments were to resume or any changes in loan status.\n2. The complexity of repayment options, such as forbearance or deferment, which often lead to accruing interest that increases the total amount owed over time.\n3. Financial hardships such as stagnant wages, inflation, and unforeseen economic downturns, making it difficult for individuals to afford increasing or unchanged payments.\n4. Mismanagement or lack of transparency regarding loan transfers, interest calculations, and delinquency reporting, which can cause confusion and unintended delinquency.\n5. Limited access to or awareness of income-driven repayment plans or forgiveness programs, leading individuals to default when they cannot meet standard repayment terms.\n6. Problems with loan servicing practices, where payments are applied in a manner that fav

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [137]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, k=10)

We'll construct the same chain - only changing the retriever.

In [138]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [139]:
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans, specifically student loans, appears to be problems dealing with lenders or servicers. Common sub-issues include disputes over fees charged, trouble with how payments are being handled (such as difficulty applying payments correctly), and receiving incorrect or bad information about loan balances and terms. These issues often involve a lack of trust in the loan servicers, allegations of predatory practices, and difficulties in understanding or managing loan details.\n\nTherefore, the most common issue with loans, as indicated by the complaints, is problems related to dealing with lenders or servicers, including mismanagement, miscommunication, and issues with payment application and fees.'

In [140]:
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, all of the complaints referenced were responded to in a timely manner. Specifically, the responses to complaints numbered 13197090, 12792958, 13160766, and 13410623 were all marked as "Yes" for timely response.'

In [141]:
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People often fail to pay back their loans due to issues such as incorrect or problematic payment plans, lack of communication from the lender or servicer, or difficulties stemming from mismanagement or deception by the loan servicing companies. For example, some individuals have experienced their autopayments being unenrolled or not properly communicated, leading to missed or late payments. Others have faced challenges with repayment plans that do not suit their financial situation or have been misled about their account status, resulting in unpaid bills and damaged credit scores. Additionally, some borrowers have reported the transfer of their loans to different companies without proper notification, causing confusion and unintended missed payments. Overall, failures to repay are often linked to poor communication, administrative errors, or problematic servicing practices.'

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.

##### ✅ Answer:
Example: "How to fix PostgreSQL error 23505 duplicate key violation" in documents related to databases/general CS
Explanation: BM25 retriever looks for exact matches, so a rare error would be easily found by it. Naive retriever may look up documents that look similar but ultimately fail to mention this specific error. 
General principle: BM25 will perform really well where exact term matches are important. While naive retriever will focus on similar documents, BM25 will focus on documents with exact matches. 

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [142]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [143]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [144]:
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans, particularly student loans, appears to be problems related to the handling of payments, interest accumulation, and information accuracy. Many complaints highlight issues such as:\n\n- Interest continuing to accrue during forbearance or deferment, leading to increased total debt.\n- Lack of transparent information about loan balances, interest calculations, and payment breakdowns.\n- Errors and inconsistencies in loan balances and interest in credit reports.\n- Difficulties in managing payments, with options like forbearance or deferment often leading to increased total debt rather than reducing it.\n\nOverall, a primary issue is the mismanagement and lack of clear communication from loan servicers regarding how loans are handled, especially around interest accumulation and repayment options.'

In [145]:
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, there are indications that some complaints did not get handled in a timely manner. For example, a complaint about a delayed response to a request has been open for over 1 year without resolution, and nearly 18 months with no resolution regarding account review and violations. Additionally, issues related to payments not being applied correctly were ongoing, with no resolution mentioned. \n\nTherefore, yes, some complaints did not get handled in a timely manner.'

In [146]:
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans primarily due to a lack of clear communication, misunderstandings about loan obligations, and the ongoing accumulation of interest. Many borrowers were not adequately informed about their repayment responsibilities, the effect of interest, or changes in their loan status, such as transfers between servicers without notice. Additionally, options like forbearance or deferment, while accessible, often led to interest continuing to accrue, making it harder to pay off loans over time. Economic hardships, stagnant wages, and unmanageable payment plans also contributed to borrowers' difficulties in repayment."

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [147]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [148]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [149]:
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issues with loans include:\n\n- Errors in loan balances, misapplied payments, and wrongful denials of payment plans.\n- Problems with how payments are being handled, such as inability to apply additional funds to the principal.\n- Discrepancies and inaccuracies in loan account status and reporting, including incorrect default or delinquency reports.\n- Lack of communication or inadequate notification from lenders or servicers.\n- Unauthorized transfers or mishandling of loans.\n- Problems related to loan forgiveness, discharge, or cancellation.\n- Issues with loan servicer misconduct, including errors due to bad information or improper processing.\n- Problems with credit reporting, including incorrect account status or account data.\n- Challenges in managing repayment plans, especially for borrowers facing financial hardship.\n- Negative impacts on credit scores due to improper reporting or delays in communication.\n\nWhile the specific "

In [150]:
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, it appears that some complaints were not handled in a timely manner. Specifically:\n\n- One complaint from 03/28/25 regarding problems with customer service at MOHELA was marked as "Timely response? No," indicating it was not handled within the expected timeframe.\n- Another complaint from 03/25/25 also about MOHELA\'s customer service was marked as "Timely response? No."\n- Several other complaints, such as those from 04/18/25 and 04/24/25, were marked as "Yes" for timely responses, indicating they were handled within the expected period.\n- However, there are complaints from 04/24/25 and 05/06/25 noted as "Yes," implying some complaints are handled on time, but the ones specifically marked "No" suggest delays.\n\nTherefore, yes, there were complaints that did not get handled in a timely manner.'

In [151]:
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans mainly due to a combination of systemic issues, lack of transparent information, and the way loan servicers handled repayment options. The key reasons include:\n\n1. **Limited or Misleading Payment Options:** Many borrowers were only offered forbearance or deferment, which allowed interest to continue accumulating, making loans more difficult to pay off over time and increasing the total debt.\n\n2. **Lack of Awareness of Alternative Repayment Plans:** Borrowers were often not informed about income-driven repayment plans or loan forgiveness programs that could make repayment more manageable.\n\n3. **Interest Accumulation and Capitalization:** Without proper guidance, borrowers did not understand how interest would compound, leading to loans ballooning in size despite ongoing payments.\n\n4. **Mismanagement and Poor Communication:** Several complaints highlight that servicers did not properly notify borrowers about their account status, missed paym

#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

##### ✅ Answer:
1. Slightly reformulated user queries can lead to better targeted retrieval when original user query is incomplete. 
2. If we do not reduce number of retrieved documents per each query, we'll simply have more documents retrieved into the context. Given that recall = "Retrieved relevant docs" / "Total relevant docs", we may improve numerator simply by having more documents in the context. 

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [152]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [153]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [154]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [155]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [156]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [157]:
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be related to errors and misconduct by loan servicers, including misreported account information, errors in loan balances, misapplied payments, wrongful denials of payment plans, and issues with loan reporting to credit bureaus. Additionally, complaints frequently mention difficulties in understanding or verifying loan terms, interest rate discrepancies, and improper handling of loans after sale or transfer.'

In [158]:
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, all the complaints identified were marked as "No" under the "Timely response?" field, indicating they were not handled in a timely manner. Specifically, two complaints involving Mohela were explicitly noted as "No," meaning they did not receive a timely response.'

In [159]:
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for various reasons, including:\n\n1. Lack of clear communication or proper notification from loan servicers about payment obligations, due dates, or changes in the servicing company.\n2. Financial hardship or severe economic difficulties that made it impossible to afford repayments.\n3. Misrepresentations or misleading information regarding the value or manageability of the loans, especially in cases where institutions misled students about career prospects or the stability of their schools.\n4. Long-term financial consequences that were not fully understood at the time of borrowing, such as accruing interest during deferment or forbearance.\n5. Institutional issues, such as schools closing or facing financial problems, which impacted graduates' ability to secure employment and ultimately repay loans.\n6. Administrative errors or disputes, including incorrect reporting of payments, unauthorized collection efforts, or issues verifying the legitima

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [160]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [161]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [162]:
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the complaints provided, appears to be dealing with or problematic handling by loan servicers and lenders. Specific sub-issues frequently reported include errors in loan balances, misapplied payments, wrongful denials of payment plans or forgiveness, incorrect account statuses, poor communication or lack of transparency, and unfair or predatory payment enforcement practices. Many complaints also mention the mishandling of loan transfers, incorrect classification of loan types, inaccurate reporting of delinquency status, and issues with understanding or managing interest calculations.\n\nIn summary, the predominant problem is the mishandling and mismanagement of student loans by servicers, leading to errors, misinformation, and financial hardship for borrowers.'

In [163]:
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Yes, according to the provided complaints, some complaints were not handled in a timely manner. For example, complaint ID 12935889 regarding issues with Mohela was marked as "No" response to being timely, indicating it was not addressed promptly. Similarly, complaint ID 12654977 involving Mohela was also marked as "No" for timely response. \n\nHowever, there are other complaints, such as IDs 13205525 and 13056764, which were marked as "Yes" for timely responses. Overall, at least some complaints did not get handled in a timely manner.'

In [164]:
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for several reasons, including:\n\n1. **Lack of clear and accurate information from lenders or servicers:** Many borrowers were misled or not properly informed about repayment options, interest accumulation, and the long-term costs, leading to confusion and unmanageable debt.\n\n2. **Financial hardships:** Borrowers faced unforeseen circumstances such as unemployment, health issues, or economic downturns, making it difficult or impossible to make payments.\n\n3. **Inadequate communication and notification:** Several complaints highlighted that servicers did not notify borrowers about the start of repayment, missed payments, or changes in loan status, which sometimes resulted in unintentional delinquency.\n\n4. **Problematic loan management practices:** Issues such as improper transfers of loans, errors in account reporting, misapplied payments, and difficulty applying extra payments directly to the principal caused loan balances to grow or payment

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [165]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [166]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [167]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [168]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [169]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [170]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided complaints data, the most common issues with loans tend to involve problems related to borrower communication and account handling, including:\n\n- Trouble with how payments are being handled (e.g., auto-debit issues, re-amortization delays)\n- Receiving incorrect or bad information about the loan status or terms\n- Problems with loan reporting and credit reporting errors\n- Difficulties contacting or receiving responsive assistance from the lender or servicer\n- Disputes over unauthorized or illegal collection practices or privacy breaches\n\nOverall, issues related to miscommunication, improper handling of payments, and reporting errors appear most frequently. \n\nIf I had to identify the most common issue from this data, it seems to be related to **"dealing with your lender or servicer,"** especially issues such as incorrect information, billing problems, and lack of clear communication.\n\nIf you need a precise summary, the most recurring theme appears to be 

In [171]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Yes, several complaints indicated that they were not handled in a timely manner. For example, one complaint about Nelnet (received on 05/04/25) detailed that despite sending multiple letters by Certified Mail and acknowledging receipt, Nelnet never responded to the complaint. Similarly, another complaint (received on 04/13/25) about Nelnet involved ongoing issues with unresolved disputes and alleged violations, though the response from the company was described as "Closed with explanation." \n\nAdditionally, multiple complaints mention delays or lack of responses from the service providers, suggesting that not all complaints were addressed promptly. Therefore, based on this data, some complaints did not get handled in a timely manner.'

In [172]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for various reasons reflected in the complaints. Some common reasons include:\n\n1. **Lack of Communication and Transparency:** Borrowers experienced poor communication from lenders or servicers, making it difficult to understand their loan status or repayment options. For example, one individual received inconsistent information about their forbearance status and had trouble logging into their accounts.\n\n2. **Disputes Over Loan Accuracy and Legitimacy:** Borrowers faced issues with incorrect or disputed loan information, such as loans wrongly reported as delinquent or in default, or discrepancies about payments made. Some were unaware their loans had gone into default due to miscommunications or errors in reporting.\n\n3. **Problems with Loan Servicers and Mishandling of Documentation:** Several complaints involved servicers stalling or refusing to process forgiveness or discharge paperwork, or mishandling documentation related to loan forgiven

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

##### ✅ Answer:

Potential problems:
1. Embeddings will look similar because they will often have similar pattern ("How do I..?")
2. Semantic chunking may merge way too much because of the unnaturally high similarity
3. Because chunks are overmerged, they add irrelevant information and are big. This means that generation may provide irrelevant information to the user. 

Fixes I could think of:
1. Change threshold to prevent overmerging.
2. Cap chunk length to avoid overmerging (although that still may lead to chunks that have random info, while also having incomplete chunks)
3. Remove repetitive info from documents (E.g., "How do", "Thank you", etc.)
4. For FAQs specifically, first chunk each question with its answer. 

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

In [114]:
from langsmith import utils
utils.get_env_var.cache_clear()

In [174]:
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_PROJECT"] = f"AIE7 - LangGraph - {uuid4().hex[0:8]}"
os.environ["LANGSMITH_API_KEY"] = getpass.getpass("LangSmith API Key: ")

In [208]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from ragas.testset import TestsetGenerator
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams
from langchain.prompts import ChatPromptTemplate
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict
from langchain_core.documents import Document
import copy

generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings)
dataset = generator.generate_with_langchain_docs(loan_complaint_data[:20], testset_size=10)

dataset.to_pandas()

all_retrievers = [naive_retrieval_chain, bm25_retrieval_chain, parent_document_retrieval_chain, contextual_compression_retrieval_chain, multi_query_retrieval_chain, ensemble_retriever]

dataset_for_naive = copy.deepcopy(dataset)
dataset_for_bm25 = copy.deepcopy(dataset)
dataset_for_parent_document = copy.deepcopy(dataset)
dataset_for_contextual_compression = copy.deepcopy(dataset)
dataset_for_multi_query = copy.deepcopy(dataset)
dataset_for_ensemble = copy.deepcopy(dataset)

retriever_dataset_pairs = [
    (naive_retrieval_chain,          dataset_for_naive),
    (bm25_retrieval_chain,           dataset_for_bm25),
    (parent_document_retrieval_chain, dataset_for_parent_document),
    (contextual_compression_retrieval_chain, dataset_for_contextual_compression),
    (multi_query_retrieval_chain,    dataset_for_multi_query),
    (ensemble_retrieval_chain,             dataset_for_ensemble),
]

for retriever,ds in retriever_dataset_pairs:
    print("Attempting to run retrieval chain")
    for test_row in ds:
        response = retriever.invoke({"question" : test_row.eval_sample.user_input})
        test_row.eval_sample.response = response["response"].content
        test_row.eval_sample.retrieved_contexts = [context.page_content for context in response["context"]]







        

Applying SummaryExtractor:   0%|          | 0/14 [00:00<?, ?it/s]

Applying CustomNodeFilter:   0%|          | 0/20 [00:00<?, ?it/s]

Node dd29059b-359d-4141-a870-8d763c1591fe does not have a summary. Skipping filtering.
Node c142025d-29e2-4f06-b20e-7958e6fd6b81 does not have a summary. Skipping filtering.
Node 7c013f96-9fa3-42dd-beaf-b6b33e2732f8 does not have a summary. Skipping filtering.
Node a18f74d8-131d-4681-a1e5-e7d72bdf49b8 does not have a summary. Skipping filtering.
Node 70f90cfe-2066-4f16-be53-7a0d6d2ab15f does not have a summary. Skipping filtering.
Node 3381621b-f62c-47fc-ad47-7f6f2a08ea30 does not have a summary. Skipping filtering.


Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/54 [00:00<?, ?it/s]

Applying [CosineSimilarityBuilder, OverlapScoreBuilder]:   0%|          | 0/2 [00:00<?, ?it/s]

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/12 [00:00<?, ?it/s]

Attempting to run retrieval chain
Attempting to run retrieval chain
Attempting to run retrieval chain
Attempting to run retrieval chain
Attempting to run retrieval chain
Attempting to run retrieval chain


In [218]:
from ragas import EvaluationDataset
from ragas import evaluate
from ragas.llms import LangchainLLMWrapper
from ragas.metrics import LLMContextRecall, Faithfulness, LLMContextPrecisionWithoutReference, ResponseRelevancy, ContextEntityRecall, NoiseSensitivity
from ragas import evaluate, RunConfig

evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1-mini"))

evaluation_dataset = EvaluationDataset.from_pandas(dataset.to_pandas())

custom_run_config = RunConfig(timeout=960)

ii = 0

all_results = {}

for i, (retriever, ds) in enumerate(retriever_dataset_pairs, start=1):
    print(f"Working on retrieval chain {i}")
    
    temp_eval_dataset = EvaluationDataset.from_pandas(ds.to_pandas()[:8])
    
    result = evaluate(
        dataset=temp_eval_dataset,
        metrics=[
            LLMContextPrecisionWithoutReference(),
            LLMContextRecall(),
            ContextEntityRecall(),
            NoiseSensitivity()
        ],
        llm=evaluator_llm,
        run_config=custom_run_config
    )
    
    print(f"Retrieval chain {i} results")
    print(result)
    all_results[f"retriever_{i}"] = result

Working on retrieval chain 1


Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Retrieval chain 1 results
{'llm_context_precision_without_reference': 0.9691, 'context_recall': 0.9375, 'context_entity_recall': 0.5133, 'noise_sensitivity(mode=relevant)': 0.2467}
Working on retrieval chain 2


Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Retrieval chain 2 results
{'llm_context_precision_without_reference': 0.9375, 'context_recall': 0.6542, 'context_entity_recall': 0.3843, 'noise_sensitivity(mode=relevant)': 0.2107}
Working on retrieval chain 3


Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Retrieval chain 3 results
{'llm_context_precision_without_reference': 1.0000, 'context_recall': 0.8667, 'context_entity_recall': 0.4927, 'noise_sensitivity(mode=relevant)': 0.3550}
Working on retrieval chain 4


Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Retrieval chain 4 results
{'llm_context_precision_without_reference': 1.0000, 'context_recall': 0.8125, 'context_entity_recall': 0.5509, 'noise_sensitivity(mode=relevant)': 0.1763}
Working on retrieval chain 5


Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Exception raised in Job[31]: TimeoutError()


Retrieval chain 5 results
{'llm_context_precision_without_reference': 0.9178, 'context_recall': 0.9583, 'context_entity_recall': 0.5269, 'noise_sensitivity(mode=relevant)': 0.3699}
Working on retrieval chain 6


Evaluating:   0%|          | 0/32 [00:00<?, ?it/s]

Exception raised in Job[7]: TimeoutError()
Exception raised in Job[19]: TimeoutError()
Exception raised in Job[27]: TimeoutError()


Retrieval chain 6 results
{'llm_context_precision_without_reference': 0.9418, 'context_recall': 0.9792, 'context_entity_recall': 0.5083, 'noise_sensitivity(mode=relevant)': 0.3593}


##### ✅ Answer:
I had to cut down on the number of metrics and examples, otherwise it wasn't possible to complete even with the small dataset we have (ragas is extremely cost/time consuming). 

##### Comparison table
| Metric                     | Naive  | BM25   | Parent doc | ContextCompress | Multi query | Ensemble |
| :-------------------------:| :----: | :----: | :--------: | :-------------: | :---------: | :------: |
| Context precision (no ref) | 0.9691 | 0.9375 | **✅ 1.0000** | **✅ 1.0000**   | 0.9178    | 0.9418   |
| Context recall             | 0.9375 | 0.6542 | 0.8667     | 0.8125          | **✅0.9583** | **✅0.9792**   |
| Context entity recall      | **✅0.5133** | 0.3843 | **✅0.4927**     | **✅0.5509**          | **✅0.5269**      | **✅0.5083**   |
| Noise sensitivity          | **✅0.2467** | **✅0.2107** | 0.3550     | **✅0.1763**          | 0.3699      | 0.3593   |
| Tokens | 93,065 | **✅50,255** | **✅48,888** | **✅30,548 + ???** | 146,417 | 202,328 |
| Cost   | **✅$0.01** | **✅$0.01** | **✅$0.01** | **✅$0.005 + $0.002** | $0.02 | $0.02 | 
| P50 Latency | **✅3.66s** | **✅2.95s** | **✅3.62s** | **✅3.72s** | 6.90s | 6.81s |
| P99 Latency | **✅7.33s** | **✅5.77s** | **✅6.45s** | **✅5.67s** | 11.20s | 10.44s |

Disclaimer: despite low sample size, I treat this table as if it has a significant sample size, otherwise there is no meaningful interpretation. RAGAS also runs multi-tests to make evals more robust (which doesn't fixes things if underlying dataset is no good, of course). 

My conclusion:
1. Context precision seems to be quite similar across retrievers. Context compression retriever and parent doc outperform other retrievers, because some may be too simplistic (naive, bm25) to have strong context targeting, while other may simply bring in too much context, including the context from simplistic retrievers (ensemble).
2. Context recall is high for retrievers with high numbers of retrieved docs, as the metric focuses on not missing important docs. BM25 underperforms as expected due to its simplistic nature that has low fit with non-exact match queries. Context compression retriever underperforms as it cuts down on the number of docs included. 
3. Context entity recall is mostly similar across retrievers. 
4. Noise sensitivity (lower — better) is good for retrievers that retrieve low number of documents because it prevents context overload where LLM gets too much information. 
5. Both token count, costs, and latency explode for multi-query retriever and ensemble retriever. 

My recommendation:
1. Context compression retriever shows strong results across all performance measures while being among the best in both cost and latency. For general cases, I'd prefer going with the context compression retriever. I also like parent doc retriever performance, and the underlying logic of parent doc retriever.  
2. However, our current ensemble retriever includes all other retrievers in it. We'd probably get way better cost/latency if we cut down on used retrievers; considering that multiple retrievers also introduce redundancy and context overload, we could probably improve performance of that retriever by using a few retrievers instead of the current state. 
3. This HW took hours (waiting for data, etc), so I'm not going to re-run ensemble before HW deadline, but will definitely explore other configurations on my own. 