# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [1]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [2]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [3]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [4]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [5]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [6]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [7]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [8]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [9]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [10]:
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issues with loans appear to be related to mismanagement and errors by loan servicers, including:\n\n- Errors in loan balances and misapplied payments\n- Wrongful denials of payment plans\n- Incorrect information reported on credit reports\n- Unauthorized or unnotified transfers of loans between servicers\n- Discrepancies in account status and balances\n- Problems with payment application, such as funds being applied mostly to interest rather than principal\n- Issues with loan forgiveness, discharge, or settlement that are not properly handled\n- Bad or incorrect information about loans affecting credit and financial hardship\n\nOverall, the most prevalent issue seems to involve the mishandling and misreporting of loan information by servicers, leading to errors, misapplied payments, incorrect balances, and lack of clear communication. This indicates that a common and significant problem with loans is poor management and communication from

In [11]:
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

"Based on the provided context, yes, some complaints were not handled in a timely manner. Specifically, the complaint with Complaint ID '12709087' submitted to MOHELA on 03/28/25 was marked as 'No' for the response being timely. The individual indicates that despite waiting beyond the expected response time (originally 15 days), they have not received a reply, and the issue remains unresolved.\n\nAdditionally, the complaint with Complaint ID '12973003' submitted to EdFinancial Services on 04/14/25 was responded to as 'Yes' for timely response, indicating it was handled within the expected timeframe. \n\nMost other complaints either were marked as handled in a timely manner or do not specify delays. \n\nTherefore, the answer is that at least one complaint did **not** get handled in a timely manner."

In [12]:
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for several reasons, including:\n\n1. Lack of clear or timely information from lenders or servicers about when repayment was to resume or changes in their loan status, leading to unawareness of payment obligations.\n2. The accumulation of interest during forbearance or deferment periods, which increased the total amount owed and extended the repayment timeline.\n3. Financial hardships, such as stagnant wages, inflation, or insufficient income, making it difficult to afford payments or increase monthly contributions.\n4. Problems with loan management, including miscommunication, transfer of loans without proper notification, or errors in account information, which caused delinquency and credit report issues.\n5. Servicer practices that made it difficult to apply additional payments toward the principal or pay off loans more quickly, often resulting in prolonged debt.\n6. Unanticipated administrative issues like incorrect reporting, failure to notif

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [13]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [14]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [15]:
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be problems related to dealing with lenders or servicers, such as incorrect or bad information about the loan, issues with how payments are processed or applied, and disputes over fees or loan terms. Many complaints highlight difficulties in obtaining accurate loan information, unfair payment practices, and the inability to resolve issues with the servicing companies.'

In [16]:
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, it appears that several complaints were marked as responded to with a response status of "Closed with explanation" and "Yes" for being timely. However, in the case of a particular complaint (row 423), the consumer indicates ongoing issues and seems dissatisfied with the company\'s response, but the response was still marked as "timely."\n\nSince the records show multiple complaints where the response was timely, there is no clear evidence that any complaints were not handled in a timely manner. Nonetheless, some complaints reflect unresolved or ongoing issues, but the data notes that the responses to these complaints were given within the expected timeframes.\n\n**Therefore, based on the available data, no complaints appear to have gone unhandled or unresponded in a timely manner.**'

In [17]:
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for various reasons, including issues with their payment plans, miscommunication or lack of communication from loan servicers, and incorrect or delayed handling of their forbearance applications. Some common problems highlighted in the complaints include:\n\n- Being steered into wrong types of forbearances or having their forbearance requests ignored or mishandled.\n- Having loans transferred to new servicers without proper notification, leading to missed payments or automatic disenrollment from autopay.\n- Receiving bad or no information about their loan status, repayment requirements, or changes in their loan servicing.\n- Technical issues with payments, such as payments being reversed repeatedly due to errors with the servicer or bank, and lack of effective customer support to resolve these issues.\n- Filing for deferments or forbearances but not receiving follow-up or confirmation, resulting in accumulating unpaid bills.\n- Being negatively im

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.

##### ✅ Answer:

BM25 is keyword‑based: it literally counts how many times the query words appear in documents.

This means it’s great when we need an exact match for the literal words we typed.

BM25 is better for queries with rare, literal terms like error codes or log lines, because it prioritizes exact word matches, whereas embeddings might pull in semantically similar but not exact results(e.g :Oracle database error code)

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [18]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [19]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [20]:
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans, particularly student loans, appears to be problems related to dealing with lenders or servicers. Common complaints include receiving bad or incorrect information, errors in loan balances, misapplied payments, wrongful denials of payment plans, lack of communication or documentation, and mishandling of loan data. These issues often involve inaccuracies in account information, unauthorized transfers, privacy violations, and disputes over loan balances and interest calculations.'

In [21]:
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, there are indications that some complaints did not get handled in a timely manner. Specifically, one complaint mentions waiting over 1 year for a response and resolution, which had not been resolved even after nearly 18 months. Although the response to that particular complaint was marked as "Yes" for timely response, the timeline suggests delays in resolving the issue. \n\nIn general, there is evidence of delays and unresolved issues with some complaints, indicating that not all complaints were handled in a timely manner.'

In [22]:
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for several reasons, including:\n\n1. Lack of Awareness and Communication: Borrowers were often not informed by financial aid officers or loan servicers about the requirement to repay loans, leading to confusion about their obligations. For example, some were unaware they needed to begin payments or were not properly notified when their loans were bought out or transferred to new servicers.\n\n2. Increased and Unmanageable Balances: Despite making payments, some borrowers experienced their balances growing due to accumulating interest, especially when loans were placed in forbearance or deferment. Interest continued to accrue during these periods, making the total debt larger over time and creating a cycle that is difficult to break.\n\n3. Poor Account Management and Errors: There were issues with incorrect or inconsistent account information, mismatched balances, and lack of detailed breakdowns of loan or interest calculations. These errors hinde

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [23]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [24]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [25]:
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

"The most common issue with loans, based on the complaints provided, appears to be problems related to poor handling and mismanagement by lenders or servicers. This includes issues such as receiving bad or inaccurate information about the loan, errors in loan balances and interest calculations, failure to provide proper documentation (like original promissory notes), incorrect reporting of account statuses (e.g., late payments or defaults), and lack of communication or transparency. These problems often lead to negative impacts on borrowers' credit scores, financial planning, and legal rights.\n\nIn summary, the most common issue is **mismanagement by loan servicers leading to inaccurate information, poor communication, and improper reporting**, which significantly harms borrowers' financial stability."

In [26]:
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided data, yes, some complaints did not get handled in a timely manner. Specifically, at least two complaints against MOHELA have clear indications of delayed responses:\n\n- Complaint with ID \'12739706\' received on 04/01/25 was marked as "No" in the "Timely response?" field, indicating it was not responded to promptly.\n- Complaint with ID \'12668396\' received on 03/26/25 was marked as "No" for timely response as well.\n\nAdditionally, multiple complaints mention being kept waiting for extended periods or receiving no resolution after considerable delays (over weeks or months), which suggests that handling was not always timely.\n\nIn contrast, some complaints were marked "Yes" for timely response, indicating administrative responses within the expected timeframes, but there are notable instances where delays occurred.\n\nTherefore, the records show that some complaints did not receive timely handling.'

In [27]:
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to issues such as:\n\n- Limited or confusing repayment options (e.g., only being offered forbearance or deferment, which accrue interest and extend the repayment period).\n- Lack of proper information and communication from loan servicers about available repayment plans, especially income-driven repayment options or loan forgiveness programs.\n- Long-term forbearance steering practices that pushed borrowers into delayed repayment methods, resulting in increased interest and balance.\n- Unexpected loan management problems like incorrect account information, reported delinquencies without proper notification, or mishandling during transfer of loan servicing.\n- Financial hardships such as unemployment, health issues, or unforeseen circumstances (e.g., homelessness, accidents) which made consistent repayment difficult.\n- Systemic issues such as mismanagement, errors in reported balances, or failure to properly process communication, wh

#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

##### ✅ Answer:
Generating multiple reformulations increases recall because it broadens the search coverage, capturing relevant documents that express the same intent with different words or phrasing from different perspective and add more refined texts

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [28]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [29]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [30]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [31]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [32]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [33]:
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided context, appears to be related to mistakes or problems with loan servicing, such as errors in loan balances, misapplied payments, wrongful denials of payment plans, and issues arising from misreporting or incorrect information on credit reports. Many complaints involve miscommunications, unfair practices, or discrepancies in loan handling by servicers.'

In [34]:
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, several complaints indicate that they were not handled in a timely manner. Specifically:\n\n- One complaint from a consumer regarding their student loan with MOHELA received a response that was "Closed with explanation" and was marked as "No" for timely response, suggesting it was not handled promptly.\n- Another complaint from the same consumer also indicated delays, with no indication of a timely resolution.\n- Additionally, multiple complaints mention extensive wait times, lack of responses, and prolonged unresolved issues (e.g., waiting on hold for hours or not receiving expected contact within the stated timeframe).\n\nTherefore, yes, some complaints did not get handled in a timely manner.'

In [35]:
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans primarily due to various issues such as financial hardships, misrepresentations by educational institutions, lack of proper communication from loan servicers, and difficulties in managing or verifying their debt. For example, some borrowers experienced severe financial hardship after graduation and relied on deferment or forbearance, which increased the total debt due to accruing interest. Others faced problems like being unaware of their payment obligations, having their payments initiated before the end of the grace period, or dealing with unverified or questionable debt collection practices. Additionally, students who attended institutions with financial instability or false promises had difficulty securing employment and repaying their loans. Overall, these issues contributed to borrowers' inability to fulfill their repayment obligations."

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [36]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [37]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [38]:
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to revolve around problems with loan servicing, including mismanagement, inaccuracies, and inadequate communication. Specific issues frequently reported include:\n\n- Errors in loan balances, interest calculations, and credit reporting\n- Misapplication or mismanagement of payments, leading to increased balances or incorrect delinquency reports\n- Lack of transparency and insufficient communication from loan servicers\n- Unauthorized transfers of loans without proper notification\n- Problems with repayment plans, forbearance, and forgiveness applications\n- Inability to access or verify loan documentation\n\nOverall, mishandling by loan servicers, leading to inaccurate information and lack of clear communication, is identified as the most common issue.'

In [39]:
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, yes, there are several instances where complaints were not handled in a timely manner. Specifically:\n\n- Complaint ID 12739706 (Mohela): Response was marked as "No" for timely response, indicating it was not addressed promptly.\n- Complaint ID 12935889 (EdFinancial Services): Also marked as "No" for timely response.\n- Complaint ID 13056764 (EdFinancial Services): Marked as "Yes" for timely response, so handled promptly.\n- Complaint ID 13117223 (Experian): Marked as "Yes" for timely response.\n- Complaint ID 13205525 (Nelnet): Marked as "Yes" for timely response.\n- Complaint ID 13298273 (EdFinancial Services): Marked "Yes."\n- Complaint ID 13410623 (Maximus Federal Services / Aidvantage): Marked as "Yes."\n- Several complaints from Maximus/Aidvantage (e.g., 12950199, 12950199, 12975634, 13131123, etc.) indicate they responded within the time frames, although some complainants expressed ongoing frustration with the handling or follow-up.\n\nIn summa

In [40]:
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for various reasons, often related to issues with loan management and communication from loan servicers. The main reasons include:\n\n1. **Lack of Proper Notification and Communication:** Many borrowers were not adequately informed when their payments were due, when their loans were transferred to different servicers, or when their payment status changed. This lack of notice led to misunderstandings and missed payments.\n\n2. **Compounding and Unmanageable Interest:** Borrowers reported that interest continued to accrue and compound, especially during periods of forbearance, making the total debt grow faster than they could repay, even with consistent payments.\n\n3. **Limited Payment Options and Misleading Guidance:** Many borrowers were only offered options like forbearance or deferment, which often led to increased interest and extended repayment periods. There was a lack of guidance on income-driven repayment plans or loan forgiveness programs

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [41]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [42]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [43]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [44]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [45]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [46]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issues with loans involve problems such as difficulties in repayment (e.g., struggling to repay, incorrect payment amounts), improper or unauthorized reporting (e.g., reporting loans as delinquent or in default without proper basis), issues with loan servicing (e.g., trouble with how payments are being handled, auto-debit problems), and mismanagement or mishandling of borrower information (e.g., privacy breaches, data breaches, unauthorized access). \n\nMany complaints highlight disputes over the accuracy of loan status, processing delays, and the challenges borrowers face in obtaining clear information or making corrections. \n\nTherefore, a key common issue appears to be **problems related to loan servicing and reporting inaccuracies**, which can lead to financial and credit reporting complications for borrowers.'

In [47]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, several complaints indicate that they were handled in a timely manner, with responses marked as "Closed with explanation" and responses labeled "Yes" under the "Timely response?" category. For example, complaints submitted on 04/28/25, 05/01/25, 05/05/25, 04/13/25, and 05/09/25 were all responded to promptly.\n\nHowever, the complaints generally note that the responses were "Closed with explanation," and many involve unresolved issues or ongoing disputes. The data does not explicitly mention any complaints that were not handled in a timely manner. All entries provided indicate that responses were given within the expected timeframe.\n\nTherefore, based on the available information, there is no evidence that any complaints did not get handled in a timely manner.'

In [48]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"Based on the provided context, people failed to pay back their loans mainly due to issues such as poor communication and lack of transparency from loan servicers, difficulties in accessing accurate or official information about their loan status, and alleged deliberate stalling or unfair practices by loan providers. These challenges have led to misunderstandings about loan terms, default judgments, and even claims of illegal reporting or data breaches, which can hinder borrowers' ability to make payments or resolve issues effectively."

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

##### ✅ Answer:

With short, repetitive sentences like FAQs, semantic chunking can produce many nearly identical, tiny chunks, which hurts retrieval quality. To fix this, we’d merge related questions into larger topical chunks, deduplicate similar sentences, and add context tags — giving the model richer, more distinct representations for retrieval.

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.