# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [3]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [4]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [5]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [6]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [7]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [8]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [9]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [10]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [11]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [12]:
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the complaints data, appears to be problems related to mismanagement and improper handling by loan servicers, including mistakes in loan balances, incorrect reporting on credit reports, and trouble with repayment plans. Many complaints highlight issues such as receiving bad or conflicting information, being unable to verify or access accurate loan details, problems with payments — especially applying extra funds, interest capitalization, and transfers of loans without proper notification. \n\nIn particular, issues like errors in loan balances, incorrect reporting on credit reports (such as wrongly indicating delinquencies), and difficulties in handling payments and repayment plans are prominent. These issues often result in financial hardship and credit score damage for borrowers.\n\nTherefore, the most common issue with loans, as reflected in these complaints, is **mismanagement and errors in loan servicing and information accuracy.**'

In [13]:
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Yes, some complaints did not get handled in a timely manner. Specifically, at least one complaint (Complaint ID: 12709087) was marked as "Not in a timely response," indicating that the response from the company was delayed beyond the expected timeframe. The others appear to have been responded to within the designated periods or have responses noted as "Closed with explanation."'

In [14]:
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including:\n\n1. Lack of clear information or communication from loan servicers about payment resumption, transfer of loans, or due dates, leading to missed payments and delinquency.\n2. Difficulty managing interest accumulation, especially when options like forbearance or deferment lead to continued interest growth, making repayment unmanageable over time.\n3. Financial hardships such as stagnating wages, inflation, or unexpected expenses, which can make monthly payments unaffordable.\n4. Complicated or predatory loan repayment processes, including inability to allocate payments directly to principal or pay off smaller loans faster.\n5. Lack of understanding about the terms of loans, interest, or repayment plans, often due to insufficient information from lenders or financial aid officers.\n6. Errors or mismanagement by loan servicers, including failure to notify borrowers, incorrect reporting of delinquency, or improper hand

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [15]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [16]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [17]:
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be problems related to dealing with lenders or servicers, such as discrepancies in fees, difficulties with payment application, or receiving incorrect or bad information about the loan. Many complaints involve a lack of clear communication, misapplied payments, or inadequate explanations from the loan servicers.'

In [18]:
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, it appears that several complaints received timely responses from the companies. For example, the complaints with Complaint IDs 13197090, 12792958, 13160766, and 13410623 all indicate that the companies responded "Yes" to providing a response within the appropriate timeframe ("Timely response?": "Yes"). The responses to these complaints were also "Closed with explanation," suggesting they were handled in a timely manner.\n\nThe one complaint with detailed ongoing issues (Complaint ID 13160766) from April 24, 2025, shows that the company responded with a "Closed with explanation" and maintained a timely response, even though the consumer expressed ongoing frustration.\n\nTherefore, based on the available data, no complaints were left unresolved or unhandled in a timely manner.'

In [19]:
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People often fail to pay back their loans due to various issues, including misunderstandings or errors related to their payment plans, lack of clear communication from lenders or servicers, being misled into incorrect types of forbearances, or technical problems with payment processing. In some cases, borrowers face difficulties because their loan accounts are transferred between different entities without proper notification, leading to missed payments, decreased credit scores, or unawareness of payment status. Additionally, delays or failures in response from loan servicers, as well as situations where borrowers are not properly informed about changes or issues with their loans, contribute to repayment problems.'

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.


*(A very specific Oracle database error message.)*

---

#### Why BM25 is the better retriever here

| Factor | BM25 | Dense embeddings |
|--------|------|-----------------|
| **Exact‐token overlap** | Scores the document that contains the *exact* string “ORA‑01722” (plus the colon and phrase) highest, because all rare tokens match. | Sub‑word tokenisers split “ORA‑01722” into fragments like `["ORA", "-", "017", "22"]`; those pieces carry little semantic meaning, so the vector is not strongly distinctive. |
| **Rarity of terms** | The error code is extremely rare in the corpus → BM25’s *inverse‑document‑frequency* boost makes any doc that contains it shoot to the top. | Embeddings average over token vectors; rare or OOV tokens often get mapped to generic “unknown” space, diluting the signal. |
| **Need for verbatim context** | Troubleshooting requires the *exact* error text to locate the relevant fix or SQL snippet. | A semantic match like “numeric conversion failed” might be *similar* but could surface docs for a **different** Oracle error, leading to wrong instructions. |
| **Zero‑shot generalisation not helpful** | The user isn’t asking for conceptual advice—they need the precise solution that matches the code. | Embeddings shine when paraphrase or broader meaning matters, not for pinpoint code/look‑up queries. |

**Result**  
With BM25, the top result is almost certainly the official Oracle docs or a Stack Overflow post whose title begins with *“ORA‑01722: invalid number – how to fix”*. Embedding‑only retrieval often buries that exact match behind semantically “similar” errors, making the user scroll or fail to retrieve the right chunk.

*In short, when the query hinges on a rare, literal token (error codes, part numbers, legal citations, etc.), lexical BM25 retrieval outperforms purely semantic vectors.*


## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [20]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [21]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [22]:
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be "Dealing with your lender or servicer," which often involves problems such as incorrect information about the loan, errors in loan balances, misapplied payments, wrongful denials of payment plans, and mishandling of loan data. Many complaints highlight issues like inaccurate balances, lack of communication, unapproved transfers, and violations of rights under privacy laws.'

In [23]:
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, at least one complaint was marked as handled in a timely manner ("Timely response?": "Yes" for both complaints). However, the complaints regarding delayed responses and unresolved issues indicate that some complaints were not handled promptly or fully resolved over an extended period. Specifically, one complaint mentions it has been nearly 18 months without resolution, and another states that it has been over 1 year without response.\n\nTherefore, yes, some complaints did not get handled in a timely manner.'

In [24]:
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for several reasons, including:\n\n1. Lack of awareness or understanding about the obligation to repay student loans, especially if they were not informed by financial aid officers about the repayment requirements.\n2. Administrative issues, such as loans being transferred or taken over by different companies (e.g., NelNet or EdFinancial Services) without proper notification, leading to confusion and missed payments.\n3. Difficulties accessing or updating account information due to technical issues or incorrect records.\n4. Limited or no communication from lenders or servicers regarding payment due dates, options for repayment plans, or notifications about changes in the loan servicing.\n5. Financial hardship caused by the accumulation of interest, especially when options like forbearance or deferment allow interest to grow, making the total debt larger over time.\n6. The inability to afford increased payments needed to pay off loans quickly, comp

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [25]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [26]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [27]:
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

"The most common issues with student loans, based on the complaints provided, include:\n\n- Mismanagement and mishandling of loans, leading to incorrect loan classifications, lost or improper transfer of loans, and inaccurate account information.\n- Problems with dealing with lenders or servicers, such as lack of transparency, poor communication, and unhelpful or misleading guidance.\n- Issues with loan repayment plans, including difficulties applying payments correctly, being steered into inappropriate forbearance or consolidation, and unanticipated interest accumulation.\n- Errors and discrepancies in loan balances, interest calculations, and account status reporting, often impacting borrowers’ credit scores and financial plans.\n- Problems with loan forgiveness, cancellation, or discharge, including mismanagement of applications for income-driven repayment (IDR) and Public Service Loan Forgiveness (PSLF), and wrongful reporting of delinquency.\n- Cases of loan servicer misconduct, s

In [28]:
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints data, several complaints explicitly indicate that issues were not handled in a timely manner. For example:\n\n- The complaint received on 03/28/25 involving a student loan application at MOHELA was marked as "Timely response? No," with the consumer indicating they heard nothing despite multiple follow-ups.\n- The complaint received on 04/01/25 involving another MOHELA issue was marked as "Timely response? Yes," but involved delays over 4+ hours on calls with no resolution.\n- The complaint from 04/14/25 about an Auto Pay issue with EdFinancial Services was marked "Timely response? Yes," but the consumer still experienced ongoing issues.\n- Multiple other complaints from April and May 2025 about unresolved issues, delays, or responses not received within acceptable timeframes suggest that some complaints did not get handled promptly.\n\nTherefore, yes, there are complaints indicating that some complaints were not handled in a timely manner.'

In [29]:
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to a combination of factors including a lack of proper information about repayment options, the accumulation of interest during forbearance or deferment, financial hardships such as unemployment or medical issues, mismanagement by servicers, and systemic issues within the student loan system that made repayment difficult or seemingly unmanageable. Many borrowers were not informed about income-driven repayment plans, loan forgiveness options, or the impact of interest capitalization. Additionally, some faced unexpected or unauthorized transfer of loans, inaccurate reporting, and inadequate communication from lenders or servicers, which further complicated their ability to manage and repay their loans.'

#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

### Why *multiple reformulations* lift recall in a RAG pipeline

1. **Covers lexical mismatch**  
   - A single phrasing may use words that never appear in the corpus.  
   - Paraphrases swap in synonyms or domain‑specific jargon, so at least one version shares tokens with the relevant chunk → higher chance of a BM25 / hybrid hit.

2. **Hits diverse semantic neighborhoods**  
   - Dense‑vector models embed each reformulation at a slightly different spot in embedding space.  
   - When you query with all of them and **union** the top‑k results, you sweep a broader radius around the user’s intent, surfacing documents that any one vector might have missed.

3. **Balances granularity**  
   - Long, detailed questions can be broken into *atomic sub‑queries* (“Who?”, “When?”, “Why?”).  
   - Short, ambiguous questions can be expanded into richer versions that add context.  
   Both moves retrieve evidence that matches either the coarse or fine aspects of the user’s need.

4. **Mitigates mistakes in user wording**  
   - Typos, wrong acronyms, or partial product names in the original query get corrected in some reformulations, rescuing what would otherwise be a miss.

5. **Reduces recall variance across corpora**  
   - Different documents describe the same concept in different ways (e.g., “heart attack”, “myocardial infarction”). Multiple rewrites align with multiple author styles.

---

#### Quick example

| Variant | What it targets |
|---------|-----------------|
| “**best iPhone battery life**” | conversational blogs / reviews |
| “**iPhone with highest mAh capacity**” | spec sheets, tech forums |
| “**longest‑lasting iPhone model in 2025**” | news coverage & benchmarks |

Unioning results from all three queries captures both marketing copy and engineering data—boosting recall compared to any single phrasing.

> **Rule of thumb:** More *diverse* reformulations (different vocab, length, focus) → more recall gains; but you still cap the total retrieved passages to stay within the context window.


## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [30]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [31]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [32]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [33]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [34]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [35]:
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be related to problems with federal student loan servicing. Specific sub-issues include errors in loan balances, misapplied payments, wrongful denials of payment plans, discrepancies in interest rates, and issues with credit reporting or identity theft protection services. Many complaints highlight systemic breakdowns, errors, and misconduct in loan management and servicing processes.'

In [36]:
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, several complaints were marked as "No" under the "Timely response?" field, indicating they were not handled in a timely manner. Specifically, one complaint involving MOHELA (Complaint ID: 12709087) received a response, but it was noted as "No" for timely response. \n\nTherefore, yes, some complaints did not get handled in a timely manner.'

In [37]:
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to reasons such as experiencing severe financial hardship, lack of proper information about repayment terms, employment difficulties, and the inability to secure adequate income after graduation. For example, some individuals faced financial hardships that made consistent payments impossible, rely on deferment or forbearance which increase interest, or were misled about the manageability of their loans. Additionally, issues like poor communication from loan servicers, improper reporting, or misinformation about payment obligations also contributed to the failure to repay.'

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [38]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [39]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [40]:
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

"Based on the provided complaints and common themes, the most common issue with loans appears to be mismanagement and poor communication from loan servicers, leading to errors such as incorrect account information, improper reporting to credit bureaus, unnotified transfers or status changes, and difficulties in obtaining accurate information or resolving disputes. Additionally, many borrowers report problems with the handling of their repayment plans, interest accumulation, and lack of transparency, resulting in credit score damage and financial hardship.\n\nIn summary, the most common issue is **servicer misconduct involving miscommunication, incorrect information, and lack of transparency, which adversely affects borrowers' credit and financial stability.**"

In [41]:
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints data, several complaints indicate that complaints were not handled in a timely manner. Specifically:\n\n- The complaint with Complaint ID \'12935889\' (row 400) explicitly states "Timely response?": "No". The complaint involves a failure to respond within the expected timeframe, leading to unresolved issues and potential credit impact.\n\n- Similarly, Complaint ID \'12668396\' (row 611) also states "Timely response?": "No", indicating the complaint was not addressed promptly.\n\n- Many other complaints mention prolonged or indefinite wait times, repeated escalations, or lack of response over extended periods (often over 30 days), which suggests that some issues went unhandled or were delayed significantly.\n\nTherefore, yes, there are complaints within this dataset that were not handled in a timely manner.'

In [42]:
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for various reasons, including:\n\n- Lack of proper notification about payment due dates, changes in loan servicers, or transfer of loans, leading to unawareness.\n- Financial hardships such as unemployment, low income, homelessness, or unexpected expenses.\n- Mismanagement or miscommunication by loan servicers, including incorrect or delayed information about account status, delinquency, or payment requirements.\n- High interest accumulation, especially due to forbearance or deferment that allowed interest to grow without principal payments.\n- Confusing or misleading information from loan servicers about payment plans, forgiveness options, or the impact of payments.\n- Technical issues such as errors in online accounts, payment reversals, or incorrect reporting to credit bureaus.\n- Lack of transparency surrounding loan details, interest calculations, or transfer of loans between companies.\n- In some cases, borrowers were misled or not fully in

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [43]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [44]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [45]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [46]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [47]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [48]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the complaints provided, appears to be problems related to the handling and servicing of student loans. This includes issues such as:\n\n- Struggling to repay loans due to problematic forgiveness, cancellation, or discharge processes.\n- Improper or illegal reporting and collection practices, including reporting loans that are no longer valid or legally questionable.\n- Difficulty with communication, transparency, and account management, such as not receiving proper notices about servicer changes or payment requirements.\n- Errors in account status, such as loans being incorrectly reported as in default despite the borrower never defaulting.\n- Disputes over account information, loan amounts, or repayment plans, often with alleged mishandling or miscalculation by servicers.\n\nOverall, these complaints suggest that a significant issue with loans is the mismanagement and mishandling of student loan information by servicers, leading to confusio

In [49]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, all the complaints indicate that responses from the companies were marked as "Closed with explanation" and responses were listed as "Yes" for being timely. There is no explicit information suggesting that any complaints were not handled in a timely manner. Therefore, the answer is that, according to the data shared, no complaints were reported as not being handled in a timely manner.'

In [50]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for various reasons, including issues with loan servicing, problems with payment processing, lack of transparency or bad information from lenders, and legal or administrative disputes. Some specific examples from the complaints include:\n\n- Receiving incorrect or unclear information about their loan status or repayment terms.\n- Difficulties logging in or managing their accounts with loan servicers.\n- Loan servicers stalling or providing incomplete documentation, making it hard for borrowers to verify their loan details or qualify for forgiveness.\n- Errors in reporting loan status leading to defaults or negative credit impacts.\n- Administrative issues like unverified or illegal collection practices, or disputes over whether loans are still valid following legal changes or department closures.\n\nIn summary, failures to pay back loans often stem from administrative errors, lack of clear communication, or legal complications, rather than the bor

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

### ❓ Question #3 – Semantic chunking with short, repetitive sentences (e.g., FAQs)

| Issue you’ll observe | Why it happens | How to adjust the chunking algorithm |
|----------------------|----------------|--------------------------------------|
| **Explosive number of nearly‑identical chunks** | Each FAQ line (“Q: … A: …”) is only a few dozen tokens, so default rules (e.g., 200–500 token target) slice *every* line into its own chunk → creates redundancy and bloats the index. | **Merge by similarity or heading**<br>• Group consecutive Q‑A pairs that share the same topic/keyword.<br>• Set a **minimum chunk length** (e.g., ≥ 100 tokens) so very short sentences are combined. |
| **High vector–space collision** | Repetitive wording (“How do I reset my password?” vs. “How do I change my password?”) yields embeddings that are almost identical; ANN search may treat them as duplicates and skip some useful variants. | **Add “semantic padding”**<br>Include the *section title* or parent category (“Account › Password”) as metadata in the chunk text to diversify the vectors. |
| **Low recall despite many chunks** | Key terms like “refund”, “shipping” appear in dozens of small chunks; retriever returns top‑k duplicates about the *same* answer, crowding out other FAQs. | **Post‑split deduplication & diversity filter**<br>• After embedding, cluster chunks at cosine ≥ 0.9 and keep only one per cluster.<br>• During retrieval, apply Maximal Marginal Relevance (MMR) so top‑k results cover different questions. |
| **Boundary context loss** | A short answer might reference the question implicitly (“Yes, you can.”) and make little sense alone. | **Increase `chunk_overlap` or concatenate Q + A** so the question always travels with its answer. |
| **Evaluation metrics plateau** | Because queries often match *exact* wording present, any chunk containing the same sentence will score the same—making it hard to see improvements. | **Switch to query‑synth plus rerank**<br>Generate paraphrased eval questions; then a reranker (e.g., colBERT or an LLM) can distinguish which chunk best satisfies the *intent* rather than verbatim match. |

> **Practical recipe**
> 1. **Pre‑group** FAQs by heading.  
> 2. **Combine** each Q with its A into one chunk.  
> 3. **Cap minimum tokens** (e.g., merge consecutive Q‑A pairs until ≥ 80 tokens).  
> 4. **Embed with metadata** (`title`, `category`).  
> 5. **Enable MMR** or reranking to avoid duplicate retrievals.

This balances index size, improves semantic diversity, and ensures each retrieved chunk is self‑contained and meaningful to the LLM.


# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.