# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [1]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [2]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [3]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [4]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [5]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [6]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [7]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [8]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [9]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [10]:
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be mishandling by loan servicers, including errors in loan balances, misapplied payments, improper transfer or sale of loans without proper notification, and faulty or confusing information about account status and repayment terms. A significant theme is the frustration with how payments are applied, inaccurate reporting to credit bureaus, and lack of transparency or communication from loan servicers.'

In [11]:
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, yes, there were complaints that did not get handled in a timely manner. For example, the complaint submitted to MOHELA on 03/28/25 was marked as "Not timely" response, indicating that the company failed to respond within the expected timeframe. Similarly, a complaint to Maximus Federal Services on 04/14/25 was handled "within" the expected window. Overall, at least one complaint was not handled in a timely manner.'

In [12]:
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"Based on the provided information, people failed to pay back their loans mainly because of a combination of factors, including:\n\n1. **Accumulating interest during forbearance or deferment:** Borrowers often lowered or paused payments but interest continued to accrue, increasing the total amount owed and making it difficult to pay off the principal later.\n\n2. **Inadequate or confusing communication from loan servicers:** Many borrowers reported not being properly notified about payment resumption, loan transfer, or changes in repayment status, leading to missed payments or delinquency.\n\n3. **Financial hardships and stagnant wages:** Many borrowers faced economic challenges, making it difficult to increase payments or manage existing debt, especially when repayment options would extend the loan term and increase overall costs.\n\n4. **Complex and restrictive repayment options:** Some found that available options like income-driven repayment or loan forgiveness programs were inacce

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [13]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [14]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [15]:
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, one of the most common issues with loans appears to be problems related to dealing with lenders or servicers. Specific sub-issues include disputes over fees charged, difficulties in applying payments correctly, receiving bad or unclear information about loan balances or terms, and issues with loan repayment calculations. These problems often involve miscommunication, lack of transparency, or predatory practices by loan servicers.'

In [16]:
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, it appears that at least one complaint was handled in a timely manner, as indicated by the "Timely response?" field being marked "Yes" for multiple complaints. However, there are also complaints where the individual reports ongoing issues and indicates delays or insufficient responses. Specifically, there is mention of repeated issues over several years and frustration with response times, but the records show that many responses were marked as "Closed with explanation" and responded to within the legal or organizational deadlines.\n\nTherefore, the answer is: Yes, some complaints did get handled in a timely manner.'

In [17]:
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for various reasons, including issues with payment plans, miscommunications, and disputes with loan servicers. Some specific reasons include:\n\n1. Problems with repayment plans, such as being steered into the wrong types of forbearances or having their autopayments discontinued without proper notification.\n2. Lack of communication from the loan servicer, leading borrowers to be unaware of their payment status or changes, resulting in missed or late payments.\n3. Technical issues with payment processing, such as payments being reversed repeatedly or not being correctly credited due to errors or misunderstandings.\n4. Disputes over loan transfers and the management of automatic payments, especially when borrowers are unaware of transfers to new servicers.\n5. Failure of loan servicers to respond to requests for deferment or forbearance, leading to continued billing and financial hardship.\n6. Borrowers experiencing confusion or frustration because

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.

#### ✅ Answer

BM25 is likely to perform better than embeddings when searching for specific keywords or phrases that appear verbatim in the complaints. For example, when searching for "FASFA deadline", BM25 might give higher weight to documents that contain this exact phrase. BM25 is based on FREQUENCY not SEMANTICS. I found this viedo to be very helpful to explain the difference: https://www.youtube.com/watch?v=3FbJOKhLv9M.



## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [18]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [19]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [20]:
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be problems related to dealing with loan servicers, including receiving bad or incorrect information about the loan, errors in loan balances, misapplied payments, wrongful denials of payment plans, and mishandling or mishandling of loan data. Many complaints highlight issues such as lack of communication, inaccurate information, unauthorized transfers, privacy violations, and difficulties in resolving disputes with loan servicers.'

In [21]:
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, yes, there are complaints that have not been handled in a timely manner. Specifically, one complaint regarding the student loan account review has been open for over 18 months without resolution, and the complainant has not received a response despite multiple requests over that period.'

In [22]:
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People have failed to pay back their loans for several reasons, including:\n\n1. Lack of Awareness and Information: Some borrowers were not informed or aware that they would need to repay their financial aid or student loans, especially if they did not receive clear communication from financial aid officers about repayment obligations.\n\n2. Issues with Loan Management and Transfers: Borrowers experienced problems such as loans being transferred between servicers without their knowledge or consent, and difficulty accessing or understanding their online account information and balances.\n\n3. Accumulation of Interest and Unmanageable Payments: Many borrowers found that interest continued to accrue even when they entered forbearance or deferment, making it difficult to pay down the principal. Lowering payments often resulted in interest compounding, extending the loan payoff period and increasing total debt.\n\n4. Insufficient or Poor Communication: Borrowers reported not receiving noti

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [23]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [24]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [25]:
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

"Based on the provided context, the most common issues with loans, particularly student loans, involve problems with loan servicing and mismanagement. Specifically, the frequent issues include:\n\n- Trouble with how payments are being handled or applied, often leading to delayed or incorrect payments and difficulties in making extra payments toward principal.\n- Receiving bad or inaccurate information about loan balances, interest, or repayment options.\n- Problems with loan transfers, such as unauthorized or unnotified transfers between servicers.\n- Errors in reported account status, such as incorrect delinquency or balance information affecting credit scores.\n- Failure to provide legally required documentation like original Master Promissory Notes (MPNs).\n- Unsatisfactory customer service, including unhelpful or dismissive interactions and lack of support in managing repayment difficulties.\n- Alleged predatory practices like forbearance steering, coercive consolidation, and incor

In [26]:
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, yes, some complaints indicate that complaints did not get handled in a timely manner. Specifically, there are instances where:\n\n- The company response was marked as "Closed with explanation," and in some cases, the responses were categorized as "timely response" but still did not resolve the issues (e.g., pages with complaints received in April 2025 where the consumer indicated lack of response over extended periods).\n- Multiple complaints state that the complainants have been waiting over a year (and in some cases nearly 18 months or more) without resolution despite follow-ups and repeated efforts.\n- Several complaints mention that the issue remains unresolved despite the consumer repeatedly reaching out over months or years.\n- In at least one case (page with complaint received 04/14/25, complaint ID 13056764), the response was "Closed with explanation," yet the complainant reports ongoing issues and no resolution.\n\nTherefore, it appears that 

In [27]:
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People often fail to pay back their loans due to a variety of systemic and servicer-related issues, including:\n\n- Misinformation or lack of proper communication from loan servicers about repayment options, leading borrowers to remain unaware of income-driven repayment plans, loan rehabilitation, or forgiveness programs.\n- Being steered into long-term forbearances or consolidations without being adequately informed of the consequences, such as interest capitalization, increased balances, and reset forgiveness periods.\n- Errors in loan balance reporting, improper account handling, or mismanagement that create confusion and hinder timely repayment.\n- Difficulty applying extra payments toward principal or reducing debt more quickly due to servicer policies, which can be interpreted as predatory or unhelpful.\n- Lack of transparency and documentation, errors in loan attribution, or transfer of loans without proper notice, causing miscalculations and credit reporting issues.\n- Insuffi

#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

##### ✅ Answer:
Generating multiple reformulations of a user query improves recall in a similar way to how image generators expand user prompts. Just like how an image generator might take a simple prompt like "dog" and expand it to "a golden retriever puppy sitting in a grassy field on a sunny day" to get more specific and relevant results, the MultiQueryRetriever takes a basic query and generates multiple variations that capture different aspects and phrasings of the same information need.

This improves recall because:
1. Different phrasings may match different relevant documents that use varying terminology
2. Multiple queries explore different semantic angles of the same question
3. By casting a wider net with related queries, we're more likely to catch relevant documents that might be missed by a single query
4. The LLM can add helpful context and specifications, just like how image generators flesh out scene details

For example, a query about "loan issues" might be expanded to:
"What are common problems people face with their loan servicers?"
"What types of complaints do borrowers report about loan payments?"
"What difficulties do people encounter when dealing with student loan companies?"
Each variation increases the chance of matching relevant documents in the corpus.



## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [28]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [29]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [30]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [31]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [32]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [33]:
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be related to problems with federal student loan servicing. Specific issues include errors in loan balances, misapplied payments, wrongful denials of payment plans, and misconduct by servicers such as providing bad or incorrect information about loans, errors in balances, and collection problems. Additionally, issues with loan reporting, unfair practices, and disputes over interest rates and terms are frequent concerns.'

In [34]:
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, it appears that several complaints were not handled in a timely manner. Specifically, in the cases involving Mohela (complaint IDs 12709087 and 12935889), the responses were marked as "No" for timely response, indicating delays in handling these complaints. Additionally, in the complaint involving Aidvantage, the response was marked as "Yes" for timely response, suggesting it was handled promptly. The complaint about credit bureau disputes (complaint ID 13205525) was responded to in a timely manner as well.\n\nTherefore, yes, some complaints did not get handled in a timely manner, specifically those related to Mohela.'

In [35]:
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People often fail to pay back their loans due to various reasons such as financial hardship, lack of proper information, or mismanagement. For example, some borrowers experience severe financial difficulties after graduation, relying on deferment or forbearance, which can increase the overall debt due to accrued interest. Others may be misinformed or unaware of when their payments are expected to start, especially if loan servicers do not communicate payment schedules clearly or fail to notify them about changes in their loan status. In some cases, inability to secure employment or adverse personal circumstances also contribute to their difficulty in repaying loans.'

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [37]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [38]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [39]:
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided complaints and data, the most common issues with loans appear to be:\n\n1. **Dealing with lenders or servicers**, including problems with how payments are being handled, incorrect or bad information about loans, and difficulty communicating effectively.\n2. **Errors or discrepancies in loan account information**, such as incorrect balance reporting, missing or misreported payment history, or inaccurate account status updates.\n3. **Problems with loan management practices**, including improper transfer of loans, lack of transparency, mismanagement during consolidation, and issues with loan servicing companies.\n4. **Problems related to credit reporting**, such as inaccurate credit reports showing late payments or increased balances without explanation.\n5. **Struggles to understand or access loan terms**, including difficulty in obtaining clear loan information, interest calculations, or proper documentation.\n6. **Issues with repayment plans**, including problems

In [40]:
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, yes, several complaints indicate that complaints did not get handled in a timely manner. Specifically:\n\n- Complaint ID 12935889 (Maximus Federal Services, NJ): The response was marked as "No" for timely response.\n- Complaint ID 12668396 (MOHELA, NJ): The response was marked "No" for timely response.\n- Complaint ID 12739706 (MOHELA, NJ): The response was marked "No" for timely response.\n- Complaint ID 12650717 (MOHELA, NJ): The response was marked "No" for timely response.\n\nAdditionally, multiple other complaints mention delays in responses, failed follow-ups, or prolonged wait times, suggesting that a number of complaints were not handled in a timely manner.'

In [41]:
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to a combination of factors highlighted in the complaints:\n\n1. **Unfavorable repayment options and interest accumulation:** Borrowers were often limited to options like forbearance or deferment, which allowed interest to continue to accrue, making the total payoff amount grow over time and prolonging the repayment period.\n\n2. **Misleading or lack of clear information:** Many borrowers reported not being properly informed about interest capitalization, repayment options, loan transfers, or the true extent of their debt, leading to unawareness of their obligations or the growth of their balances.\n\n3. **Financial hardship and unrealistic expectations:** Borrowers took loans expecting to repay them easily after graduation, but stagnant wages, economic downturns, job instability, or health issues prevented them from making payments.\n\n4. **Servicer misconduct and poor communication:** Complaints include lack of notifications, misma

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [42]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [43]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [44]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [45]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [46]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [47]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided data, the most common issue with loans appears to be related to problems with loan servicing and communication. Many complaints involve difficulties in accurate reporting, understanding loan balances, payment processing, and administrative errors such as incorrect repayment amounts, delayed or missing notifications about servicer changes, and issues with auto-debit setups. While specific issues like loan repayment struggles and improper reporting are prominent, overall, administrative and communication-related problems with loan servicers seem to be the most frequent issues reported.'

In [48]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, yes, there are multiple complaints indicating that complaints were not handled in a timely manner. Specifically, several complaints mention that the companies did not respond promptly or at all:\n\n- One complaint states that despite multiple letters and acknowledgment of receipt, Nelnet never responded to the complainant\'s letters, indicating a lack of timely handling.\n- Other complaints mention issues with delays or lack of response from companies like Maximus Federal Services and EdFinancial Services, although these are noted as responses being "Closed with explanation," which might suggest the matter was resolved or closed without timely communication.\n\nTherefore, it appears that several complaints were not handled promptly or in a timely manner.'

In [49]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including issues related to mismanagement, lack of transparency, and legal complications. Specific examples from the provided data include:\n\n- Difficulties in understanding or receiving proper information about their loan status or repayment obligations, leading to unintentional default.\n- Problems with loan servicing processes, such as missing or unrecorded payments, technical errors, or miscommunication from servicers like Navient or Nelnet, which can result in being incorrectly reported as delinquent or in default.\n- Disputes over the legitimacy of debt, especially when loans are reported as legally void or when there are breaches of privacy and legal violations by loan servicers or the Department of Education, causing confusion and hindering repayment efforts.\n- In some cases, borrowers reported that their efforts to provide documentation or resolve issues were met with stall tactics or unhelpful responses, leading to

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

##### ✅ Answer:
With short, repetitive sentences like FAQs, semantic chunking may face several challenges:

1. Overlapping Embeddings: Similar/repetitive sentences will have very similar embeddings, making it harder to distinguish between chunks and potentially leading to redundant retrievals

2. Loss of Context: Short sentences may lack sufficient context for the embedding model to capture meaningful semantic relationships

3. Inefficient Chunking: The algorithm may create many small chunks that don't effectively group related content

To address these issues, you could:

Increase chunk size to capture more context around each FAQ
Use custom chunking rules that keep question-answer pairs together
Add metadata/tags to help distinguish between similar FAQs
Consider alternative chunking strategies like combining related FAQs into topical groups
Use hybrid retrieval approaches that combine semantic and keyword matching


# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

In [51]:
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders import PyMuPDFLoader


path = "data/"
loader = DirectoryLoader(path, glob="*.pdf", loader_cls=PyMuPDFLoader)
docs = loader.load()

In [54]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

In [57]:
from ragas.testset import TestsetGenerator

generator = TestsetGenerator(
    llm=generator_llm, embedding_model=generator_embeddings)
dataset = generator.generate_with_langchain_docs(docs[:20], testset_size=10)

Applying HeadlinesExtractor:   0%|          | 0/17 [00:00<?, ?it/s]

Applying HeadlineSplitter:   0%|          | 0/20 [00:00<?, ?it/s]

unable to apply transformation: 'headlines' property not found in this node
unable to apply transformation: 'headlines' property not found in this node
unable to apply transformation: 'headlines' property not found in this node


Applying SummaryExtractor:   0%|          | 0/31 [00:00<?, ?it/s]

Property 'summary' already exists in node '6d17f9'. Skipping!
Property 'summary' already exists in node 'a9cc5c'. Skipping!
Property 'summary' already exists in node '14dd9b'. Skipping!
Property 'summary' already exists in node 'e907f3'. Skipping!
Property 'summary' already exists in node 'f3298d'. Skipping!
Property 'summary' already exists in node 'd02815'. Skipping!
Property 'summary' already exists in node '1e1fea'. Skipping!
Property 'summary' already exists in node '990718'. Skipping!
Property 'summary' already exists in node 'd33e7f'. Skipping!
Property 'summary' already exists in node '125488'. Skipping!
Property 'summary' already exists in node '6c1c65'. Skipping!
Property 'summary' already exists in node '1cc769'. Skipping!
Property 'summary' already exists in node 'c56073'. Skipping!
Property 'summary' already exists in node 'bd7040'. Skipping!


Applying CustomNodeFilter:   0%|          | 0/6 [00:00<?, ?it/s]

Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/41 [00:00<?, ?it/s]

Property 'summary_embedding' already exists in node '1cc769'. Skipping!
Property 'summary_embedding' already exists in node 'c56073'. Skipping!
Property 'summary_embedding' already exists in node '990718'. Skipping!
Property 'summary_embedding' already exists in node '6d17f9'. Skipping!
Property 'summary_embedding' already exists in node 'e907f3'. Skipping!
Property 'summary_embedding' already exists in node 'f3298d'. Skipping!
Property 'summary_embedding' already exists in node 'bd7040'. Skipping!
Property 'summary_embedding' already exists in node '1e1fea'. Skipping!
Property 'summary_embedding' already exists in node '125488'. Skipping!
Property 'summary_embedding' already exists in node 'd33e7f'. Skipping!
Property 'summary_embedding' already exists in node '6c1c65'. Skipping!
Property 'summary_embedding' already exists in node '14dd9b'. Skipping!
Property 'summary_embedding' already exists in node 'd02815'. Skipping!
Property 'summary_embedding' already exists in node 'a9cc5c'. Sk

Applying [CosineSimilarityBuilder, OverlapScoreBuilder]:   0%|          | 0/2 [00:00<?, ?it/s]

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/12 [00:00<?, ?it/s]

In [58]:
dataset.to_pandas().head()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,"Howw doo the Pell Grant formulas 1, 2, or 3 ap...","[non-term (includes clock-hour calendars), or ...","In a subscription-based academic calendar, the...",single_hop_specifc_query_synthesizer
1,Whaat aree thee exseptions to thee normal disb...,[Inclusion of Clinical Work in a Standard Term...,"See Volume 8, Chapter 3 for additional guidanc...",single_hop_specifc_query_synthesizer
2,How are credit-hours related to determining wh...,[Non-Term Characteristics A program that measu...,A program that measures progress in credit-hou...,single_hop_specifc_query_synthesizer
3,where i find examples in Appendix A?,[both the credit or clock hours and the weeks ...,The principles described above are illustrated...,single_hop_specifc_query_synthesizer
4,What are the disbursement requirements for fed...,[<1-hop>\n\nboth the credit or clock hours and...,In clock-hour or non-term credit-hour programs...,multi_hop_abstract_query_synthesizer


In [59]:
# Import required libraries for evaluation
from ragas.metrics import ContextPrecision, ContextRecall, ContextRelevance
from ragas import evaluate
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from ragas.dataset_schema import SingleTurnSample, EvaluationDataset
import pandas as pd
import time
from langsmith import traceable
from datetime import datetime
import os
import getpass

In [60]:
# # Get LangSmith API key
# langsmith_key = getpass.getpass("LangSmith API Key:")
os.environ["LANGSMITH_API_KEY"] = getpass.getpass("LangSmith API Key:")

# Enable LangSmith tracing
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_PROJECT"] = "retriever-evaluation-max"

print("✅ LangSmith tracing enabled!")
print(f"📊 Project: {os.environ['LANGSMITH_PROJECT']}")
print("🔗 Visit https://smith.langchain.com to view your traces")

✅ LangSmith tracing enabled!
📊 Project: retriever-evaluation-max
🔗 Visit https://smith.langchain.com to view your traces


In [61]:
evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1-mini"))
evaluator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

# Define RAGAS metrics for retriever evaluation
ragas_metrics = [
    ContextPrecision(llm=evaluator_llm),
    ContextRecall(llm=evaluator_llm), 
    ContextRelevance(llm=evaluator_llm)
]

print("RAGAS evaluator setup complete!")

RAGAS evaluator setup complete!


In [62]:
# Convert the generated dataset to RAGAS format
test_df = dataset.to_pandas()

# Create evaluation samples from the test dataset
evaluation_samples = []
for idx, row in test_df.iterrows():
    sample = {
        'user_input': row['user_input'],
        'reference_contexts': row['reference_contexts'],
        'reference': row['reference']
    }
    evaluation_samples.append(sample)

print(f"Created {len(evaluation_samples)} evaluation samples")
print("Sample structure:", evaluation_samples[0].keys())

Created 12 evaluation samples
Sample structure: dict_keys(['user_input', 'reference_contexts', 'reference'])


In [64]:
evaluation_samples

[{'user_input': 'Howw doo the Pell Grant formulas 1, 2, or 3 applyy to subscription-based academic calendars?',
  'reference_contexts': ['non-term (includes clock-hour calendars), or subscription-based. In a standard term or nonstandard term academic calendar, a term is generally a period in which all classes are scheduled to begin and end within a set time frame, and academic progress is measured in credit hours. In a non-term academic calendar, classes do not begin and end within a set time frame, such as a term. Academic progress in a non-term program can be measured in either credit or clock hours. In some cases (as discussed below), a program with terms must be treated as a non-term program for Title IV purposes. A subscription-based academic calendar is used only by subscription-based programs. A subscription-based program is a term-based program in which the school charges a student for each term on a subscription basis with the expectation that the student will complete a speci

In [63]:
retrievers_to_evaluate = {
    "naive": naive_retriever,
    "bm25": bm25_retriever, 
    "contextual_compression": compression_retriever,
    "multi_query": multi_query_retriever,
    "parent_document": parent_document_retriever,
    "ensemble": ensemble_retriever,
    "semantic": semantic_retriever
}

print(f"Will evaluate {len(retrievers_to_evaluate)} retrieval methods")

Will evaluate 7 retrieval methods


In [None]:
from langsmith import trace_as_chain

@traceable
def evaluate_retriever(retriever, retriever_name, evaluation_samples):
    """Evaluate a single retriever using RAGAS metrics"""
    
    print(f"Evaluating {retriever_name}...")
    start_time = time.time()
    
    # Prepare evaluation data
    ragas_samples = []
    successful_samples = 0
    
    for sample in evaluation_samples:
        try:
            # Retrieve documents for this question
            with trace_as_chain(name=f"{retriever_name}_retrieval") as manager:
                retrieved_docs = retriever.invoke(sample['user_input'])
                retrieved_contexts = [doc.page_content for doc in retrieved_docs]
                
                # Log metadata about the retrieval
                manager.on_chain_end({
                    "question": sample['user_input'],
                    "num_docs_retrieved": len(retrieved_docs),
                    "retriever_type": retriever_name
                })
            
            # Create RAGAS sample
            ragas_sample = SingleTurnSample(
                user_input=sample['user_input'],
                retrieved_contexts=retrieved_contexts,
                reference_contexts=sample['reference_contexts'],
                reference=sample['reference']
            )
            ragas_samples.append(ragas_sample)
            successful_samples += 1
            
        except Exception as e:
            print(f"Error processing sample for {retriever_name}: {e}")
            continue
    
    # Calculate timing
    end_time = time.time()
    avg_latency = (end_time - start_time) / len(evaluation_samples)
    
    # Evaluate with RAGAS
    if ragas_samples:
        eval_dataset = EvaluationDataset(samples=ragas_samples)
        with trace_as_chain(name=f"{retriever_name}_ragas_evaluation") as manager:
            evaluation_results = evaluate(dataset=eval_dataset, metrics=ragas_metrics)
            
            # Log the evaluation results to LangSmith
            manager.on_chain_end({
                "retriever_name": retriever_name,
                "metrics": evaluation_results.to_dict(),
                "avg_latency_seconds": avg_latency,
                "total_samples": len(evaluation_samples),
                "successful_samples": successful_samples
            })
        
        return {
            'retriever_name': retriever_name,
            'metrics': evaluation_results,
            'avg_latency_seconds': avg_latency,
            'total_samples': len(evaluation_samples),
            'successful_samples': successful_samples
        }
    else:
        return None

print("Evaluation function defined with LangSmith integration!")

Evaluation function defined!


In [None]:
print("Starting retriever evaluation...")
print("="*60)

results = []
for retriever_name, retriever in retrievers_to_evaluate.items():
    try:
        result = evaluate_retriever(retriever, retriever_name, evaluation_samples)
        if result:
            results.append(result)
            print(f"✅ {retriever_name} evaluation completed")
        else:
            print(f"❌ {retriever_name} evaluation failed")
    except Exception as e:
        print(f"❌ Error evaluating {retriever_name}: {e}")
        continue
    time.sleep(60)

print(f"\nCompleted evaluation of {len(results)} retrievers")

Starting retriever evaluation...
Evaluating naive...


Evaluating:   0%|          | 0/36 [00:00<?, ?it/s]

✅ naive evaluation completed
Evaluating bm25...


Evaluating:   0%|          | 0/36 [00:00<?, ?it/s]

✅ bm25 evaluation completed
Evaluating contextual_compression...


Evaluating:   0%|          | 0/36 [00:00<?, ?it/s]

✅ contextual_compression evaluation completed
Evaluating multi_query...
