# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [2]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [3]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [4]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [5]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [6]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [7]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [8]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [9]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [10]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [11]:
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided data, the most common issue with loans, particularly student loans, appears to be related to mishandling and errors by loan servicers. This includes problems such as:\n\n- Dealing with the lender or servicer (e.g., receiving bad information, errors in loan balances, misapplied payments, wrongful denials of payment plans).\n- Incorrect or conflicting information on credit reports and account status (e.g., accounts shown as delinquent or in default when they are not).\n- Challenges with payment application (e.g., unable to apply payments toward principal, payments only applied to interest, or payments being misapplied).\n- Loan transfers and lack of proper notification or transparency during those processes.\n- Disputes over interest, balances, or settlement eligibility.\n- Mishandling of forbearance and deferment options, leading to increased debt or prolonged repayment periods.\n- Privacy violations and improper data handling.\n\nOverall, a prevalent issue is the

In [12]:
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the information in the provided context, yes, some complaints did not get handled in a timely manner. Specifically, there are complaints with the response status marked as "No" or indicating a delay:\n\n- A complaint received on 03/28/25 (Complaint ID: 12709087) involved a delay in handling, with the consumer stating they had not heard from the company despite multiple follow-ups. The complaint was marked as "Timely response?": "No."\n  \n- Other complaints from the same period involve significant delays or lack of response, indicating that not all complaints were handled promptly.\n\nTherefore, the answer is: Yes, some complaints did not get handled in a timely manner.'

In [13]:
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People often fail to pay back their loans due to a combination of factors including:\n\n1. **Limited options for manageable repayment:** Many borrowers are only offered forbearance or deferment options, which allow them to temporarily pause payments but continue to accrue interest, making the total debt larger over time and extending the repayment period.\n\n2. **Accumulation of interest and growing debt:** Even when making payments, interest can continue to accumulate, especially if payments are lowered or deferred, resulting in a growing balance that is difficult to pay off.\n\n3. **Financial hardships and stagnating wages:** Borrowers often face economic challenges such as stagnant wages, unemployment, or unexpected expenses that make consistent repayment difficult.\n\n4. **Lack of clear communication and information:** Borrowers report inadequate communication from loan servicers about repayment terms, loan status, interest calculations, or changes in loan servicers, leading to co

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [14]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [15]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [16]:
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with student loan complaints appears to be related to dealing with lenders or servicers, specifically issues like the following:\n\n- Disagreements over fees charged\n- Problems with how payments are being handled, such as applying additional funds or paying off loans more quickly\n- Receiving incorrect or bad information about loan balances, interest, or repayment terms\n- Issues with loan information accuracy and transparency\n\nMany complaints mention miscommunication, lack of proper explanations, or perceived predatory behavior by servicers and lenders.\n\nTherefore, the most common issue with loans, as reflected in these complaints, is difficulties in communication and disputes with loan servicers or lenders regarding fees, payment application, and loan information accuracy.'

In [17]:
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, all the complaints mentioned in the context received responses labeled as "Closed with explanation," and the responses are indicated as "Yes" for being timely. Therefore, there is no evidence or indication that any complaints did not get handled in a timely manner.'

In [18]:
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failing to pay back their loans often do so because they encounter issues such as improper or confusing payment plans, difficulty with automated payments, or miscommunication from loan servicers. For example, some individuals have reported being misled into unsuitable forbearance options, having their payments or autopayments discontinued without proper notification, or facing unexpected charges due to interest capitalization. Additionally, others have experienced their loans being transferred to new companies without their awareness, leading to missed communications and unaddressed repayment issues. These problems can result in missed or reversed payments, legal penalties, and negative impacts on credit scores, especially when loan servicers do not provide clear guidance or timely assistance.'

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.

#### ✅ Answer

BM25 is likely to perform better than embeddings when searching for specific keywords or phrases that appear verbatim in the complaints. For example, when searching for "FASFA deadline", BM25 might give higher weight to documents that contain this exact phrase. BM25 is based on FREQUENCY not SEMANTICS. I found this viedo to be very helpful to explain the difference: https://www.youtube.com/watch?v=3FbJOKhLv9M.



## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [19]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [20]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [21]:
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be dealing with the borrower’s interactions with their lender or servicer, specifically issues related to incorrect or misleading information, mishandling of loans, and lack of proper communication. Many complaints involve errors in loan balances, misapplication of payments, wrongful denials of payment plans, and disputes over loan details, which can lead to confusion and credit issues for borrowers.'

In [22]:
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, there are indications that some complaints did not get handled in a timely manner. For example, the complaint logged under Complaint ID 12975634 mentions it has been over a year since the initial request and nearly 18 months without resolution, despite the response being marked as "Timely response? Yes." Additionally, in the complaint with Complaint ID 12973003, the issue has persisted for over 2-3 weeks, and the secondary complaint about unpaid payments remains unresolved. \n\nTherefore, yes, some complaints appeared to be delayed or not resolved promptly.'

In [23]:
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for several reasons, based on the provided context:\n\n1. **Lack of Awareness and Information:** Many borrowers were not adequately informed about their obligation to repay loans when taking out financial aid, especially if they were the first in their family to attend college. This led to confusion about repayment responsibilities.\n\n2. **Administrative Issues and Communication Failures:** Borrowers experienced poor communication from loan servicers, including failure to notify them about payment due dates, loan transfers without their knowledge, and lack of proper notifications or updates about their loan status.\n\n3. **Compounding Interest and Payment Difficulties:** Even when borrowers made payments, interest continued to accrue—sometimes outpacing their payments—making it difficult to reduce principal and leading to increasing balances over time.\n\n4. **Limited or Unavailable Payment Options:** Borrowers were often only offered options lik

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [24]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [25]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [26]:
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided context, appears to be problems related to **dealing with lenders or servicers**, specifically:\n\n- Mismanagement of loan accounts (errors in balances, misapplied payments)\n- Problems with payment handling (difficulty applying payments correctly, inability to pay principal)\n- Receiving bad or conflicting information about loans\n- Unauthorized account activity or advances in delinquency status\n- Lack of clear communication, transparency, or proper notices\n- Inaccurate reporting of account status to credit bureaus\n- Problems with loan classification or proper handling of deferments and forbearances\n\nIn particular, many complaints focus on poor customer service, inaccurate loan information (including balances and delinquency status), and mishandling of payment plans, income-driven repayment options, or loan forgiveness processes. These issues seem to be the most prevalent and recurring in the complaints data.\n\nSo, the mos

In [27]:
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, yes, some complaints indicate that issues were not handled in a timely manner. Several complaints mention delays, lack of responses, or failure to resolve issues within expected timeframes. For example:\n\n- A complaint from 04/18/25 states that Mohela failed to correct inaccuracies within the required time, and response was delayed beyond the expected window.\n- Multiple complaints mention that the companies did not respond promptly to disputes or inquiries, with some delays exceeding the typical response times.\n\nHowever, some responses were marked as "Timely" (e.g., responses on 04/05/25 and 04/22/25), but even in these cases, the complaints highlight ongoing unresolved issues or delays in resolution.\n\nIn summary, while some organizations responded in a timely manner, many complaints indicate that certain issues were not handled in a timely manner, leading to ongoing frustration for the consumers.'

In [28]:
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including:\n\n- Administrative errors or mismanagement by loan servicers, such as inaccurate reporting or transfer of loans without proper notification.\n- Lack of proper communication and guidance from servicers regarding payment resumption, delinquency notices, and available repayment options.\n- Excessive accumulation of interest due to forbearance or deferment practices, often leading to balances that grow over time rather than decrease.\n- Coercive practices like forbearance steering, where borrowers are not informed about income-driven repayment or rehabilitation options, resulting in higher balances and difficulty in repayment.\n- Financial hardship or misunderstanding of loan terms, including high interest rates and lack of transparency about repayment plans and loan balance details.\n- Systemic issues and systemic failures in loan servicing, leading to incomplete or incorrect information, negative credit reporting, an

#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

##### ✅ Answer:
Generating multiple reformulations of a user query improves recall in a similar way to how image generators expand user prompts. Just like how an image generator might take a simple prompt like "dog" and expand it to "a golden retriever puppy sitting in a grassy field on a sunny day" to get more specific and relevant results, the MultiQueryRetriever takes a basic query and generates multiple variations that capture different aspects and phrasings of the same information need.

This improves recall because:
1. Different phrasings may match different relevant documents that use varying terminology
2. Multiple queries explore different semantic angles of the same question
3. By casting a wider net with related queries, we're more likely to catch relevant documents that might be missed by a single query
4. The LLM can add helpful context and specifications, just like how image generators flesh out scene details

For example, a query about "loan issues" might be expanded to:
"What are common problems people face with their loan servicers?"
"What types of complaints do borrowers report about loan payments?"
"What difficulties do people encounter when dealing with student loan companies?"
Each variation increases the chance of matching relevant documents in the corpus.



## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [29]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [30]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [31]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [32]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [33]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [34]:
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to involve misconduct or errors in loan servicing. Specific issues include errors in loan balances, misapplied payments, wrongful denials of payment plans, unfair increases in interest rates, inappropriate credit reporting, and difficulties with loan consolidation or verification. These problems indicate systemic shortcomings in how loans are managed and reported, leading to significant financial and credit impacts on borrowers.'

In [35]:
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, yes, some complaints did not get handled in a timely manner. Specifically, the complaints regarding student loan servicing by MOHELA (Complaint IDs 12709087 and 12935889) indicate that responses were delayed beyond the expected timeframes ("Timely response?": "No"). Additionally, one complaint about dispute settlement by Nelnet (Complaint ID 13205525) was responded to within 30 days but still represents a longer-than-acceptable processing time, and the complaint itself highlights ongoing neglect.\n\nTherefore, the answer is: Yes, some complaints were not handled in a timely manner.'

In [36]:
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to a combination of factors such as financial hardship, lack of proper information, mismanagement, and external circumstances. For example, some borrowers experienced severe financial difficulties after graduation, making it difficult to keep up with payments. Others were misled about the manageability of their loans or were not properly informed about payment requirements and changes, such as loan servicers not notifying them about when payments should begin or about loan buyouts and account transfers. Additionally, issues like the closure of educational institutions, employment challenges, and health problems also contributed to their inability to repay loans.'

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [37]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [38]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [39]:
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content



In [40]:
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints data, yes, several complaints indicate that issues were not handled in a timely manner. Specifically:\n\n- Complaint ID 12709087 (MOHELA) was marked as "Not timely response," with the user stating the issue was not resolved within 10 days despite promises.\n- Complaint ID 12823876 (EdFinancial Services) was "Handled within the response time."\n- Complaint ID 12935889 (Maximus/Aidvantage) was marked "No," indicating a response was not timely, with wait times exceeding four hours or more and no resolution.\n- Complaint ID 13062402 (Nelnet) was handled "within the response time."\n- Complaint ID 13197090 (EdFinancial) was "Handled within response time."\n- Complaint ID 13205525 (Nelnet) was "Handled within response time."\n- Other complaints similarly indicate delays or failures to respond or resolve issues promptly.\n\nIn summary, multiple complaints reflect delays or failures to handle issues in a timely manner.'

In [41]:
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including:\n\n1. Lack of proper information about interest accrual and repayment options, leading to unmanageable debt growth.\n2. Difficulties in affording higher payments due to stagnant wages or financial hardship.\n3. Poor communication from lenders or servicers, such as failure to notify borrowers of repayment start dates or changes in loan status.\n4. Mismanagement or mishandling of loans, including transfers without proper notification or inaccurate reporting to credit agencies.\n5. Being steered into long-term forbearances or aggressive consolidation practices without being fully informed of the consequences.\n6. Borrowers encountering financial crises, such as unemployment, homelessness, or health issues, that hindered their ability to continue payments.\n7. Challenges in navigating complex loan repayment programs or lack of access to income-driven repayment options.\n8. Errors or illegal actions by loan servicers, su

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [42]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [43]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [44]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [45]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [46]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [47]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

"The most common issue with loans, based on the provided complaints data, appears to be problems related to handling and servicing of federal student loans. Specific frequent issues include:\n\n- Struggling to repay loans due to confusing or unverified payment plans and re-amortization problems.\n- Errors in loan reporting, such as incorrect account statuses, defaults, or delinquency notices.\n- Poor communication from loan servicers regarding account changes, payment amounts, or servicer updates.\n- Disputes over loan account legitimacy and data breaches.\n- Auto-debit and payment processing issues, including automatic payments not being processed or being manipulated.\n- Problems with loan forgiveness, cancellation, discharge processes, or delays in documentation processing.\n\nOverall, many complaints revolve around poor servicing, miscommunication, and errors in loan account handling, which significantly impact borrowers' financial and credit situations."

In [48]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the information provided, it appears that many complaints were marked as "Closed with explanation" and explicitly state that responses were received in a timely manner ("Timely response?\': \'Yes\'"). However, several complaints highlight issues such as lack of response, unresolved disputes, or ongoing problems with handling complaints, payments, or errors. \n\nSpecifically, the complaint about Nelnet involving serious misconduct, errors, and violations of laws mentions that despite certified mail and acknowledgment of receipt, Nelnet never responded to the complaint or provided answers, indicating it was not handled in a timely manner. \n\nOverall, while some complaints were handled promptly, there are notable instances where complaints were not handled in a timely manner or issues remain unresolved. Therefore, **yes, there were complaints that did not get handled in a timely manner.**'

In [49]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including:\n\n- Receiving bad or incomplete information about their loans, which caused confusion and delays.\n- Difficulties in logging into loan accounts or communicating with lenders or servicers.\n- Delays or disputes in processing payments due to technical issues or mismanagement.\n- Deliberate stalling or administrative hurdles by loan servicers, which can discourage borrowers from continuing repayment.\n- Problems with loan forgiveness or discharge programs, where documentation was rejected or delayed.\n- Issues related to incorrect account status reports, such as being mistakenly marked in default or delinquent.\n- Disputed or illegitimate reporting of loan status or debt, sometimes involving legal or privacy breaches.\n- Loss of income or financial hardship, although specific cases do not always mention this explicitly.\n\nIn summary, a combination of poor communication, administrative challenges, legal disputes, and 

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

##### ✅ Answer:
With short, repetitive sentences like FAQs, semantic chunking may face several challenges:

1. Overlapping Embeddings: Similar/repetitive sentences will have very similar embeddings, making it harder to distinguish between chunks and potentially leading to redundant retrievals

2. Loss of Context: Short sentences may lack sufficient context for the embedding model to capture meaningful semantic relationships

3. Inefficient Chunking: The algorithm may create many small chunks that don't effectively group related content

To address these issues, you could:

Increase chunk size to capture more context around each FAQ
Use custom chunking rules that keep question-answer pairs together
Add metadata/tags to help distinguish between similar FAQs
Consider alternative chunking strategies like combining related FAQs into topical groups
Use hybrid retrieval approaches that combine semantic and keyword matching


# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

In [50]:
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders import PyMuPDFLoader


path = "data/"
loader = DirectoryLoader(path, glob="*.pdf", loader_cls=PyMuPDFLoader)
docs = loader.load()

In [51]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

In [52]:
from ragas.testset import TestsetGenerator

generator = TestsetGenerator(
    llm=generator_llm, embedding_model=generator_embeddings)
dataset = generator.generate_with_langchain_docs(docs[:20], testset_size=10)

Applying HeadlinesExtractor:   0%|          | 0/17 [00:00<?, ?it/s]

Applying HeadlineSplitter:   0%|          | 0/20 [00:00<?, ?it/s]

unable to apply transformation: 'headlines' property not found in this node
unable to apply transformation: 'headlines' property not found in this node
unable to apply transformation: 'headlines' property not found in this node


Applying SummaryExtractor:   0%|          | 0/31 [00:00<?, ?it/s]

Property 'summary' already exists in node 'cc98e0'. Skipping!
Property 'summary' already exists in node 'a02163'. Skipping!
Property 'summary' already exists in node '05f2b0'. Skipping!
Property 'summary' already exists in node '1e0e62'. Skipping!
Property 'summary' already exists in node '4e3c5d'. Skipping!
Property 'summary' already exists in node 'f7f327'. Skipping!
Property 'summary' already exists in node '84d79d'. Skipping!
Property 'summary' already exists in node '556ae6'. Skipping!
Property 'summary' already exists in node 'fa7032'. Skipping!
Property 'summary' already exists in node '8ac78d'. Skipping!
Property 'summary' already exists in node '7f3a64'. Skipping!
Property 'summary' already exists in node 'd97fbd'. Skipping!
Property 'summary' already exists in node '022d1a'. Skipping!
Property 'summary' already exists in node '77923e'. Skipping!


Applying CustomNodeFilter:   0%|          | 0/6 [00:00<?, ?it/s]

Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/41 [00:00<?, ?it/s]

Property 'summary_embedding' already exists in node '556ae6'. Skipping!
Property 'summary_embedding' already exists in node '05f2b0'. Skipping!
Property 'summary_embedding' already exists in node 'd97fbd'. Skipping!
Property 'summary_embedding' already exists in node 'fa7032'. Skipping!
Property 'summary_embedding' already exists in node '77923e'. Skipping!
Property 'summary_embedding' already exists in node '022d1a'. Skipping!
Property 'summary_embedding' already exists in node '1e0e62'. Skipping!
Property 'summary_embedding' already exists in node 'f7f327'. Skipping!
Property 'summary_embedding' already exists in node '84d79d'. Skipping!
Property 'summary_embedding' already exists in node '4e3c5d'. Skipping!
Property 'summary_embedding' already exists in node '8ac78d'. Skipping!
Property 'summary_embedding' already exists in node 'a02163'. Skipping!
Property 'summary_embedding' already exists in node '7f3a64'. Skipping!
Property 'summary_embedding' already exists in node 'cc98e0'. Sk

Applying [CosineSimilarityBuilder, OverlapScoreBuilder]:   0%|          | 0/2 [00:00<?, ?it/s]

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/12 [00:00<?, ?it/s]

In [57]:
dataset.to_pandas().head()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,"where i find info on BBAY 3 in Volume 8, Chapt...","[non-term (includes clock-hour calendars), or ...","For more information on BBAY 3, see Volume 8, ...",single_hop_specifc_query_synthesizer
1,"what Volume 8, Chapter 3 say about clinical wo...",[Inclusion of Clinical Work in a Standard Term...,"Volume 8, Chapter 3 give more guidance on exce...",single_hop_specifc_query_synthesizer
2,How are subscription-based programs treated di...,[Non-Term Characteristics A program that measu...,A program that measures progress in clock hour...,single_hop_specifc_query_synthesizer
3,How is a student’s Pell Grant amount determine...,[both the credit or clock hours and the weeks ...,The Pell Grant amount a student is eligible to...,single_hop_specifc_query_synthesizer
4,how do disbursement requirements for federal s...,[<1-hop>\n\nboth the credit or clock hours and...,in clock-hour or non-term credit-hour programs...,multi_hop_abstract_query_synthesizer


In [54]:
# Import required libraries for evaluation
from ragas.metrics import ContextPrecision, ContextRecall, ContextRelevance
from ragas import evaluate
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from ragas.dataset_schema import SingleTurnSample, EvaluationDataset
import pandas as pd
import time
from langsmith import traceable
from datetime import datetime
import os
import getpass

In [58]:
# # Get LangSmith API key
# langsmith_key = getpass.getpass("LangSmith API Key:")
os.environ["LANGSMITH_API_KEY"] = "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"

# Enable LangSmith tracing
os.environ["LANGSMITH_TRACING"] = "true"
#os.environ["LANGSMITH_PROJECT"] = "retriever-evaluation-max"
os.environ["LANGSMITH_ENDPOINT"] = "https://api.smith.langchain.com"

print("✅ LangSmith tracing enabled!")
#print(f"📊 Project: {os.environ['LANGSMITH_PROJECT']}")
print("🔗 Visit https://smith.langchain.com to view your traces")

✅ LangSmith tracing enabled!
🔗 Visit https://smith.langchain.com to view your traces


In [83]:
# Create LLM with rate limit handling
from langchain.chat_models import ChatOpenAI
from langchain.schema.output_parser import BaseOutputParser
import time

class RateLimitRetryLLM(ChatOpenAI):
    def __init__(self, *args, max_retries=5, **kwargs):
        super().__init__(*args, **kwargs)
        self.max_retries = max_retries
    
    def invoke(self, *args, **kwargs):
        for attempt in range(self.max_retries):
            try:
                return super().invoke(*args, **kwargs)
            except Exception as e:
                if "rate_limit" in str(e).lower() and attempt < self.max_retries - 1:
                    wait_time = 1 + attempt * 2  # Exponential backoff
                    print(f"\n⏳ Rate limit hit, waiting {wait_time}s before retry...")
                    time.sleep(wait_time)
                    continue
                raise

evaluator_llm = LangchainLLMWrapper(RateLimitRetryLLM(
    model="gpt-4.1-mini",
    max_retries=5,
    request_timeout=30
))
evaluator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

# Define RAGAS metrics for retriever evaluation
ragas_metrics = [
    ContextPrecision(llm=evaluator_llm),
    ContextRecall(llm=evaluator_llm), 
    ContextRelevance(llm=evaluator_llm)
]

print("RAGAS evaluator setup complete with rate limit handling!")

RAGAS evaluator setup complete with rate limit handling!


In [61]:
# Convert the generated dataset to RAGAS format
test_df = dataset.to_pandas()

# Create evaluation samples from the test dataset
evaluation_samples = []
for idx, row in test_df.iterrows():
    sample = {
        'user_input': row['user_input'],
        'reference_contexts': row['reference_contexts'],
        'reference': row['reference']
    }
    evaluation_samples.append(sample)

print(f"Created {len(evaluation_samples)} evaluation samples")
print("Sample structure:", evaluation_samples[0].keys())

Created 12 evaluation samples
Sample structure: dict_keys(['user_input', 'reference_contexts', 'reference'])


In [62]:
retrievers_to_evaluate = {
    "naive": naive_retriever,
    "bm25": bm25_retriever, 
    "contextual_compression": compression_retriever,
    "multi_query": multi_query_retriever,
    "parent_document": parent_document_retriever,
    "ensemble": ensemble_retriever,
    "semantic": semantic_retriever
}

print(f"Will evaluate {len(retrievers_to_evaluate)} retrieval methods")

Will evaluate 7 retrieval methods


In [63]:
evaluation_samples

[{'user_input': 'where i find info on BBAY 3 in Volume 8, Chapter 6?',
  'reference_contexts': ['non-term (includes clock-hour calendars), or subscription-based. In a standard term or nonstandard term academic calendar, a term is generally a period in which all classes are scheduled to begin and end within a set time frame, and academic progress is measured in credit hours. In a non-term academic calendar, classes do not begin and end within a set time frame, such as a term. Academic progress in a non-term program can be measured in either credit or clock hours. In some cases (as discussed below), a program with terms must be treated as a non-term program for Title IV purposes. A subscription-based academic calendar is used only by subscription-based programs. A subscription-based program is a term-based program in which the school charges a student for each term on a subscription basis with the expectation that the student will complete a specified number of credit hours (or the equiv

In [85]:
from ragas.metrics import ContextPrecision, ContextRecall, ContextRelevance
from ragas.llms import LangchainLLMWrapper
from langchain.chat_models import ChatOpenAI
import pandas as pd
from ragas import evaluate
from ragas.dataset_schema import EvaluationDataset

# Setup evaluator LLM
evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-3.5-turbo"))

# Define metrics
ragas_metrics = [
    ContextPrecision(llm=evaluator_llm),
    ContextRecall(llm=evaluator_llm),
    ContextRelevance(llm=evaluator_llm)
]

In [97]:
# 1. Update environment variables
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_API_KEY"] = "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"
os.environ["LANGCHAIN_PROJECT"] = "retriever-evaluation"  # Set project name

# 2. Initialize the tracer
from langchain.callbacks.tracers import LangChainTracer
tracer = LangChainTracer(project_name="retriever-evaluation")

from ragas.run_config import RunConfig

# Create a RunConfig with lower concurrency to avoid rate limits
run_config = RunConfig(
    max_workers=2,  # Lower number of concurrent workers
    timeout=60  # Longer timeout to handle rate limits
)

def evaluate_retriever(retriever, name, evaluation_samples):
    results = []
    
    for sample in evaluation_samples:
        docs = retriever.get_relevant_documents(sample["user_input"])
        
        eval_sample = {
            "user_input": sample["user_input"],
            "retrieved_contexts": [doc.page_content for doc in docs],
            "reference": sample["reference"]
        }
        results.append(eval_sample)
    
    ragas_dataset = EvaluationDataset.from_list(results)
    
    scores = evaluate(
        dataset=ragas_dataset,
        metrics=ragas_metrics,
        run_config=run_config
    )
    
    # Return the scores directly from the scores object
    return scores.scores

# Create dictionary to store results
all_results = {}


In [98]:
retrievers = {
    "BM25": bm25_retriever,
    "Naive": naive_retriever,
    #"Parent Document": parent_document_retriever,
    #"Compression": compression_retriever,
    #"Multi Query": multi_query_retriever,
    #"Ensemble": ensemble_retriever
}

# Create a shorter version of evaluation samples with just 5 rows
evaluation_samples_short = evaluation_samples[:5]



all_results = {}
for name, retriever in retrievers.items():
    print(f"Evaluating {name} retriever...")
    scores = evaluate_retriever(retriever, name, evaluation_samples_short)
    all_results[name] = scores

Evaluating BM25 retriever...


Evaluating:   0%|          | 0/15 [00:00<?, ?it/s]

Evaluating Naive retriever...


Evaluating:   0%|          | 0/15 [00:00<?, ?it/s]

In [100]:
all_results

{'BM25': [{'context_precision': 0.9999999999,
   'context_recall': 1.0,
   'nv_context_relevance': 0.0},
  {'context_precision': 0.49999999995,
   'context_recall': 1.0,
   'nv_context_relevance': 0.0},
  {'context_precision': 0.6388888888675925,
   'context_recall': 0.6666666666666666,
   'nv_context_relevance': 1.0},
  {'context_precision': 0.0,
   'context_recall': 1.0,
   'nv_context_relevance': 0.0},
  {'context_precision': 0.999999999975,
   'context_recall': 1.0,
   'nv_context_relevance': 0.0}],
 'Naive': [{'context_precision': 0.3333333333,
   'context_recall': 1.0,
   'nv_context_relevance': 0.0},
  {'context_precision': 0.3499999999825,
   'context_recall': 1.0,
   'nv_context_relevance': 0.0},
  {'context_precision': 0.8240079364976364,
   'context_recall': 0.3333333333333333,
   'nv_context_relevance': 0.0},
  {'context_precision': 0.0,
   'context_recall': 0.5,
   'nv_context_relevance': 0.0},
  {'context_precision': 0.9999999999888889,
   'context_recall': 1.0,
   'nv_co

In [None]:
all_results

{'BM25': [{'context_precision': 0.9999999999,
   'context_recall': 1.0,
   'nv_context_relevance': 0.0},
  {'context_precision': 0.49999999995,
   'context_recall': 1.0,
   'nv_context_relevance': 0.0},
  {'context_precision': 0.6388888888675925,
   'context_recall': 0.6666666666666666,
   'nv_context_relevance': 1.0},
  {'context_precision': 0.0,
   'context_recall': 1.0,
   'nv_context_relevance': 0.0},
  {'context_precision': 0.999999999975,
   'context_recall': 1.0,
   'nv_context_relevance': 0.0}],
 'Naive': [{'context_precision': 0.3333333333,
   'context_recall': 1.0,
   'nv_context_relevance': 0.0},
  {'context_precision': 0.3499999999825,
   'context_recall': 1.0,
   'nv_context_relevance': 0.0},
  {'context_precision': 0.8240079364976364,
   'context_recall': 0.3333333333333333,
   'nv_context_relevance': 0.0},
  {'context_precision': 0.0,
   'context_recall': 0.5,
   'nv_context_relevance': 0.0},
  {'context_precision': 0.9999999999888889,
   'context_recall': 1.0,
   'nv_co