# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [1]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [3]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [4]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [5]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [6]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [7]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [8]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [9]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [10]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [11]:
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided information, the most common issues with loans, especially student loans, include errors in loan balances and account reporting, mismanagement of loans, difficulties in making or applying payments correctly, and issues with loan transfer and communication. Many complaints also involve incorrect or outdated information on credit reports and problems related to repayment plans or loan forgiveness.\n\nIn summary, the most common issues are:\n\n- Errors or discrepancies in loan balances and account information\n- Mismanagement and mishandling of loans, including unauthorized transfers\n- Difficulties in payment application, often resulting in payments being applied improperly (e.g., primarily to interest)\n- Problems with loan reporting on credit reports, including incorrect delinquency status\n- Lack of transparency and communication from loan servicers\n- Issues related to loan forgiveness or discharge complications\n\nThese problems often lead to financial hardshi

In [12]:
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, yes, some complaints did not get handled in a timely manner. Specifically, the complaint submitted to MOHELA on 03/28/25 was marked as "No" for being handled promptly. Additionally, multiple complaints associated with Maximus Federal Services, Inc. indicate delays or failures in responses, such as a complaint from 04/05/25 still pending after over a year and another from 04/14/25 with ongoing issues despite efforts to resolve.\n\nThere are also several instances where consumers reported that their issues were not addressed quickly, including delays of weeks or months before receiving responses, or instances where the company failed to respond at all within expected time frames.\n\nTherefore, the answer is: Yes, some complaints did not get handled in a timely manner.'

In [13]:
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People often fail to pay back their loans due to a combination of factors highlighted in the complaints:\n\n1. **Accumulation of Interest and Unmanageable Payments:** Many borrowers find that their interest continues to grow, especially when loans are in forbearance or deferment, making the total amount owed larger over time. Lowering monthly payments can lead to interest accumulating faster than payments, extending the repayment period and increasing total debt.\n\n2. **Lack of Clear Communication and Notification:** Several complaints indicate that borrowers were not properly notified about when payments were to resume, changes in servicers, or transfer of loans between companies. This lack of transparency leads to unexpected delinquencies and damage to credit scores.\n\n3. **Inability to Afford Payments:** Borrowers often state that they cannot increase payments to meet the required amounts due to financial hardship, loss of employment, or stagnant wages. This prevents them from pa

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [14]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [15]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [16]:
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be problems related to dealing with lenders or servicers, including issues with the fees charged, how payments are handled, and receiving accurate or transparent information about the loan. Specifically, complaints often involve:\n\n- Disputes over fees charged or other charges that are not clearly explained.\n- Difficulties in applying payments correctly or paying down the principal.\n- Receiving incorrect or confusing information about loan balances, interest, or repayment terms.\n- Issues with loan servicing that seem predatory or restrictive.\n\nThese issues are repeatedly highlighted across multiple complaints, indicating that the most common problem is related to the handling and management of loans by the servicers, leading to frustrations and disputes over fees and information.\n\nIf you need a concise answer: \n\nThe most common issue with loans, based on the context, is dealing with loan servicers or 

In [17]:
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, all the complaints mentioned in the documents indicate that the companies responded to the complaints in a timely manner. The responses are noted as "Closed with explanation" and explicitly state "Timely response?": "Yes" for each complaint. Therefore, there is no evidence in these complaints to suggest that any complaints did not get handled in a timely manner.'

In [18]:
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including issues with their payment plans, lack of communication from the loan servicers, and problems with the handling of their account information. Some specific causes include:\n\n- Difficulty in obtaining proper forbearance or deferment, leading to continued billing despite applying for relief.\n- Loan servicers steering borrowers into incorrect types of forbearances or not responding to forbearance requests.\n- Lack of timely communication from loan companies about account status changes, leading to unawareness of missed payments or account issues.\n- Errors in billing or payment processing, such as payments being reversed or not being processed correctly.\n- Unexplained transfer of loans to new servicers without proper notification, causing confusion and billing problems.\n- Borrowers being unaware of changes or defaults, resulting in negative impacts on their credit ratings.\n- Poor customer service and inadequate resp

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.

#### ✅ Answer

BM25 is likely to perform better than embeddings when searching for specific keywords or phrases that appear verbatim in the complaints. For example, when searching for "FASFA deadline", BM25 might give higher weight to documents that contain this exact phrase. BM25 is based on FREQUENCY not SEMANTICS. I found this viedo to be very helpful to explain the difference: https://www.youtube.com/watch?v=3FbJOKhLv9M.



## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [19]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [20]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [21]:
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to involve problems with dealing with lenders or servicers, such as receiving bad or incorrect information about the loan, errors in loan balances, misapplied payments, wrongful denials of payment plans, and mishandling or mishandling of personal information. Issues like inaccuracies in loan balances, lack of proper documentation or explanation, and improper handling of loan data are prevalent concerns.\n\nIf I had to summarize, a significant and common problem is difficulties in communication with loan servicers, inaccuracies in loan information, and mismanagement of the loan processes.'

In [22]:
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, at least one complaint indicates that it has not been handled in a timely manner. Specifically, the complaint from the individual regarding their loan account review has been open since approximately 18 months ago with no resolution, despite being flagged in the context as "still awaiting a response and resolution." Additionally, the complaint about the main issue not being addressed was active for over 2-3 weeks. \n\nTherefore, yes, some complaints did not get handled in a timely manner.'

In [23]:
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for several reasons, including:\n\n- Lack of awareness or understanding that loans needed to be repaid: Some borrowers were not informed by their financial aid officers that repayment was required.\n- Poor communication and notification from lenders or servicers: Borrowers reported not receiving timely notices about payment due dates, changes in their loan servicer, or when their loans were transferred without their knowledge.\n- Difficulties with managing repayment options: Borrowers found that options like forbearance or deferment led to accumulating interest, making it harder to pay off loans in the long run.\n- Increasing debt despite payments: Some individuals made payments over many years but still saw their balances grow due to accumulated interest and inconsistent or confusing account information.\n- Financial hardship and inability to afford payments: Many borrowers faced challenges in making payments because doing so would extend the rep

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [24]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [25]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [26]:
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

"The most common issue with loans, based on the provided complaints and data, appears to be poor servicing and mismanagement by loan servicers. Specific recurring problems include errors in loan balances and interest calculation, lack of communication or transparency about account status, incorrect reporting or default status, and failures to process repayment plans or loan forgiveness applications properly. Many complaints highlight a pattern of inadequately informing borrowers about their loan terms, mishandling of loan status (such as default or deferment), and unprofessional customer service, which often results in damage to borrowers' credit and financial hardship."

In [27]:
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Yes, based on the provided complaints, some complaints were not handled in a timely manner. For example, complaint ID 12739706 received on 04/01/25 was marked as "No" under the "Timely response?" field, indicating it was not responded to promptly. Additionally, multiple complaints mention delays or lack of response, such as complaint ID 12832400 received on 04/05/25, which was also marked as "Yes" for timely response, suggesting it was handled more promptly, but other complaints like 12973003 and 12709087 explicitly state that responses were delayed or not received as expected. \n\nTherefore, several complaints did not get handled in a timely manner.'

In [28]:
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for various reasons, including:\n\n- Lack of proper communication or notice from loan servicers about payment resumption or delinquency.\n- Being misled or inadequately informed about available options such as income-driven repayment, rehabilitation, or the implications of forbearance.\n- Accumulation of high interest and compound interest, making the total debt grow significantly over time.\n- Difficulties in managing or manipulating the payment application process, with some reports of payments only being applied to interest, prolonging debt.\n- Systemic errors, such as incorrect credit reporting, misapplication of payments, and improper transfers of loans, leading to late reports and credit score drops.\n- Loan servicer misconduct, including failure to follow regulatory guidelines, lack of transparency, and inadequate support during financial hardship.\n- Lack of awareness about the true state of their loan balances and terms due to poor record

#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

##### ✅ Answer:
Generating multiple reformulations of a user query improves recall in a similar way to how image generators expand user prompts. Just like how an image generator might take a simple prompt like "dog" and expand it to "a golden retriever puppy sitting in a grassy field on a sunny day" to get more specific and relevant results, the MultiQueryRetriever takes a basic query and generates multiple variations that capture different aspects and phrasings of the same information need.

This improves recall because:
1. Different phrasings may match different relevant documents that use varying terminology
2. Multiple queries explore different semantic angles of the same question
3. By casting a wider net with related queries, we're more likely to catch relevant documents that might be missed by a single query
4. The LLM can add helpful context and specifications, just like how image generators flesh out scene details

For example, a query about "loan issues" might be expanded to:
"What are common problems people face with their loan servicers?"
"What types of complaints do borrowers report about loan payments?"
"What difficulties do people encounter when dealing with student loan companies?"
Each variation increases the chance of matching relevant documents in the corpus.



## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [29]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [30]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [31]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [32]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [33]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [34]:
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issues with loans appear to involve errors in loan balances, misapplied payments, wrongful denials of payment plans, discrepancies in interest rates, and problems with credit reporting or identity theft protection services. Among these, issues related to inaccurate or misleading information on credit reports, incorrect account status, and errors caused by loan servicer misconduct seem to be prevalent.'

In [35]:
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, the first two complaints related to the student loan servicing issues (rows 441 and 84) explicitly indicate that the responses were not handled in a timely manner. Specifically, both complaints were marked as "Timely response?": "No," meaning they were not responded to promptly. The third complaint (row 418) was marked as "Timely response?": "Yes," so it was handled in a timely manner. The fourth complaint (row 474) also indicates a timely response.\n\nTherefore, yes, some complaints did not get handled in a timely manner.'

In [36]:
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to a combination of factors such as financial hardship, misleading information about the value and manageability of their loans, the inability to secure employment or repay due to poor job prospects, and issues with loan servicing and notification failures. For example, some individuals experienced severe financial hardship after graduation and relied on deferment or forbearance, which increased the interest owed. Others were misled about the legitimacy or manageability of their loans, or faced administrative problems like unverified debt reporting, inadequate communication, or issues stemming from institutional misconduct.'

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [37]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [38]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [39]:
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be dealing with the lender or servicer, including problems such as receiving bad information about the loan, trouble with how payments are being handled (e.g., payments being canceled, applied incorrectly, or not processed), and issues stemming from loan transfers or servicing changes without proper notification. Many complaints also involve inaccuracies in loan balances, improper reporting to credit bureaus, and unresolved disputes regarding loan terms or legitimacy.'

In [40]:
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints data, yes, some complaints indicate that issues did not get handled in a timely manner. Specifically:\n\n- Complaint ID 12935889 (MOHELA, NJ): Response was marked as "No" for timely response, indicating it was not handled promptly.\n- Complaint ID 12668396 (MOHELA, NJ): Response was "No" for timely response.\n- Complaint ID 12739706 (MOHELA, NJ): Response was "No" for timely response, and the complaint mentions ongoing issues with overdue or inaccurate reporting.\n- Complaint IDs 12650717, 13056764, 12927952, 13160766, 12832400, 13410623, 12950199, 12973003, 13197090, 13205525: in these cases, responses were marked as "Yes" for being timely, indicating the complaints were handled within expected time frames.\n- Some complaints, such as with Maximus Federal Services (Aidvantage), indicate ongoing unresolved issues despite the complaints being filed, which may suggest delays or inadequate handling.\n\nIn conclusion, yes, there are complaints that were no

In [41]:
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their student loans for various reasons, often related to issues with loan servicing, lack of clear communication, financial hardship, or mismanagement. Based on the provided complaints, common reasons include:\n\n1. **Lack of Notification and Communication Failures:** Borrowers were often not properly notified about loan transfer to different servicers, the start of repayment obligations, or updates to their account status. This led to unintentional missed payments or delinquencies.\n\n2. **Incorrect or Confusing Information:** Many reports involve inaccurate account balances, misapplied payments, or inconsistent reporting to credit bureaus, which caused confusion and unintentional late payments.\n\n3. **Financial Hardship and Unmanageable Payment Plans:** Borrowers cited difficulty affording payments due to stagnant wages, increased interest accruing during forbearance or deferment, or other financial hardships, making repayment impossible without hardship.

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [42]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [43]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [44]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [45]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [46]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [47]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be problems related to the handling and servicing of student loans, including:\n\n- Struggling to repay or difficulties with payment plans and repayment amounts.\n- Miscommunication or lack of transparency from loan servicers.\n- Errors in reporting loan status, such as incorrect default or delinquency notices.\n- Problems with loan documentation and processing of forgiveness or discharge.\n- Issues with unauthorized or illegal reporting of loan information.\n- Disputes over loan account status and verification of debt.\n\nOverall, many complaints involve mismanagement, communication failures, and legal or administrative errors by loan servicers or federal agencies.'

In [48]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, it appears that several complaints involved issues with handling in a timely manner. Specifically:\n\n- One complaint from 04/28/25 about autopay setup issues was marked as responded to "Yes" for timely response.\n- Multiple complaints from 05/04/25 and 05/05/25 involved transfer of account and dispute issues, both also marked as "Yes" for timely response.\n- The complaint from 04/13/25 regarding unauthorized access and breach issues was also responded to "Yes" for timely response.\n- Additional complaints related to disputes, payment processing problems, and legal investigations all indicate that responses were generally marked as timely.\n\nGiven that the responses to these complaints are noted as "Yes" for being timely, it suggests that most complaints were handled within an appropriate timeframe. However, the fact that many complaints involve ongoing issues, disputes, or lack of resolution could imply delays in effectively resolving the underlying

In [49]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including difficulties dealing with their lenders or servicers, problems with payment plans or re-amortization, issues with documentation and verification, alleged mismanagement or stall tactics by loan servicers, and disputes over the legitimacy or status of their loans. Some borrowers also experienced trouble due to miscommunication, lack of transparency, or improper reporting of their loan status, which in turn affected their ability to repay or have their accounts accurately reflected.'

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

##### ✅ Answer:
With short, repetitive sentences like FAQs, semantic chunking may face several challenges:

1. Overlapping Embeddings: Similar/repetitive sentences will have very similar embeddings, making it harder to distinguish between chunks and potentially leading to redundant retrievals

2. Loss of Context: Short sentences may lack sufficient context for the embedding model to capture meaningful semantic relationships

3. Inefficient Chunking: The algorithm may create many small chunks that don't effectively group related content

To address these issues, you could:

Increase chunk size to capture more context around each FAQ
Use custom chunking rules that keep question-answer pairs together
Add metadata/tags to help distinguish between similar FAQs
Consider alternative chunking strategies like combining related FAQs into topical groups
Use hybrid retrieval approaches that combine semantic and keyword matching


# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

In [50]:
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders import PyMuPDFLoader


path = "data/"
loader = DirectoryLoader(path, glob="*.pdf", loader_cls=PyMuPDFLoader)
docs = loader.load()

In [51]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

In [52]:
from ragas.testset import TestsetGenerator

generator = TestsetGenerator(
    llm=generator_llm, embedding_model=generator_embeddings)
dataset = generator.generate_with_langchain_docs(docs[:20], testset_size=10)

Applying HeadlinesExtractor:   0%|          | 0/17 [00:00<?, ?it/s]

Applying HeadlineSplitter:   0%|          | 0/20 [00:00<?, ?it/s]

unable to apply transformation: 'headlines' property not found in this node
unable to apply transformation: 'headlines' property not found in this node
unable to apply transformation: 'headlines' property not found in this node


Applying SummaryExtractor:   0%|          | 0/31 [00:00<?, ?it/s]

Property 'summary' already exists in node 'b86950'. Skipping!
Property 'summary' already exists in node 'b0c4b8'. Skipping!
Property 'summary' already exists in node 'df6eb9'. Skipping!
Property 'summary' already exists in node '3b70e6'. Skipping!
Property 'summary' already exists in node '3f88e7'. Skipping!
Property 'summary' already exists in node '2e88e2'. Skipping!
Property 'summary' already exists in node 'e2961b'. Skipping!
Property 'summary' already exists in node '6b21f5'. Skipping!
Property 'summary' already exists in node 'eb4abf'. Skipping!
Property 'summary' already exists in node '611c4a'. Skipping!
Property 'summary' already exists in node '355a54'. Skipping!
Property 'summary' already exists in node 'e16dd2'. Skipping!
Property 'summary' already exists in node '0905cd'. Skipping!
Property 'summary' already exists in node 'fc75a9'. Skipping!


Applying CustomNodeFilter:   0%|          | 0/6 [00:00<?, ?it/s]

Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/41 [00:00<?, ?it/s]

Property 'summary_embedding' already exists in node 'b0c4b8'. Skipping!
Property 'summary_embedding' already exists in node '3f88e7'. Skipping!
Property 'summary_embedding' already exists in node 'b86950'. Skipping!
Property 'summary_embedding' already exists in node '6b21f5'. Skipping!
Property 'summary_embedding' already exists in node '3b70e6'. Skipping!
Property 'summary_embedding' already exists in node 'eb4abf'. Skipping!
Property 'summary_embedding' already exists in node 'e16dd2'. Skipping!
Property 'summary_embedding' already exists in node '355a54'. Skipping!
Property 'summary_embedding' already exists in node '0905cd'. Skipping!
Property 'summary_embedding' already exists in node '611c4a'. Skipping!
Property 'summary_embedding' already exists in node 'fc75a9'. Skipping!
Property 'summary_embedding' already exists in node '2e88e2'. Skipping!
Property 'summary_embedding' already exists in node 'e2961b'. Skipping!
Property 'summary_embedding' already exists in node 'df6eb9'. Sk

Applying [CosineSimilarityBuilder, OverlapScoreBuilder]:   0%|          | 0/2 [00:00<?, ?it/s]

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/12 [00:00<?, ?it/s]

In [53]:
dataset.to_pandas().head()

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,Wht is BBAY 2 and how is it used for Direct Lo...,"[non-term (includes clock-hour calendars), or ...",BBAY 2 is one of the options (along with Sched...,single_hop_specifc_query_synthesizer
1,Whaat are the requirments for includin clinica...,[Inclusion of Clinical Work in a Standard Term...,Periods of clinical work may be included in a ...,single_hop_specifc_query_synthesizer
2,"what is non-term characteristics for program, ...",[Non-Term Characteristics A program that measu...,Non-term characteristics mean a program is tre...,single_hop_specifc_query_synthesizer
3,i dont get what Chapters 5 and 6 talk about fo...,[both the credit or clock hours and the weeks ...,Chapters 5 and 6 in Volume 8 give information ...,single_hop_specifc_query_synthesizer
4,how subscription-based academic calendars diff...,[<1-hop>\n\nnon-term (includes clock-hour cale...,subscription-based academic calendars use term...,multi_hop_abstract_query_synthesizer


In [54]:
# Import required libraries for evaluation
from ragas.metrics import ContextPrecision, ContextRecall, ContextRelevance
from ragas import evaluate
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from ragas.dataset_schema import SingleTurnSample, EvaluationDataset
import pandas as pd
import time
from langsmith import traceable
from datetime import datetime
import os
import getpass

In [66]:
#Convert the generated dataset to RAGAS format
test_df = dataset.to_pandas()

# Create evaluation samples from the test dataset
evaluation_samples = []
for idx, row in test_df.iterrows():
    sample = {
        'user_input': row['user_input'],
        'reference_contexts': row['reference_contexts'],
        'reference': row['reference']
    }
    evaluation_samples.append(sample)

print(f"Created {len(evaluation_samples)} evaluation samples")
print("Sample structure:", evaluation_samples[0].keys())

Created 12 evaluation samples
Sample structure: dict_keys(['user_input', 'reference_contexts', 'reference'])


In [58]:
retrievers_to_evaluate = {
    "naive": naive_retriever,
    "bm25": bm25_retriever, 
    "contextual_compression": compression_retriever,
    "multi_query": multi_query_retriever,
    "parent_document": parent_document_retriever,
    "ensemble": ensemble_retriever,
    "semantic": semantic_retriever
}

print(f"Will evaluate {len(retrievers_to_evaluate)} retrieval methods")

Will evaluate 7 retrieval methods


In [60]:
from ragas.metrics import ContextPrecision, ContextRecall, ContextRelevance
from ragas.llms import LangchainLLMWrapper
from langchain.chat_models import ChatOpenAI
import pandas as pd
from ragas import evaluate
from ragas.dataset_schema import EvaluationDataset

# Setup evaluator LLM
evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-3.5-turbo"))

# Define metrics
ragas_metrics = [
    ContextPrecision(llm=evaluator_llm),
    ContextRecall(llm=evaluator_llm),
    ContextRelevance(llm=evaluator_llm)
]

  evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-3.5-turbo"))


In [69]:
# 1. Update environment variables
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"
os.environ["LANGCHAIN_API_KEY"] = "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"
os.environ["LANGCHAIN_PROJECT"] = "retriever-evaluation"  # Set project name

# 2. Initialize the tracer
from langchain.callbacks.tracers import LangChainTracer
tracer = LangChainTracer(project_name="retriever-evaluation")

from ragas.run_config import RunConfig

# Create a RunConfig with lower concurrency to avoid rate limits
run_config = RunConfig(
    max_workers=2,  # Lower number of concurrent workers
    timeout=60  # Longer timeout to handle rate limits
)

def evaluate_retriever(retriever, name, evaluation_samples):
    results = []
    
    for sample in evaluation_samples:
        docs = retriever.get_relevant_documents(sample["user_input"])
        
        eval_sample = {
            "user_input": sample["user_input"],
            "retrieved_contexts": [doc.page_content for doc in docs],
            "reference": sample["reference"]
        }
        results.append(eval_sample)
    
    ragas_dataset = EvaluationDataset.from_list(results)
    
    scores = evaluate(
        dataset=ragas_dataset,
        metrics=ragas_metrics,
        run_config=run_config
    )
    
    # Return the scores directly from the scores object
    return scores.scores

# Create dictionary to store results
all_results = {}


In [75]:
evaluation_samples

[{'user_input': 'Wht is BBAY 2 and how is it used for Direct Loan anual loan limit progresion in diferent academic calenders?',
  'reference_contexts': ['non-term (includes clock-hour calendars), or subscription-based. In a standard term or nonstandard term academic calendar, a term is generally a period in which all classes are scheduled to begin and end within a set time frame, and academic progress is measured in credit hours. In a non-term academic calendar, classes do not begin and end within a set time frame, such as a term. Academic progress in a non-term program can be measured in either credit or clock hours. In some cases (as discussed below), a program with terms must be treated as a non-term program for Title IV purposes. A subscription-based academic calendar is used only by subscription-based programs. A subscription-based program is a term-based program in which the school charges a student for each term on a subscription basis with the expectation that the student will 

In [None]:
retrievers = {
    "BM25": bm25_retriever,
    "Naive": naive_retriever,
    "Parent Document": parent_document_retriever,
    "Compression": compression_retriever,
    "Multi Query": multi_query_retriever,
    "Ensemble": ensemble_retriever
}

# Create a shorter version of evaluation samples with just 5 rows
evaluation_samples_short = evaluation_samples



all_results = {}
for name, retriever in retrievers.items():
    print(f"Evaluating {name} retriever...")
    scores = evaluate_retriever(retriever, name, evaluation_samples_short)
    all_results[name] = scores

Evaluating BM25 retriever...


Evaluating:   0%|          | 0/36 [00:00<?, ?it/s]

In [None]:
all_results

{}

In [None]:
all_results

{}

In [68]:
all_results

{}