# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [1]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

In [2]:
os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

In [3]:
import os
import getpass
from uuid import uuid4

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LangChain API Key:")

os.environ["LANGCHAIN_PROJECT"] = f"AIM - 09 - {uuid4().hex[0:8]}"

In [100]:
from ragas import EvaluationDataset

from ragas.metrics import (
    LLMContextRecall,
    Faithfulness,
    FactualCorrectness,
    ResponseRelevancy,
    ContextEntityRecall,
    NoiseSensitivity,
)
from ragas import evaluate, RunConfig

B25 + semantic search in quadrant fussion has delivered consistent results for Allan

## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [4]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [6]:
loan_complaint_data[0]
len(loan_complaint_data)

825

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [7]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [8]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [9]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [10]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [11]:
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [12]:
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issues with loans, based on the complaints provided, appear to be related to mismanagement and errors in servicing. This includes incorrect information on credit reports, problems applying payments properly (such as payments being misapplied to interest rather than principal), lack of transparency or communication from servicers, illegal or unfair practices like unauthorized transfers, and discrepancies in loan balances and interest calculations. Many complainants also face difficulties in resolving issues or obtaining accurate loan information, which can lead to financial hardship and credit damage.'

In [13]:
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided context, there are several complaints indicating that issues were not handled in a timely manner. Specifically:\n\n- One complaint (row 441) from a consumer who reported that their loan application status had no movement and multiple unreturned calls, with delays exceeding the expected response time, and the complaint was categorized as "Timely response? No."\n- Multiple complaints from Maximus Federal Services, Inc. (rows 67, 400, 816), where consumers reported delays of over a year, nearly 18 months, or several weeks without resolution, often with the company responding "Closed with explanation" or with no response. These indicate delays beyond reasonable or expected timeframes.\n- A complaint (row 816) explicitly states that no resolution has been reached after nearly 18 months, which is an excessively long delay.\n- Several other complaints mention prolonged waiting times (e.g., over 10 days, several days, or multiple months), and some complaints explicitly s

In [14]:
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for several reasons, including:\n\n1. **Accumulation of interest and repayment challenges:** Many borrowers found that choosing options like forbearance or deferment allowed interest to continue accruing, which increased the total amount owed over time and made repayment more difficult. For example, some reported that even after years of payments, their balances remained high due to ongoing interest.\n\n2. **Financial hardships and employment issues:** Borrowers faced financial hardships, including unemployment, stagnating wages, or unexpected expenses, making it impossible to afford repayment or increasing their reliance on forbearance, which can lead to higher accruing interest.\n\n3. **Lack of clear communication and mismanagement:** Several complaints highlighted a lack of proper notification about loan transfers, repayment start dates, or delinquency statuses. Some borrowers weren't informed when payments had resumed or when their loans were 

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [15]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [16]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [17]:
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided data, the most common issue with student loans appears to be problems related to dealing with lenders or servicers, such as miscommunication, incorrect information, or unfair repayment practices. Specific sub-issues include disputes over fees charged, difficulty applying payments correctly, obtaining accurate loan information, and receiving bad or confusing information about loans.'

In [18]:
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, all the complaints mentioned in the context were responded to by the companies and closed with explanation, indicating that they addressed the complaints in a timely manner. Specifically, the responses were marked as "Yes" for timely response for each complaint. \n\nTherefore, no complaints in the provided context appear to have gone unhandled or unresolved within an appropriate timeframe.'

In [19]:
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including miscommunication and administrative errors by loan servicers, inability to get proper assistance or responses when facing repayment difficulties, and issues related to incorrect or outdated information about their loan status. For example, some borrowers experienced their automatic payments being unenrolled without their knowledge, leading to missed payments and negative impacts on their credit scores. Others were improperly steered into the wrong payment plans or forbearances, or did not receive critical notices about their loans or repayment statuses. Additionally, some borrowers reported being unable to get assistance despite repeated efforts, and in certain cases, loan servicers transferred or sold their loans without proper notification, making it difficult for borrowers to keep track or resolve issues. All these factors contributed to difficulties in fulfilling repayment obligations.'

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.

##### ✅ Answer
An example query where BM25 outperforms embeddings is when the user is searching for an exact phrase (names, locations, etc) or specific terminology that appears verbatim in the documents, especially if the phrase is rare.

**Example Query:**  
*"What is the difference between 'deferment' and 'forbearance' in student loans?"*

**Justification:**  
BM25 is a term-based retrieval method that excels at matching exact keywords and phrases. If a document contains the exact terms "deferment" and "forbearance" in the context of student loans, BM25 will rank it highly because it directly matches the query terms.


## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [18]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [19]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [20]:
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided complaints, a common issue with loans appears to be problems related to the handling and management of the loans by servicers. This includes errors or discrepancies in loan balances, misapplied payments, wrongful denials of payment plans, incorrect or incomplete information, unauthorized transfers of loans, and poor communication or lack of transparency. Specifically, many complaints highlight issues such as receiving bad or incomplete information about the loan, discrepancies in account balances and interest, and mishandling of personal and loan data, sometimes involving violations of privacy laws like FERPA.\n\nIn summary, the most common issue with loans, especially federal student loans in these complaints, involves mishandling and miscommunication by loan servicers, leading to errors in account information, billing, and personal data privacy concerns.'

In [23]:
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, there are examples indicating that some issues took a significant amount of time to be addressed. For instance, one complaint mentions that it has been nearly 18 months with no resolution, and another indicates that an issue has been pending for over one year without response. The complaint about federal student loan servicing related to the account review and account adjustments has also been open for over a year.\n\nWhile the responses from the companies in some cases are marked as "Timely response? Yes," the lengthy durations mentioned by complainants suggest that some complaints did not get handled in a timely manner. Therefore, yes, there are complaints that did not get handled promptly.'

In [24]:
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to a lack of clear communication, misunderstanding of their repayment obligations, and the growing burden of interest. Many borrowers were not informed by their financial aid officers that repayment was required, leading to confusion about their responsibilities. Additionally, issues such as incorrect account information, unnotified loan transfers, and difficulties accessing accurate online data hindered their ability to manage payments properly. \n\nFurthermore, even when payment plans were established, the accumulation of interest—especially when loans were put into forbearance or deferment—made it difficult to reduce the principal amount. Borrowers also faced financial hardships, such as stagnant wages and inability to increase monthly payments without compromising basic necessities, which extended the repayment period and increased the total amount owed. Overall, inadequate information, miscommunication, and economic constraints 

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [21]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [22]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [28]:
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints and data, appears to be mismanagement by loan servicers, including errors in loan balances, misapplication of payments, inaccurate or deceptive information, and failure to communicate properly with borrowers. Many complaints involve incorrect account statuses, unintended default classifications, inaccurate reporting on credit reports, and mishandling of deferments or forbearances. Additionally, a significant number of complaints highlight difficulties in obtaining clear, accurate information about loan terms and balances, often compounded by procedural disorganization and lack of transparency from service providers.'

In [29]:
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided information, it appears that several complaints did not get handled in a timely manner. For example:\n\n- One complaint (Complaint #12739706) regarding an unprocessed graduated loan application was not responded to within the expected timeframe (it was noted as not timely).\n- Another complaint (Complaint #12709087) about issues with a student loan application was also not addressed promptly, with the complainant indicating that the issue remained unresolved after multiple follow-ups.\n- Additionally, multiple complaints mention delays of several days to weeks before receiving responses, and some complaints report that the issues persist without resolution.\n\nTherefore, yes, many complaints listed did not get handled in a timely manner.'

In [30]:
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans primarily due to issues such as lack of clear information about repayment options, mismanagement by loan servicers, high interest accumulation, and miscommunication. Many borrowers were steered into forbearance or consolidation without being informed about the consequences, such as interest capitalization and loss of forgiveness eligibility. Additionally, some faced difficulty understanding when and how to make payments, leading to delinquency and negative impacts on their credit scores. Systemic practices like forbearance steering and inadequate communication by servicers contributed significantly to borrowers' inability to repay their loans successfully."

#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

##### ✅ Answer
When building a chatbot-like product, it is difficult to anticipate every possible user intent or the exact way users will phrase their questions. Users may ask ambiguous or incomplete questions, or use terminology that differs from the language in your documents. By generating multiple reformulations of a user query, you increase the chances that at least one version will closely match the relevant information in your data. This approach helps surface more relevant documents that might otherwise be missed, thereby improving recall and ensuring the system retrieves a broader and more accurate set of results for the user's true intent.

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [23]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [24]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [25]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [26]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [27]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [28]:
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided context, the most common issue with loans appears to be related to errors, misconduct, and systemic breakdowns in servicing and reporting. Specifically, common problems involve incorrect information on credit reports, disputes over loan balances and interest rates, misapplication of payments, wrongful denial of payment plans, and issues stemming from loan transfers and sale of loans which lead to confusion and unverified debt reporting. These issues reflect systemic flaws in the loan servicing process, leading to financial and credit reporting problems for borrowers.'

In [37]:
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided data, multiple complaints were marked as "No" for timely response, indicating that some complaints did not get handled in a timely manner. Specifically, the complaints with Complaint IDs 12709087 and 12935889 both received responses marked as "No" for timely response. Therefore, yes, some complaints did not get handled in a timely manner.'

In [38]:
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People often fail to pay back their loans due to a variety of reasons highlighted in the complaints. Some common causes include:\n\n1. **Lack of clear communication and proper notification:** Borrowers reported not being properly informed about payment obligations, due dates, or changes in their loan management, which led to missed payments.\n\n2. **Financial hardship and inability to afford payments:** Many borrowers experienced severe financial difficulties, including unemployment or underemployment, making it difficult to keep up with repayment.\n\n3. **Misrepresentation and lack of transparency from schools and lenders:** Several borrowers cited misleading information about the value of their education, career prospects, and the debt's manageability, which contributed to their inability to repay.\n\n4. **Problems with loan servicing and administrative issues:** Complaints indicate issues like failed or delayed notifications, buying out of loans without notice, and difficulties in 

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [29]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [30]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [31]:
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the context provided, the most common issues with loans, particularly student loans, involve:\n\n- Errors in loan balances and misapplied payments\n- Receiving incorrect or bad information about loans\n- Difficulty dealing with lenders or servicers, including refusal to apply additional payments to principal, wrongful denials of repayment plans, or problematic payment handling\n- Problems with loan classification and mismanagement, such as incorrect loan type designation or ending in-school deferments improperly\n- Discrepancies and inaccuracies in credit reporting, including incorrect late payments, account status errors, or missing payment history\n- Challenges in obtaining accurate loan information or validation, and issues related to loan transfers and data mishandling\n- Problems with debt validation, fraud concerns, or improper collection practices\n- Difficulties with loan forgiveness, cancellation, or discharge applications\n\nIn summary, the most common issue appears

In [42]:
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints data, yes, some complaints were not handled in a timely manner. For example:\n\n- Complaint ID 12935889 about a Mohela account was marked as "No" for timely response.\n- Complaint ID 12654977 about a student loan payment not being applied was also marked as "No" for timely response.\n- Complaint ID 12744910 regarding payments showing late was marked "Yes," indicating it was handled timely.\n- Complaint ID 13056764 about inaccurate credit reporting was handled timely ("Yes").\n- Complaint ID 12823876 about delayed follow-up and unrecorded payments was handled timely ("Yes").\n\nAdditionally, some complaints explicitly state delays or unresponsiveness, such as complaint ID 12935889 and 13056764, which show that complaints did not get addressed promptly. Therefore, there are instances indicating that some complaints were not handled in a timely manner.'

In [43]:
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans primarily due to a combination of factors highlighted in the complaints:\n\n1. **Lack of Clear Information and Communication:** Many borrowers were not properly informed about their repayment obligations, when payments were due, or the status of their payments. Instances include unnotified loan transfers, failure to receive notices about delinquency, and lack of confirmation when payments were made.\n\n2. **Technical and Administrative Errors:** Complaints include payments being reversed without explanation, inaccurate reporting of delinquency or late payments, and difficulties in applying payments correctly, often leading to errors in loan balances and credit reports.\n\n3. **High and Growing Interest Due to Mismanagement:** Several borrowers mentioned that interest continues to accrue and capitalize, especially during forbearance or deferment periods, causing their debt to grow over time despite making payments.\n\n4. **Inadequate Support for Ha

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [32]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [33]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [34]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [35]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [36]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [37]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'Based on the provided complaints data, the most common issues with loans tend to revolve around miscommunication, errors, or delays in servicing, including problems with payment handling, inaccurate reporting, or lack of transparency. However, the issues that stand out most frequently involve problems with repayment processes, such as incorrect payment amounts, failure to update loan status, or poor communication from loan servicers.\n\nTherefore, the most common issue appears to be **problems related to loan servicing and payment handling**, particularly errors in billing, delays, and lack of clear communication.'

In [57]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, it appears that many complaints were responded to in a timely manner, with responses marked as \'Yes\' under the \'Timely response?\' field. Notably, several complaints state "Closed with explanation," indicating that they were addressed within the required time frame. \n\nHowever, there is at least one complaint regarding a lack of response or handling—specifically, the complaint about Nelnet (row 17). The consumer\'s narrative details multiple issues with lack of responses and conduct that suggests their complaint was not handled promptly or satisfactorily.\n\nIn summary:\n\n- Multiple complaints confirm responses were handled in a timely manner.\n- One complaint (about Nelnet\'s failure to respond to Certified Mail and ongoing misconduct) indicates that the complaint was not properly handled or responded to, suggesting that some complaints did not get handled in a timely manner.\n\nTherefore, yes, some complaints did not get handled in a timely man

In [58]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

"People failed to pay back their loans for various reasons, including issues such as difficulties dealing with their loan servicers, miscommunications or inadequate information about their loan status, problems with payment processing, and disputes over the legitimacy or accuracy of their loan details. Some specific reasons noted in the complaints include receiving bad information about loan statuses, delays or errors in re-amortizing payments after forbearance ended, and inaccurate reports of default or delinquency. Additionally, instances of alleged mismanagement, lack of transparency, or improper handling of personal data have also contributed to borrowers' difficulties in repayment."

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

If sentences are short and highly repetitive the chunking algorithm could produce many nearly identical or overlapping chunks, leading to redundancy in the vector store and making it harder for the retriever to distinguish between different questions or answers. 

To address this, we can increase the chunk size by combining multiple short sentences or QA pairs, deduplicate content to remove or merge highly similar chunks, and attach metadata like question IDs or categories to help disambiguate similar content. Additionally, implementing custom chunking logic—such as grouping by topic or intent—can ensure each chunk represents a unique theme, while adjusting embedding granularity or experimenting with different embedding models can help capture subtle differences and improve retrieval quality.



# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

In [38]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings

generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1-nano"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

In [39]:
from ragas.testset.graph import KnowledgeGraph

kg = KnowledgeGraph()
kg

KnowledgeGraph(nodes: 0, relationships: 0)

In [40]:
from ragas.testset.graph import Node, NodeType

### NOTICE: We're using a subset of the data for this example - this is to keep costs/time down.
for doc in loan_complaint_data[:20]:
    kg.nodes.append(
        Node(
            type=NodeType.DOCUMENT,
            properties={
                "page_content": doc.page_content,
                "document_metadata": doc.metadata,
            },
        )
    )
kg

KnowledgeGraph(nodes: 20, relationships: 0)

In [41]:
from ragas.testset.transforms import default_transforms, apply_transforms

transformer_llm = generator_llm
embedding_model = generator_embeddings

default_transforms = default_transforms(
    documents=loan_complaint_data, llm=transformer_llm, embedding_model=embedding_model
)
apply_transforms(kg, default_transforms)
kg

Applying SummaryExtractor:   0%|          | 0/14 [00:00<?, ?it/s]

Applying CustomNodeFilter:   0%|          | 0/20 [00:00<?, ?it/s]

Node ca64b5a2-9ae6-4437-aa5b-96654d11c859 does not have a summary. Skipping filtering.
Node c1c8b805-4f50-495e-9f36-067ca21ef5cb does not have a summary. Skipping filtering.
Node 21e3179c-5337-484d-b15c-c76deda51f93 does not have a summary. Skipping filtering.
Node c9ec7062-a6e1-49f1-9ed9-72abf17f9a99 does not have a summary. Skipping filtering.
Node b814c8cf-ef19-4fe5-beb7-1c2d9e0b2106 does not have a summary. Skipping filtering.
Node 3e61f1bf-d07a-4673-83e1-c90466a271a6 does not have a summary. Skipping filtering.


Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/51 [00:00<?, ?it/s]

Applying OverlapScoreBuilder:   0%|          | 0/1 [00:00<?, ?it/s]

KnowledgeGraph(nodes: 19, relationships: 19)

In [42]:
kg.save("loan_data_kg.json")
loan_data_kg = KnowledgeGraph.load("loan_data_kg.json")
loan_data_kg

KnowledgeGraph(nodes: 19, relationships: 19)

In [43]:
from ragas.testset import TestsetGenerator

generator = TestsetGenerator(
    llm=generator_llm, embedding_model=embedding_model, knowledge_graph=loan_data_kg
)

In [44]:
from ragas.testset.synthesizers import (
    default_query_distribution,
    SingleHopSpecificQuerySynthesizer
)

query_distribution = [
    (SingleHopSpecificQuerySynthesizer(llm=generator_llm), 1.0)
]

In [65]:
testset = generator.generate(testset_size=10, query_distribution=query_distribution)
testset.to_pandas()

Generating Scenarios:   0%|          | 0/1 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/10 [00:00<?, ?it/s]

Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,How did the end of the COVID-19 forbearance pr...,[The federal student loan COVID-19 forbearance...,The federal student loan COVID-19 forbearance ...,single_hop_specifc_query_synthesizer
1,How is Aidvantage handling borrower complaints...,[I submitted my annual Income-Driven Repayment...,The context describes a borrower who submitted...,single_hop_specifc_query_synthesizer
2,How does FERPA protect student data and what a...,[My personal and financial data was compromise...,My personal and financial data was compromised...,single_hop_specifc_query_synthesizer
3,How does the Fair Credit Reporting Act ensure ...,[I am writing to formally dispute inaccurate i...,"The Fair Credit Reporting Act (FCRA), specific...",single_hop_specifc_query_synthesizer
4,"As a Privacy and Consumer Rights Advocate, how...",[I am devastated. I would like to report a sit...,The individual reports that they have never be...,single_hop_specifc_query_synthesizer
5,What role did the Department of Government Eff...,"[On XXXX XXXX XXXX, XXXX XXXX instructed his t...","On XXXX XXXX XXXX, XXXX XXXX instructed his te...",single_hop_specifc_query_synthesizer
6,How does the issue with the EdFinancial forms ...,[I have provided documentation relating to my ...,Our Human Resources department provided separa...,single_hop_specifc_query_synthesizer
7,How does FERPA protect student data privacy?,[My personal and financial data was compromise...,My personal and financial data was compromised...,single_hop_specifc_query_synthesizer
8,How does the violation of the Higer Education ...,[I am writing to formally dispute my XXXX XXXX...,The violation of the Higher Education Act demo...,single_hop_specifc_query_synthesizer
9,How does HIPAA relate to the violations of rig...,[Breach of Contract - All four branches have v...,"In the context provided, HIPAA is listed among...",single_hop_specifc_query_synthesizer


In [67]:
from ragas import evaluate
from ragas.llms import LangchainLLMWrapper

evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1-mini"))

# Naive Retrieval Evaluation Dataset

In [None]:
naive_retrieval_dataset = testset.to_pandas().copy()
naive_retrieval_dataset["response"] = ""
naive_retrieval_dataset["retrieved_contexts"] = [[] for _ in range(len(pd_dataset))]

for k, v in naive_retrieval_dataset.iterrows():
    response = naive_retrieval_chain.invoke({"question": v.user_input})
    naive_retrieval_dataset.at[k, "response"] = response["response"].content
    naive_retrieval_dataset.at[k, "retrieved_contexts"] = [
        context.page_content for context in response["context"]
    ]

In [90]:
naive_retrieval_result = evaluate(
    dataset=EvaluationDataset.from_pandas(naive_retrieval_dataset),
    metrics=[
        LLMContextRecall(),
        Faithfulness(),
        FactualCorrectness(),
        ResponseRelevancy(),
        ContextEntityRecall(),
        NoiseSensitivity(),
    ],
    llm=evaluator_llm,
    run_config=RunConfig(timeout=360),
)
naive_retrieval_result

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

Exception raised in Job[38]: AttributeError('StringIO' object has no attribute 'statements')
Exception raised in Job[40]: TimeoutError()
Exception raised in Job[59]: TimeoutError()


{'context_recall': 0.9750, 'faithfulness': 0.7829, 'factual_correctness': 0.4800, 'answer_relevancy': 0.8567, 'context_entity_recall': 0.6016, 'noise_sensitivity_relevant': 0.4199}

# Bm25 Retrieval Evaluation Dataset

In [96]:
bm25_retrieval_dataset = testset.to_pandas().copy()
bm25_retrieval_dataset["response"] = ""
bm25_retrieval_dataset["retrieved_contexts"] = [[] for _ in range(len(pd_dataset))]

for k, v in bm25_retrieval_dataset.iterrows():
    response = bm25_retrieval_chain.invoke({"question": v.user_input})
    bm25_retrieval_dataset.at[k, "response"] = response["response"].content
    bm25_retrieval_dataset.at[k, "retrieved_contexts"] = [
        context.page_content for context in response["context"]
    ]

In [101]:
bm25_retrieval_result = evaluate(
    dataset=EvaluationDataset.from_pandas(bm25_retrieval_dataset),
    metrics=[
        LLMContextRecall(),
        Faithfulness(),
        FactualCorrectness(),
        ResponseRelevancy(),
        ContextEntityRecall(),
        NoiseSensitivity(),
    ],
    llm=evaluator_llm,
    run_config=RunConfig(timeout=360),
)
bm25_retrieval_result

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

Exception raised in Job[47]: AttributeError('StringIO' object has no attribute 'statements')
Exception raised in Job[40]: TimeoutError()


{'context_recall': 0.7750, 'faithfulness': 0.7864, 'factual_correctness': 0.4670, 'answer_relevancy': 0.9745, 'context_entity_recall': 0.4984, 'noise_sensitivity_relevant': 0.3671}

# Contextual Compression

In [None]:
context_compression_retrieval_dataset = testset.to_pandas().copy()
context_compression_retrieval_dataset["response"] = ""
context_compression_retrieval_dataset["retrieved_contexts"] = [[] for _ in range(len(pd_dataset))]

for k, v in context_compression_retrieval_dataset.iterrows():
    response = contextual_compression_retrieval_chain.invoke({"question": v.user_input})
    context_compression_retrieval_dataset.at[k, "response"] = response["response"].content
    context_compression_retrieval_dataset.at[k, "retrieved_contexts"] = [
        context.page_content for context in response["context"]
    ]

In [102]:
context_compression_retrieval_result = evaluate(
    dataset=EvaluationDataset.from_pandas(context_compression_retrieval_dataset),
    metrics=[
        LLMContextRecall(),
        Faithfulness(),
        FactualCorrectness(),
        ResponseRelevancy(),
        ContextEntityRecall(),
        NoiseSensitivity(),
    ],
    llm=evaluator_llm,
    run_config=RunConfig(timeout=360),
)
context_compression_retrieval_result

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

Exception raised in Job[44]: AttributeError('StringIO' object has no attribute 'statements')
Exception raised in Job[38]: AttributeError('StringIO' object has no attribute 'statements')
Exception raised in Job[47]: AttributeError('StringIO' object has no attribute 'statements')


{'context_recall': 0.8250, 'faithfulness': 0.7676, 'factual_correctness': 0.4475, 'answer_relevancy': 0.8689, 'context_entity_recall': 0.5248, 'noise_sensitivity_relevant': 0.3479}

# Multi-query retriever

In [103]:
multi_query_retrieval_dataset = testset.to_pandas().copy()
multi_query_retrieval_dataset["response"] = ""
multi_query_retrieval_dataset["retrieved_contexts"] = [
    [] for _ in range(len(pd_dataset))
]

for k, v in multi_query_retrieval_dataset.iterrows():
    response = multi_query_retrieval_chain.invoke({"question": v.user_input})
    multi_query_retrieval_dataset.at[k, "response"] = response[
        "response"
    ].content
    multi_query_retrieval_dataset.at[k, "retrieved_contexts"] = [
        context.page_content for context in response["context"]
    ]

In [105]:
multi_query_retrieval_result = evaluate(
    dataset=EvaluationDataset.from_pandas(multi_query_retrieval_dataset),
    metrics=[
        LLMContextRecall(),
        Faithfulness(),
        FactualCorrectness(),
        ResponseRelevancy(),
        ContextEntityRecall(),
        NoiseSensitivity(),
    ],
    llm=evaluator_llm,
    run_config=RunConfig(timeout=360),
)
multi_query_retrieval_result

Evaluating:   0%|          | 0/60 [00:00<?, ?it/s]

Exception raised in Job[8]: AttributeError('StringIO' object has no attribute 'statements')
Exception raised in Job[11]: AttributeError('StringIO' object has no attribute 'statements')
Exception raised in Job[10]: TimeoutError()
Exception raised in Job[23]: TimeoutError()
Exception raised in Job[29]: TimeoutError()
Exception raised in Job[40]: TimeoutError()
Exception raised in Job[41]: TimeoutError()
Exception raised in Job[59]: TimeoutError()


{'context_recall': 0.9750, 'faithfulness': 0.8394, 'factual_correctness': 0.4678, 'answer_relevancy': 0.8692, 'context_entity_recall': 0.6042, 'noise_sensitivity_relevant': 0.4617}

## Results Analysis

| Retrieval Method             | Context Recall | Faithfulness | Factual Correctness | Answer Relevancy | Context Entity Recall | Noise Sensitivity Relevant |
|-----------------------------|:--------------:|:------------:|:-------------------:|:----------------:|:---------------------:|:--------------------------:|
| **Naive Retrieval**         | 0.9750         | 0.7829       | 0.4800              | 0.8567           | 0.6016                | 0.4199                     |
| **BM25 Retrieval**          | 0.7750         | 0.7864       | 0.4670              | 0.9745           | 0.4984                | 0.3671                     |
| **Context Compression**     | 0.8250         | 0.7676       | 0.4475              | 0.8689           | 0.5248                | 0.3479                     |
| **Multi-Query Retrieval**   | 0.9750         | 0.8394       | 0.4678              | 0.8692           | 0.6042                | 0.4617                     |

| Retrieval Method           | Cost per request      | Mean Time (seconds) |
|---------------------------|-------------------|----------------|
| Naive Retrieval           | $0.00034036       | 3.17           |
| BM25 Retrieval            | $0.00049501       | 3.63           |
| Context Compression       | $0.00031641       | 5.30           |
| Multi-Query Retrieval     | $0.00114227       | 6.44  |