# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [1]:
from dotenv import load_dotenv

load_dotenv(dotenv_path="../.env")

True

In [2]:
def check_if_env_var_is_set(env_var_name: str, human_readable_string: str = "API Key"):
    api_key = os.getenv(env_var_name)
  
    if api_key:
       print(f"{env_var_name} is present")
    else:
      print(f"{env_var_name} is NOT present, paste key at the prompt:")
      os.environ[env_var_name] = getpass.getpass(f"Please enter your {human_readable_string}: ")

In [3]:
import os
import getpass

# os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

check_if_env_var_is_set("OPENAI_API_KEY", "OpenAI API key")

OPENAI_API_KEY is present


In [4]:
# os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

check_if_env_var_is_set("COHERE_API_KEY", "Cohere API key")

COHERE_API_KEY is present


## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [5]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [6]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [7]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [8]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [9]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [10]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [52]:
%%time
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

CPU times: user 406 μs, sys: 0 ns, total: 406 μs
Wall time: 440 μs


Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [53]:
%%time
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

CPU times: user 475 ms, sys: 38 ms, total: 513 ms
Wall time: 2.22 s


'Based on the information provided, the most common issues with loans appear to be related to mismanagement and misleading practices, including errors in loan balances, incorrect reporting, difficulty applying payments correctly, and problems with loan transfers and information accuracy. \n\nIn particular, issues such as errors in loan balances, misapplied payments, wrongful denials of payment plans, and incorrect information reported to credit bureaus are prevalent. Many complaints also involve unsatisfactory handling of loan transfers, lack of communication, and unfair repayment practices.\n\nSo, the most common issue with loans, as reflected in the complaints, seems to be **mismanagement and errors in loan handling**, leading to increasing balances, credit report discrepancies, and confusion for borrowers.'

In [54]:
%%time
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

CPU times: user 400 ms, sys: 3.54 ms, total: 404 ms
Wall time: 3.12 s


'Based on the provided information, yes, there were complaints that did not get handled in a timely manner. Specifically, at least two complaints were marked as "Not in a timely response" or had delays:\n\n- One complaint to MOHELA received on 03/28/25 was marked as "Timely response: No."\n- Another complaint to Maximus Federal Services, Inc. received on 04/05/25 was marked as "Timely response: Yes," but the context indicates ongoing issues with response times in some cases.\n\nAdditionally, several complaints mention delays and not receiving timely responses, such as ongoing unresolved issues for over a year or more, indicating some complaints were not addressed within expected time frames.'

In [55]:
%%time
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

CPU times: user 388 ms, sys: 7.29 ms, total: 395 ms
Wall time: 5.6 s


'People failed to pay back their loans primarily due to several interconnected reasons highlighted in the complaints:\n\n1. **Lack of Clear Communication and Transparency:** Many borrowers were not adequately informed about changes in loan servicing, repayment resumption dates, or transfer of loan ownership. For example, some borrowers did not receive notifications about when repayments were to begin or when their loans were transferred to different servicers, leading to unexpected delinquencies and credit issues.\n\n2. **Compounding and Growing Interest:** Borrowers reported that interest continued to accrue even during forbearance or deferment periods, which made it difficult or impossible to pay down the actual loan amount. This effectively extended the repayment period and increased total debt, causing frustration and financial hardship.\n\n3. **Inadequate or Restrictive Payment Options:** Several complaints pointed out that the available options, such as forbearance or deferment, 

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [15]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [16]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [56]:
%%time
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

CPU times: user 52.3 ms, sys: 5.97 ms, total: 58.3 ms
Wall time: 1.33 s


'Based on the provided context, the most common issue with student loan complaints appears to be problems related to dealing with lenders or servicers. Specific sub-issues include disagreements over fees charged, difficulties in making or applying payments (such as being unable to pay down principal or pay off loans more quickly), and receiving incorrect or bad information about the loan details.'

In [57]:
%%time
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

CPU times: user 36.3 ms, sys: 8.66 ms, total: 44.9 ms
Wall time: 926 ms


'Based on the provided information, all the complaints listed were responded to in a timely manner, with the responses marked as "Yes" for timely response. Therefore, there do not appear to be any complaints that did not get handled in a timely manner.'

In [59]:
%%time
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

CPU times: user 35.5 ms, sys: 4.42 ms, total: 39.9 ms
Wall time: 4.54 s


'People failed to pay back their loans for several reasons, including:\n\n1. **Problems with payment plans or forbearances:** Some borrowers were steered into incorrect types of forbearances or experienced issues with their repayment plans, leading to missed or deferred payments that were not properly communicated or handled.\n\n2. **Errors and mismanagement by lenders or servicers:** Complaints include instances where loans were transferred or sold without proper notice, resulting in borrowers being unaware of changes, automatic payments being unenrolled, or incorrect billing and overdue statuses being reported.\n\n3. **Lack of communication:** Many borrowers were not properly informed about their loan status, repayment restart dates, or changes in service providers, causing unintentional missed payments and impacts on credit scores.\n\n4. **Technical problems:** Payment reversals and billing errors, such as payments being repeatedly reversed or not processed correctly, contributed to

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.

## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [20]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [21]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [60]:
%%time
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

CPU times: user 518 ms, sys: 53.4 ms, total: 572 ms
Wall time: 3.27 s


'Based on the provided context, a common issue with loans, specifically student loans, appears to be problems related to the mismanagement of information by servicers. This includes errors in loan balances, misapplied payments, wrongful denials of payment plans, and inaccurate or incomplete information about the loan status. Additionally, issues such as lack of communication, incorrect information, unauthorized transfer of loans, and violations of privacy laws are frequently reported. Therefore, a most common issue is the mishandling and miscommunication regarding loan information by lenders or servicers.'

In [61]:
%%time
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

CPU times: user 393 ms, sys: 16.2 ms, total: 409 ms
Wall time: 22.1 s


'Based on the information provided, at least one complaint indicates that it was not handled in a timely manner. Specifically, the complaint from the consumer regarding a student loan issue (Complaint ID: 12975634) was open for nearly 18 months without resolution, and the consumer reported that they had not received a response despite waiting over a year for a reply. Although the company\'s response was marked as "Closed with explanation" and responded "Yes" to promptness, the consumer\'s experience suggests that some complaints, like this one, were not addressed promptly.'

In [62]:
%%time
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

CPU times: user 414 ms, sys: 19.6 ms, total: 434 ms
Wall time: 3.98 s


'People failed to pay back their loans primarily due to a combination of factors such as lack of clear communication and understanding about loan terms, accumulating interest, and financial hardships. Specifically, some borrowers were not aware that they would need to repay their loans, especially when they were not properly informed by financial aid officers. Others faced difficulties because loan servicers offered limited options like forbearance or deferment, which led to interest continuing to grow and increasing the total debt over time. Additionally, borrowers described challenges in managing monthly payments and avoiding interest buildup, which made repayment seem unmanageable. These issues, along with inadequate information and support from loan servicers, contributed to the failure to pay back loans.'

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [25]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [26]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [63]:
%%time
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

CPU times: user 1.14 s, sys: 7.76 ms, total: 1.15 s
Wall time: 4.5 s


"The most common issue with loans, based on the complaints data provided, appears to be problems related to dealing with lenders or servicers, including errors in loan balances, misapplied payments, incorrect reporting on credit reports, mishandling of deferments or forbearances, and poor communication or customer service. Many complaints highlight errors, lack of transparency, mismanagement, and improper reporting that negatively impact borrowers' credit and financial stability."

In [64]:
%%time
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

CPU times: user 1.17 s, sys: 11.8 ms, total: 1.18 s
Wall time: 4.83 s


'Based on the provided complaints, it appears that some complaints were not handled in a timely manner. For example, one complaint submitted to MOHELA on 03/25/25 was marked as "No" for timely response, indicating it was not responded to promptly. Additionally, several other complaints mention delays, lack of response, or processing times exceeding expected periods, such as over a year without resolution or needing multiple follow-ups.\n\nTherefore, the answer is: Yes, some complaints did not get handled in a timely manner.'

In [65]:
%%time
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

CPU times: user 1.13 s, sys: 20.9 ms, total: 1.16 s
Wall time: 7.49 s


'People failed to pay back their loans for various reasons, including:\n\n- Errors and misconduct by student loan servicers, such as misapplied payments, errors in loan balances, and wrongful denials of payment plans.\n- Administrative issues like transfers of loans without proper notification, incorrect reporting to credit bureaus, and lack of proper communication about delinquency status.\n- Being steered into long-term forbearances or consolidations without being informed of better legal options like income-driven repayment plans or rehabilitation programs, leading to increased balances due to interest capitalization.\n- Systemic problems such as improper handling of loan data, failure to follow regulations, and inadequate communication or misinformation from loan servicers.\n- Financial hardships, including unemployment, health issues, or unexpected life events, which made timely repayment difficult.\n- Borrowers often felt misled or falsely informed about their repayment obligatio

#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [30]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [31]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [32]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [33]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [34]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [66]:
%%time
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

CPU times: user 452 ms, sys: 14.6 ms, total: 467 ms
Wall time: 1.52 s


'Based on the provided context, the most common issue with loans appears to be problems related to the servicing and reporting of loans, such as errors in loan balances, misapplied payments, wrongful denials of payment plans, and incorrect or misleading credit reporting. Many complaints highlight systemic breakdowns, errors, and lack of transparency within loan servicing systems.\n\nIf you have a specific aspect in mind, such as repayment issues, interest rate problems, or reporting errors, please let me know!'

In [67]:
%%time
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

CPU times: user 393 ms, sys: 23.7 ms, total: 417 ms
Wall time: 4.8 s


'Based on the provided complaints, the first two complaints submitted to MOHELA on 03/28/25 and 04/11/25 indicate that the complaints were not handled in a timely manner. Specifically, they were marked as "No" under the "Timely response?" field, suggesting that these complaints were not addressed promptly. The third complaint regarding Aidvantage was responded to "Yes" for timely response, but it involved systemic errors and ongoing issues.\n\nTherefore, yes, some complaints did not get handled in a timely manner.'

In [68]:
%%time
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

CPU times: user 417 ms, sys: 12 ms, total: 429 ms
Wall time: 2.76 s


"People often failed to pay back their loans due to several factors highlighted in the complaints. These include lack of clear communication from loan servicers about when payments were expected to begin, especially prior to the end of grace periods. Additionally, some borrowers experienced financial hardship, unemployment, or unanticipated expenses that made it difficult to make payments. In cases involving institutions that closed or misrepresented the value of their education, borrowers faced long-term financial consequences and difficulties in repaying their loans. Overall, issues such as inadequate notification, mismanagement, and unforeseen personal or institutional difficulties contributed to borrowers' inability to repay their loans."

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [38]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [39]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [69]:
%%time
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

CPU times: user 2.31 s, sys: 51.4 ms, total: 2.36 s
Wall time: 4.96 s


'Based on the provided context, the most common issues with loans, particularly student loans, include:\n\n- Errors in loan balances and interest calculations\n- Misapplication of payments and wrongful denials of repayment plans\n- Problems with loan transfer, mismanagement, or lack of proper communication about account status\n- Receiving bad or incorrect information about loan terms, fees, or repayment options\n- Difficulty applying extra payments to principal\n- Problems with loan reporting on credit reports, including incorrect delinquency status or increased balances during pauses or forbearance\n- Unauthorized or confusing loan transfers and lack of transparency\n- Issues with loan forgiveness, discharge, or settlement mismanagement\n\nFrom these observations, the most prevalent or most frequently reported issue appears to be **errors in loan balances, interest calculations, or account information**, leading to discrepancies, credit report problems, and financial hardship. Many c

In [70]:
%%time
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

CPU times: user 2.26 s, sys: 16.3 ms, total: 2.27 s
Wall time: 7.62 s


'Based on the provided complaints, yes, some complaints were not handled in a timely manner. Specifically, there are instances where the responses from companies were marked as "No" or "Delayed," such as:\n\n- Complaint ID 12935889 (Mohela): Response was "No" indicating response was not timely.\n- Complaint ID 12668396 (Mohela): Response was "No," indicating the company did not respond in a timely manner.\n- Complaint ID 13056764 (Nelnet): Response was "Yes," but the complaint mentions a dispute over the handling of the dispute, implying possible delays or inadequate handling.\n- Complaint ID 13062402 (Nelnet): Response was "Yes," but the complaint discusses ongoing issues with timely responses.\n- Several other complaints indicate delays or failures to respond on time, as noted by the "Timely response?" field and the narrative comments.\n\nTherefore, the answer is: Yes, some complaints did not get handled in a timely manner.'

In [71]:
%%time
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

CPU times: user 2.25 s, sys: 8.83 ms, total: 2.26 s
Wall time: 9.82 s


"People failed to pay back their loans for various reasons, including:\n\n- Lack of clear or timely communication from lenders or servicers, leading to unawareness of payment requirements or status.\n- Accumulation of interest during forbearance or deferment, which increased the total amount owed and prolonged repayment.\n- Financial hardships such as unemployment, low income, or health issues making payments unaffordable.\n- Mismanagement or errors by loan servicers, such as incorrect reporting, unauthorized account access, or failure to provide proper documentation.\n- Limited or no access to income-driven repayment options or assistance programs, leading borrowers to default despite intentions to pay.\n- Lack of transparency about loan terms, interest accrual, and repayment options, causing confusion and unintended defaults.\n- Transfers of loan servicing without proper notice, resulting in missed communications and overdue payments.\n- Misleading or deceptive practices by loan serv

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

 The breakpoint_threshold_type parameter controls when the semantic chunker creates chunk boundaries based on embedding similarity between sentences:

**Four Threshold Types:**

1. _"percentile" (default)_
- Splits when sentence embedding distance exceeds the 95th percentile of all distances
- Effect: Creates chunks at the most semantically distinct boundaries
- Behavior: More conservative splitting, larger chunks

2. _"standard_deviation"_
- Splits when distance exceeds 3 standard deviations from mean
- Effect: Better predictable performance, especially for normally distributed content
- Behavior: More consistent chunk sizes

3. _"interquartile"_
- Uses IQR * 1.5 scaling factor to determine breakpoints
- Effect: Middle-ground approach, robust to outliers
- Behavior: Balanced chunk distribution

4. _"gradient"_
- Detects anomalies in embedding distance gradients
- Effect: Best for domain-specific/highly correlated content
- Behavior: Finds subtle semantic transitions

**Impact:** _The threshold type determines sensitivity to semantic changes - more sensitive types create smaller, more focused chunks while less sensitive types create larger, more comprehensive chunks._

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [43]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [44]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [45]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [46]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [47]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [48]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints data, appears to be related to problems with loan servicing such as miscommunications, incorrect account information, improper reporting, or issues with payment handling. Specifically, many complaints mention difficulties with loan repayment plans, erroneous reporting of delinquency or default, problems with account management, and disputes over how payments and interest are processed.'

In [49]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, it appears that several complaints were marked as "Closed with explanation" and indicated responses such as "None" or "Closed with explanation," which suggests that these issues were addressed, but not necessarily in a timely manner or to the complainant\'s satisfaction. Specifically, all complaints listed have responses stating the issue was closed, and there is no explicit mention of delays or failures to handle complaints promptly.\n\nHowever, there is no clear evidence in this data snippet showing complaints that were explicitly *not* handled in a timely manner. The responses indicate that responses were generally marked as "Yes" for being timely, but the fact that the complaints were ultimately closed could imply unresolved issues or delays in fully resolving the concerns.\n\nTherefore, I do not have concrete evidence from this data to definitively say that any complaints did not get handled in a timely manner.'

In [50]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People may have failed to pay back their loans for various reasons, including issues related to poor communication and transparency from lenders, difficulties with payment plans, mismanagement or errors by loan servicers, and legal or administrative disputes over the legitimacy or status of their loans. For example, some individuals faced challenges due to lack of clear information about their loan status, delays or problems with payments processing, or complications arising from legal issues such as the illegitimacy of certain debts or data breaches. Additionally, some borrowers experienced difficulties because of administrative hurdles or suspected misconduct by servicers, which hindered their ability to make or understand their payments.'

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

In [51]:
### YOUR CODE HERE