# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [1]:
from dotenv import load_dotenv

load_dotenv(dotenv_path="../.env")

True

In [2]:
def check_if_env_var_is_set(env_var_name: str, human_readable_string: str = "API Key"):
    api_key = os.getenv(env_var_name)
  
    if api_key:
       print(f"{env_var_name} is present")
    else:
      print(f"{env_var_name} is NOT present, paste key at the prompt:")
      os.environ[env_var_name] = getpass.getpass(f"Please enter your {human_readable_string}: ")

In [3]:
import os
import getpass

# os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

check_if_env_var_is_set("OPENAI_API_KEY", "OpenAI API key")

OPENAI_API_KEY is present


In [4]:
# os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

check_if_env_var_is_set("COHERE_API_KEY", "Cohere API key")

COHERE_API_KEY is present


## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [5]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [6]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [7]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [8]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [9]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [10]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [11]:
%%time
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

CPU times: user 9.07 ms, sys: 4.06 ms, total: 13.1 ms
Wall time: 158 ms


Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [12]:
%%time
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

CPU times: user 425 ms, sys: 52.1 ms, total: 478 ms
Wall time: 2.04 s


'The most common issue with loans, based on the complaints in the provided data, appears to be problems related to the handling and management of student loans. This includes issues such as errors in loan balances, misapplied payments, incorrect reporting on credit reports, confusing or disputed loan transfers without proper notification, trouble with repayment plans, and improper handling of loans including bad information or mismanagement by servicers. Many complaints also involve difficulty in applying payments correctly, inaccurate or misleading loan information, and issues with loan forgiveness or discharge processes.'

In [13]:
%%time
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

CPU times: user 392 ms, sys: 39.3 ms, total: 431 ms
Wall time: 2.87 s


'Based on the provided data, yes, some complaints were not handled in a timely manner. Specifically:\n\n- The complaint received on 03/28/25 from MOHELA was marked as "Not timely" in response to the question about handling complaints promptly.\n- Others, such as the complaint on 04/24/25 from Maximus Federal Services, were responded to "on time" or "early," indicating timely handling.\n\nTherefore, the complaint from MOHELA did not get handled in a timely manner.'

In [14]:
%%time
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

CPU times: user 395 ms, sys: 0 ns, total: 395 ms
Wall time: 4.47 s


'People failed to pay back their loans primarily because of difficulties understanding and managing the complexities of student loan repayment. Many faced issues such as:\n\n- Lack of clear communication about when and how payments would resume, especially after forbearance or deferment periods.\n- Accumulation of interest during forbearance or deferment, which increased the total amount owed and extended the repayment period.\n- Uncertainty and confusion caused by inconsistent or incorrect information from loan servicers, including transfer of loans without proper notification.\n- Inability to afford increased or scheduled payments due to financial hardship, stagnant wages, or unexpected expenses.\n- Problems with loan management systems, such as difficulty applying additional payments to the principal, which prolongs debt and increases total cost.\n- Insufficient transparency and lack of information from servicers about loan balances, interest accrual, and repayment options, leading 

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [15]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [16]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [17]:
%%time
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

CPU times: user 46.6 ms, sys: 7.13 ms, total: 53.7 ms
Wall time: 2.43 s


'Based on the provided context, the most common issue with loans appears to be problems related to dealing with lenders or servicers. Specific sub-issues include disagreements over fees charged, trouble with how payments are handled (such as inability to apply additional funds to the principal), and receiving incorrect or bad information about the loan. These issues highlight ongoing challenges consumers face with loan management and servicing.'

In [18]:
%%time
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

CPU times: user 46.3 ms, sys: 12.2 ms, total: 58.5 ms
Wall time: 1.39 s


'Based on the provided information, all the complaints mentioned were responded to by the companies, and each response was marked as "Closed with explanation" and indicated that the response was "Timely" (Yes). Therefore, no complaints appear to have gone unhandled in a timely manner.'

In [19]:
%%time
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

CPU times: user 54.2 ms, sys: 16.5 ms, total: 70.7 ms
Wall time: 6.2 s


"People failed to pay back their loans for several reasons, including:\n\n1. **Problems with Payment Plans:** Some borrowers experienced issues with their payment plans, such as being steered into the wrong types of forbearances or not receiving the appropriate assistance, which led to difficulties in making payments.\n\n2. **Lack of Communication:** Some borrowers were not properly informed about changes or their loan status. For example, automatic payments were discontinued without notice, or borrowers did not receive emails or mail informing them about their loan transfer, billing, or overdue status.\n\n3. **Errors and Delays from Loan Servicers:** Errors like payments being reversed, unpaid bills accumulating, or incorrect account information contributed to borrowers' inability to pay.\n\n4. **Transfer or Shuttering of Loan Servicers:** When loans were transferred between servicers, some borrowers were unaware or did not receive proper communication, leading to missed payments and 

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.

#### ✅ Answer:

BM25, a traditional full-text search ranking function, is particularly effective when dealing with queries that rely heavily on exact term matching, term frequency, and inverse document frequency (TF-IDF) principles.

BM25 is generally better suited for scenarios where exact keyword matching is essential, such as in e-commerce search engines, document retrieval systems, and legal e-discovery.

Additionally, BM25 is often used in hybrid search systems alongside vector search to create a more comprehensive understanding of both semantic meaning and keyword importance.

Here are a couple of queries where the exact matching terms in the document would be essential to prevent a lot of results with noise and near close terms but not close enough:

- "Find documents about COVID-19 vaccine side effects in patients with diabetes"
  - the key terms here COVID-19 vaccine and diabetes are were the focus is in the query
- "Best practices for data backup in 2025"
  - It includes specific terms like "data backup" and "2025" that are likely to appear verbatim in relevant documents.
  - BM25 can effectively leverage term frequency (e.g., how often "data backup" appears in a document) and document length normalization to rank documents accurately. The query does not heavily rely on semantic similarity but rather on the presence and frequency of exact keywords.
  - In contrast, dense embeddings might struggle if the training data does not include similar phrasing or if the semantic model does not strongly associate "best practices" with "data backup" in the context of 2025.

Embeddings, on the other hand, are better suited for capturing semantic relationships between words and documents. If embeddings were used in the above scenarios or use-cases, the precision of the results would not be as accurate as with BM25.


### Addendum

_**Sparse Embeddings** are high-dimensional vectors where most values are zero, with only a few non-zero values representing specific features or tokens that are present, making them memory-efficient and interpretable but limited to explicit feature representation._

_**Dense Embeddings** are vectors where most or all dimensions have non-zero values, creating rich, continuous representations that capture complex semantic relationships and contextual meaning, but require more storage and are less interpretable._

_**Key Difference:** Sparse embeddings work like "on/off switches" for specific features (like one-hot encoding or TF-IDF), while dense embeddings work like "semantic fingerprints" where every dimension contributes to the overall meaning representation - sparse focuses on explicit presence/absence, dense captures nuanced relationships._

___

_**Sparse Retrieval** uses exact keyword matching with algorithms like BM25, where documents are represented as sparse vectors containing only the specific terms that appear in them, making it excellent for precise term-based searches but limited to lexical matches._

_**Dense Retrieval** uses semantic embeddings where documents and queries are converted into dense vector representations that capture meaning and context, allowing it to find semantically similar content even when different words are used, but potentially missing exact keyword matches._

_**Key Difference:** Sparse retrieval excels at "what you search is what you get" with exact terms, while dense retrieval excels at "what you mean is what you get" through semantic understanding - which is why hybrid approaches combining both often work best._


## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [20]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [21]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [22]:
%%time
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

CPU times: user 430 ms, sys: 14.5 ms, total: 445 ms
Wall time: 1.82 s


'Based on the provided context, the most common issue with loans appears to be problems related to dealing with lenders or servicers, such as receiving incorrect or misleading information, errors in loan balances, misapplied payments, wrongful denials of payment plans, and mishandling of loan data. Many complaints involve difficulties in communication, lack of proper documentation, and disputes over account information.'

In [23]:
%%time
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

CPU times: user 404 ms, sys: 850 μs, total: 404 ms
Wall time: 2.59 s


'Based on the provided information, at least two complaints indicate delays in handling:\n\n1. The complaint regarding the student loan account review and related issues has been ongoing for nearly 18 months with no resolution, and the individual reports waiting over a year for responses, indicating it was not handled in a timely manner.\n2. The complaint about payments not being applied to the loan account was addressed with a response from the company, and response times are marked as "Yes" for timely response, suggesting it was handled within the expected timeframe.\n\nTherefore, yes, some complaints did not get handled in a timely manner, specifically the one about the account review which has remained unresolved for over a year and a half.'

In [24]:
%%time
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

CPU times: user 411 ms, sys: 8.52 ms, total: 419 ms
Wall time: 4.8 s


'People failed to pay back their loans for several reasons, including:\n\n1. Lack of Awareness and Communication: Many borrowers were unaware they needed to repay their loans or were not properly informed by financial aid officers. Some were not notified when their loans were transferred between servicers, and they did not receive adequate information about repayment requirements or options.\n\n2. Compounding Interest and Loan Terms: Borrowers often faced accumulating interest, especially when loans were placed into forbearance or deferment. During these periods, interest continued to grow, making the total amount owed larger over time and creating a cycle where payments could be insufficient to offset the interest, prolonging repayment.\n\n3. Financial Hardships: Many borrowers experienced difficulties in making payments due to stagnant wages, inflation, and unmanageable debt levels. Some could not increase payments to reduce the principal without sacrificing basic necessities, and ot

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [25]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [26]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [27]:
%%time
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

CPU times: user 1.16 s, sys: 6.95 ms, total: 1.17 s
Wall time: 3.99 s


'The most common issue with loans, based on the provided complaints, appears to be mismanagement and errors by loan servicers. This includes inaccuracies in loan balances, improper classification of loans, mishandling of deferments and repayment plans, problematic communication, unauthorized interest capitalization, and difficulty obtaining accurate information or documentation. Many complaints highlight issues such as improper default reporting, incorrect loan status classification, problems with applying payments, and lack of transparency or support from servicers.\n\nIn summary, the most prevalent issue is **mismanagement and mishandling of loan accounts by servicers, leading to inaccuracies, confusion, and financial hardship for borrowers**.'

In [28]:
%%time
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

CPU times: user 1.14 s, sys: 3.85 ms, total: 1.14 s
Wall time: 3.76 s


'Yes, according to the provided complaints, some complaints did not get handled in a timely manner. For example, one complaint received on 03/28/25 was marked as "No" in response to "Timely response?" and indicates that the company acted late or delayed (e.g., "Company believes it acted appropriately as authorized by contract or law" despite the delay). Additionally, several complaints mention delays in response or resolution, such as responses taking over 30 days or multiple follow-ups without resolution.\n\nTherefore, the answer is: Yes, some complaints did not get handled in a timely manner.'

In [29]:
%%time
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

CPU times: user 1.14 s, sys: 0 ns, total: 1.14 s
Wall time: 7.43 s


'People failed to pay back their loans primarily due to a combination of systemic issues, mismanagement, and lack of transparent information. The common reasons include:\n\n1. **Accumulation of interest during forbearance and deferment:** Borrowers were often offered options like forbearance or deferment, which allowed them to pause payments but did not stop interest from accruing and compounding, leading to inflated balances that became unmanageable.\n\n2. **Lack of awareness about repayment options:** Many borrowers were not adequately informed about alternative repayment plans such as income-driven repayment (IDR), loan forgiveness, or rehabilitation programs. Instead, they were steered into long-term forbearance or consolidation, which increased their debt burdens.\n\n3. **Poor communication and misrepresentation by servicers:** Several complaints highlight that servicers failed to give proper notice of delinquency, provided incorrect or confusing information about loan balances an

#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

#### ✅ Answer:

Multiple reformulations improve recall because relevant documents may use different terminology than the original query, and each reformulation can surface documents the others miss (different phrasings in multiple reformulations of a query can match different relevant documents).

In other words, multiple reformulations approach the same query from different angles/facets, leading to retrieval of documents covering those various angles. This increases the confluence of documents around the common theme while capturing variations in terminology and perspective, thereby enhancing retrieval scope.

And since such retrievers that use multiple reformulations would follow the below steps:

  1. Generates multiple query variations from the original query using an LLM
  2. Retrieves documents for each variation (each gets k results)
  3. Deduplicates and merges the results from all queries
  4. Returns the final deduplicated set

The return results from multiple reformulations would be more beneficial as a retrieval process.

An example would be "machine learning algorithms" vs "AI models" retrieves different relevant documents but around the same or similar theme.

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [30]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [31]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [32]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [33]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [34]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [35]:
%%time
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

CPU times: user 526 ms, sys: 247 ms, total: 773 ms
Wall time: 1.79 s


'The most common issue with loans, based on the context provided, appears to be problems related to federal student loan servicing. These include errors in loan balances, misapplied payments, wrongful denials of payment plans, and ongoing misconduct by loan servicers such as errors, disputes over balances and interest rates, illegal credit reporting, and failure to verify the legitimacy of debts. Many complaints highlight systemic issues like inaccurate reporting, unfair practices, and complications arising from the transfer or sale of student loans.'

In [36]:
%%time
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

CPU times: user 394 ms, sys: 9.37 ms, total: 403 ms
Wall time: 2.56 s


'Based on the provided information, yes, some complaints did not get handled in a timely manner. Specifically, the complaints related to student loan servicing by MOHELA (Complaint IDs 12709087 and 12935889) indicate that the responses were "No" for timeliness, meaning they were not handled promptly. Additionally, the complaint about dispute settlement with Nelnet (Complaint ID 13205525) was handled "Yes" for timely response, suggesting it was addressed on time. \n\nTherefore, the complaints about MOHELA were not handled in a timely manner, while the Nelnet complaint was handled promptly.'

In [37]:
%%time
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

CPU times: user 393 ms, sys: 158 ms, total: 552 ms
Wall time: 3.18 s


"People often fail to pay back their loans due to a variety of reasons highlighted in the complaints. These include:\n\n- Lack of proper communication or notification from loan servicers about payment requirements or due dates.\n- Financial hardship or severe economic difficulties that make it difficult to afford payments.\n- Problems with understanding or managing the loan terms, such as not being aware of when payments should start or being misled about the manageability of the loans.\n- Issues related to the institution from which the loans were taken, such as school closures, misrepresentations about the value of education, or institutional financial instability, which can impact graduates' ability to find employment and repay loans.\n- Disputes over the legitimacy of the debt or improper reporting and collection practices that hinder repayment.\n\nIn summary, failure to pay back loans can stem from poor communication, financial hardship, mismanagement, or systemic issues within th

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [38]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [39]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [40]:
%%time
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

CPU times: user 2.27 s, sys: 0 ns, total: 2.27 s
Wall time: 5.25 s


'Based on the provided complaints data, the most common issues with student loans tend to revolve around:\n\n- Dealing with lenders or servicers (e.g., misapplication of payments, mismanagement, improper transfers, or lack of communication)\n- Incorrect or bad information reported about the loan status or balances\n- Problems with how payments are being handled (e.g., inability to apply extra funds to principal, repayment plan issues)\n- Challenges in obtaining proper documentation or verification of loans\n- Disputes over loan balances, interest calculations, or reported delinquencies and defaults\n- Problems related to loan transfers and servicing changes without proper communication\n- Issues concerning loan forgiveness, discharge, or application processing\n\nIn summary, a recurring theme is misconduct or errors by loan servicers and mismanagement of loan information, causing confusion, inaccurate credit reporting, or difficulty in repayment.\n\nTherefore, the most common issue wit

In [41]:
%%time
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

CPU times: user 2.25 s, sys: 0 ns, total: 2.25 s
Wall time: 4.56 s


'Based on the provided complaints, yes, there are instances where complaints were not handled in a timely manner. For example:\n\n- Complaint ID 12935889 regarding incorrect information on a report was marked as "Timely response?": No, indicating it was not handled in time.\n- Complaint ID 12935889 about account and credit report issues was marked as "No" for timely response.\n- Complaint ID 12935889 about account information inaccuracies was also marked as "No" for timely response.\n\nHowever, most other complaints show "Yes" for timely response, indicating those were handled promptly.\n\nIn summary, yes, some complaints did not get handled in a timely manner.'

In [42]:
%%time
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

CPU times: user 2.25 s, sys: 9.44 ms, total: 2.26 s
Wall time: 5.85 s


"People failed to pay back their loans mainly due to a combination of factors including:\n\n1. Lack of clear or timely communication from loan servicers about payment resumption, delinquency status, or changes in loan transfer details.\n2. Being misled or inadequately informed about repayment options, such as income-driven repayment plans, rehabilitation, or consequences of forbearance, leading them to default or fall behind.\n3. Compounding interest and high interest rates during forbearance or deferment periods, making the debt grow faster than borrowers could pay off.\n4. Systematic issues such as improper handling of loan transfers, errors in account information, or reporting inaccuracies that negatively impacted credit scores.\n5. Difficulty in managing payments due to financial hardship, unemployment, or unexpected life events, combined with limited support or guidance from loan servicers.\n6. Instances of servicer misconduct, like unfair practices, misapplication of payments, or

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

The `breakpoint_threshold_type` parameter controls when the semantic chunker creates chunk boundaries based on embedding similarity between sentences:

**Four Threshold Types:**

1. _"percentile" (default)_
- Splits when sentence embedding distance exceeds the 95th percentile of all distances
- Effect: Creates chunks at the most semantically distinct boundaries
- Behavior: More conservative splitting, larger chunks

2. _"standard_deviation"_
- Splits when distance exceeds 3 standard deviations from mean
- Effect: Better predictable performance, especially for normally distributed content
- Behavior: More consistent chunk sizes

3. _"interquartile"_
- Uses IQR * 1.5 scaling factor to determine breakpoints
- Effect: Middle-ground approach, robust to outliers
- Behavior: Balanced chunk distribution

4. _"gradient"_
- Detects anomalies in embedding distance gradients
- Effect: Best for domain-specific/highly correlated content
- Behavior: Finds subtle semantic transitions

**Impact:** _The threshold type determines sensitivity to semantic changes - more sensitive types create smaller, more focused chunks while less sensitive types create larger, more comprehensive chunks._

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [43]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [44]:
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

Let's create a new vector store.

In [45]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [46]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [47]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [48]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be problems related to the handling and servicing of federal student loans. These issues include:\n\n- Difficulty with loan forgiveness, cancellation, or discharge processes.\n- Improper or illegal reporting and collection of debts, especially post-abolishment of the Department of Education.\n- Errors in loan account status, such as loans being incorrectly reported as in default or delinquent.\n- Problems with payments, auto-debit setups, and reconciling payment plans.\n- Discrepancies or lack of transparency regarding loan servicing, issuer changes, or account information.\n- Data breaches and violations of privacy laws related to borrower information.\n- Issues with communication and responsiveness from loan servicers.\n\nWhile multiple issues are noted, a recurring theme is the mishandling, inaccurate reporting, and servicing problems associated with federal student loans.'

In [49]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, it appears that several complaints involving different companies and issues were responded to with responses such as "Closed with explanation," indicating that the complaints were addressed, albeit potentially not satisfactorily. Additionally, the responses explicitly state that they were handled in a timely manner ("Yes" under the "Timely response?" field).\n\nHowever, there is at least one complaint (the first one about Nelnet in Indiana) where the consumer reported that despite multiple letters and acknowledgments, the company "never responded to the CM, nor provided any answers to the questions raised in the CM." The company response was "Closed with explanation" but there is no indication that the response was timely or adequate.\n\nIn summary:\n\n- Some complaints were handled in a timely manner, as indicated by "Yes" under "Timely response."\n- Others, such as the complaint against Nelnet regarding unanswered correspondence and alleged miscondu

In [50]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including legal disputes over the legitimacy of the debt, issues with the reporting and transfer of their loans, difficulties with payment plans or account management, and problems arising from data breaches or improper handling of personal information. Some borrowers also faced delays or errors in re-amortization after forbearance periods, or encountered obstacles in verifying their forgiveness or discharge claims, which hindered their ability to comply with or resolve their loan obligations.'

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

#### ✅ Answer:

Short and highly repetitive sentences create _minimal embedding distance_ variations, making it difficult to detect _meaningful semantic_ boundaries.

Threshold Type Behaviors:

1. "percentile" (95th percentile)

- Behavior: Creates very few chunks since most distances are similar
- Issue: May group unrelated FAQ topics together
- Adjustment: Lower to 75-85th percentile to increase sensitivity

2. "standard_deviation" (3σ)

- Behavior: Performs poorly due to low variance in short, similar sentences
- Issue: Creates massive chunks with no meaningful breaks
- Adjustment: Reduce to 1-2 standard deviations for more splitting

3. "interquartile" (IQR × 1.5)

- Behavior: Most robust for FAQs due to outlier resistance
- Issue: Still may miss subtle topic transitions
- Adjustment: Reduce scaling factor to 0.8-1.0

4. "gradient" (anomaly detection)

- Behavior: Best performer - detects subtle topic shifts in repetitive content
- Issue: May be overly sensitive to minor variations
- Adjustment: Fine-tune threshold to 85-90th percentile

Conclusion: Use "gradient" with _85th percentile_ + minimum chunk size constraints + keyword-based post-processing to ensure FAQ topics remain grouped appropriately despite repetitive language patterns.

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

In [51]:
### YOUR CODE HERE