# Advanced Retrieval with LangChain

In the following notebook, we'll explore various methods of advanced retrieval using LangChain!

We'll touch on:

- Naive Retrieval
- Best-Matching 25 (BM25)
- Multi-Query Retrieval
- Parent-Document Retrieval
- Contextual Compression (a.k.a. Rerank)
- Ensemble Retrieval
- Semantic chunking

We'll also discuss how these methods impact performance on our set of documents with a simple RAG chain.

There will be two breakout rooms:

- 🤝 Breakout Room Part #1
  - Task 1: Getting Dependencies!
  - Task 2: Data Collection and Preparation
  - Task 3: Setting Up QDrant!
  - Task 4-10: Retrieval Strategies
- 🤝 Breakout Room Part #2
  - Activity: Evaluate with Ragas

# 🤝 Breakout Room Part #1

## Task 1: Getting Dependencies!

We're going to need a few specific LangChain community packages, like OpenAI (for our [LLM](https://platform.openai.com/docs/models) and [Embedding Model](https://platform.openai.com/docs/guides/embeddings)) and Cohere (for our [Reranker](https://cohere.com/rerank)).

We'll also provide our OpenAI key, as well as our Cohere API key.

In [1]:
from dotenv import load_dotenv

load_dotenv(dotenv_path="../.env")

True

In [2]:
def check_if_env_var_is_set(env_var_name: str, human_readable_string: str = "API Key"):
    api_key = os.getenv(env_var_name)
  
    if api_key:
       print(f"{env_var_name} is present")
    else:
      print(f"{env_var_name} is NOT present, paste key at the prompt:")
      os.environ[env_var_name] = getpass.getpass(f"Please enter your {human_readable_string}: ")

In [3]:
import os
import getpass

# os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key:")

check_if_env_var_is_set("OPENAI_API_KEY", "OpenAI API key")

OPENAI_API_KEY is present


In [4]:
# os.environ["COHERE_API_KEY"] = getpass.getpass("Cohere API Key:")

check_if_env_var_is_set("COHERE_API_KEY", "Cohere API key")

COHERE_API_KEY is present


## Task 2: Data Collection and Preparation

We'll be using our Loan Data once again - this time the strutured data available through the CSV!

### Data Preparation

We want to make sure all our documents have the relevant metadata for the various retrieval strategies we're going to be applying today.

In [5]:
from langchain_community.document_loaders.csv_loader import CSVLoader
from datetime import datetime, timedelta

loader = CSVLoader(
    file_path=f"./data/complaints.csv",
    metadata_columns=[
      "Date received", 
      "Product", 
      "Sub-product", 
      "Issue", 
      "Sub-issue", 
      "Consumer complaint narrative", 
      "Company public response", 
      "Company", 
      "State", 
      "ZIP code", 
      "Tags", 
      "Consumer consent provided?", 
      "Submitted via", 
      "Date sent to company", 
      "Company response to consumer", 
      "Timely response?", 
      "Consumer disputed?", 
      "Complaint ID"
    ]
)

loan_complaint_data = loader.load()

for doc in loan_complaint_data:
    doc.page_content = doc.metadata["Consumer complaint narrative"]

Let's look at an example document to see if everything worked as expected!

In [6]:
loan_complaint_data[0]

Document(metadata={'source': './data/complaints.csv', 'row': 0, 'Date received': '03/27/25', 'Product': 'Student loan', 'Sub-product': 'Federal student loan servicing', 'Issue': 'Dealing with your lender or servicer', 'Sub-issue': 'Trouble with how payments are being handled', 'Consumer complaint narrative': "The federal student loan COVID-19 forbearance program ended in XX/XX/XXXX. However, payments were not re-amortized on my federal student loans currently serviced by Nelnet until very recently. The new payment amount that is effective starting with the XX/XX/XXXX payment will nearly double my payment from {$180.00} per month to {$360.00} per month. I'm fortunate that my current financial position allows me to be able to handle the increased payment amount, but I am sure there are likely many borrowers who are not in the same position. The re-amortization should have occurred once the forbearance ended to reduce the impact to borrowers.", 'Company public response': 'None', 'Company'

## Task 3: Setting up QDrant!

Now that we have our documents, let's create a QDrant VectorStore with the collection name "LoanComplaints".

We'll leverage OpenAI's [`text-embedding-3-small`](https://openai.com/blog/new-embedding-models-and-api-updates) because it's a very powerful (and low-cost) embedding model.

> NOTE: We'll be creating additional vectorstores where necessary, but this pattern is still extremely useful.

In [7]:
from langchain_community.vectorstores import Qdrant
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

vectorstore = Qdrant.from_documents(
    loan_complaint_data,
    embeddings,
    location=":memory:",
    collection_name="LoanComplaints"
)

## Task 4: Naive RAG Chain

Since we're focusing on the "R" in RAG today - we'll create our Retriever first.

### R - Retrieval

This naive retriever will simply look at each review as a document, and use cosine-similarity to fetch the 10 most relevant documents.

> NOTE: We're choosing `10` as our `k` here to provide enough documents for our reranking process later

In [8]:
naive_retriever = vectorstore.as_retriever(search_kwargs={"k" : 10})

### A - Augmented

We're going to go with a standard prompt for our simple RAG chain today! Nothing fancy here, we want this to mostly be about the Retrieval process.

In [9]:
from langchain_core.prompts import ChatPromptTemplate

RAG_TEMPLATE = """\
You are a helpful and kind assistant. Use the context provided below to answer the question.

If you do not know the answer, or are unsure, say you don't know.

Query:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_TEMPLATE)

### G - Generation

We're going to leverage `gpt-4.1-nano` as our LLM today, as - again - we want this to largely be about the Retrieval process.

In [10]:
from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4.1-nano")

### LCEL RAG Chain

We're going to use LCEL to construct our chain.

> NOTE: This chain will be exactly the same across the various examples with the exception of our Retriever!

In [11]:
%%time
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

naive_retrieval_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | naive_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

CPU times: user 13.8 ms, sys: 3.14 ms, total: 16.9 ms
Wall time: 26.7 ms


Let's see how this simple chain does on a few different prompts.

> NOTE: You might think that we've cherry picked prompts that showcase the individual skill of each of the retrieval strategies - you'd be correct!

In [12]:
%%time
naive_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

CPU times: user 430 ms, sys: 36.2 ms, total: 466 ms
Wall time: 2min 12s


'The most common issue with loans, based on the provided complaints, appears to be problems related to the handling and management of loans by servicers. Specifically, many complaints involve errors in loan balances, misapplied payments, incorrect or misleading information reported on credit reports, and difficulties in communicating or resolving issues with loan servicers. Additionally, issues such as deceptive repayment practices, improper transfer of loans without notice, and disputes over loan terms or balances are prevalent.\n\nTherefore, the most common issue can be summarized as: **Problems with loan management and servicing, including errors, miscommunication, and unfair practices by loan servicers.**'

In [13]:
%%time
naive_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

CPU times: user 400 ms, sys: 22.1 ms, total: 422 ms
Wall time: 2min 18s


'Based on the provided context, yes, some complaints were not handled in a timely manner. Specifically, there are complaints where the response status indicates "No" or where delays are mentioned:\n\n- Complaint from 03/28/25 (Complaint ID: 12709087, Mohela) was marked as "Timely response?": No.\n- Complaint from 04/24/25 (Complaint ID: 13160766, Maximus Federal Services) was marked as "Timely response?": Yes, but the narrative indicates ongoing issues with responses and unresolved problems.\n- Complaint from 04/18/25 (Complaint ID: 13062402, Nelnet) was marked as "Timely response?": Yes, but the complaint details show unresolved or ongoing issues.\n- Others, such as the complaint from 04/14/25 (Complaint ID: 12973003, EdFinancial Services) were handled within the timeline, but multiple complaints reflect delays and lack of resolution over long periods.\n\nOverall, at least one complaint (the Mohela one) was explicitly not handled in a timely manner, and several others suggest delays o

In [14]:
%%time
naive_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

CPU times: user 368 ms, sys: 29.9 ms, total: 398 ms
Wall time: 9.75 s


"People failed to pay back their loans for several reasons, including:\n\n1. **Accumulation of interest during forbearance or deferment:** Borrowers often found that interest continued to accrue even when payments were paused, making the total debt grow and prolonging repayment.\n\n2. **Lack of clear communication and notification:** Many borrowers were not properly informed about loan transfer dates, repayment start dates, or changes in payment requirements, leading to unintentional delinquency.\n\n3. **Financial hardships and economic conditions:** Many borrowers experienced job loss, reduced income, or economic recessions that made it difficult to keep up with payments.\n\n4. **Inability to afford increased payments:** When attempting to lower payments or extend repayment, interest continued to grow, and increasing monthly payments was not feasible for some borrowers due to their financial situation.\n\n5. **Problems with loan management and servicing:** Some borrowers faced issues 

Overall, this is not bad! Let's see if we can make it better!

## Task 5: Best-Matching 25 (BM25) Retriever

Taking a step back in time - [BM25](https://www.nowpublishers.com/article/Details/INR-019) is based on [Bag-Of-Words](https://en.wikipedia.org/wiki/Bag-of-words_model) which is a sparse representation of text.

In essence, it's a way to compare how similar two pieces of text are based on the words they both contain.

This retriever is very straightforward to set-up! Let's see it happen down below!


In [15]:
from langchain_community.retrievers import BM25Retriever

bm25_retriever = BM25Retriever.from_documents(loan_complaint_data, )

We'll construct the same chain - only changing the retriever.

In [16]:
bm25_retrieval_chain = (
    {"context": itemgetter("question") | bm25_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at the responses!

In [17]:
%%time
bm25_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

CPU times: user 53.3 ms, sys: 10.5 ms, total: 63.8 ms
Wall time: 1.28 s


'The most common issue with loans, based on the provided context, appears to be problems related to dealing with lenders or servicers, including issues like misinformation about loan balances, difficulty applying payments correctly, and disputes over charges or fees. Specifically, complaints often involve challenges in understanding or managing loan payments, incorrect or misleading information about loan details, and concerns about the fairness of payment application or fees charged.'

In [18]:
%%time
bm25_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

CPU times: user 34.6 ms, sys: 6.29 ms, total: 40.9 ms
Wall time: 1.07 s


'Based on the provided information, all the complaints listed indicate that the organizations responded to the complaints in a timely manner. The responses are marked as "Yes" under the "Timely response?" field for each complaint. Therefore, there are no complaints in the provided data that were left unhandled or not responded to in a timely manner.'

In [19]:
%%time
bm25_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

CPU times: user 36.3 ms, sys: 8.77 ms, total: 45.1 ms
Wall time: 3.49 s


'People failed to pay back their loans for various reasons, including issues with payment plans, miscommunication or lack of communication from the loan servicers, and problems related to forbearance or deferment. For example, some borrowers experienced trouble with their payment plans or were subjected to incorrect handling of their forbearance requests, leading to continued billing despite attempts to defer payments. Others faced complications when loans were transferred between companies without proper notification, resulting in missed updates about payment status, automatic withdrawal issues, and negative impacts on credit scores. Additionally, some borrowers mentioned that they were not properly informed about account statuses or changes, which contributed to paying bills they were unaware of or unable to address in time.\n\nIn summary, failure to repay loans often stemmed from administrative errors, poor communication from loan servicers, and billing or account management problem

It's not clear that this is better or worse, if only we had a way to test this (SPOILERS: We do, the second half of the notebook will cover this)

#### ❓ Question #1:

Give an example query where BM25 is better than embeddings and justify your answer.

#### ✅ Answer:

BM25, a traditional full-text search ranking function, is particularly effective when dealing with queries that rely heavily on exact term matching, term frequency, and inverse document frequency (TF-IDF) principles.

BM25 is generally better suited for scenarios where exact keyword matching is essential, such as in e-commerce search engines, document retrieval systems, and legal e-discovery.

Additionally, BM25 is often used in hybrid search systems alongside vector search to create a more comprehensive understanding of both semantic meaning and keyword importance.

Here are a couple of queries where the exact matching terms in the document would be essential to prevent a lot of results with noise and near close terms but not close enough:

- "Find documents about COVID-19 vaccine side effects in patients with diabetes"
  - the key terms here COVID-19 vaccine and diabetes are were the focus is in the query
- "Best practices for data backup in 2025"
  - It includes specific terms like "data backup" and "2025" that are likely to appear verbatim in relevant documents.
  - BM25 can effectively leverage term frequency (e.g., how often "data backup" appears in a document) and document length normalization to rank documents accurately. The query does not heavily rely on semantic similarity but rather on the presence and frequency of exact keywords.
  - In contrast, dense embeddings might struggle if the training data does not include similar phrasing or if the semantic model does not strongly associate "best practices" with "data backup" in the context of 2025.

Embeddings, on the other hand, are better suited for capturing semantic relationships between words and documents. If embeddings were used in the above scenarios or use-cases, the precision of the results would not be as accurate as with BM25.


### Addendum

_**Sparse Embeddings** are high-dimensional vectors where most values are zero, with only a few non-zero values representing specific features or tokens that are present, making them memory-efficient and interpretable but limited to explicit feature representation._

_**Dense Embeddings** are vectors where most or all dimensions have non-zero values, creating rich, continuous representations that capture complex semantic relationships and contextual meaning, but require more storage and are less interpretable._

_**Key Difference:** Sparse embeddings work like "on/off switches" for specific features (like one-hot encoding or TF-IDF), while dense embeddings work like "semantic fingerprints" where every dimension contributes to the overall meaning representation - sparse focuses on explicit presence/absence, dense captures nuanced relationships._

___

_**Sparse Retrieval** uses exact keyword matching with algorithms like BM25, where documents are represented as sparse vectors containing only the specific terms that appear in them, making it excellent for precise term-based searches but limited to lexical matches._

_**Dense Retrieval** uses semantic embeddings where documents and queries are converted into dense vector representations that capture meaning and context, allowing it to find semantically similar content even when different words are used, but potentially missing exact keyword matches._

_**Key Difference:** Sparse retrieval excels at "what you search is what you get" with exact terms, while dense retrieval excels at "what you mean is what you get" through semantic understanding - which is why hybrid approaches combining both often work best._


## Task 6: Contextual Compression (Using Reranking)

Contextual Compression is a fairly straightforward idea: We want to "compress" our retrieved context into just the most useful bits.

There are a few ways we can achieve this - but we're going to look at a specific example called reranking.

The basic idea here is this:

- We retrieve lots of documents that are very likely related to our query vector
- We "compress" those documents into a smaller set of *more* related documents using a reranking algorithm.

We'll be leveraging Cohere's Rerank model for our reranker today!

All we need to do is the following:

- Create a basic retriever
- Create a compressor (reranker, in this case)

That's it!

Let's see it in the code below!

In [20]:
from langchain.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_cohere import CohereRerank

compressor = CohereRerank(model="rerank-v3.5")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=naive_retriever
)

Let's create our chain again, and see how this does!

In [21]:
contextual_compression_retrieval_chain = (
    {"context": itemgetter("question") | compression_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [22]:
%%time
contextual_compression_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

CPU times: user 419 ms, sys: 11.8 ms, total: 431 ms
Wall time: 6.42 s


'Based on the provided data, the most common issue with loans appears to be problems related to student loan servicing, including errors in loan balances, misapplied payments, wrongful denials of payment plans, and mishandling of loan information. Many complaints involve inaccurate or inconsistent data, lack of communication, and improper handling or transfer of loans.'

In [23]:
%%time
contextual_compression_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

CPU times: user 398 ms, sys: 5.84 ms, total: 404 ms
Wall time: 4.15 s


'Based on the provided data, at least one complaint was not handled in a timely manner. Specifically, the complaint regarding the student loan issue with Maximus Federal Services, Inc. mentions that it has been nearly 18 months with no resolution, indicating it was not addressed in a timely manner. However, the complaint from EdFinancial Services about payment issues was responded to within the expected timeframe, as indicated by "Timely response? Yes."\n\nSo, yes, there was a complaint that did not get handled in a timely manner.'

In [24]:
%%time
contextual_compression_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

CPU times: user 407 ms, sys: 9.57 ms, total: 417 ms
Wall time: 7.06 s


'People failed to pay back their loans for several reasons, including:\n\n1. Lack of awareness: Borrowers were often not informed by their financial aid officers or loan servicers that they would need to repay the loans, leading to confusion about repayment obligations.\n\n2. Poor communication and notification issues: Some borrowers did not receive proper notifications when their loans were transferred between servicers or when repayment was due. This included failure to notify them about payment requirements, updates, or changes in loan ownership.\n\n3. Difficulties with loan management: Borrowers experienced issues accessing their online accounts, incorrect or inconsistent account information, and lack of clear documentation explaining loan balances and interest calculations.\n\n4. Accumulating interest: Borrowers faced situations where interest continued to grow during forbearance, deferment, or missed payments, making it harder to pay off the original loan amount and increasing ov

We'll need to rely on something like Ragas to help us get a better sense of how this is performing overall - but it "feels" better!

## Task 7: Multi-Query Retriever

Typically in RAG we have a single query - the one provided by the user.

What if we had....more than one query!

In essence, a Multi-Query Retriever works by:

1. Taking the original user query and creating `n` number of new user queries using an LLM.
2. Retrieving documents for each query.
3. Using all unique retrieved documents as context

So, how is it to set-up? Not bad! Let's see it down below!



In [25]:
from langchain.retrievers.multi_query import MultiQueryRetriever

multi_query_retriever = MultiQueryRetriever.from_llm(
    retriever=naive_retriever, llm=chat_model
)

In [26]:
multi_query_retrieval_chain = (
    {"context": itemgetter("question") | multi_query_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

In [27]:
%%time
multi_query_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

CPU times: user 1.12 s, sys: 7.88 ms, total: 1.13 s
Wall time: 6.89 s


'Based on the provided context, the most common issues with loans appear to be:\n\n- Errors in loan balances, misapplied payments, and incorrect reporting of loan status.\n- Problems with payment handling, such as inability to apply extra funds to the principal or pay off loans more quickly.\n- Discrepancies and errors in interest calculations and loan terms.\n- Mismanagement of loan classification, such as incorrect loan type or status.\n- Unauthorized or improper ending of deferments or forbearances.\n- Lack of proper communication about loan transfers, account status, or changes.\n- Reports of bad or misleading information affecting credit reports.\n- Issues with loan consolidation and improper handling of consolidation processes.\n- Problems with repayment plans, including wrongful denials and inability to access suitable plans.\n- Erroneous late payments or delinquencies reported, damaging credit scores.\n\nWhile these issues vary, the most predominant pattern involves errors and 

In [28]:
%%time
multi_query_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

CPU times: user 1.13 s, sys: 6.19 ms, total: 1.14 s
Wall time: 6.91 s


'Based on the provided complaints, yes, some complaints were not handled in a timely manner. For example:\n\n- Complaint ID 12709087 (row 441) about the application process was marked "No" for timely response.\n- Complaint ID 12739706 (row 716) regarding account disputes was handled "Yes" for timely response, but the complainant describes ongoing delays, indicating some issues with timeliness.\n- Complaint ID 12698650 (row 67) about recertification was marked "Yes" for timely response, but the user reports stress and delays.\n- Complaint ID 12654977 (row 66) about payment status was marked "No" for timely response.\n\nOverall, multiple complaints explicitly note that the issues were not resolved promptly or responses were delayed, so some complaints were indeed not handled in a timely manner.'

In [29]:
%%time
multi_query_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

CPU times: user 1.13 s, sys: 6.35 ms, total: 1.14 s
Wall time: 11.3 s


'People often fail to pay back their loans due to a combination of systemic issues and individual circumstances, including:\n\n1. **Accumulation of Interest During Forbearance or Deferment:** Borrowers may be placed into forbearance or deferment, which temporarily pauses payments but allows interest to continue accruing. This leads to higher balances over time, making repayment more difficult.\n\n2. **Lack of Clear Information and Guidance:** Borrowers frequently report not being properly informed about options such as income-driven repayment plans, loan rehabilitation, or the long-term effects of forbearance and consolidation. This can result in unintentional default or inflated balances.\n\n3. **Inadequate Communication from Servicers:** Many complaints highlight that borrowers are not adequately notified about payment due dates, loan status changes, or the transfer of their loans between servicers, leading to missed payments and negative credit reports.\n\n4. **Mismanagement and Err

#### ❓ Question #2:

Explain how generating multiple reformulations of a user query can improve recall.

#### ✅ Answer:

Multiple reformulations improve recall because relevant documents may use different terminology than the original query, and each reformulation can surface documents the others miss (different phrasings in multiple reformulations of a query can match different relevant documents).

In other words, multiple reformulations approach the same query from different angles/facets, leading to retrieval of documents covering those various angles. This increases the confluence of documents around the common theme while capturing variations in terminology and perspective, thereby enhancing retrieval scope.

And since such retrievers that use multiple reformulations would follow the below steps:

  1. Generates multiple query variations from the original query using an LLM
  2. Retrieves documents for each variation (each gets k results)
  3. Deduplicates and merges the results from all queries
  4. Returns the final deduplicated set

The return results from multiple reformulations would be more beneficial as a retrieval process.

An example would be "machine learning algorithms" vs "AI models" retrieves different relevant documents but around the same or similar theme.

## Task 8: Parent Document Retriever

A "small-to-big" strategy - the Parent Document Retriever works based on a simple strategy:

1. Each un-split "document" will be designated as a "parent document" (You could use larger chunks of document as well, but our data format allows us to consider the overall document as the parent chunk)
2. Store those "parent documents" in a memory store (not a VectorStore)
3. We will chunk each of those documents into smaller documents, and associate them with their respective parents, and store those in a VectorStore. We'll call those "child chunks".
4. When we query our Retriever, we will do a similarity search comparing our query vector to the "child chunks".
5. Instead of returning the "child chunks", we'll return their associated "parent chunks".

Okay, maybe that was a few steps - but the basic idea is this:

- Search for small documents
- Return big documents

The intuition is that we're likely to find the most relevant information by limiting the amount of semantic information that is encoded in each embedding vector - but we're likely to miss relevant surrounding context if we only use that information.

Let's start by creating our "parent documents" and defining a `RecursiveCharacterTextSplitter`.

In [30]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient, models

parent_docs = loan_complaint_data
child_splitter = RecursiveCharacterTextSplitter(chunk_size=750)

We'll need to set up a new QDrant vectorstore - and we'll use another useful pattern to do so!

> NOTE: We are manually defining our embedding dimension, you'll need to change this if you're using a different embedding model.

In [31]:
from langchain_qdrant import QdrantVectorStore

client = QdrantClient(location=":memory:")

client.create_collection(
    collection_name="full_documents",
    vectors_config=models.VectorParams(size=1536, distance=models.Distance.COSINE)
)

parent_document_vectorstore = QdrantVectorStore(
    collection_name="full_documents", embedding=OpenAIEmbeddings(model="text-embedding-3-small"), client=client
)

Now we can create our `InMemoryStore` that will hold our "parent documents" - and build our retriever!

In [32]:
store = InMemoryStore()

parent_document_retriever = ParentDocumentRetriever(
    vectorstore = parent_document_vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

By default, this is empty as we haven't added any documents - let's add some now!

In [33]:
parent_document_retriever.add_documents(parent_docs, ids=None)

We'll create the same chain we did before - but substitute our new `parent_document_retriever`.

In [34]:
parent_document_retrieval_chain = (
    {"context": itemgetter("question") | parent_document_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's give it a whirl!

In [35]:
%%time
parent_document_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

CPU times: user 403 ms, sys: 23.3 ms, total: 426 ms
Wall time: 1.14 s


'Based on the provided context, the most common issues with loans appear to be related to errors and problems in loan servicing, such as incorrect information on credit reports, misapplied payments, wrongful denials of payment plans, discrepancies in loan balances and interest rates, and issues arising from the sale or transfer of loans. These problems often result from systemic breakdowns within the loan management and reporting processes.'

In [36]:
%%time
parent_document_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

CPU times: user 387 ms, sys: 7.36 ms, total: 394 ms
Wall time: 3.46 s


'Based on the provided information, several complaints indicate that they were not handled in a timely manner. Specifically, the complaints with Complaint IDs 12709087 and 12935889 explicitly state that responses were "No" to being timely. These complaints involved issues with student loan servicing and had been pending for significant periods without timely resolution. \n\nAdditionally, the complaint with Complaint ID 13205525, although marked as "timely response?", was actually reported to be over 30 days old without a response, suggesting a delay in handling.\n\nIn summary, yes, some complaints did not get handled in a timely manner.'

In [37]:
%%time
parent_document_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

CPU times: user 398 ms, sys: 5.87 ms, total: 404 ms
Wall time: 1.66 s


'People often failed to pay back their loans due to various reasons, including financial hardships, mismanagement, lack of proper information, and issues related to loan servicing. Specifically, some borrowers experienced severe financial difficulties after graduation and relied on deferment or forbearance, which increased the overall debt due to accruing interest. Others faced challenges arising from misrepresentations by educational institutions about the value and stability of their degrees and schools, which impacted their employment prospects and ability to repay. Additionally, issues with loan servicing, such as failure to notify borrowers of payment requirements or changes in loan ownership, also contributed to missed payments.'

Overall, the performance *seems* largely the same. We can leverage a tool like [Ragas]() to more effectively answer the question about the performance.

## Task 9: Ensemble Retriever

In brief, an Ensemble Retriever simply takes 2, or more, retrievers and combines their retrieved documents based on a rank-fusion algorithm.

In this case - we're using the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.

Setting it up is as easy as providing a list of our desired retrievers - and the weights for each retriever.

In [38]:
from langchain.retrievers import EnsembleRetriever

retriever_list = [bm25_retriever, naive_retriever, parent_document_retriever, compression_retriever, multi_query_retriever]
equal_weighting = [1/len(retriever_list)] * len(retriever_list)

ensemble_retriever = EnsembleRetriever(
    retrievers=retriever_list, weights=equal_weighting
)

We'll pack *all* of these retrievers together in an ensemble.

In [39]:
ensemble_retrieval_chain = (
    {"context": itemgetter("question") | ensemble_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

Let's look at our results!

In [40]:
%%time
ensemble_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

CPU times: user 2.23 s, sys: 22 ms, total: 2.25 s
Wall time: 8.25 s


'The most common issue with loans, particularly as reflected in the complaints provided, appears to be errors and issues related to the management and handling of student loans. These include mistakes in loan balances, misapplied payments, incorrect or misleading information about loan terms, improper classification of loans, unauthorized transfers of loans, problems with repayment plans, and inaccurate reporting to credit bureaus. \n\nMany complaints also highlight issues such as lack of transparency, poor communication from servicers, wrongful denials of payment or forgiveness programs, and improper collection or reporting practices. A significant subset focuses on mismanagement stemming from servicing errors, misclassification of loan types, or failure to comply with legal obligations under acts like FERPA and the Higher Education Act.\n\nOverall, a predominant issue seems to be **mismanagement and misreporting of student loan information**, leading to financial hardship, damage to 

In [41]:
%%time
ensemble_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

CPU times: user 2.22 s, sys: 26.5 ms, total: 2.25 s
Wall time: 6.09 s


'Yes, based on the provided complaints, several complaints indicate that complaints did not get handled in a timely manner. For example:\n\n- Complaint ID 12709087 from 03/28/25 shows a response labeled "No" for timely response.\n- Complaint ID 12935889 from 04/11/25 was marked as "No" for timely response.\n- Complaint ID 12739706 from 04/01/25 was also marked "No" for timely response.\n\nTherefore, some complaints were not handled promptly.'

In [42]:
%%time
ensemble_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

CPU times: user 2.2 s, sys: 36.5 ms, total: 2.24 s
Wall time: 6.33 s


'People failed to pay back their loans primarily due to a combination of systemic issues, miscommunication, and financial hardships. Many borrowers experienced:\n\n- Lack of clear information about repayment obligations, interest accumulation, and available options.\n- Being steered into long-term forbearances or repayment plans that caused interest to compound, increasing total debt.\n- Difficulties in navigating multiple loan transfers and inconsistent communication from servicers.\n- Challenges in accessing flexible repayment options like income-driven plans or loan forgiveness programs.\n- Unexpected reporting of delinquencies or late payments without proper notice, impacting credit scores.\n- Financial hardships such as unemployment, medical issues, or low income, which made maintaining payments difficult.\n- Perceived unjust practices, such as being unaware of loan transfers, incorrect credit reporting, or improper handling of accounts.\n\nThese issues are compounded by claims of

## Task 10: Semantic Chunking

While this is not a retrieval method - it *is* an effective way of increasing retrieval performance on corpora that have clean semantic breaks in them.

Essentially, Semantic Chunking is implemented by:

1. Embedding all sentences in the corpus.
2. Combining or splitting sequences of sentences based on their semantic similarity based on a number of [possible thresholding methods](https://python.langchain.com/docs/how_to/semantic-chunker/):
  - `percentile`
  - `standard_deviation`
  - `interquartile`
  - `gradient`
3. Each sequence of related sentences is kept as a document!

Let's see how to implement this!

The `breakpoint_threshold_type` parameter controls when the semantic chunker creates chunk boundaries based on embedding similarity between sentences:

**Four Threshold Types:**

1. _"percentile" (default)_
- Splits when sentence embedding distance exceeds the 95th percentile of all distances
- Effect: Creates chunks at the most semantically distinct boundaries
- Behavior: More conservative splitting, larger chunks

2. _"standard_deviation"_
- Splits when distance exceeds 3 standard deviations from mean
- Effect: Better predictable performance, especially for normally distributed content
- Behavior: More consistent chunk sizes

3. _"interquartile"_
- Uses IQR * 1.5 scaling factor to determine breakpoints
- Effect: Middle-ground approach, robust to outliers
- Behavior: Balanced chunk distribution

4. _"gradient"_
- Detects anomalies in embedding distance gradients
- Effect: Best for domain-specific/highly correlated content
- Behavior: Finds subtle semantic transitions

**Impact:** _The threshold type determines sensitivity to semantic changes - more sensitive types create smaller, more focused chunks while less sensitive types create larger, more comprehensive chunks._

We'll use the `percentile` thresholding method for this example which will:

Calculate all distances between sentences, and then break apart sequences of setences that exceed a given percentile among all distances.

In [43]:
from langchain_experimental.text_splitter import SemanticChunker

semantic_chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile"
)

Now we can split our documents.

In [44]:
%%time
semantic_documents = semantic_chunker.split_documents(loan_complaint_data[:20])

CPU times: user 264 ms, sys: 24.3 ms, total: 288 ms
Wall time: 8.8 s


Let's create a new vector store.

In [45]:
semantic_vectorstore = Qdrant.from_documents(
    semantic_documents,
    embeddings,
    location=":memory:",
    collection_name="Loan_Complaint_Data_Semantic_Chunks"
)

We'll use naive retrieval for this example.

In [46]:
semantic_retriever = semantic_vectorstore.as_retriever(search_kwargs={"k" : 10})

Finally we can create our classic chain!

In [47]:
semantic_retrieval_chain = (
    {"context": itemgetter("question") | semantic_retriever, "question": itemgetter("question")}
    | RunnablePassthrough.assign(context=itemgetter("context"))
    | {"response": rag_prompt | chat_model, "context": itemgetter("context")}
)

And view the results!

In [48]:
semantic_retrieval_chain.invoke({"question" : "What is the most common issue with loans?"})["response"].content

'The most common issue with loans, based on the provided complaints, appears to be problems related to loan servicing and management, including:\n\n- Difficulty in handling repayment and payment plans (e.g., issues with auto-debit, payment recalculations, or increased payments)\n- Incorrect or improper reporting of loan status, such as default or delinquency notices\n- Lack of transparency and communication from loan servicers\n- Problems with loan forgiveness, cancellation, or discharge processes\n- Disputes over the legitimacy of debt or data breaches\n\nMany complaints highlight issues with loan servicers like Nelnet, Maximus, and EdFinancial Services, often involving miscommunication, errors in account status, or problems with loan data management.\n\nWhile there isn\'t a single "most common" issue explicitly stated, the recurring theme suggests that mismanagement and errors by loan servicers are among the most frequent problems faced by borrowers.'

In [49]:
semantic_retrieval_chain.invoke({"question" : "Did any complaints not get handled in a timely manner?"})["response"].content

'Based on the provided complaints, yes, some complaints were explicitly noted as "Closed with explanation," which indicates that the issue was not handled in a timely manner or was unresolved at the time of closure. Specifically, multiple complaints about Nelnet and EdFinancial Services mention that the company\'s responses were "Closed with explanation," suggesting that the complaints were not fully addressed or resolved promptly.\n\nAdditionally, the complaint about Nelnet regarding the transfer of accounts and serious misconduct highlights ongoing issues where the company did not respond to certified mail or questions raised, indicating failures in handling the complaint in a timely manner.\n\nIn summary, yes, there were complaints that did not get handled in a timely manner, as evidenced by the "Closed with explanation" responses and ongoing issues described by consumers.'

In [50]:
semantic_retrieval_chain.invoke({"question" : "Why did people fail to pay back their loans?"})["response"].content

'People failed to pay back their loans for various reasons, including issues with loan information accuracy, alleged illegal or improper reporting of their loans, difficulties with loan recovery or transfer, complications with documentation or proper verification, and disputes over the legitimacy or legality of their debt due to changes in regulations or government actions. Additionally, some borrowers experienced technical issues, lack of transparency, or administrative delays that hindered their ability to make or verify payments.'

#### ❓ Question #3:

If sentences are short and highly repetitive (e.g., FAQs), how might semantic chunking behave, and how would you adjust the algorithm?

#### ✅ Answer:

Short and highly repetitive sentences create _minimal embedding distance_ variations, making it difficult to detect _meaningful semantic_ boundaries.

Threshold Type Behaviors:

1. "percentile" (95th percentile)

- Behavior: Creates very few chunks since most distances are similar
- Issue: May group unrelated FAQ topics together
- Adjustment: Lower to 75-85th percentile to increase sensitivity

2. "standard_deviation" (3σ)

- Behavior: Performs poorly due to low variance in short, similar sentences
- Issue: Creates massive chunks with no meaningful breaks
- Adjustment: Reduce to 1-2 standard deviations for more splitting

3. "interquartile" (IQR × 1.5)

- Behavior: Most robust for FAQs due to outlier resistance
- Issue: Still may miss subtle topic transitions
- Adjustment: Reduce scaling factor to 0.8-1.0

4. "gradient" (anomaly detection)

- Behavior: Best performer - detects subtle topic shifts in repetitive content
- Issue: May be overly sensitive to minor variations
- Adjustment: Fine-tune threshold to 85-90th percentile

Conclusion: Use "gradient" with _85th percentile_ + minimum chunk size constraints + keyword-based post-processing to ensure FAQ topics remain grouped appropriately despite repetitive language patterns.

# 🤝 Breakout Room Part #2

#### 🏗️ Activity #1

Your task is to evaluate the various Retriever methods against eachother.

You are expected to:

1. Create a "golden dataset"
 - Use Synthetic Data Generation (powered by Ragas, or otherwise) to create this dataset
2. Evaluate each retriever with *retriever specific* Ragas metrics
 - Semantic Chunking is not considered a retriever method and will not be required for marks, but you may find it useful to do a "semantic chunking on" vs. "semantic chunking off" comparision between them
3. Compile these in a list and write a small paragraph about which is best for this particular data and why.

Your analysis should factor in:
  - Cost
  - Latency
  - Performance

> NOTE: This is **NOT** required to be completed in class. Please spend time in your breakout rooms creating a plan before moving on to writing code.

##### HINTS:

- LangSmith provides detailed information about latency and cost.

In [51]:
### YOUR CODE HERE

In [52]:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

In [53]:
from dotenv import load_dotenv

load_dotenv(dotenv_path="../.env")

True

In [54]:
def check_if_env_var_is_set(env_var_name: str, human_readable_string: str = "API Key"):
    api_key = os.getenv(env_var_name)
  
    if api_key:
       print(f"{env_var_name} is present")
    else:
      print(f"{env_var_name} is NOT present, paste key at the prompt:")
      os.environ[env_var_name] = getpass.getpass(f"Please enter your {human_readable_string}: ")

In [55]:
import os
import getpass

os.environ["LANGCHAIN_TRACING_V2"] = "true"
check_if_env_var_is_set("LANGCHAIN_API_KEY", "LangChain API key")
check_if_env_var_is_set("OPENAI_API_KEY", "OpenAI API key")

LANGCHAIN_API_KEY is present
OPENAI_API_KEY is present


In [56]:
from uuid import uuid4

os.environ["LANGCHAIN_PROJECT"] = f"AIM - ADwLC - {uuid4().hex[0:8]}"

In [57]:
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders import PyMuPDFLoader


path = "data/"
loader = DirectoryLoader(path, glob="*.pdf", loader_cls=PyMuPDFLoader)
docs = loader.load()

In [58]:
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4.1-nano"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

In [59]:
from ragas.testset.graph import KnowledgeGraph

kg = KnowledgeGraph()
kg

KnowledgeGraph(nodes: 0, relationships: 0)

In [60]:
from ragas.testset.graph import Node, NodeType

### NOTICE: We're using a subset of the data for this example - this is to keep costs/time down.
for doc in docs[:20]:
    kg.nodes.append(
        Node(
            type=NodeType.DOCUMENT,
            properties={"page_content": doc.page_content, "document_metadata": doc.metadata}
        )
    )
kg

KnowledgeGraph(nodes: 20, relationships: 0)

In [61]:
%%time
from ragas.testset.transforms import default_transforms, apply_transforms

transformer_llm = generator_llm
embedding_model = generator_embeddings

default_transforms = default_transforms(documents=docs, llm=transformer_llm, embedding_model=embedding_model)
apply_transforms(kg, default_transforms)
kg

Applying HeadlinesExtractor:   0%|          | 0/17 [00:00<?, ?it/s]

Applying HeadlineSplitter:   0%|          | 0/20 [00:00<?, ?it/s]

unable to apply transformation: 'headlines' property not found in this node
unable to apply transformation: 'headlines' property not found in this node
unable to apply transformation: 'headlines' property not found in this node


Applying SummaryExtractor:   0%|          | 0/31 [00:00<?, ?it/s]

Property 'summary' already exists in node '08e466'. Skipping!
Property 'summary' already exists in node '2fcdaf'. Skipping!
Property 'summary' already exists in node 'c5cce0'. Skipping!
Property 'summary' already exists in node 'f726ed'. Skipping!
Property 'summary' already exists in node 'ff53fd'. Skipping!
Property 'summary' already exists in node '3455a6'. Skipping!
Property 'summary' already exists in node 'a973f8'. Skipping!
Property 'summary' already exists in node '22aa9a'. Skipping!
Property 'summary' already exists in node 'e686e9'. Skipping!
Property 'summary' already exists in node '14b369'. Skipping!
Property 'summary' already exists in node '48b9ef'. Skipping!
Property 'summary' already exists in node '50771b'. Skipping!
Property 'summary' already exists in node '9672d9'. Skipping!
Property 'summary' already exists in node '890943'. Skipping!


Applying CustomNodeFilter:   0%|          | 0/6 [00:00<?, ?it/s]

Applying [EmbeddingExtractor, ThemesExtractor, NERExtractor]:   0%|          | 0/43 [00:00<?, ?it/s]

Property 'summary_embedding' already exists in node '08e466'. Skipping!
Property 'summary_embedding' already exists in node 'c5cce0'. Skipping!
Property 'summary_embedding' already exists in node '22aa9a'. Skipping!
Property 'summary_embedding' already exists in node '14b369'. Skipping!
Property 'summary_embedding' already exists in node '890943'. Skipping!
Property 'summary_embedding' already exists in node '48b9ef'. Skipping!
Property 'summary_embedding' already exists in node 'a973f8'. Skipping!
Property 'summary_embedding' already exists in node '50771b'. Skipping!
Property 'summary_embedding' already exists in node 'ff53fd'. Skipping!
Property 'summary_embedding' already exists in node 'f726ed'. Skipping!
Property 'summary_embedding' already exists in node '3455a6'. Skipping!
Property 'summary_embedding' already exists in node '2fcdaf'. Skipping!
Property 'summary_embedding' already exists in node '9672d9'. Skipping!
Property 'summary_embedding' already exists in node 'e686e9'. Sk

Applying [CosineSimilarityBuilder, OverlapScoreBuilder]:   0%|          | 0/2 [00:00<?, ?it/s]

CPU times: user 1.73 s, sys: 197 ms, total: 1.92 s
Wall time: 2min 47s


KnowledgeGraph(nodes: 40, relationships: 480)

In [62]:
%%time
kg.save("loan_data_kg.json")
loan_data_kg = KnowledgeGraph.load("loan_data_kg.json")
loan_data_kg

CPU times: user 5.84 s, sys: 752 ms, total: 6.59 s
Wall time: 19.7 s


KnowledgeGraph(nodes: 40, relationships: 480)

In [63]:
from ragas.testset import TestsetGenerator

generator = TestsetGenerator(llm=generator_llm, embedding_model=embedding_model, knowledge_graph=loan_data_kg)

In [64]:
from ragas.testset.synthesizers import default_query_distribution, SingleHopSpecificQuerySynthesizer, MultiHopAbstractQuerySynthesizer, MultiHopSpecificQuerySynthesizer

query_distribution = [
        (SingleHopSpecificQuerySynthesizer(llm=generator_llm), 0.5),
        (MultiHopAbstractQuerySynthesizer(llm=generator_llm), 0.25),
        (MultiHopSpecificQuerySynthesizer(llm=generator_llm), 0.25),
]

In [65]:
%%time
testset = generator.generate(testset_size=10, query_distribution=query_distribution)
testset.to_pandas()

Generating personas:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Scenarios:   0%|          | 0/3 [00:00<?, ?it/s]

Generating Samples:   0%|          | 0/11 [00:00<?, ?it/s]

CPU times: user 3.65 s, sys: 57.4 ms, total: 3.7 s
Wall time: 42.9 s


Unnamed: 0,user_input,reference_contexts,reference,synthesizer_name
0,What is the significance of Title IV in relati...,"[Chapter 1 Academic Years, Academic Calendars,...","For Title IV purposes, the academic year is de...",single_hop_specifc_query_synthesizer
1,What does 34 CFR 668.3(a) refer to in the cont...,[Regulatory Citations Academic year minimums: ...,Regulatory Citations Academic year minimums: 3...,single_hop_specifc_query_synthesizer
2,Chapter 3 include clinical work?,[Inclusion of Clinical Work in a Standard Term...,Inclusion of Clinical Work in a Standard Term ...,single_hop_specifc_query_synthesizer
3,What are Non-Term Characteristics and how do t...,[Non-Term Characteristics A program that measu...,Non-Term Characteristics refer to programs tha...,single_hop_specifc_query_synthesizer
4,What information does Volume 7 provide regardi...,[both the credit or clock hours and the weeks ...,Volume 7 explains that the amount of Pell Gran...,single_hop_specifc_query_synthesizer
5,so like if the academic year minimums are 34 C...,[<1-hop>\n\nRegulatory Citations Academic year...,The regulatory citations specify that the acad...,multi_hop_abstract_query_synthesizer
6,How inclusion of clinical work in standard ter...,[<1-hop>\n\nInclusion of Clinical Work in a St...,Inclusion of clinical work in standard term pe...,multi_hop_abstract_query_synthesizer
7,How do disbursement timing requirements differ...,[<1-hop>\n\nboth the credit or clock hours and...,Disbursement timing for Pell Grant funds depen...,multi_hop_abstract_query_synthesizer
8,Considering the detailed requirements for disb...,[<1-hop>\n\nboth the credit or clock hours and...,The disbursement timing for Pell Grants and Di...,multi_hop_specific_query_synthesizer
9,How do the guidelines in Volume 8 regarding cl...,[<1-hop>\n\nInclusion of Clinical Work in a St...,Volume 8 provides guidance on including clinic...,multi_hop_specific_query_synthesizer


In [66]:
from langsmith import Client

langsmith_client = Client()

dataset_name = "Loan Synthetic Data (s09)"

langsmith_dataset = langsmith_client.create_dataset(
    dataset_name=dataset_name,
    description="Loan Synthetic Data (for s09 exercise)"
)

LangSmithConflictError: Conflict for /datasets. HTTPError('409 Client Error: Conflict for url: https://api.smith.langchain.com/datasets', '{"detail":"Dataset with this name already exists."}')

In [None]:
for data_row in testset.to_pandas().iterrows():
  langsmith_client.create_example(
      inputs={
          "question": data_row[1]["user_input"]
      },
      outputs={
          "answer": data_row[1]["reference"]
      },
      metadata={
          "context": data_row[1]["reference_contexts"]
      },
      dataset_id=langsmith_dataset.id
  )

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap = 50
)

rag_documents = text_splitter.split_documents(docs)

In [None]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

In [None]:
from langchain_community.vectorstores import Qdrant

vectorstore = Qdrant.from_documents(
    documents=rag_documents,
    embedding=embeddings,
    location=":memory:",
    collection_name="Loan RAG"
)

In [None]:
retriever = vectorstore.as_retriever(search_kwargs={"k": 10})

In [None]:
from langchain.prompts import ChatPromptTemplate

RAG_PROMPT = """\
Given a provided context and question, you must answer the question based only on context.

If you cannot answer the question based on the context - you must say "I don't know".

Context: {context}
Question: {question}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_PROMPT)

In [None]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4.1-mini")

In [None]:
from operator import itemgetter
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
from langchain.schema import StrOutputParser

rag_chain = (
    {"context": itemgetter("question") | retriever, "question": itemgetter("question")}
    | rag_prompt | llm | StrOutputParser()
)

In [None]:
rag_chain.invoke({"question" : "What kinds of loans are available?"})

## LangSmith Evaluation Set-up

In [None]:
eval_llm = ChatOpenAI(model="gpt-4.1")

In [None]:
from langsmith.evaluation import LangChainStringEvaluator, evaluate

qa_evaluator = LangChainStringEvaluator("qa", config={"llm" : eval_llm})

labeled_helpfulness_evaluator = LangChainStringEvaluator(
    "labeled_criteria",
    config={
        "criteria": {
            "helpfulness": (
                "Is this submission helpful to the user,"
                " taking into account the correct reference answer?"
            )
        },
        "llm" : eval_llm
    },
    prepare_data=lambda run, example: {
        "prediction": run.outputs["output"],
        "reference": example.outputs["answer"],
        "input": example.inputs["question"],
    }
)

empathy_evaluator = LangChainStringEvaluator(
    "criteria",
    config={
        "criteria": {
            "empathy": "Is this response empathetic? Does it make the user feel like they are being heard?",
        },
        "llm" : eval_llm
    }
)

## LangSmith Evaluation

In [None]:
# evaluate(
#     rag_chain.invoke,
#     data=dataset_name,
#     evaluators=[
#         qa_evaluator,
#         labeled_helpfulness_evaluator,
#         empathy_evaluator
#     ],
#     metadata={"revision_id": "default_chain_init"},
# )

## Dope-ifying Our Application

In [None]:
EMPATHY_RAG_PROMPT = """\
Given a provided context and question, you must answer the question based only on context.

If you cannot answer the question based on the context - you must say "I don't know".

You must answer the question using empathy and kindness, and make sure the user feels heard.

Context: {context}
Question: {question}
"""

empathy_rag_prompt = ChatPromptTemplate.from_template(EMPATHY_RAG_PROMPT)

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap = 50
)

rag_documents = text_splitter.split_documents(docs)

In [None]:
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-large")

In [None]:
vectorstore = Qdrant.from_documents(
    documents=rag_documents,
    embedding=embeddings,
    location=":memory:",
    collection_name="Loan Data for RAG"
)

In [None]:
retriever = vectorstore.as_retriever()

In [None]:
empathy_rag_chain = (
    {"context": itemgetter("question") | retriever, "question": itemgetter("question")}
    | empathy_rag_prompt | llm | StrOutputParser()
)

In [None]:
# empathy_rag_chain.invoke({"question" : "What kinds of loans are available?"})

In [None]:
# evaluate(
#     empathy_rag_chain.invoke,
#     data=dataset_name,
#     evaluators=[
#         qa_evaluator,
#         labeled_helpfulness_evaluator,
#         empathy_evaluator
#     ],
#     metadata={"revision_id": "empathy_rag_chain"},
# )

### Retriever Evaluation

#### Naive Retrieval Chain

In [None]:
from tqdm.notebook import tqdm

In [None]:
retriever_chains_list = {
    "naive_retrieval_chain" : naive_retrieval_chain,
    "bm25_retrieval_chain": bm25_retrieval_chain,
    "contextual_compression_retrieval_chain": contextual_compression_retrieval_chain,
    "multi_query_retrieval_chain": multi_query_retrieval_chain,
    "parent_document_retrieval_chain": parent_document_retrieval_chain,
    "ensemble_retrieval_chain": ensemble_retrieval_chain,
    "semantic_retrieval_chain": semantic_retrieval_chain
}

retriever_eval_progress_bar = tqdm(retriever_chains_list)
for retriever_chain in retriever_eval_progress_bar:
    retriever_eval_progress_bar.set_description(retriever_chain, refresh=True)
    chain_to_invoke = retriever_chains_list[retriever_chain]
    try:
        evaluate(
          chain_to_invoke.invoke,
          data=dataset_name,
          evaluators=[qa_evaluator, labeled_helpfulness_evaluator, empathy_evaluator],
          metadata={"revision_id": retriever_chain}
        )
    except Exception as ex:
        print(f"Failed to run evaluation on the {retriever_chain}, due to {ex}, skipping to the next one...")
        continue

In [None]:
# try:
# # Try to list projects to see current usage
#   projects = langsmith_client.list_projects()
#   print(f"Current projects: {len(list(projects))}")

#   # Try to list datasets
#   datasets = langsmith_client.list_datasets()
#   print(f"Current datasets: {len(list(datasets))}")

# except Exception as e:
#   print(f"Error details: {e}")
#   print(f"Error type: {type(e)}")

In [None]:
import logging
import requests

# Enable debug logging for requests
logging.basicConfig(level=logging.INFO) ### logging.DEBUG
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.INFO) ### logging.DEBUG
requests_log.propagate = True

In [None]:
# import requests
# import os

# headers = {
#   "X-API-Key": os.environ["LANGCHAIN_API_KEY"],
#   "Content-Type": "application/json"
# }

# from pprint import pprint
# try:
#   response = requests.get(
#       "https://api.smith.langchain.com/datasets",
#       headers=headers,
#       timeout=30
#   )
#   print(f"Status code: {response.status_code}")
#   pprint(f"Response: {response.text}")
# except Exception as e:
#   print(f"Direct API error: {e}")

In [None]:
# from langsmith import Client

# client = Client()
# try:
#   # Make any API call and check the response
#   datasets = list(client.list_datasets(limit=1))
# except Exception as e:
#   # Look for rate limit headers in the error
#   if hasattr(e, 'response') and e.response:
#       print(f"Headers: {e.response.headers}")
#       print(f"Status: {e.response.status_code}")
#       print(f"Body: {e.response.text}")