# Testing Different Retrievers in RAG System

This notebook demonstrates how different retrievers perform on the same knowledge base (rag_test.txt) and compares their strengths and weaknesses.

## What We'll Compare

1. **Similarity Search** - Basic vector similarity
2. **Maximum Marginal Relevance (MMR)** - Diverse results
3. **Similarity Score Threshold** - Quality filtering
4. **BM25 Retriever** - Keyword-based search
5. **Ensemble Retriever** - Hybrid approach
6. **Multi-Query Retriever** - LLM-based query expansion
7. **Contextual Compression** - Token-optimized retrieval

Each retriever will be tested on the same queries from rag_test.txt to see which performs best for different scenarios.

---

## Step 1: Setup and Imports

In [1]:
# ============================================================================
# STEP 1: IMPORTS & SETUP
# ============================================================================

from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain_community.retrievers import BM25Retriever
# Correct imports for LangChain v1
from langchain_classic.retrievers import (
    EnsembleRetriever,
    MultiQueryRetriever,
)
from langchain_classic.retrievers.contextual_compression import ContextualCompressionRetriever
from langchain_classic.retrievers.document_compressors import LLMChainExtractor
from langchain_core.prompts import ChatPromptTemplate
# Correct imports for LangChain v1
from langchain_classic.chains.combine_documents import create_stuff_documents_chain
from langchain_classic.chains import create_retrieval_chain
import os
from dotenv import load_dotenv
import time
from typing import List
from langchain_core.documents import Document

# Load environment variables
load_dotenv()

print("✓ All imports successful")

✓ All imports successful


---

## Step 2: Load and Prepare Documents

In [2]:
# ============================================================================
# STEP 2: LOAD DOCUMENTS FROM rag_test.txt
# ============================================================================

print("\n" + "="*70)
print("LOADING DOCUMENTS FROM rag_test.txt")
print("="*70)

# Load the knowledge base
file_path = "rag_test.txt"
loader = TextLoader(file_path)
documents = loader.load()

print(f"✓ Loaded {len(documents)} document(s)")
print(f"✓ Total content length: {len(documents[0].page_content)} characters")

# Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", " ", ""]
)

chunks = text_splitter.split_documents(documents)

print(f"✓ Split into {len(chunks)} chunks")
print(f"✓ Average chunk size: {sum(len(c.page_content) for c in chunks) / len(chunks):.0f} characters")


LOADING DOCUMENTS FROM rag_test.txt
✓ Loaded 1 document(s)
✓ Total content length: 9061 characters
✓ Split into 12 chunks
✓ Average chunk size: 835 characters


---

## Step 3: Create Embeddings and Vector Store

In [3]:
# ============================================================================
# STEP 3: CREATE EMBEDDINGS & VECTOR STORE
# ============================================================================

print("\n" + "="*70)
print("CREATING EMBEDDINGS & VECTOR STORE")
print("="*70)

# Create embeddings
embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",
    api_key=os.getenv("OPENAI_API_KEY")
)

print("✓ Creating vector store (this may take a moment)...")

# Create FAISS vector store
vector_store = FAISS.from_documents(
    documents=chunks,
    embedding=embeddings
)

print(f"✓ Vector store created with {len(chunks)} embedded chunks")
print(f"✓ Embedding dimension: 1536 (text-embedding-3-small)")


CREATING EMBEDDINGS & VECTOR STORE
✓ Creating vector store (this may take a moment)...
✓ Vector store created with 12 embedded chunks
✓ Embedding dimension: 1536 (text-embedding-3-small)


---

## Step 4: Define All Retrievers

In [4]:
# ============================================================================
# STEP 4: CREATE ALL RETRIEVERS
# ============================================================================

print("\n" + "="*70)
print("CREATING DIFFERENT RETRIEVERS")
print("="*70)

# ─────────────────────────────────────────────────────────────────
# 1. SIMILARITY SEARCH RETRIEVER
# ─────────────────────────────────────────────────────────────────

retriever_similarity = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 4}
)

print("\n[1] Similarity Search Retriever")
print("    ├─ Method: Vector cosine similarity")
print("    ├─ Returns: Top 4 most similar chunks")
print("    └─ Use case: General RAG, prototyping")

# ─────────────────────────────────────────────────────────────────
# 2. MAXIMUM MARGINAL RELEVANCE (MMR) RETRIEVER
# ─────────────────────────────────────────────────────────────────

retriever_mmr = vector_store.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 4, "fetch_k": 20}
)

print("\n[2] Maximum Marginal Relevance (MMR) Retriever")
print("    ├─ Method: Balances relevance + diversity")
print("    ├─ Fetches: 20 candidates, returns 4 most diverse")
print("    └─ Use case: Research, diverse results")

# ─────────────────────────────────────────────────────────────────
# 3. SIMILARITY SCORE THRESHOLD RETRIEVER
# ─────────────────────────────────────────────────────────────────

retriever_threshold = vector_store.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"k": 4, "score_threshold": 0.5}
)

print("\n[3] Similarity Score Threshold Retriever")
print("    ├─ Method: Vector similarity + quality filter")
print("    ├─ Threshold: 0.5 (only high-quality matches)")
print("    └─ Use case: Compliance, safety-critical systems")

# ─────────────────────────────────────────────────────────────────
# 4. BM25 RETRIEVER (Keyword-based)
# ─────────────────────────────────────────────────────────────────

retriever_bm25 = BM25Retriever.from_documents(chunks)
retriever_bm25.k = 4

print("\n[4] BM25 Retriever (Keyword-based)")
print("    ├─ Method: TF-IDF based keyword matching")
print("    ├─ Returns: Top 4 keyword matches")
print("    └─ Use case: Technical docs, acronyms, exact phrases")

# ─────────────────────────────────────────────────────────────────
# 5. ENSEMBLE RETRIEVER (Hybrid)
# ─────────────────────────────────────────────────────────────────

retriever_ensemble = EnsembleRetriever(
    retrievers=[retriever_bm25, retriever_similarity],
    weights=[0.3, 0.7]  # 30% keyword (BM25), 70% semantic (Vector)
)

print("\n[5] Ensemble Retriever (Hybrid)")
print("    ├─ Method: Combines BM25 + Vector search")
print("    ├─ Weights: 30% keyword, 70% semantic")
print("    └─ Use case: Enterprise applications, hybrid search")

# ─────────────────────────────────────────────────────────────────
# 6. MULTI-QUERY RETRIEVER (LLM-based query expansion)
# ─────────────────────────────────────────────────────────────────

llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    api_key=os.getenv("OPENAI_API_KEY")
)

retriever_multiquery = MultiQueryRetriever.from_llm(
    retriever=retriever_similarity,
    llm=llm
)

print("\n[6] Multi-Query Retriever (LLM-based)")
print("    ├─ Method: LLM generates multiple query variations")
print("    ├─ Improves: Recall by capturing different angles")
print("    └─ Use case: Complex questions, Q&A systems")

# ─────────────────────────────────────────────────────────────────
# 7. CONTEXTUAL COMPRESSION RETRIEVER (Token-optimized)
# ─────────────────────────────────────────────────────────────────

compressor = LLMChainExtractor.from_llm(llm)
retriever_compression = ContextualCompressionRetriever(
    base_retriever=retriever_similarity,
    base_compressor=compressor
)

print("\n[7] Contextual Compression Retriever")
print("    ├─ Method: LLM extracts relevant parts of documents")
print("    ├─ Benefit: Reduces tokens, saves API costs")
print("    └─ Use case: Cost optimization, token-limited apps")

print("\n✓ All 7 retrievers created successfully!")


CREATING DIFFERENT RETRIEVERS

[1] Similarity Search Retriever
    ├─ Method: Vector cosine similarity
    ├─ Returns: Top 4 most similar chunks
    └─ Use case: General RAG, prototyping

[2] Maximum Marginal Relevance (MMR) Retriever
    ├─ Method: Balances relevance + diversity
    ├─ Fetches: 20 candidates, returns 4 most diverse
    └─ Use case: Research, diverse results

[3] Similarity Score Threshold Retriever
    ├─ Method: Vector similarity + quality filter
    ├─ Threshold: 0.5 (only high-quality matches)
    └─ Use case: Compliance, safety-critical systems

[4] BM25 Retriever (Keyword-based)
    ├─ Method: TF-IDF based keyword matching
    ├─ Returns: Top 4 keyword matches
    └─ Use case: Technical docs, acronyms, exact phrases

[5] Ensemble Retriever (Hybrid)
    ├─ Method: Combines BM25 + Vector search
    ├─ Weights: 30% keyword, 70% semantic
    └─ Use case: Enterprise applications, hybrid search

[6] Multi-Query Retriever (LLM-based)
    ├─ Method: LLM generates multip

---

## Step 5: Create Helper Function to Test Retrievers

In [5]:
# ============================================================================
# STEP 5: HELPER FUNCTION TO TEST RETRIEVERS
# ============================================================================

def test_retriever(name: str, retriever, query: str) -> dict:
    """
    Test a retriever with a query and return results with metrics.
    
    Args:
        name: Name of the retriever
        retriever: The retriever instance
        query: The query to test
    
    Returns:
        Dictionary with results and metrics
    """
    start_time = time.time()
    
    try:
        results = retriever.invoke(query)
        elapsed_time = time.time() - start_time
        
        return {
            "name": name,
            "success": True,
            "num_results": len(results),
            "time_ms": elapsed_time * 1000,
            "results": results,
            "error": None
        }
    except Exception as e:
        elapsed_time = time.time() - start_time
        return {
            "name": name,
            "success": False,
            "num_results": 0,
            "time_ms": elapsed_time * 1000,
            "results": [],
            "error": str(e)
        }


def display_retriever_results(test_result: dict):
    """
    Display results from a single retriever test.
    """
    print(f"\n{'─'*70}")
    print(f"Retriever: {test_result['name']}")
    print(f"{'─'*70}")
    
    if test_result['success']:
        print(f"✓ Status: SUCCESS")
        print(f"✓ Results found: {test_result['num_results']}")
        print(f"✓ Response time: {test_result['time_ms']:.2f}ms")
        print(f"\nTop Retrieved Chunks:")
        
        for i, doc in enumerate(test_result['results'], 1):
            print(f"\n  [{i}] Content Preview:")
            content = doc.page_content[:200]
            print(f"      {content}...")
    else:
        print(f"✗ Status: FAILED")
        print(f"✗ Error: {test_result['error']}")


def compare_retrievers_table(test_results: List[dict]):
    """
    Display comparison table of all retrievers.
    """
    from tabulate import tabulate
    
    table_data = []
    for result in test_results:
        status = "✓ Success" if result['success'] else "✗ Failed"
        table_data.append([
            result['name'],
            status,
            result['num_results'],
            f"{result['time_ms']:.2f}ms"
        ])
    
    print("\n" + "═"*70)
    print("RETRIEVER COMPARISON TABLE")
    print("═"*70)
    print(tabulate(
        table_data,
        headers=["Retriever Name", "Status", "Results Found", "Response Time"],
        tablefmt="grid",
        stralign="center"
    ))
    print("═"*70)

print("✓ Helper functions defined successfully")

✓ Helper functions defined successfully


---

## Step 6: Test All Retrievers with Sample Queries

In [6]:
# ============================================================================
# STEP 6: TEST ALL RETRIEVERS WITH MULTIPLE QUERIES
# ============================================================================

# Define all retrievers
all_retrievers = [
    ("Similarity Search", retriever_similarity),
    ("MMR (Diverse Results)", retriever_mmr),
    ("Similarity with Threshold", retriever_threshold),
    ("BM25 (Keyword-based)", retriever_bm25),
    ("Ensemble (Hybrid)", retriever_ensemble),
    ("Multi-Query (LLM-based)", retriever_multiquery),
    # ("Contextual Compression", retriever_compression),  # Uncomment if want to test (slower)
]

# Define test queries from rag_test.txt
test_queries = [
    {
        "query": "What is LemoBank's support email?",
        "type": "Simple Lookup",
        "expected": "support@lemobank.example"
    },
    {
        "query": "What is the latest LemoCard annual fee?",
        "type": "Policy Comparison",
        "expected": "₹1,499 (effective 2025-11-15)"
    },
    {
        "query": "What is the KYC Tier 2 requirement?",
        "type": "Policy Details",
        "expected": "PAN + Aadhaar + address proof"
    },
    {
        "query": "Explain how monthly average balance is computed",
        "type": "Complex Explanation",
        "expected": "sum of end-of-day balances / number of days"
    },
    {
        "query": "What bonus interest rate applies?",
        "type": "Conditional Logic",
        "expected": "+1% bonus if threshold met"
    },
]

print("\n" + "#"*70)
print("# TESTING ALL RETRIEVERS WITH MULTIPLE QUERIES")
print("#"*70)

# Test each query with all retrievers
for query_info in test_queries:
    query = query_info["query"]
    query_type = query_info["type"]
    
    print(f"\n\n" + "#"*70)
    print(f"# QUERY: {query}")
    print(f"# Type: {query_type}")
    print(f"#"*70)
    
    results = []
    
    for retriever_name, retriever in all_retrievers:
        result = test_retriever(retriever_name, retriever, query)
        results.append(result)
        display_retriever_results(result)
    
    # Show comparison table
    compare_retrievers_table(results)


######################################################################
# TESTING ALL RETRIEVERS WITH MULTIPLE QUERIES
######################################################################


######################################################################
# QUERY: What is LemoBank's support email?
# Type: Simple Lookup
######################################################################

──────────────────────────────────────────────────────────────────────
Retriever: Similarity Search
──────────────────────────────────────────────────────────────────────
✓ Status: SUCCESS
✓ Results found: 4
✓ Response time: 457.20ms

Top Retrieved Chunks:

  [1] Content Preview:
      [A5] Refund Policy
- Card transaction refunds:
  - “Initiated” to “Completed” typically takes 5–7 business days.
  - If amount > ₹25,000, manual review may extend by 2 business days.
- Wallet transfer...

  [2] Content Preview:
      [A2] Product Overview
LemoBank has three products:
1) LemoCard (credit card)
  

No relevant docs were retrieved using the relevance score threshold 0.5



──────────────────────────────────────────────────────────────────────
Retriever: Similarity with Threshold
──────────────────────────────────────────────────────────────────────
✓ Status: SUCCESS
✓ Results found: 0
✓ Response time: 445.59ms

Top Retrieved Chunks:

──────────────────────────────────────────────────────────────────────
Retriever: BM25 (Keyword-based)
──────────────────────────────────────────────────────────────────────
✓ Status: SUCCESS
✓ Results found: 4
✓ Response time: 1.22ms

Top Retrieved Chunks:

  [1] Content Preview:
      ----------------------------------------------------------------------
SECTION B — TEST QUERIES (run these against your RAG)
----------------------------------------------------------------------

[B1...

  [2] Content Preview:
      [B17] Math sanity check
Q17: If base interest is 4.5% p.a., what is the annual rate as a decimal?
Expected: 0.045.

[B18] “Cite your sources”
Q18: What is the latest LemoVault base interest? Please qu...

  [3] 

No relevant docs were retrieved using the relevance score threshold 0.5



──────────────────────────────────────────────────────────────────────
Retriever: MMR (Diverse Results)
──────────────────────────────────────────────────────────────────────
✓ Status: SUCCESS
✓ Results found: 4
✓ Response time: 414.06ms

Top Retrieved Chunks:

  [1] Content Preview:
      [B17] Math sanity check
Q17: If base interest is 4.5% p.a., what is the annual rate as a decimal?
Expected: 0.045.

[B18] “Cite your sources”
Q18: What is the latest LemoVault base interest? Please qu...

  [2] Content Preview:
      [A5] Refund Policy
- Card transaction refunds:
  - “Initiated” to “Completed” typically takes 5–7 business days.
  - If amount > ₹25,000, manual review may extend by 2 business days.
- Wallet transfer...

  [3] Content Preview:
      [A15] Contradiction Pair (to test conflict handling)
Doc X: “CEO is Asha Raman.”
Doc Y: “CEO is Asha Raman.” (duplicate)
Doc Z: “CEO is Ashwin Raman.” (this is incorrect, keep it as noise)

[A16] Prom...

  [4] Content Preview:
      RAG TE

No relevant docs were retrieved using the relevance score threshold 0.5



──────────────────────────────────────────────────────────────────────
Retriever: Similarity with Threshold
──────────────────────────────────────────────────────────────────────
✓ Status: SUCCESS
✓ Results found: 0
✓ Response time: 284.32ms

Top Retrieved Chunks:

──────────────────────────────────────────────────────────────────────
Retriever: BM25 (Keyword-based)
──────────────────────────────────────────────────────────────────────
✓ Status: SUCCESS
✓ Results found: 4
✓ Response time: 0.23ms

Top Retrieved Chunks:

  [1] Content Preview:
      ----------------------------------------------------------------------
SECTION B — TEST QUERIES (run these against your RAG)
----------------------------------------------------------------------

[B1...

  [2] Content Preview:
      [B5] Multi-hop (definition + threshold)
Q5: Explain how monthly average balance is computed and why it matters for bonus interest.
Expected: formula from [A13] + tie to bonus threshold.

[B6] Table lo...

  [3] 

No relevant docs were retrieved using the relevance score threshold 0.5



──────────────────────────────────────────────────────────────────────
Retriever: Similarity with Threshold
──────────────────────────────────────────────────────────────────────
✓ Status: SUCCESS
✓ Results found: 0
✓ Response time: 430.06ms

Top Retrieved Chunks:

──────────────────────────────────────────────────────────────────────
Retriever: BM25 (Keyword-based)
──────────────────────────────────────────────────────────────────────
✓ Status: SUCCESS
✓ Results found: 4
✓ Response time: 0.23ms

Top Retrieved Chunks:

  [1] Content Preview:
      ----------------------------------------------------------------------
SECTION B — TEST QUERIES (run these against your RAG)
----------------------------------------------------------------------

[B1...

  [2] Content Preview:
      [B5] Multi-hop (definition + threshold)
Q5: Explain how monthly average balance is computed and why it matters for bonus interest.
Expected: formula from [A13] + tie to bonus threshold.

[B6] Table lo...

  [3] 

---

## Step 7: Detailed Analysis and Recommendations

In [7]:
# ============================================================================
# STEP 7: ANALYSIS AND RECOMMENDATIONS
# ============================================================================

print("\n" + "="*70)
print("RETRIEVER ANALYSIS & RECOMMENDATIONS")
print("="*70)

analysis = """
╔══════════════════════════════════════════════════════════════════════╗
║                   RETRIEVER SELECTION GUIDE                         ║
╚══════════════════════════════════════════════════════════════════════╝

1. SIMILARITY SEARCH
   ✓ Best for: MVP, prototyping, general use
   ✓ Pros: Fast, simple, works well in most cases
   ✗ Cons: May miss synonym variations
   → Recommendation: Use this as your baseline/default

2. MMR (Maximum Marginal Relevance)
   ✓ Best for: Research, exploring multiple perspectives
   ✓ Pros: Returns diverse, non-redundant results
   ✗ Cons: Slightly slower, may not prioritize relevance
   → Recommendation: Use when you need varied viewpoints

3. SIMILARITY WITH SCORE THRESHOLD
   ✓ Best for: Safety-critical, compliance systems
   ✓ Pros: Quality guarantee, prevents low-relevance matches
   ✗ Cons: May return 0 results if threshold too high
   → Recommendation: Use in regulated industries (finance, healthcare)

4. BM25 (Keyword-based)
   ✓ Best for: Technical docs, acronyms, exact phrases
   ✓ Pros: Fast (no embeddings needed), great for keywords
   ✗ Cons: Doesn't understand semantics, misses synonyms
   → Recommendation: Use for code search, technical documentation

5. ENSEMBLE (Hybrid)
   ✓ Best for: Enterprise applications, advanced search
   ✓ Pros: Best of both worlds - keywords + semantics
   ✗ Cons: Slightly slower (calls both retrievers)
   → Recommendation: Use for production enterprise systems

6. MULTI-QUERY (LLM-based)
   ✓ Best for: Complex questions, Q&A systems
   ✓ Pros: Better understanding of intent, improved recall
   ✗ Cons: Slower (LLM calls), higher cost
   → Recommendation: Use when recall is more important than speed

7. CONTEXTUAL COMPRESSION
   ✓ Best for: Cost-sensitive, token-limited applications
   ✓ Pros: Reduces token usage significantly
   ✗ Cons: Slowest (extra LLM call for compression)
   → Recommendation: Use when optimizing for cost/token efficiency

╔══════════════════════════════════════════════════════════════════════╗
║                   PERFORMANCE SUMMARY                               ║
╚══════════════════════════════════════════════════════════════════════╝

Speed (fastest → slowest):
  1. BM25 (no embeddings needed)
  2. Similarity Search
  3. MMR
  4. Ensemble
  5. Threshold Search
  6. Multi-Query (LLM calls)
  7. Contextual Compression (slowest)

Accuracy (best recall → worst):
  1. Ensemble (combines keyword + semantic)
  2. Multi-Query (LLM understands intent)
  3. Similarity Search
  4. MMR
  5. Contextual Compression
  6. BM25 (keyword only)
  7. Threshold (may miss valid results)

Cost (lowest → highest):
  1. BM25 (free, no embeddings)
  2. Similarity Search (embeddings only)
  3. MMR
  4. Threshold Search
  5. Ensemble
  6. Multi-Query (LLM calls)
  7. Contextual Compression (most expensive)

╔══════════════════════════════════════════════════════════════════════╗
║                   QUICK DECISION MATRIX                             ║
╚══════════════════════════════════════════════════════════════════════╝

Question: "What should I use?"

IF speed is critical
  → Use: Similarity Search or BM25

IF cost is critical
  → Use: Similarity Search (cheapest with embeddings)

IF accuracy is critical
  → Use: Ensemble or Multi-Query

IF safety/compliance is critical
  → Use: Similarity with Score Threshold

IF you need diverse results
  → Use: MMR

IF searching technical documents
  → Use: Ensemble (BM25 + Similarity)

IF unsure
  → Start with: Similarity Search, then optimize
"""

print(analysis)


RETRIEVER ANALYSIS & RECOMMENDATIONS

╔══════════════════════════════════════════════════════════════════════╗
║                   RETRIEVER SELECTION GUIDE                         ║
╚══════════════════════════════════════════════════════════════════════╝

1. SIMILARITY SEARCH
   ✓ Best for: MVP, prototyping, general use
   ✓ Pros: Fast, simple, works well in most cases
   ✗ Cons: May miss synonym variations
   → Recommendation: Use this as your baseline/default

2. MMR (Maximum Marginal Relevance)
   ✓ Best for: Research, exploring multiple perspectives
   ✓ Pros: Returns diverse, non-redundant results
   ✗ Cons: Slightly slower, may not prioritize relevance
   → Recommendation: Use when you need varied viewpoints

3. SIMILARITY WITH SCORE THRESHOLD
   ✓ Best for: Safety-critical, compliance systems
   ✓ Pros: Quality guarantee, prevents low-relevance matches
   ✗ Cons: May return 0 results if threshold too high
   → Recommendation: Use in regulated industries (finance, healthcare)



---

## Step 8: Compare Retrievers Side-by-Side (Custom Query)

In [8]:
# ============================================================================
# STEP 8: CUSTOM QUERY TEST
# ============================================================================
# Modify this query to test with your own question

custom_query = "Can I reverse a completed wallet transfer?"

print("\n" + "#"*70)
print(f"# CUSTOM QUERY TEST")
print(f"# Query: {custom_query}")
print("#"*70)

custom_results = []

for retriever_name, retriever in all_retrievers:
    result = test_retriever(retriever_name, retriever, custom_query)
    custom_results.append(result)
    display_retriever_results(result)

# Show comparison table
compare_retrievers_table(custom_results)


######################################################################
# CUSTOM QUERY TEST
# Query: Can I reverse a completed wallet transfer?
######################################################################

──────────────────────────────────────────────────────────────────────
Retriever: Similarity Search
──────────────────────────────────────────────────────────────────────
✓ Status: SUCCESS
✓ Results found: 4
✓ Response time: 120.76ms

Top Retrieved Chunks:

  [1] Content Preview:
      [A5] Refund Policy
- Card transaction refunds:
  - “Initiated” to “Completed” typically takes 5–7 business days.
  - If amount > ₹25,000, manual review may extend by 2 business days.
- Wallet transfer...

  [2] Content Preview:
      [B5] Multi-hop (definition + threshold)
Q5: Explain how monthly average balance is computed and why it matters for bonus interest.
Expected: formula from [A13] + tie to bonus threshold.

[B6] Table lo...

  [3] Content Preview:
      [B11] Non-reversible action
Q

  self.vectorstore.similarity_search_with_relevance_scores(
No relevant docs were retrieved using the relevance score threshold 0.5



──────────────────────────────────────────────────────────────────────
Retriever: Similarity with Threshold
──────────────────────────────────────────────────────────────────────
✓ Status: SUCCESS
✓ Results found: 0
✓ Response time: 306.69ms

Top Retrieved Chunks:

──────────────────────────────────────────────────────────────────────
Retriever: BM25 (Keyword-based)
──────────────────────────────────────────────────────────────────────
✓ Status: SUCCESS
✓ Results found: 4
✓ Response time: 0.24ms

Top Retrieved Chunks:

  [1] Content Preview:
      [B5] Multi-hop (definition + threshold)
Q5: Explain how monthly average balance is computed and why it matters for bonus interest.
Expected: formula from [A13] + tie to bonus threshold.

[B6] Table lo...

  [2] Content Preview:
      [B11] Non-reversible action
Q11: Can a completed wallet transfer be reversed?
Expected: No, wallet transfers final once completed [A5].

[B12] Contradictory docs (noise handling)
Q12: Who is the CEO?
...

  [3] 

---

## Step 9: Build RAG Chain with Best Retriever

In [9]:
# ============================================================================
# STEP 9: BUILD COMPLETE RAG CHAIN WITH SELECTED RETRIEVER
# ============================================================================

print("\n" + "="*70)
print("BUILDING COMPLETE RAG CHAINS WITH DIFFERENT RETRIEVERS")
print("="*70)

# Create LLM and prompt template
llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    api_key=os.getenv("OPENAI_API_KEY")
)

system_prompt = """You are a helpful assistant for LemoBank customer support.
Use the provided context to answer questions accurately and cite your sources.

If the answer is not in the provided documents, say "I don't have this information in the knowledge base."
Do NOT make up information.

Context:
{context}

Question: {input}
"""

prompt = ChatPromptTemplate.from_template(system_prompt)

# Dictionary to store RAG chains
rag_chains = {}

print("\nBuilding RAG chains with different retrievers...")

for retriever_name, retriever in all_retrievers:
    try:
        document_chain = create_stuff_documents_chain(llm, prompt)
        rag_chain = create_retrieval_chain(retriever, document_chain)
        rag_chains[retriever_name] = rag_chain
        print(f"  ✓ {retriever_name}")
    except Exception as e:
        print(f"  ✗ {retriever_name}: {str(e)[:50]}")

print(f"\n✓ Created {len(rag_chains)} RAG chains successfully")


BUILDING COMPLETE RAG CHAINS WITH DIFFERENT RETRIEVERS

Building RAG chains with different retrievers...
  ✓ Similarity Search
  ✓ MMR (Diverse Results)
  ✓ Similarity with Threshold
  ✓ BM25 (Keyword-based)
  ✓ Ensemble (Hybrid)
  ✓ Multi-Query (LLM-based)

✓ Created 6 RAG chains successfully


---

## Step 10: Compare RAG Responses from Different Retrievers

In [10]:
# ============================================================================
# STEP 10: COMPARE RAG RESPONSES FROM DIFFERENT RETRIEVERS
# ============================================================================

test_question = "What is the refund policy for card transactions?"

print("\n" + "="*70)
print(f"COMPARING RAG RESPONSES")
print(f"Question: {test_question}")
print("="*70)

for retriever_name, rag_chain in rag_chains.items():
    start_time = time.time()
    
    try:
        response = rag_chain.invoke({"input": test_question})
        elapsed_time = time.time() - start_time
        
        print(f"\n{'─'*70}")
        print(f"Retriever: {retriever_name}")
        print(f"Time: {elapsed_time*1000:.2f}ms")
        print(f"Sources: {len(response['context'])} chunks")
        print(f"{'─'*70}")
        print(f"\nAnswer:")
        print(f"{response['answer']}")
        
    except Exception as e:
        print(f"\n{'─'*70}")
        print(f"Retriever: {retriever_name}")
        print(f"{'─'*70}")
        print(f"✗ Error: {str(e)[:100]}")

print("\n" + "="*70)


COMPARING RAG RESPONSES
Question: What is the refund policy for card transactions?

──────────────────────────────────────────────────────────────────────
Retriever: Similarity Search
Time: 1868.90ms
Sources: 4 chunks
──────────────────────────────────────────────────────────────────────

Answer:
The refund policy for card transactions states that the time from "Initiated" to "Completed" typically takes 5–7 business days. If the amount is greater than ₹25,000, a manual review may extend this period by an additional 2 business days. [A5]

──────────────────────────────────────────────────────────────────────
Retriever: MMR (Diverse Results)
Time: 2150.04ms
Sources: 4 chunks
──────────────────────────────────────────────────────────────────────

Answer:
The refund policy for card transactions at LemoBank states that the process from "Initiated" to "Completed" typically takes 5–7 business days. If the refund amount is greater than ₹25,000, a manual review may extend the processing time b

No relevant docs were retrieved using the relevance score threshold 0.5



──────────────────────────────────────────────────────────────────────
Retriever: Similarity with Threshold
Time: 920.57ms
Sources: 0 chunks
──────────────────────────────────────────────────────────────────────

Answer:
I don't have this information in the knowledge base.

──────────────────────────────────────────────────────────────────────
Retriever: BM25 (Keyword-based)
Time: 459.23ms
Sources: 4 chunks
──────────────────────────────────────────────────────────────────────

Answer:
I don't have this information in the knowledge base.

──────────────────────────────────────────────────────────────────────
Retriever: Ensemble (Hybrid)
Time: 2494.93ms
Sources: 7 chunks
──────────────────────────────────────────────────────────────────────

Answer:
The refund policy for card transactions states that refunds are typically initiated to completed within 5–7 business days. If the amount is greater than ₹25,000, a manual review may extend this by an additional 2 business days. [A5]

──────

---

## Summary: Key Takeaways

### Retriever Comparison Summary

| Retriever | Speed | Accuracy | Cost | Best For |
|-----------|-------|----------|------|----------|
| **Similarity Search** | ⚡⚡⚡⚡ | ⭐⭐⭐⭐ | $ | General RAG, MVP |
| **MMR** | ⚡⚡⚡ | ⭐⭐⭐⭐ | $ | Diverse results |
| **Score Threshold** | ⚡⚡⚡⚡ | ⭐⭐⭐ | $ | Safety-critical |
| **BM25** | ⚡⚡⚡⚡⚡ | ⭐⭐⭐ | Free | Technical docs |
| **Ensemble** | ⚡⚡⚡ | ⭐⭐⭐⭐⭐ | $ | Enterprise, hybrid |
| **Multi-Query** | ⚡⚡ | ⭐⭐⭐⭐⭐ | $$ | Complex questions |
| **Contextual Compression** | ⚡ | ⭐⭐⭐ | $$$ | Cost optimization |

### Recommendations by Scenario

- **Prototyping**: Use Similarity Search
- **Production General**: Use Ensemble or Multi-Query
- **Cost-Critical**: Use BM25 or Similarity Search
- **Safety-Critical**: Use Score Threshold
- **Token-Limited**: Use Contextual Compression
- **Complex Questions**: Use Ensemble or Multi-Query