# LAB 4: LANGCHAIN RAG SYSTEMS - COMPLETE GUIDE

**Course:** Advanced Prompt Engineering Training  
**Session:** Session 3 - RAG & Advanced Retrieval (Day 2)  
**Duration:** 150 minutes (2.5 hours)  
**Type:** LangChain-Based RAG Workshop

## LAB OVERVIEW

This comprehensive lab teaches you to build **production-grade RAG (Retrieval-Augmented Generation) systems using LangChain**. You'll progress through five interconnected modules:

1. **Document Loading & Chunking** - LangChain text splitters and document loaders
2. **Embeddings & Search** - OpenAIEmbeddings and similarity search
3. **Vector Store** - LangChain Chroma integration
4. **Complete RAG Pipeline** - LCEL-based RAG chains
5. **Advanced Patterns** - MultiQuery, Contextual Compression, Re-ranking

**Scenario:** You're building an AI-powered knowledge base for a bank's internal documentation - policies, procedures, compliance guidelines, and product information.

### Step 1: Import Libraries

In [None]:
# Lab 4: LangChain RAG Systems
# Advanced Prompt Engineering Training - Session 3

import os
import json
import time
from datetime import datetime
from typing import List, Dict, Any, Optional

# LangChain core
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_core.documents import Document

# LangChain OpenAI
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# LangChain text splitters
from langchain_classic.text_splitter import (
    RecursiveCharacterTextSplitter,
    CharacterTextSplitter,
    TokenTextSplitter
)

# LangChain vector stores
from langchain_chroma import Chroma

# LangChain retrievers
from langchain_classic.retrievers import (
    MultiQueryRetriever,
    ContextualCompressionRetriever
)
from langchain_classic.retrievers.document_compressors import LLMChainExtractor

# Standard libraries
import pandas as pd
import numpy as np
from dotenv import load_dotenv

load_dotenv(override=True)

print("✓ All libraries imported successfully")
print("✓ LangChain RAG components ready")

### Step 2: Configure LangChain Models and Embeddings

In [None]:
# Model configurations
GPT4_MODEL = os.environ.get("MODEL_NAME", "gpt-4o")
GPT35_MODEL = os.environ.get("FAST_MODEL_NAME", "gpt-3.5-turbo")
EMBEDDING_MODEL = "text-embedding-3-small"

# Initialize LangChain models
llm_gpt4 = ChatOpenAI(
    model=GPT4_MODEL,
    temperature=0,
    max_tokens=2000
)

llm_gpt35 = ChatOpenAI(
    model=GPT35_MODEL,
    temperature=0,
    max_tokens=2000
)

# Initialize embeddings
embeddings = OpenAIEmbeddings(
    model=EMBEDDING_MODEL
)

print(f"✓ LLM models configured: {GPT4_MODEL}, {GPT35_MODEL}")
print(f"✓ Embeddings model: {EMBEDDING_MODEL}")
print(f"✓ LangChain RAG stack ready")

### Step 3: Load Sample Knowledge Base

In [None]:
# Sample bank knowledge base documents
BANK_KNOWLEDGE_BASE = {
    "hr_policies": """
HUMAN RESOURCES POLICIES - CONSOLIDATED HANDBOOK

Section 1: Paid Time Off (PTO) Policy

Employees are entitled to paid time off for rest and personal needs. Our PTO policy is designed to promote work-life balance while ensuring business continuity.

PTO Accrual:
- Full-time employees accrue 20 days (160 hours) of PTO annually
- PTO accrual begins after successful completion of 90-day probationary period
- PTO accrues at a rate of 1.67 days per month
- Part-time employees accrue PTO on a pro-rated basis
- Maximum PTO bank: 30 days (240 hours); excess is forfeited

Usage Guidelines:
- PTO requests must be submitted at least 2 weeks in advance for periods longer than 3 days
- Manager approval required for all PTO requests
- Holiday periods (Thanksgiving, Christmas, New Year) require 30-day advance notice
- Unused PTO does not roll over to the following year
- PTO cannot be cashed out except upon termination

Section 2: Health Insurance Coverage

We offer comprehensive health insurance plans to eligible employees and their dependents.

Eligibility:
- Full-time employees (30+ hours/week) are eligible
- Coverage begins on the first day of the month following 30 days of employment
- Eligible dependents include spouse and children under 26

Plan Options:
- PPO Plan: $250 deductible, 80/20 coinsurance, $3,000 out-of-pocket max
- HMO Plan: $100 deductible, 90/10 coinsurance, $2,000 out-of-pocket max
- HSA-eligible HDHP: $1,500 deductible, 100% after deductible, $4,000 max

Section 3: Remote Work Policy

Our hybrid work model allows flexibility while maintaining collaboration and culture.

Eligibility:
- Employees must complete 6 months of employment before becoming eligible
- Manager approval required
- Position must be suitable for remote work
- Performance must meet or exceed expectations

Requirements:
- Employees must work from primary residence within the United States
- Dedicated workspace with reliable internet (minimum 25 Mbps)
- Available during core hours (10 AM - 3 PM local time)
- Attend all required in-person meetings and events
""",
    
    "loan_products": """
LOAN PRODUCTS GUIDE - CONSUMER LENDING

Product 1: Personal Loans

Personal loans provide flexible funding for debt consolidation, home improvements, major purchases, or other personal needs.

Loan Amounts: $5,000 to $50,000
Terms: 12, 24, 36, 48, or 60 months
Interest Rates: 7.99% - 18.99% APR (based on creditworthiness)

Eligibility Requirements:
- Minimum credit score: 650
- Minimum annual income: $35,000
- Debt-to-income ratio: Maximum 43%
- Employment: Minimum 2 years current job or 3 years in same field
- U.S. citizen or permanent resident

Fees:
- Origination fee: 1-5% of loan amount
- Late payment fee: $35 or 5% of payment (whichever is greater)
- Returned payment fee: $35
- No prepayment penalty

Product 2: Home Mortgage Loans

We offer competitive mortgage products for home purchase and refinancing.

Conventional Mortgages:
- Loan amounts up to $726,200 (conforming limits)
- Down payment: Minimum 5% (20% to avoid PMI)
- Terms: 15, 20, or 30 years
- Interest rates: 6.50% - 8.25% (based on credit, LTV, term)

Qualification Guidelines:
- Credit score: Minimum 620 (conventional), 580 (FHA)
- DTI ratio: Maximum 43% (some exceptions to 50%)
- Employment verification: 2 years history
- Down payment must be from acceptable sources
- Property must appraise at or above purchase price

Product 3: Auto Loans

Competitive financing for new and used vehicle purchases.

New Vehicle Loans:
- Loan amounts: $5,000 to $100,000
- Terms: 24, 36, 48, 60, or 72 months
- Rates: 4.99% - 9.99% APR
- Maximum LTV: 125%

Used Vehicle Loans:
- Vehicle age: Up to 8 years old
- Mileage: Maximum 100,000 miles
- Terms: 24, 36, 48, or 60 months
- Rates: 5.99% - 11.99% APR
""",

    "compliance_guidelines": """
COMPLIANCE AND REGULATORY GUIDELINES

Anti-Money Laundering (AML) Requirements

Our AML program ensures compliance with the Bank Secrecy Act and USA PATRIOT Act.

Customer Due Diligence (CDD):
- All new customers must be verified using government-issued ID
- Collect: Full legal name, date of birth, physical address, SSN/TIN
- Document type and ID number must be recorded
- Verification must occur before account opening

Transaction Monitoring:
- All transactions over $10,000 must be reported (CTR)
- Structured transactions must be identified and reported (SAR)
- Wire transfers require additional screening
- Daily monitoring of all accounts for unusual patterns

Suspicious Activity Reporting (SAR):
- File within 30 days of detecting suspicious activity
- Threshold: $5,000 for most violations
- Customer must not be notified of SAR filing
- Maintain SAR confidentiality

Know Your Customer (KYC) Program

Ongoing customer monitoring to assess risk and detect changes.

Risk Categories:
- Low Risk: Employed individuals, small balances, low transaction volume
- Medium Risk: Self-employed, moderate balances, regular transactions
- High Risk: Cash businesses, high balances, frequent international activity

Review Frequency:
- Low risk: Annual review
- Medium risk: Semi-annual review
- High risk: Quarterly review or continuous monitoring

Privacy and Data Protection

Safeguarding customer information is paramount.

Gramm-Leach-Bliley Act (GLBA) Compliance:
- Provide privacy notice at account opening
- Annual privacy notice required
- Opt-out must be offered for information sharing

Information Security:
- Encrypt all sensitive data at rest and in transit
- Access controls: minimum necessary principle
- Password requirements: 12+ characters, MFA enabled
- Log and monitor all access to customer data
"""
}

# Convert to LangChain Document objects
documents = []
for doc_type, content in BANK_KNOWLEDGE_BASE.items():
    doc = Document(
        page_content=content,
        metadata={
            "type": doc_type,
            "source": "bank_knowledge_base",
            "created_at": datetime.now().isoformat()
        }
    )
    documents.append(doc)

print(f"✓ Loaded {len(documents)} knowledge base documents")
print(f"✓ Total characters: {sum(len(doc.page_content) for doc in documents):,}")
print(f"✓ Document types: {[doc.metadata['type'] for doc in documents]}")

---

## PART 1: DOCUMENT CHUNKING WITH LANGCHAIN (Lab 4.1)

**Duration:** 30 minutes  
**Objective:** Use LangChain text splitters for optimal chunking

### Theory: LangChain Text Splitters

LangChain provides multiple text splitter strategies:

- **RecursiveCharacterTextSplitter**: Smartest - splits on natural boundaries
- **CharacterTextSplitter**: Simple - splits on single separator
- **TokenTextSplitter**: Precise - splits by token count

**Advantages:**
- Pre-built, tested implementations
- Consistent with LangChain ecosystem
- Metadata preservation
- Easy to configure and compare

### Challenge 1.1: RecursiveCharacterTextSplitter (10 minutes)

**Strategy:** Split on multiple separators in order (paragraph → sentence → word)

In [None]:
# Create RecursiveCharacterTextSplitter
recursive_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    length_function=len,
    separators=["\n\n", "\n", ". ", " ", ""],
    is_separator_regex=False
)

print("="*80)
print("RECURSIVE CHARACTER TEXT SPLITTER")
print("="*80 + "\n")

# Split all documents
recursive_chunks = recursive_splitter.split_documents(documents)

print(f"Total chunks created: {len(recursive_chunks)}")
print(f"\nChunks per document:")
for doc_type in BANK_KNOWLEDGE_BASE.keys():
    doc_chunks = [c for c in recursive_chunks if c.metadata['type'] == doc_type]
    print(f"  {doc_type}: {len(doc_chunks)} chunks")

# Show sample chunk
print(f"\nSample chunk:")
print("-" * 80)
print(f"Metadata: {recursive_chunks[0].metadata}")
print(f"Content ({len(recursive_chunks[0].page_content)} chars):")
print(recursive_chunks[0].page_content[:300] + "...")
print("-" * 80)

### Challenge 1.2: CharacterTextSplitter (5 minutes)

**Strategy:** Split on a single separator (e.g., double newline)

In [None]:
# Create CharacterTextSplitter
character_splitter = CharacterTextSplitter(
    separator="\n\n",
    chunk_size=800,
    chunk_overlap=80,
    length_function=len
)

print("\n" + "="*80)
print("CHARACTER TEXT SPLITTER (Paragraph-based)")
print("="*80 + "\n")

# Split documents
character_chunks = character_splitter.split_documents(documents)

print(f"Total chunks created: {len(character_chunks)}")
print(f"Avg chunk size: {np.mean([len(c.page_content) for c in character_chunks]):.0f} chars")
print(f"Min size: {min(len(c.page_content) for c in character_chunks)}")
print(f"Max size: {max(len(c.page_content) for c in character_chunks)}")

### Challenge 1.3: TokenTextSplitter (5 minutes)

**Strategy:** Split by exact token count for precise control

In [None]:
# Create TokenTextSplitter
token_splitter = TokenTextSplitter(
    chunk_size=300,
    chunk_overlap=30
)

print("\n" + "="*80)
print("TOKEN TEXT SPLITTER")
print("="*80 + "\n")

# Split documents
token_chunks = token_splitter.split_documents(documents)

print(f"Total chunks created: {len(token_chunks)}")
print(f"Target tokens per chunk: 300")
print(f"\nNote: TokenTextSplitter ensures precise token counts for API limits")

### Challenge 1.4: Chunking Strategy Comparison (10 minutes)

In [None]:
# Compare all strategies
print("\n" + "="*80)
print("CHUNKING STRATEGY COMPARISON")
print("="*80 + "\n")

comparison_data = [
    {
        "Strategy": "RecursiveCharacter",
        "Chunks": len(recursive_chunks),
        "Avg Size": int(np.mean([len(c.page_content) for c in recursive_chunks])),
        "Min": min(len(c.page_content) for c in recursive_chunks),
        "Max": max(len(c.page_content) for c in recursive_chunks),
        "Std Dev": int(np.std([len(c.page_content) for c in recursive_chunks]))
    },
    {
        "Strategy": "Character",
        "Chunks": len(character_chunks),
        "Avg Size": int(np.mean([len(c.page_content) for c in character_chunks])),
        "Min": min(len(c.page_content) for c in character_chunks),
        "Max": max(len(c.page_content) for c in character_chunks),
        "Std Dev": int(np.std([len(c.page_content) for c in character_chunks]))
    },
    {
        "Strategy": "Token",
        "Chunks": len(token_chunks),
        "Avg Size": int(np.mean([len(c.page_content) for c in token_chunks])),
        "Min": min(len(c.page_content) for c in token_chunks),
        "Max": max(len(c.page_content) for c in token_chunks),
        "Std Dev": int(np.std([len(c.page_content) for c in token_chunks]))
    }
]

comparison_df = pd.DataFrame(comparison_data)
print(comparison_df.to_string(index=False))

print(f"\n✓ RecursiveCharacterTextSplitter: Best for general use (smart boundaries)")
print(f"✓ CharacterTextSplitter: Good for paragraph-based content")
print(f"✓ TokenTextSplitter: Best for precise token control")

# Use recursive chunks for rest of lab
chunks_to_use = recursive_chunks
print(f"\n✓ Using RecursiveCharacterTextSplitter chunks ({len(chunks_to_use)} chunks) for the rest of the lab")

---

## PART 2: EMBEDDINGS WITH LANGCHAIN (Lab 4.2)

**Duration:** 20 minutes  
**Objective:** Use OpenAIEmbeddings for vector representations


### Challenge 2.1: Generate Embeddings (10 minutes)

In [None]:
print("="*80)
print("GENERATING EMBEDDINGS WITH OPENAIEMBEDDINGS")
print("="*80 + "\n")

# Test single embedding
test_text = "What is the PTO policy for employees?"
test_embedding = embeddings.embed_query(test_text)

print(f"Test query: '{test_text}'")
print(f"Embedding dimensions: {len(test_embedding)}")
print(f"First 10 values: {test_embedding[:10]}")
print(f"\n✓ OpenAIEmbeddings working correctly")

# Generate embeddings for sample chunks
print(f"\nGenerating embeddings for {len(chunks_to_use[:5])} sample chunks...")
sample_texts = [chunk.page_content for chunk in chunks_to_use[:5]]
sample_embeddings = embeddings.embed_documents(sample_texts)

print(f"✓ Generated {len(sample_embeddings)} embeddings")
print(f"✓ Each embedding has {len(sample_embeddings[0])} dimensions")

print(f"\nNote: Full embedding generation happens during vector store creation")

---

## PART 3: VECTOR STORE WITH LANGCHAIN CHROMA (Lab 4.3)

**Duration:** 30 minutes  
**Objective:** Build scalable vector store using LangChain's Chroma integration

### Theory: LangChain Chroma Integration

**Advantages of LangChain's Chroma:**
- Automatic embedding generation
- Metadata filtering support
- Similarity and MMR search
- Easy persistence and loading
- Seamless integration with retrievers

### Challenge 3.1: Create Vector Store (15 minutes)

In [None]:
print("="*80)
print("CREATING VECTOR STORE WITH LANGCHAIN CHROMA")
print("="*80 + "\n")

# Create Chroma vector store from documents
print(f"Creating vector store with {len(chunks_to_use)} chunks...")
print(f"This includes automatic embedding generation...\n")

vectorstore = Chroma.from_documents(
    documents=chunks_to_use,
    embedding=embeddings,
    collection_name="bank_knowledge_base"
)

print(f"✓ Vector store created successfully")
print(f"✓ Collection: bank_knowledge_base")
print(f"✓ Total chunks indexed: {len(chunks_to_use)}")
print(f"✓ Embedding model: {EMBEDDING_MODEL}")

### Challenge 3.2: Similarity Search (15 minutes)

In [None]:
print("\n" + "="*80)
print("SIMILARITY SEARCH TESTING")
print("="*80 + "\n")

test_queries = [
    "How many vacation days do employees get?",
    "What is the minimum credit score for a personal loan?",
    "What are the AML requirements for new customers?"
]

for query in test_queries:
    print(f"Query: '{query}'")
    print("-" * 80)
    
    # Similarity search
    results = vectorstore.similarity_search(query, k=3)
    
    for i, doc in enumerate(results, 1):
        print(f"\nResult {i}:")
        print(f"Source: {doc.metadata['type']}")
        print(f"Content preview: {doc.page_content[:200]}...")
    
    print("\n" + "="*80 + "\n")

### Challenge 3.3: Similarity Search with Scores

In [None]:
print("\n" + "="*80)
print("SIMILARITY SEARCH WITH RELEVANCE SCORES")
print("="*80 + "\n")

query = "What documentation is needed to open an account?"
print(f"Query: '{query}'")
print("-" * 80)

# Similarity search with scores
results_with_scores = vectorstore.similarity_search_with_score(query, k=5)

for i, (doc, score) in enumerate(results_with_scores, 1):
    print(f"\nResult {i} (distance: {score:.3f}):")
    print(f"Source: {doc.metadata['type']}")
    print(f"Content: {doc.page_content[:150]}...")

print("\n" + "="*80)
print("Note: Lower distance = more similar (L2 distance)")
print("="*80)

### Challenge 3.4: Metadata Filtering

In [None]:
print("\n" + "="*80)
print("METADATA FILTERING")
print("="*80 + "\n")

query = "What are the requirements?"
print(f"Query: '{query}'")
print(f"Filter: Only 'compliance_guidelines' documents\n")
print("-" * 80)

# Search with metadata filter
filtered_results = vectorstore.similarity_search(
    query,
    k=3,
    filter={"type": "compliance_guidelines"}
)

for i, doc in enumerate(filtered_results, 1):
    print(f"\nResult {i}:")
    print(f"Source: {doc.metadata['type']}")
    print(f"Content: {doc.page_content[:200]}...")

print("\n" + "="*80)

---

## PART 4: RAG PIPELINE WITH LCEL (Lab 4.4)

**Duration:** 40 minutes  
**Objective:** Build end-to-end RAG using LangChain Expression Language


### Challenge 4.1: Basic RAG Chain (20 minutes)

In [None]:
# Create retriever from vector store
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 5}
)

# Create RAG prompt template
rag_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant answering questions about bank policies and products. Use ONLY the provided context to answer questions. If the context doesn't contain enough information, say so."),
    ("user", """Context:
{context}

Question: {question}

Provide a concise, accurate answer based ONLY on the context above. Cite which source (HR policies, loan products, or compliance) you used.""")
])

# Helper function to format documents
def format_docs(docs):
    """Format documents for context"""
    formatted = []
    for i, doc in enumerate(docs, 1):
        formatted.append(f"[Source {i} - {doc.metadata['type']}]\n{doc.page_content}")
    return "\n\n".join(formatted)

# Build RAG chain using LCEL
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | llm_gpt4
    | StrOutputParser()
)

print("✓ RAG chain created with LCEL")
print("✓ Components: Retriever → Format → Prompt → LLM → Parse")

In [None]:
# Test RAG chain
print("\n" + "="*80)
print("TESTING BASIC RAG CHAIN")
print("="*80 + "\n")

test_questions = [
    "How many vacation days do full-time employees get per year?",
    "What is the minimum credit score needed for a personal loan?",
    "What are the customer verification requirements for new accounts?"
]

for question in test_questions:
    print(f"Question: {question}")
    print("="*80)
    
    start_time = time.time()
    answer = rag_chain.invoke(question)
    elapsed = time.time() - start_time
    
    print(f"\nAnswer:\n{answer}")
    print(f"\nTime: {elapsed:.2f}s")
    print("\n" + "="*80 + "\n")

### Challenge 4.2: RAG with Source Citations (10 minutes)

In [None]:
# Enhanced RAG with source tracking
def format_docs_with_sources(docs):
    """Format documents and track sources"""
    formatted = []
    sources = []
    
    for i, doc in enumerate(docs, 1):
        formatted.append(f"[Source {i}]\n{doc.page_content}")
        sources.append({
            "number": i,
            "type": doc.metadata['type'],
            "preview": doc.page_content[:100]
        })
    
    return {"context": "\n\n".join(formatted), "sources": sources}

# RAG chain that returns both answer and sources
rag_with_sources_chain = (
    {"retrieved_docs": retriever, "question": RunnablePassthrough()}
    | RunnableLambda(lambda x: {
        "formatted": format_docs_with_sources(x["retrieved_docs"]),
        "question": x["question"]
    })
    | RunnableLambda(lambda x: {
        "answer": (rag_prompt | llm_gpt4 | StrOutputParser()).invoke({
            "context": x["formatted"]["context"],
            "question": x["question"]
        }),
        "sources": x["formatted"]["sources"]
    })
)

print("✓ Enhanced RAG chain with source tracking created")

In [None]:
# Test enhanced RAG
print("\n" + "="*80)
print("TESTING RAG WITH SOURCE CITATIONS")
print("="*80 + "\n")

question = "Can employees work from home and what are the requirements?"
print(f"Question: {question}")
print("="*80)

result = rag_with_sources_chain.invoke(question)

print(f"\nAnswer:\n{result['answer']}")

print(f"\n\nSources Used:")
for source in result['sources']:
    print(f"  [{source['number']}] {source['type']}: {source['preview']}...")

print("\n" + "="*80)

### Challenge 4.3: Streaming RAG (10 minutes)

In [None]:
# Test streaming
print("\n" + "="*80)
print("STREAMING RAG DEMO")
print("="*80 + "\n")

question = "What fees are associated with personal loans?"
print(f"Question: {question}")
print("\nAnswer (streaming):")
print("-" * 80)

# Stream the answer
for chunk in rag_chain.stream(question):
    print(chunk, end="", flush=True)

print("\n" + "-" * 80)
print("\n✓ Streaming complete")
print("\nNote: In production web apps, each chunk is sent to client immediately")
print("="*80)

---

## PART 5: ADVANCED RAG PATTERNS (Lab 4.5)

**Duration:** 30 minutes  
**Objective:** Implement production-grade RAG enhancements with LangChain

### Advanced Retrievers:

- **MultiQueryRetriever**: Generates multiple queries to improve recall
- **ContextualCompressionRetriever**: Filters retrieved content for precision
- **EnsembleRetriever**: Combines multiple retrieval methods

### Challenge 5.1: MultiQueryRetriever (15 minutes)

**Objective:** Improve recall by generating multiple query variations

In [None]:
# Create MultiQueryRetriever
multiquery_retriever = MultiQueryRetriever.from_llm(
    retriever=retriever,
    llm=llm_gpt35  # Use faster model for query generation
)

print("="*80)
print("MULTIQUERY RETRIEVER")
print("="*80 + "\n")
print("✓ MultiQueryRetriever created")
print("✓ Automatically generates multiple query variations")
print("✓ Combines results for better recall")

In [None]:
# Test MultiQueryRetriever
import logging

# Enable logging to see generated queries
logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

print("\n" + "="*80)
print("TESTING MULTIQUERY RETRIEVER")
print("="*80 + "\n")

question = "What do I need to qualify for a home loan?"
print(f"Original question: {question}\n")
print("-" * 80)
print("Generated query variations and retrieval:")
print("-" * 80 + "\n")

# This will print the generated query variations
multiquery_docs = multiquery_retriever.invoke(question)

print(f"\n✓ Retrieved {len(multiquery_docs)} unique documents")
print(f"\nTop 3 results:")
for i, doc in enumerate(multiquery_docs[:3], 1):
    print(f"\n[{i}] Source: {doc.metadata['type']}")
    print(f"Content: {doc.page_content[:200]}...")

print("\n" + "="*80)

### Challenge 5.2: Contextual Compression (15 minutes)

**Objective:** Filter retrieved content to only relevant parts

In [None]:
# Create compressor
compressor = LLMChainExtractor.from_llm(llm_gpt35)

# Create ContextualCompressionRetriever
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=retriever
)

print("\n" + "="*80)
print("CONTEXTUAL COMPRESSION RETRIEVER")
print("="*80 + "\n")
print("✓ ContextualCompressionRetriever created")
print("✓ Uses LLM to extract only relevant parts")
print("✓ Improves precision and reduces token usage")

In [None]:
# Compare regular vs compressed retrieval
print("\n" + "="*80)
print("COMPARING REGULAR VS COMPRESSED RETRIEVAL")
print("="*80 + "\n")

question = "What is the origination fee for personal loans?"
print(f"Question: {question}\n")

# Regular retrieval
print("Regular Retrieval:")
print("-" * 80)
regular_docs = retriever.invoke(question)
print(f"Documents: {len(regular_docs)}")
print(f"Total characters: {sum(len(doc.page_content) for doc in regular_docs)}")
print(f"Sample: {regular_docs[0].page_content[:300]}...\n")

# Compressed retrieval
print("Compressed Retrieval:")
print("-" * 80)
compressed_docs = compression_retriever.invoke(question)
print(f"Documents: {len(compressed_docs)}")
print(f"Total characters: {sum(len(doc.page_content) for doc in compressed_docs)}")
print(f"Sample: {compressed_docs[0].page_content}\n")

# Calculate compression ratio
original_chars = sum(len(doc.page_content) for doc in regular_docs)
compressed_chars = sum(len(doc.page_content) for doc in compressed_docs)
compression_ratio = compressed_chars / original_chars if original_chars > 0 else 0

print("="*80)
print(f"Compression ratio: {compression_ratio:.1%}")
print(f"Token savings: ~{(1-compression_ratio)*100:.0f}%")
print("="*80)

### Challenge 5.3: Enhanced RAG with Advanced Retrievers

In [None]:
# Build enhanced RAG with compression
enhanced_rag_chain = (
    {"context": compression_retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | llm_gpt4
    | StrOutputParser()
)

print("\n" + "="*80)
print("ENHANCED RAG WITH COMPRESSION")
print("="*80 + "\n")

question = "What documentation is required for a mortgage application?"
print(f"Question: {question}")
print("="*80)

# Compare basic vs enhanced
print("\nBasic RAG Answer:")
print("-" * 80)
basic_answer = rag_chain.invoke(question)
print(basic_answer)

print("\n" + "="*80)
print("\nEnhanced RAG Answer (with compression):")
print("-" * 80)
enhanced_answer = enhanced_rag_chain.invoke(question)
print(enhanced_answer)

print("\n" + "="*80)
print("✓ Enhanced RAG provides more focused, relevant answers")
print("✓ Uses fewer tokens while maintaining accuracy")
print("="*80)

---

## CAPSTONE: PRODUCTION RAG SYSTEM

**Objective:** Complete production system with all LangChain best practices

In [None]:
class ProductionRAGSystem:
    """
    Production-grade RAG system using LangChain:
    - Multiple retrieval strategies
    - Contextual compression
    - Source tracking
    - Metrics and logging
    """
    
    def __init__(self, vectorstore: Chroma, llm: ChatOpenAI):
        self.vectorstore = vectorstore
        self.llm = llm
        
        # Create retrievers
        self.base_retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
        
        # MultiQuery for better recall
        self.multiquery_retriever = MultiQueryRetriever.from_llm(
            retriever=self.base_retriever,
            llm=llm_gpt35
        )
        
        # Compression for better precision
        compressor = LLMChainExtractor.from_llm(llm_gpt35)
        self.compression_retriever = ContextualCompressionRetriever(
            base_compressor=compressor,
            base_retriever=self.base_retriever
        )
        
        # Build RAG chain
        self.rag_chain = self._build_rag_chain()
        
        self.query_history = []
    
    def _build_rag_chain(self):
        """Build LCEL RAG chain"""
        prompt = ChatPromptTemplate.from_messages([
            ("system", "You are a helpful assistant for bank employees. Answer questions accurately using the provided context. Always cite your sources."),
            ("user", "Context:\n{context}\n\nQuestion: {question}\n\nProvide a clear, concise answer with source citations.")
        ])
        
        return (
            {"context": self.compression_retriever | format_docs, "question": RunnablePassthrough()}
            | prompt
            | self.llm
            | StrOutputParser()
        )
    
    def query(self, question: str, use_multiquery: bool = False) -> Dict[str, Any]:
        """
        Query the RAG system
        
        Args:
            question: User question
            use_multiquery: Whether to use MultiQueryRetriever
        
        Returns:
            Dictionary with answer, sources, and metrics
        """
        start_time = time.time()
        
        # Select retriever
        retriever = self.multiquery_retriever if use_multiquery else self.compression_retriever
        
        # Retrieve documents
        docs = retriever.invoke(question)
        retrieval_time = time.time() - start_time
        
        # Generate answer
        gen_start = time.time()
        answer = self.rag_chain.invoke(question)
        generation_time = time.time() - gen_start
        
        total_time = time.time() - start_time
        
        # Track query
        result = {
            "question": question,
            "answer": answer,
            "sources": [
                {"type": doc.metadata['type'], "preview": doc.page_content[:100]}
                for doc in docs
            ],
            "metrics": {
                "retrieval_time": retrieval_time,
                "generation_time": generation_time,
                "total_time": total_time,
                "docs_retrieved": len(docs),
                "retriever_type": "multiquery" if use_multiquery else "compression"
            }
        }
        
        self.query_history.append(result)
        
        return result
    
    def get_statistics(self) -> Dict[str, Any]:
        """Get system statistics"""
        if not self.query_history:
            return {}
        
        return {
            "total_queries": len(self.query_history),
            "avg_retrieval_time": np.mean([q["metrics"]["retrieval_time"] for q in self.query_history]),
            "avg_generation_time": np.mean([q["metrics"]["generation_time"] for q in self.query_history]),
            "avg_total_time": np.mean([q["metrics"]["total_time"] for q in self.query_history]),
            "avg_docs_retrieved": np.mean([q["metrics"]["docs_retrieved"] for q in self.query_history])
        }

print("✓ ProductionRAGSystem class created")

In [None]:
# Initialize production system
print("\n" + "="*80)
print("PRODUCTION RAG SYSTEM TESTING")
print("="*80 + "\n")

prod_rag = ProductionRAGSystem(vectorstore, llm_gpt4)

# Test queries
test_questions = [
    "How many PTO days do full-time employees get?",
    "What credit score is needed for a personal loan?",
    "What are the AML customer verification requirements?"
]

for question in test_questions:
    print(f"Question: {question}")
    print("="*80)
    
    result = prod_rag.query(question)
    
    print(f"\nAnswer:\n{result['answer']}")
    
    print(f"\nMetrics:")
    for key, value in result['metrics'].items():
        if 'time' in key:
            print(f"  {key}: {value:.3f}s")
        else:
            print(f"  {key}: {value}")
    
    print(f"\nSources: {len(result['sources'])} documents")
    print("\n" + "="*80 + "\n")

# Display statistics
stats = prod_rag.get_statistics()
print("SYSTEM STATISTICS")
print("="*80)
for key, value in stats.items():
    if 'time' in key:
        print(f"{key}: {value:.3f}s")
    else:
        print(f"{key}: {value:.2f}")
print("="*80)