# LAB 4: RAG SYSTEMS - COMPLETE IMPLEMENTATION GUIDE

**Course:** Advanced Prompt Engineering Training  
**Session:** Session 3 - RAG & Advanced Retrieval (Day 2)  
**Duration:** 150 minutes (2.5 hours)  
**Type:** Comprehensive RAG Workshop

## LAB OVERVIEW

This comprehensive lab teaches you to build **production-grade RAG (Retrieval-Augmented Generation) systems**. You'll progress through five interconnected modules:

1. **Document Chunking** - Optimal text segmentation strategies
2. **Embeddings & Search** - Vector representations and similarity
3. **Vector Database** - Indexing and retrieval at scale
4. **Complete RAG Pipeline** - End-to-end question answering
5. **Advanced Patterns** - Hybrid search, re-ranking, query expansion

**Scenario:** You're building an AI-powered knowledge base for a bank's internal documentation - policies, procedures, compliance guidelines, and product information. Starting with raw documents, you'll build a complete RAG system that employees can query in natural language.

### Step 1: Import Libraries

In [None]:
# Lab 4: RAG Systems Implementation
# Advanced Prompt Engineering Training - Session 3

import os
import json
import re
import time
from datetime import datetime
from typing import List, Dict, Tuple, Optional, Any
from dataclasses import dataclass, field
from collections import defaultdict

import numpy as np
import pandas as pd
from openai import OpenAI
import tiktoken
from sklearn.metrics.pairwise import cosine_similarity

# Vector database
import chromadb
from chromadb.config import Settings
from dotenv import load_dotenv

load_dotenv(override=True)

print("✓ All libraries imported successfully")

### Step 2: Configure OpenAI Client

In [None]:
# Initialize OpenAI client
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

# Model configurations
GPT4 = os.environ.get("MODEL_NAME", "gpt-4o")
GPT35 = os.environ.get("FAST_MODEL_NAME", "gpt-3.5-turbo")

EMBEDDING_MODEL = "text-embedding-3-small"
EMBEDDING_DIMENSIONS = 1536

print(f"✓ OpenAI client configured")
print(f"✓ LLM models: {GPT4}, {GPT35}")
print(f"✓ Embedding model: {EMBEDDING_MODEL} ({EMBEDDING_DIMENSIONS}d)")

### Step 3: Core Helper Functions

In [None]:
@dataclass
class Document:
    """Document with metadata"""
    content: str
    metadata: Dict[str, Any] = field(default_factory=dict)
    doc_id: Optional[str] = None
    
    def __post_init__(self):
        if self.doc_id is None:
            self.doc_id = f"doc_{hash(self.content) % 10**8}"

@dataclass
class Chunk:
    """Text chunk with metadata and embedding"""
    content: str
    chunk_id: str
    doc_id: str
    metadata: Dict[str, Any] = field(default_factory=dict)
    embedding: Optional[List[float]] = None
    start_index: int = 0
    end_index: int = 0
    
    def __len__(self):
        return len(self.content)

def count_tokens(text: str, model: str = "gpt-4") -> int:
    """Count tokens in text"""
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

def get_embedding(text: str, model: str = EMBEDDING_MODEL) -> List[float]:
    """Get embedding vector for text"""
    try:
        response = client.embeddings.create(
            model=model,
            input=text
        )
        return response.data[0].embedding
    except Exception as e:
        print(f"Embedding error: {e}")
        return [0.0] * EMBEDDING_DIMENSIONS

def call_llm(
    prompt: str,
    system_prompt: str = "You are a helpful AI assistant.",
    model: str = GPT4,
    temperature: float = 0
) -> str:
    """Call LLM and return response"""
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": prompt}
            ],
            temperature=temperature
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"Error: {str(e)}"

print("✓ Helper functions created")

### Step 4: Load Sample Knowledge Base

In [None]:
# Sample bank knowledge base documents
BANK_KNOWLEDGE_BASE = {
    "hr_policies": """
HUMAN RESOURCES POLICIES - CONSOLIDATED HANDBOOK

Section 1: Paid Time Off (PTO) Policy

Employees are entitled to paid time off for rest and personal needs. Our PTO policy is designed to promote work-life balance while ensuring business continuity.

PTO Accrual:
- Full-time employees accrue 20 days (160 hours) of PTO annually
- PTO accrual begins after successful completion of 90-day probationary period
- PTO accrues at a rate of 1.67 days per month
- Part-time employees accrue PTO on a pro-rated basis
- Maximum PTO bank: 30 days (240 hours); excess is forfeited

Usage Guidelines:
- PTO requests must be submitted at least 2 weeks in advance for periods longer than 3 days
- Manager approval required for all PTO requests
- Holiday periods (Thanksgiving, Christmas, New Year) require 30-day advance notice
- Unused PTO does not roll over to the following year
- PTO cannot be cashed out except upon termination

Blackout Periods:
- First week of each quarter (financial close)
- Annual audit period (March 1-31)
- Department-specific blackout periods may apply

Section 2: Health Insurance Coverage

We offer comprehensive health insurance plans to eligible employees and their dependents.

Eligibility:
- Full-time employees (30+ hours/week) are eligible
- Coverage begins on the first day of the month following 30 days of employment
- Eligible dependents include spouse and children under 26

Plan Options:
- PPO Plan: $250 deductible, 80/20 coinsurance, $3,000 out-of-pocket max
- HMO Plan: $100 deductible, 90/10 coinsurance, $2,000 out-of-pocket max
- HSA-eligible HDHP: $1,500 deductible, 100% after deductible, $4,000 max

Employee Contributions (per month):
- Employee only: $150 (PPO), $100 (HMO), $75 (HDHP)
- Employee + Spouse: $350 (PPO), $280 (HMO), $220 (HDHP)
- Employee + Family: $500 (PPO), $420 (HMO), $340 (HDHP)

Coverage Includes:
- Preventive care at 100% (no cost sharing)
- Emergency room visits (subject to deductible)
- Prescription drugs (3-tier copay structure)
- Mental health services (same as medical)
- Dental and vision available as separate add-ons

Section 3: Remote Work Policy

Our hybrid work model allows flexibility while maintaining collaboration and culture.

Eligibility:
- Employees must complete 6 months of employment before becoming eligible
- Manager approval required
- Position must be suitable for remote work
- Performance must meet or exceed expectations

Requirements:
- Employees must work from primary residence within the United States
- Dedicated workspace with reliable internet (minimum 25 Mbps)
- Available during core hours (10 AM - 3 PM local time)
- Attend all required in-person meetings and events

Schedule Options:
- Hybrid: 2-3 days in office per week (Monday and Thursday required)
- Fully Remote: Available for specific roles only
- Flexible: Varies by department and business need

Equipment:
- Company provides laptop, monitor, keyboard, mouse
- $500 home office stipend for furniture/accessories
- IT support available during business hours
- Security requirements must be met (VPN, encrypted drives)

Performance Expectations:
- Same productivity and quality standards as in-office work
- Regular check-ins with manager (minimum weekly)
- Response time: within 2 hours during core hours
- Remote work privilege may be revoked for performance issues
""",
    
    "loan_products": """
LOAN PRODUCTS GUIDE - CONSUMER LENDING

Product 1: Personal Loans

Personal loans provide flexible funding for debt consolidation, home improvements, major purchases, or other personal needs.

Loan Amounts: $5,000 to $50,000
Terms: 12, 24, 36, 48, or 60 months
Interest Rates: 7.99% - 18.99% APR (based on creditworthiness)

Eligibility Requirements:
- Minimum credit score: 650
- Minimum annual income: $35,000
- Debt-to-income ratio: Maximum 43%
- Employment: Minimum 2 years current job or 3 years in same field
- U.S. citizen or permanent resident

Application Process:
- Online application (15 minutes)
- Instant pre-qualification decision
- Document submission: 2 pay stubs, 2 bank statements, tax returns
- Final approval: 1-3 business days
- Funding: 1-2 business days after approval

Fees:
- Origination fee: 1-5% of loan amount
- Late payment fee: $35 or 5% of payment (whichever is greater)
- Returned payment fee: $35
- No prepayment penalty

Product 2: Home Mortgage Loans

We offer competitive mortgage products for home purchase and refinancing.

Conventional Mortgages:
- Loan amounts up to $726,200 (conforming limits)
- Down payment: Minimum 5% (20% to avoid PMI)
- Terms: 15, 20, or 30 years
- Interest rates: 6.50% - 8.25% (based on credit, LTV, term)

FHA Loans:
- Down payment: Minimum 3.5%
- Credit score: Minimum 580
- Mortgage insurance required
- Loan limits vary by county

VA Loans (for eligible veterans):
- No down payment required
- No mortgage insurance
- Competitive interest rates
- Funding fee: 2.3% (can be financed)

Jumbo Mortgages (over $726,200):
- Down payment: Minimum 20%
- Credit score: Minimum 700
- Cash reserves: 6-12 months PITI
- Rates typically 0.25-0.50% higher than conforming

Qualification Guidelines:
- Credit score: Minimum 620 (conventional), 580 (FHA)
- DTI ratio: Maximum 43% (some exceptions to 50%)
- Employment verification: 2 years history
- Down payment must be from acceptable sources
- Property must appraise at or above purchase price

Closing Costs:
- Typically 2-5% of loan amount
- Includes: appraisal ($500-800), title insurance, origination fee (0.5-1%), inspections, recording fees
- Seller may contribute up to 3-6% toward closing costs

Product 3: Auto Loans

Competitive financing for new and used vehicle purchases.

New Vehicle Loans:
- Loan amounts: $5,000 to $100,000
- Terms: 24, 36, 48, 60, or 72 months
- Rates: 4.99% - 9.99% APR
- Maximum LTV: 125%

Used Vehicle Loans:
- Vehicle age: Up to 8 years old
- Mileage: Maximum 100,000 miles
- Terms: 24, 36, 48, or 60 months
- Rates: 5.99% - 11.99% APR
- Maximum LTV: 120%

Requirements:
- Credit score: Minimum 600
- Down payment: Minimum 10% for used, 5% for new
- Income verification required
- Full coverage insurance required
- Vehicle must meet bank's age and mileage criteria

Special Programs:
- Recent graduate program: 0.25% rate discount
- Multi-car household: 0.15% discount
- Auto-pay discount: 0.25% rate reduction
- Trade-in assistance available

Application Process:
- Pre-approval available (does not affect credit score)
- Online application (10 minutes)
- Decision: Same day for most applicants
- Funding: 24-48 hours after approval
- Dealership direct financing available
""",

    "compliance_guidelines": """
COMPLIANCE AND REGULATORY GUIDELINES

Anti-Money Laundering (AML) Requirements

Our AML program ensures compliance with the Bank Secrecy Act and USA PATRIOT Act.

Customer Due Diligence (CDD):
- All new customers must be verified using government-issued ID
- Collect: Full legal name, date of birth, physical address, SSN/TIN
- Document type and ID number must be recorded
- Verification must occur before account opening

Enhanced Due Diligence (EDD) Required For:
- Cash-intensive businesses
- Non-resident aliens
- Politically exposed persons (PEPs)
- High-risk jurisdictions
- Customers with prior suspicious activity

Transaction Monitoring:
- All transactions over $10,000 must be reported (CTR)
- Structured transactions must be identified and reported (SAR)
- Wire transfers require additional screening
- Daily monitoring of all accounts for unusual patterns

Suspicious Activity Reporting (SAR):
- File within 30 days of detecting suspicious activity
- Threshold: $5,000 for most violations
- Threshold: $25,000 for securities violations
- Customer must not be notified of SAR filing
- Maintain SAR confidentiality

Red Flags to Report:
- Customer reluctant to provide information
- Multiple accounts used to avoid reporting thresholds
- Large cash deposits inconsistent with business
- Sudden increase in transaction volume
- Transactions with high-risk countries
- Frequent wire transfers with no clear purpose

Know Your Customer (KYC) Program

Ongoing customer monitoring to assess risk and detect changes.

Risk Categories:
- Low Risk: Employed individuals, small balances, low transaction volume
- Medium Risk: Self-employed, moderate balances, regular transactions
- High Risk: Cash businesses, high balances, frequent international activity

Review Frequency:
- Low risk: Annual review
- Medium risk: Semi-annual review
- High risk: Quarterly review or continuous monitoring

Information to Update:
- Current address and contact information
- Employment status and income sources
- Business activities and ownership
- Expected account activity and transaction patterns
- Source of funds for large deposits

OFAC Screening Requirements

Screen against Office of Foreign Assets Control lists.

When to Screen:
- All new customer accounts
- All wire transfers (domestic and international)
- Large cash transactions
- When customer information changes
- Daily batch screening of all customers

Blocked Person Lists:
- Specially Designated Nationals (SDN) list
- Sectoral Sanctions Identifications List
- Foreign Sanctions Evaders List
- Country-based sanctions programs

Screening Procedures:
- Use automated screening software
- Check names, addresses, ID numbers
- Screen beneficial owners and related parties
- Document all screening results
- Escalate potential matches immediately

Response to Matches:
- Block transaction immediately
- Do not notify customer
- Contact Compliance Department within 1 hour
- File report with OFAC within 10 days
- Freeze account if required

Privacy and Data Protection

Safeguarding customer information is paramount.

Gramm-Leach-Bliley Act (GLBA) Compliance:
- Provide privacy notice at account opening
- Annual privacy notice required
- Opt-out must be offered for information sharing
- Notice of adverse action when required

Information Security:
- Encrypt all sensitive data at rest and in transit
- Access controls: minimum necessary principle
- Password requirements: 12+ characters, MFA enabled
- Log and monitor all access to customer data
- Report security incidents within 24 hours

Permissible Data Sharing:
- With customer consent
- To service providers (with contract protections)
- To comply with legal requirements
- To prevent fraud or unauthorized transactions
- To report to credit bureaus

Prohibited Activities:
- Sharing account numbers for marketing
- Selling customer information to third parties
- Accessing accounts without business need
- Discussing customer information in public areas
- Removing customer data from secure systems

Fair Lending Requirements

We are committed to fair and equal treatment of all customers.

Equal Credit Opportunity Act (ECOA):
- Cannot discriminate based on protected classes
- Protected classes: race, color, religion, national origin, sex, marital status, age (if of legal age), income from public assistance
- Adverse action notice required within 30 days
- Specific reasons for denial must be provided

Fair Housing Act:
- Applies to residential real estate loans
- Cannot redline or steer applicants
- Marketing must reach diverse communities
- Underwriting must be consistent and objective

Documentation Requirements:
- All credit decisions must be documented
- Exceptions to policy must be justified in writing
- Maintain evidence of fair lending compliance
- Monitor for disparate impact
- Regular fair lending training required
"""
}

# Create Document objects
documents = []
for doc_type, content in BANK_KNOWLEDGE_BASE.items():
    doc = Document(
        content=content,
        metadata={
            "type": doc_type,
            "source": "bank_knowledge_base",
            "created_at": datetime.now().isoformat()
        }
    )
    documents.append(doc)

print(f"✓ Loaded {len(documents)} knowledge base documents")
print(f"✓ Total characters: {sum(len(doc.content) for doc in documents):,}")
print(f"✓ Total tokens (approx): {sum(count_tokens(doc.content) for doc in documents):,}")

---

## PART 1: DOCUMENT CHUNKING STRATEGIES (Lab 4.1)

**Duration:** 30 minutes  
**Objective:** Implement and compare different chunking strategies


### Challenge 1.1: Fixed-Size Chunking (10 minutes)

**Strategy:** Split text into chunks of fixed token count

In [None]:
class FixedSizeChunker:
    """Split text into fixed-size chunks with optional overlap"""
    
    def __init__(self, chunk_size: int = 500, overlap: int = 50):
        self.chunk_size = chunk_size
        self.overlap = overlap
        self.encoding = tiktoken.encoding_for_model("gpt-4")
    
    def chunk_document(self, document: Document) -> List[Chunk]:
        """Split document into fixed-size chunks"""
        
        # Tokenize entire document
        tokens = self.encoding.encode(document.content)
        total_tokens = len(tokens)
        
        chunks = []
        start = 0
        chunk_num = 0
        
        while start < total_tokens:
            # Calculate end position
            end = min(start + self.chunk_size, total_tokens)
            
            # Extract chunk tokens
            chunk_tokens = tokens[start:end]
            
            # Decode back to text
            chunk_text = self.encoding.decode(chunk_tokens)
            
            # Create Chunk object
            chunk = Chunk(
                content=chunk_text,
                chunk_id=f"{document.doc_id}_chunk_{chunk_num}",
                doc_id=document.doc_id,
                metadata={
                    **document.metadata,
                    "chunk_num": chunk_num,
                    "total_chunks": None,  # Will update later
                    "chunking_strategy": "fixed_size",
                    "chunk_size": self.chunk_size
                },
                start_index=start,
                end_index=end
            )
            chunks.append(chunk)
            
            # If we've reached the end, break out of the loop
            if end >= total_tokens:
                break
            
            # Move to next chunk (with overlap)
            start = end - self.overlap
            chunk_num += 1
        
        # Update total_chunks metadata
        for chunk in chunks:
            chunk.metadata["total_chunks"] = len(chunks)
        
        return chunks


# Test fixed-size chunking
print("="*80)
print("TESTING FIXED-SIZE CHUNKING")
print("="*80 + "\n")

chunker = FixedSizeChunker(chunk_size=500, overlap=50)

all_chunks = []
for doc in documents:
    chunks = chunker.chunk_document(doc)
    all_chunks.extend(chunks)
    print(f"Document '{doc.metadata['type']}':")
    print(f"  Original tokens: {count_tokens(doc.content)}")
    print(f"  Created {len(chunks)} chunks")
    print(f"  Avg chunk size: {np.mean([count_tokens(c.content) for c in chunks]):.0f} tokens")
    print()

print(f"Total chunks created: {len(all_chunks)}")
print("\nSample chunk:")
print("-" * 80)
print(all_chunks[0].content[:500] + "...")
print("-" * 80)

### Challenge 1.2: Semantic Chunking (15 minutes)

**Strategy:** Split by semantic boundaries (paragraphs, sections)

In [None]:
class SemanticChunker:
    """Split text by semantic boundaries (sections, paragraphs)"""
    
    def __init__(self, max_chunk_size: int = 800):
        self.max_chunk_size = max_chunk_size
        self.encoding = tiktoken.encoding_for_model("gpt-4")
    
    def split_by_sections(self, text: str) -> List[str]:
        """Split text by section headers"""
        # Look for section patterns like "Section 1:", "Product 1:", etc.
        section_pattern = r'\n(?:Section|Product|Chapter|Part)\s+\d+[:\.]'
        sections = re.split(section_pattern, text)
        
        # Filter empty sections
        sections = [s.strip() for s in sections if s.strip()]
        
        return sections
    
    def split_by_paragraphs(self, text: str) -> List[str]:
        """Split text by paragraph breaks"""
        # Split on double newlines
        paragraphs = re.split(r'\n\s*\n', text)
        paragraphs = [p.strip() for p in paragraphs if p.strip()]
        
        return paragraphs
    
    def merge_small_chunks(self, chunks: List[str]) -> List[str]:
        """Merge chunks that are too small"""
        merged = []
        current = ""
        
        for chunk in chunks:
            chunk_tokens = count_tokens(chunk)
            current_tokens = count_tokens(current)
            
            if current_tokens + chunk_tokens <= self.max_chunk_size:
                current = (current + "\n\n" + chunk).strip()
            else:
                if current:
                    merged.append(current)
                current = chunk
        
        if current:
            merged.append(current)
        
        return merged
    
    def chunk_document(self, document: Document) -> List[Chunk]:
        """Split document by semantic boundaries"""
        
        # Try section-based splitting first
        sections = self.split_by_sections(document.content)
        
        if len(sections) <= 1:
            # Fall back to paragraph splitting
            sections = self.split_by_paragraphs(document.content)
        
        # Merge small chunks
        merged_chunks = self.merge_small_chunks(sections)
        
        # Split any chunks that are too large
        final_text_chunks = []
        for chunk_text in merged_chunks:
            if count_tokens(chunk_text) > self.max_chunk_size:
                # Use fixed-size chunking for oversized chunks
                temp_doc = Document(content=chunk_text, metadata=document.metadata)
                fixed_chunker = FixedSizeChunker(chunk_size=self.max_chunk_size, overlap=0)
                sub_chunks = fixed_chunker.chunk_document(temp_doc)
                final_text_chunks.extend([c.content for c in sub_chunks])
            else:
                final_text_chunks.append(chunk_text)
        
        # Create Chunk objects
        chunks = []
        for i, chunk_text in enumerate(final_text_chunks):
            chunk = Chunk(
                content=chunk_text,
                chunk_id=f"{document.doc_id}_semantic_{i}",
                doc_id=document.doc_id,
                metadata={
                    **document.metadata,
                    "chunk_num": i,
                    "total_chunks": len(final_text_chunks),
                    "chunking_strategy": "semantic",
                    "max_chunk_size": self.max_chunk_size
                }
            )
            chunks.append(chunk)
        
        return chunks


# Test semantic chunking
print("\n" + "="*80)
print("TESTING SEMANTIC CHUNKING")
print("="*80 + "\n")

semantic_chunker = SemanticChunker(max_chunk_size=800)

semantic_chunks = []
for doc in documents:
    chunks = semantic_chunker.chunk_document(doc)
    semantic_chunks.extend(chunks)
    print(f"Document '{doc.metadata['type']}':")
    print(f"  Original tokens: {count_tokens(doc.content)}")
    print(f"  Created {len(chunks)} semantic chunks")
    print(f"  Avg chunk size: {np.mean([count_tokens(c.content) for c in chunks]):.0f} tokens")
    print(f"  Chunk sizes: {[count_tokens(c.content) for c in chunks]}")
    print()

print(f"Total semantic chunks: {len(semantic_chunks)}")

### Challenge 1.3: Chunking Comparison (5 minutes)

**Objective:** Compare different strategies

In [None]:
def compare_chunking_strategies(document: Document) -> pd.DataFrame:
    """Compare different chunking strategies on same document"""
    
    strategies = {
        "Fixed 300": FixedSizeChunker(chunk_size=300, overlap=30),
        "Fixed 500": FixedSizeChunker(chunk_size=500, overlap=50),
        "Fixed 800": FixedSizeChunker(chunk_size=800, overlap=80),
        "Semantic 600": SemanticChunker(max_chunk_size=600),
        "Semantic 800": SemanticChunker(max_chunk_size=800),
    }
    
    results = []
    
    for name, chunker in strategies.items():
        chunks = chunker.chunk_document(document)
        
        chunk_sizes = [count_tokens(c.content) for c in chunks]
        
        results.append({
            "Strategy": name,
            "Num Chunks": len(chunks),
            "Min Size": int(np.min(chunk_sizes)),
            "Avg Size": int(np.mean(chunk_sizes)),
            "Max Size": int(np.max(chunk_sizes)),
            "Std Dev": int(np.std(chunk_sizes))
        })
    
    return pd.DataFrame(results)


# Compare strategies
print("\n" + "="*80)
print("CHUNKING STRATEGY COMPARISON")
print("="*80 + "\n")

for doc in documents[:2]:  # Compare on first 2 documents
    print(f"\nDocument: {doc.metadata['type']}")
    print(f"Original size: {count_tokens(doc.content)} tokens\n")
    
    comparison = compare_chunking_strategies(doc)
    print(comparison.to_string(index=False))
    print()

---

## PART 2: EMBEDDINGS & SEMANTIC SEARCH (Lab 4.2)

**Duration:** 30 minutes  
**Objective:** Generate embeddings and implement semantic search

### Challenge 2.1: Generate Embeddings (10 minutes)

In [None]:
class EmbeddingGenerator:
    """Generate and manage embeddings for chunks"""
    
    def __init__(self, model: str = EMBEDDING_MODEL):
        self.model = model
        self.cache = {}  # Simple cache to avoid re-embedding
    
    def embed_text(self, text: str) -> List[float]:
        """Generate embedding for single text"""
        # Check cache
        cache_key = hash(text)
        if cache_key in self.cache:
            return self.cache[cache_key]
        
        # Generate embedding
        embedding = get_embedding(text, self.model)
        
        # Cache it
        self.cache[cache_key] = embedding
        
        return embedding
    
    def embed_chunks(self, chunks: List[Chunk], show_progress: bool = True) -> List[Chunk]:
        """Generate embeddings for all chunks"""
        total = len(chunks)
        
        if show_progress:
            print(f"Generating embeddings for {total} chunks...")
        
        for i, chunk in enumerate(chunks):
            if chunk.embedding is None:
                chunk.embedding = self.embed_text(chunk.content)
            
            if show_progress and (i + 1) % 5 == 0:
                print(f"  Progress: {i + 1}/{total} chunks embedded")
        
        if show_progress:
            print(f"✓ All {total} chunks embedded\n")
        
        return chunks
    
    def get_cache_stats(self) -> Dict[str, int]:
        """Return cache statistics"""
        return {
            "cached_embeddings": len(self.cache),
            "total_dimensions": EMBEDDING_DIMENSIONS
        }


# Generate embeddings for all chunks
print("="*80)
print("GENERATING EMBEDDINGS")
print("="*80 + "\n")

# Use fixed-size chunks from earlier
embedder = EmbeddingGenerator()
embedded_chunks = embedder.embed_chunks(all_chunks, show_progress=True)

# Verify embeddings
sample_chunk = embedded_chunks[0]
print(f"Sample chunk: {sample_chunk.chunk_id}")
print(f"Embedding dimensions: {len(sample_chunk.embedding)}")
print(f"First 10 values: {sample_chunk.embedding[:10]}")
print(f"\nCache stats: {embedder.get_cache_stats()}")

### Challenge 2.2: Semantic Similarity Search (15 minutes)

In [None]:
class SemanticSearchEngine:
    """Perform semantic search across embedded chunks"""
    
    def __init__(self, chunks: List[Chunk]):
        self.chunks = chunks
        self.embeddings = np.array([c.embedding for c in chunks])
        self.embedder = EmbeddingGenerator()
    
    def search(
        self,
        query: str,
        top_k: int = 5,
        min_similarity: float = 0.0
    ) -> List[Tuple[Chunk, float]]:
        """
        Search for chunks most similar to query
        
        Args:
            query: Search query
            top_k: Number of results to return
            min_similarity: Minimum similarity threshold
        
        Returns:
            List of (chunk, similarity_score) tuples
        """
        # Generate query embedding
        query_embedding = np.array(self.embedder.embed_text(query)).reshape(1, -1)
        
        # Calculate cosine similarities
        similarities = cosine_similarity(query_embedding, self.embeddings)[0]
        
        # Get top-k indices
        top_indices = np.argsort(similarities)[::-1][:top_k]
        
        # Filter by minimum similarity
        results = []
        for idx in top_indices:
            similarity = similarities[idx]
            if similarity >= min_similarity:
                results.append((self.chunks[idx], float(similarity)))
        
        return results
    
    def search_with_metadata_filter(
        self,
        query: str,
        metadata_filter: Dict[str, Any],
        top_k: int = 5
    ) -> List[Tuple[Chunk, float]]:
        """Search with metadata filtering"""
        # Filter chunks by metadata
        filtered_chunks = [
            c for c in self.chunks
            if all(c.metadata.get(k) == v for k, v in metadata_filter.items())
        ]
        
        if not filtered_chunks:
            return []
        
        # Create temporary search engine with filtered chunks
        temp_engine = SemanticSearchEngine(filtered_chunks)
        return temp_engine.search(query, top_k)


# Test semantic search
print("\n" + "="*80)
print("SEMANTIC SEARCH TESTING")
print("="*80 + "\n")

search_engine = SemanticSearchEngine(embedded_chunks)

# Test queries
test_queries = [
    "How many vacation days do employees get?",
    "What is the minimum credit score for a personal loan?",
    "What are the AML requirements for new customers?",
    "Can I work from home?",
    "How much is the mortgage origination fee?"
]

for query in test_queries:
    print(f"Query: '{query}'")
    print("-" * 80)
    
    results = search_engine.search(query, top_k=3)
    
    for i, (chunk, similarity) in enumerate(results, 1):
        print(f"\nResult {i} (similarity: {similarity:.3f}):")
        print(f"Source: {chunk.metadata['type']}")
        print(f"Content preview: {chunk.content[:200]}...")
    
    print("\n" + "="*80 + "\n")

---

## PART 3: VECTOR DATABASE & RETRIEVAL (Lab 4.3)

**Duration:** 30 minutes  
**Objective:** Build scalable vector database for production

### Challenge 3.1: ChromaDB Integration (20 minutes)

In [None]:
class VectorDatabase:
    """Production-grade vector database using ChromaDB"""
    
    def __init__(self, collection_name: str = "bank_knowledge"):
        # Initialize ChromaDB client
        self.client = chromadb.Client(Settings(
            anonymized_telemetry=False,
            allow_reset=True
        ))
        
        # Reset if exists (for development)
        try:
            self.client.delete_collection(collection_name)
        except:
            pass
        
        # Create collection
        self.collection = self.client.create_collection(
            name=collection_name,
            metadata={"description": "Bank knowledge base"}
        )
        
        self.embedder = EmbeddingGenerator()
    
    def add_chunks(self, chunks: List[Chunk], show_progress: bool = True):
        """Add chunks to vector database"""
        if show_progress:
            print(f"Adding {len(chunks)} chunks to vector database...")
        
        # Prepare data for batch insertion
        ids = []
        embeddings = []
        documents = []
        metadatas = []
        
        for chunk in chunks:
            ids.append(chunk.chunk_id)
            
            # Ensure chunk has embedding
            if chunk.embedding is None:
                chunk.embedding = self.embedder.embed_text(chunk.content)
            
            embeddings.append(chunk.embedding)
            documents.append(chunk.content)
            metadatas.append(chunk.metadata)
        
        # Batch insert
        self.collection.add(
            ids=ids,
            embeddings=embeddings,
            documents=documents,
            metadatas=metadatas
        )
        
        if show_progress:
            print(f"✓ Added {len(chunks)} chunks to database\n")
    
    def search(
        self,
        query: str,
        n_results: int = 5,
        where: Optional[Dict] = None
    ) -> Dict[str, Any]:
        """
        Search vector database
        
        Args:
            query: Search query
            n_results: Number of results to return
            where: Metadata filter (e.g., {"type": "hr_policies"})
        
        Returns:
            Dictionary with ids, documents, distances, metadatas
        """
        # Generate query embedding
        query_embedding = self.embedder.embed_text(query)
        
        # Search
        results = self.collection.query(
            query_embeddings=[query_embedding],
            n_results=n_results,
            where=where
        )
        
        return results
    
    def hybrid_search(
        self,
        query: str,
        keyword: Optional[str] = None,
        n_results: int = 5
    ) -> Dict[str, Any]:
        """
        Hybrid search combining vector similarity and keyword filtering
        """
        # Semantic search
        semantic_results = self.search(query, n_results=n_results*2)
        
        if keyword is None:
            return semantic_results
        
        # Filter by keyword
        keyword_lower = keyword.lower()
        filtered_ids = []
        filtered_docs = []
        filtered_distances = []
        filtered_metas = []
        
        for i, doc in enumerate(semantic_results['documents'][0]):
            if keyword_lower in doc.lower():
                filtered_ids.append(semantic_results['ids'][0][i])
                filtered_docs.append(doc)
                filtered_distances.append(semantic_results['distances'][0][i])
                filtered_metas.append(semantic_results['metadatas'][0][i])
                
                if len(filtered_ids) >= n_results:
                    break
        
        return {
            'ids': [filtered_ids],
            'documents': [filtered_docs],
            'distances': [filtered_distances],
            'metadatas': [filtered_metas]
        }
    
    def get_stats(self) -> Dict[str, Any]:
        """Get database statistics"""
        return {
            "total_chunks": self.collection.count(),
            "collection_name": self.collection.name
        }


# Initialize vector database
print("="*80)
print("VECTOR DATABASE SETUP")
print("="*80 + "\n")

vector_db = VectorDatabase(collection_name="bank_knowledge_base")
vector_db.add_chunks(embedded_chunks, show_progress=True)

print(f"Database stats: {vector_db.get_stats()}\n")

# Test vector database search
print("="*80)
print("VECTOR DATABASE SEARCH TESTING")
print("="*80 + "\n")

test_queries = [
    ("How many PTO days do I get?", None),
    ("What credit score do I need for a mortgage?", None),
    ("Tell me about AML requirements", "suspicious"),  # Hybrid search
]

for query, keyword in test_queries:
    print(f"Query: '{query}'")
    if keyword:
        print(f"Keyword filter: '{keyword}'")
    print("-" * 80)
    
    if keyword:
        results = vector_db.hybrid_search(query, keyword=keyword, n_results=3)
    else:
        results = vector_db.search(query, n_results=3)
    
    for i, (doc, distance, meta) in enumerate(zip(
        results['documents'][0],
        results['distances'][0],
        results['metadatas'][0]
    ), 1):
        similarity = 1 - distance  # Convert distance to similarity
        print(f"\nResult {i} (similarity: {similarity:.3f}):")
        print(f"Source: {meta['type']}")
        print(f"Preview: {doc[:200]}...")
    
    print("\n" + "="*80 + "\n")

---

## PART 4: COMPLETE RAG PIPELINE (Lab 4.4)

**Duration:** 40 minutes  
**Objective:** Build end-to-end RAG system from query to answer

### Challenge 4.1: Basic RAG Implementation (20 minutes)

In [None]:
class RAGSystem:
    """Complete Retrieval-Augmented Generation system"""
    
    def __init__(self, vector_db: VectorDatabase, model: str = GPT4):
        self.vector_db = vector_db
        self.model = model
        self.query_log = []
    
    def retrieve(
        self,
        query: str,
        n_results: int = 5,
        min_similarity: float = -1.0
    ) -> List[Dict[str, Any]]:
        """
        Retrieve relevant chunks for query
        
        Returns list of chunks with content and metadata
        
        Note: ChromaDB uses L2 distance by default, so we convert to similarity score.
        Lower distance = higher similarity. Default min_similarity of -1.0 accepts all results.
        """
        # Search vector database
        results = self.vector_db.search(query, n_results=n_results)
        
        # Convert to list of dicts and filter by similarity
        retrieved = []
        for i, (doc, distance, meta) in enumerate(zip(
            results['documents'][0],
            results['distances'][0],
            results['metadatas'][0]
        )):
            # Convert distance to similarity-like score (lower distance = better match)
            # Note: With L2 distance, this can be negative for distant matches
            similarity = 1 - distance
            if similarity >= min_similarity:
                retrieved.append({
                    'content': doc,
                    'metadata': meta,
                    'similarity': similarity,
                    'distance': distance  # Keep original distance for reference
                })
        
        return retrieved
    
    def generate_answer(
        self,
        query: str,
        context_chunks: List[Dict[str, Any]],
        include_sources: bool = True
    ) -> str:
        """
        Generate answer using retrieved context
        """
        # Build context from chunks
        context_parts = []
        for i, chunk in enumerate(context_chunks, 1):
            context_parts.append(f"[Source {i}] {chunk['content']}")
        
        context = "\n\n".join(context_parts)
        
        # Create prompt
        prompt = f"""
Answer the following question using ONLY the provided context. Be specific and cite sources.

Context:
{context}

Question: {query}

Instructions:
- Answer based ONLY on the context provided
- If the context doesn't contain enough information, say so
- Cite which source(s) you used (e.g., "According to Source 1...")
- Be concise but complete
- If multiple sources provide relevant info, synthesize them

Answer:
"""
        
        # Generate answer
        answer = call_llm(
            prompt,
            system_prompt="You are a helpful assistant answering questions based on provided context. Always cite your sources.",
            model=self.model
        )
        
        # Optionally append source citations
        if include_sources:
            sources_text = "\n\nSources:\n"
            for i, chunk in enumerate(context_chunks, 1):
                sources_text += f"[{i}] {chunk['metadata']['type']} (similarity: {chunk['similarity']:.2f})\n"
            answer += sources_text
        
        return answer
    
    def query(
        self,
        question: str,
        n_results: int = 5,
        min_similarity: float = -1.0,
        include_sources: bool = True
    ) -> Dict[str, Any]:
        """
        Complete RAG query: retrieve + generate
        
        Returns dictionary with question, answer, retrieved chunks, metrics
        """
        start_time = time.time()
        
        # Retrieve relevant chunks
        retrieved_chunks = self.retrieve(question, n_results, min_similarity)
        retrieval_time = time.time() - start_time
        
        if not retrieved_chunks:
            return {
                'question': question,
                'answer': "I don't have enough information to answer that question.",
                'retrieved_chunks': [],
                'metrics': {
                    'retrieval_time': retrieval_time,
                    'generation_time': 0,
                    'total_time': retrieval_time,
                    'chunks_retrieved': 0,
                    'avg_similarity': 0
                }
            }
        
        # Generate answer
        gen_start = time.time()
        answer = self.generate_answer(question, retrieved_chunks, include_sources)
        generation_time = time.time() - gen_start
        
        total_time = time.time() - start_time
        
        # Log query
        self.query_log.append({
            'timestamp': datetime.now().isoformat(),
            'question': question,
            'chunks_retrieved': len(retrieved_chunks),
            'total_time': total_time
        })
        
        return {
            'question': question,
            'answer': answer,
            'retrieved_chunks': retrieved_chunks,
            'metrics': {
                'retrieval_time': retrieval_time,
                'generation_time': generation_time,
                'total_time': total_time,
                'chunks_retrieved': len(retrieved_chunks),
                'avg_similarity': np.mean([c['similarity'] for c in retrieved_chunks])
            }
        }
    
    def get_query_stats(self) -> Dict[str, Any]:
        """Get statistics about queries"""
        if not self.query_log:
            return {}
        
        return {
            'total_queries': len(self.query_log),
            'avg_time': np.mean([q['total_time'] for q in self.query_log]),
            'avg_chunks_retrieved': np.mean([q['chunks_retrieved'] for q in self.query_log])
        }


# Initialize RAG system
print("="*80)
print("RAG SYSTEM INITIALIZATION")
print("="*80 + "\n")

rag_system = RAGSystem(vector_db, model=GPT4)

print("✓ RAG system ready\n")

# Test RAG system
print("="*80)
print("RAG SYSTEM TESTING")
print("="*80 + "\n")

test_questions = [
    "How many vacation days do full-time employees get per year?",
    "What is the minimum credit score needed for a personal loan?",
    "What are the key AML requirements when opening a new account?",
    "Can employees work remotely and what are the requirements?",
    "What fees are associated with personal loans?"
]

for question in test_questions:
    print(f"Question: {question}")
    print("="*80)
    
    result = rag_system.query(question, n_results=3, include_sources=True)
    
    print(f"\nAnswer:\n{result['answer']}")
    
    print(f"\nMetrics:")
    print(f"  Retrieval time: {result['metrics']['retrieval_time']:.3f}s")
    print(f"  Generation time: {result['metrics']['generation_time']:.3f}s")
    print(f"  Total time: {result['metrics']['total_time']:.3f}s")
    print(f"  Chunks retrieved: {result['metrics']['chunks_retrieved']}")
    print(f"  Avg similarity: {result['metrics']['avg_similarity']:.3f}")
    
    print("\n" + "="*80 + "\n")

### Challenge 4.2: RAG with Query Enhancement (15 minutes)

**Objective:** Improve retrieval with query expansion and reformulation

In [None]:
class EnhancedRAGSystem(RAGSystem):
    """RAG system with query enhancement"""
    
    def expand_query(self, query: str) -> List[str]:
        """
        Generate query variations to improve retrieval
        """
        expansion_prompt = f"""
Generate 2 alternative phrasings of this question that preserve the meaning but use different words:

Original: {query}

Return as JSON array: ["variation 1", "variation 2"]
"""
        
        response = call_llm(expansion_prompt, model=GPT35)
        
        try:
            variations = json.loads(response)
            return [query] + variations
        except:
            return [query]
    
    def query_with_expansion(
        self,
        question: str,
        n_results: int = 5
    ) -> Dict[str, Any]:
        """
        Query with automatic query expansion
        """
        # Expand query
        query_variations = self.expand_query(question)
        
        # Retrieve for each variation
        all_chunks = {}  # Use dict to deduplicate by chunk_id
        
        for variant in query_variations:
            chunks = self.retrieve(variant, n_results=n_results)
            for chunk in chunks:
                chunk_id = chunk['metadata'].get('chunk_num', id(chunk))
                if chunk_id not in all_chunks or chunk['similarity'] > all_chunks[chunk_id]['similarity']:
                    all_chunks[chunk_id] = chunk
        
        # Get top chunks by similarity
        top_chunks = sorted(
            all_chunks.values(),
            key=lambda x: x['similarity'],
            reverse=True
        )[:n_results]
        
        # Generate answer
        answer = self.generate_answer(question, top_chunks, include_sources=True)
        
        return {
            'question': question,
            'query_variations': query_variations,
            'answer': answer,
            'retrieved_chunks': top_chunks
        }


# Test enhanced RAG
print("\n" + "="*80)
print("ENHANCED RAG WITH QUERY EXPANSION")
print("="*80 + "\n")

enhanced_rag = EnhancedRAGSystem(vector_db, model=GPT4)

test_question = "What do I need to qualify for a home loan?"

print(f"Question: {test_question}\n")
result = enhanced_rag.query_with_expansion(test_question, n_results=3)

print(f"Query variations generated:")
for i, var in enumerate(result['query_variations'], 1):
    print(f"  {i}. {var}")

print(f"\nAnswer:\n{result['answer']}")
print("="*80)

---

## PART 5: ADVANCED RAG PATTERNS (Lab 4.5)

**Duration:** 20 minutes  
**Objective:** Implement production-grade RAG enhancements

### Challenge 5.1: Re-ranking for Precision (10 minutes)

In [None]:
class ReRankingRAG(RAGSystem):
    """RAG with two-stage retrieval: broad recall + precise reranking"""
    
    def rerank_with_llm(
        self,
        query: str,
        chunks: List[Dict[str, Any]],
        top_k: int = 5
    ) -> List[Dict[str, Any]]:
        """
        Re-rank chunks using LLM for better precision
        """
        # Create prompt for reranking
        chunks_text = "\n\n".join([
            f"[{i}] {chunk['content'][:300]}..."
            for i, chunk in enumerate(chunks, 1)
        ])
        
        rerank_prompt = f"""
Rate how relevant each text chunk is to answering this question.

Question: {query}

Text Chunks:
{chunks_text}

For each chunk, provide a relevance score from 0-10 where:
- 10 = Perfectly answers the question
- 7-9 = Highly relevant, contains key information
- 4-6 = Somewhat relevant, provides context
- 1-3 = Tangentially related
- 0 = Not relevant

Return JSON array: [score1, score2, score3, ...]
"""
        
        response = call_llm(rerank_prompt, model=GPT35)
        
        try:
            scores = json.loads(response)
            
            # Combine scores with chunks
            for i, chunk in enumerate(chunks):
                if i < len(scores):
                    chunk['rerank_score'] = scores[i]
                else:
                    chunk['rerank_score'] = 0
            
            # Sort by rerank score
            reranked = sorted(chunks, key=lambda x: x['rerank_score'], reverse=True)
            
            return reranked[:top_k]
            
        except:
            # Fallback to original ranking
            return chunks[:top_k]
    
    def query_with_reranking(
        self,
        question: str,
        initial_results: int = 20,
        final_results: int = 5
    ) -> Dict[str, Any]:
        """
        Two-stage retrieval: broad recall + precise reranking
        """
        start_time = time.time()
        
        # Stage 1: Broad retrieval (use permissive threshold to get more candidates)
        initial_chunks = self.retrieve(question, n_results=initial_results, min_similarity=-1.0)
        
        if not initial_chunks:
            # Fallback to basic query with consistent metrics structure
            result = self.query(question, n_results=final_results)
            result['metrics']['initial_retrieval'] = 0
            result['metrics']['final_results'] = len(result['retrieved_chunks'])
            return result
        
        # Stage 2: Rerank
        reranked_chunks = self.rerank_with_llm(question, initial_chunks, top_k=final_results)
        
        # Generate answer
        answer = self.generate_answer(question, reranked_chunks, include_sources=True)
        
        return {
            'question': question,
            'answer': answer,
            'retrieved_chunks': reranked_chunks,
            'metrics': {
                'initial_retrieval': initial_results,
                'final_results': len(reranked_chunks),
                'total_time': time.time() - start_time
            }
        }


# Test re-ranking
print("\n" + "="*80)
print("RE-RANKING RAG TESTING")
print("="*80 + "\n")

reranking_rag = ReRankingRAG(vector_db, model=GPT4)

test_question = "What documentation is required for a mortgage application?"

print(f"Question: {test_question}\n")

# Compare basic vs reranking
print("Basic RAG:")
basic_result = rag_system.query(test_question, n_results=3)
print(f"Chunks: {len(basic_result['retrieved_chunks'])}")
print(f'Similarities: {[f"{c["similarity"]:.2f}" for c in basic_result["retrieved_chunks"]]}\n')

print("Re-ranking RAG:")
rerank_result = reranking_rag.query_with_reranking(test_question, initial_results=10, final_results=3)
print(f"Initial retrieval: {rerank_result['metrics']['initial_retrieval']}")
print(f"Final results: {rerank_result['metrics']['final_results']}")
print(f'Re-rank scores: {[f"{c.get("rerank_score", 0)}/10" for c in rerank_result["retrieved_chunks"]]}\n')

print(f"Answer:\n{rerank_result['answer']}")
print("="*80)

### Challenge 5.2: Evaluation Metrics (10 minutes)

In [None]:
class RAGEvaluator:
    """Evaluate RAG system quality"""
    
    def __init__(self):
        self.test_cases = []
    
    def add_test_case(
        self,
        question: str,
        expected_answer_contains: List[str],
        relevant_doc_types: List[str]
    ):
        """Add test case for evaluation"""
        self.test_cases.append({
            'question': question,
            'expected_answer_contains': expected_answer_contains,
            'relevant_doc_types': relevant_doc_types
        })
    
    def evaluate_retrieval(
        self,
        rag_system: RAGSystem,
        test_case: Dict
    ) -> Dict[str, float]:
        """Evaluate retrieval quality"""
        result = rag_system.query(test_case['question'], n_results=5)
        
        retrieved_types = [
            chunk['metadata']['type']
            for chunk in result['retrieved_chunks']
        ]
        
        # Precision: How many retrieved are relevant?
        relevant_retrieved = sum(
            1 for t in retrieved_types
            if t in test_case['relevant_doc_types']
        )
        precision = relevant_retrieved / len(retrieved_types) if retrieved_types else 0
        
        # Recall: Did we get all relevant doc types?
        recall = len(set(retrieved_types) & set(test_case['relevant_doc_types'])) / len(test_case['relevant_doc_types'])
        
        # F1 score
        f1 = 2 * (precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
        
        return {
            'precision': precision,
            'recall': recall,
            'f1': f1,
            'avg_similarity': result['metrics']['avg_similarity']
        }
    
    def evaluate_answer_quality(
        self,
        rag_system: RAGSystem,
        test_case: Dict
    ) -> Dict[str, Any]:
        """Evaluate generated answer quality"""
        result = rag_system.query(test_case['question'])
        answer = result['answer'].lower()
        
        # Check if expected content is in answer
        contains_count = sum(
            1 for expected in test_case['expected_answer_contains']
            if expected.lower() in answer
        )
        
        completeness = contains_count / len(test_case['expected_answer_contains'])
        
        return {
            'completeness': completeness,
            'answer_length': len(result['answer']),
            'has_sources': 'Source' in result['answer']
        }
    
    def run_evaluation(self, rag_system: RAGSystem) -> pd.DataFrame:
        """Run full evaluation suite"""
        results = []
        
        for test_case in self.test_cases:
            retrieval_metrics = self.evaluate_retrieval(rag_system, test_case)
            answer_metrics = self.evaluate_answer_quality(rag_system, test_case)
            
            results.append({
                'question': test_case['question'][:50] + '...',
                **retrieval_metrics,
                **answer_metrics
            })
        
        return pd.DataFrame(results)


# Create evaluation suite
print("\n" + "="*80)
print("RAG EVALUATION")
print("="*80 + "\n")

evaluator = RAGEvaluator()

# Add test cases
evaluator.add_test_case(
    question="How much PTO do employees get?",
    expected_answer_contains=["20 days", "160 hours", "annual"],
    relevant_doc_types=["hr_policies"]
)

evaluator.add_test_case(
    question="What credit score is needed for a personal loan?",
    expected_answer_contains=["650", "credit score", "minimum"],
    relevant_doc_types=["loan_products"]
)

evaluator.add_test_case(
    question="What are AML customer verification requirements?",
    expected_answer_contains=["government-issued ID", "SSN", "address"],
    relevant_doc_types=["compliance_guidelines"]
)

# Run evaluation
eval_results = evaluator.run_evaluation(rag_system)

print("Evaluation Results:")
print(eval_results.to_string(index=False))

print(f"\n\nOverall Metrics:")
print(f"  Avg Precision: {eval_results['precision'].mean():.2f}")
print(f"  Avg Recall: {eval_results['recall'].mean():.2f}")
print(f"  Avg F1 Score: {eval_results['f1'].mean():.2f}")
print(f"  Avg Completeness: {eval_results['completeness'].mean():.2f}")
print(f"  Avg Similarity: {eval_results['avg_similarity'].mean():.3f}")
print("="*80)