# Day 6 Lab 5: RAG & Vector Databases for Banking

## üéØ Learning Objectives
- Understand RAG architecture
- Implement vector embeddings
- Use FAISS vector database
- Build banking knowledge base
- Compare RAG vs Fine-tuning

## üè¶ Banking Use Case
Build a **Banking Policy Q&A System** using RAG to answer customer questions about loans, credit cards, and accounts.

## ‚è±Ô∏è Duration: 45 minutes
## üí∞ Cost: ~$0.15 (Bedrock API calls)

## Setup

In [None]:
# Install required packages
!pip install -q faiss-cpu boto3 numpy pandas langchain

In [None]:
import boto3
import json
import numpy as np
import faiss
from typing import List, Dict
import time

# Initialize AWS clients
bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')
s3 = boto3.client('s3')

print("‚úÖ Libraries imported successfully")

## Part 1: Create Banking Knowledge Base

In [None]:
# Banking policy documents
banking_documents = [
    {
        "id": "loan_policy_1",
        "title": "Personal Loan Requirements",
        "content": """SecureBank offers personal loans from $1,000 to $50,000. 
        Minimum credit score required is 650. Interest rates range from 6.5% to 12.5% APR 
        based on creditworthiness. Loan terms available: 12, 24, 36, 48, or 60 months. 
        Required documents: government-issued ID, proof of income (pay stubs or tax returns), 
        and recent bank statements."""
    },
    {
        "id": "loan_policy_2",
        "title": "Loan Approval Process",
        "content": """The loan approval process takes 24-48 hours. Step 1: Submit online application. 
        Step 2: Credit check and income verification. Step 3: Loan officer review. 
        Step 4: Final approval and fund disbursement. Applicants must be 21-65 years old, 
        have minimum annual income of $30,000, and debt-to-income ratio below 40%."""
    },
    {
        "id": "credit_card_1",
        "title": "Credit Card Types",
        "content": """SecureBank offers three credit card tiers: Classic Card ($500-$5,000 limit, 
        18.99% APR, no annual fee), Gold Card ($5,000-$15,000 limit, 15.99% APR, $95 annual fee), 
        and Platinum Card ($15,000-$50,000 limit, 12.99% APR, $195 annual fee). 
        All cards include fraud protection and 24/7 customer support."""
    },
    {
        "id": "credit_card_2",
        "title": "Credit Card Rewards",
        "content": """Earn rewards on every purchase: 1% cashback on all purchases, 
        2% on groceries and gas, 3% on travel bookings. Gold and Platinum cards include 
        travel insurance, purchase protection, and extended warranty. No foreign transaction fees 
        on Platinum cards."""
    },
    {
        "id": "savings_1",
        "title": "Savings Account Types",
        "content": """Three savings account options: Basic Savings (0.5% APY, $100 minimum balance), 
        High-Yield Savings (2.5% APY, $10,000 minimum), Premium Savings (3.5% APY, $50,000 minimum). 
        All accounts are FDIC insured up to $250,000. No monthly maintenance fees. 
        Interest compounded daily and credited monthly."""
    },
    {
        "id": "savings_2",
        "title": "Savings Account Features",
        "content": """Free online and mobile banking with all savings accounts. 
        Nationwide ATM access. Automatic savings plans available. Federal regulation allows 
        up to 6 withdrawals per month. Unlimited deposits. No penalties for maintaining 
        minimum balance. Link to checking account for overdraft protection."""
    }
]

print(f"üìö Created knowledge base with {len(banking_documents)} documents")
for doc in banking_documents:
    print(f"  - {doc['title']}")

## Part 2: Generate Vector Embeddings

In [None]:
def get_embedding(text: str) -> List[float]:
    """
    Generate vector embedding using Amazon Titan Embeddings
    """
    body = json.dumps({
        "inputText": text
    })
    
    response = bedrock.invoke_model(
        modelId="amazon.titan-embed-text-v1",
        body=body
    )
    
    response_body = json.loads(response['body'].read())
    return response_body['embedding']

# Generate embeddings for all documents
print("üîÑ Generating embeddings...")
embeddings = []
for doc in banking_documents:
    embedding = get_embedding(doc['content'])
    embeddings.append(embedding)
    print(f"  ‚úÖ {doc['title']}: {len(embedding)} dimensions")

# Convert to numpy array
embeddings_array = np.array(embeddings).astype('float32')
print(f"\nüìä Embeddings shape: {embeddings_array.shape}")

## Part 3: Create FAISS Vector Database

In [None]:
# Create FAISS index
dimension = embeddings_array.shape[1]
index = faiss.IndexFlatL2(dimension)  # L2 distance (Euclidean)

# Add embeddings to index
index.add(embeddings_array)

print(f"‚úÖ FAISS index created")
print(f"  Dimension: {dimension}")
print(f"  Total vectors: {index.ntotal}")
print(f"  Index type: L2 (Euclidean distance)")

## Part 4: Implement Retrieval Function

In [None]:
def retrieve_documents(query: str, top_k: int = 3) -> List[Dict]:
    """
    Retrieve top-k most relevant documents for a query
    """
    # Generate query embedding
    query_embedding = get_embedding(query)
    query_vector = np.array([query_embedding]).astype('float32')
    
    # Search FAISS index
    distances, indices = index.search(query_vector, top_k)
    
    # Retrieve documents
    results = []
    for i, idx in enumerate(indices[0]):
        results.append({
            'document': banking_documents[idx],
            'distance': float(distances[0][i]),
            'rank': i + 1
        })
    
    return results

# Test retrieval
test_query = "What is the interest rate for personal loans?"
print(f"üîç Query: {test_query}\n")

results = retrieve_documents(test_query, top_k=3)
for result in results:
    print(f"Rank {result['rank']}: {result['document']['title']}")
    print(f"  Distance: {result['distance']:.4f}")
    print(f"  Content: {result['document']['content'][:100]}...\n")

## Part 5: Build RAG Pipeline

In [None]:
def rag_query(question: str, top_k: int = 3) -> Dict:
    """
    Complete RAG pipeline: Retrieve + Generate
    """
    # Step 1: Retrieve relevant documents
    retrieved_docs = retrieve_documents(question, top_k)
    
    # Step 2: Create context from retrieved documents
    context = "\n\n".join([
        f"Document {r['rank']}: {r['document']['title']}\n{r['document']['content']}"
        for r in retrieved_docs
    ])
    
    # Step 3: Create prompt with context
    prompt = f"""You are a helpful SecureBank customer service assistant. 
Use the following banking policy documents to answer the customer's question accurately.

Context:
{context}

Customer Question: {question}

Provide a clear, accurate answer based ONLY on the information in the documents. 
If the information is not available, say so. Include relevant details like rates, fees, or requirements.

Answer:"""
    
    # Step 4: Generate answer with Claude
    body = json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 500,
        "messages": [{
            "role": "user",
            "content": prompt
        }]
    })
    
    response = bedrock.invoke_model(
        modelId="anthropic.claude-3-sonnet-20240229-v1:0",
        body=body
    )
    
    response_body = json.loads(response['body'].read())
    answer = response_body['content'][0]['text']
    
    return {
        'question': question,
        'answer': answer,
        'sources': [r['document']['title'] for r in retrieved_docs],
        'retrieved_docs': retrieved_docs
    }

print("‚úÖ RAG pipeline ready")

## Part 6: Test Banking Q&A System

In [None]:
# Test questions
test_questions = [
    "What is the minimum credit score needed for a personal loan?",
    "What are the benefits of the Platinum credit card?",
    "How much interest do I earn on a High-Yield Savings account?",
    "What documents do I need to apply for a loan?",
    "Are there any monthly fees for savings accounts?"
]

print("ü§ñ Testing Banking Q&A System\n")
print("="*80)

for i, question in enumerate(test_questions, 1):
    print(f"\n‚ùì Question {i}: {question}")
    print("-"*80)
    
    result = rag_query(question)
    
    print(f"\nüí¨ Answer:\n{result['answer']}")
    print(f"\nüìö Sources: {', '.join(result['sources'])}")
    print("="*80)
    
    time.sleep(1)  # Rate limiting

## Part 7: RAG vs Fine-tuning Comparison

In [None]:
comparison = {
    "Aspect": [
        "Knowledge Updates",
        "Training Required",
        "Cost",
        "Latency",
        "Source Attribution",
        "Accuracy",
        "Maintenance"
    ],
    "RAG": [
        "Real-time (just update docs)",
        "No training needed",
        "Low ($0.15 for this lab)",
        "Higher (retrieval + generation)",
        "Yes (can cite sources)",
        "High (uses latest info)",
        "Easy (update documents)"
    ],
    "Fine-tuning": [
        "Requires retraining",
        "Yes (hours/days)",
        "High ($100-$1000+)",
        "Lower (direct inference)",
        "No (knowledge baked in)",
        "High (for specific tasks)",
        "Complex (retrain for updates)"
    ]
}

import pandas as pd
df = pd.DataFrame(comparison)
print("\n‚öñÔ∏è RAG vs Fine-tuning Comparison:\n")
print(df.to_string(index=False))

print("\nüí° Decision Framework:")
print("  ‚úÖ Use RAG when:")
print("     - Knowledge changes frequently (policies, rates)")
print("     - Need source attribution")
print("     - Want low-cost solution")
print("     - Quick deployment needed")
print("\n  ‚úÖ Use Fine-tuning when:")
print("     - Need specific output format")
print("     - Knowledge is stable")
print("     - Lower latency critical")
print("     - Domain-specific behavior needed")

## Part 8: RAG Optimization Techniques

In [None]:
# Test different chunk sizes
def test_chunk_sizes():
    print("üî¨ Testing Chunk Size Impact:\n")
    
    chunk_sizes = [100, 250, 500]
    test_query = "What credit score do I need?"
    
    for size in chunk_sizes:
        # Simulate chunking (simplified)
        print(f"Chunk size: {size} characters")
        print(f"  Pros: {'More context' if size > 300 else 'More precise'}")
        print(f"  Cons: {'Less precise' if size > 300 else 'Less context'}")
        print()

test_chunk_sizes()

print("\nüí° Optimization Tips:")
print("  1. Chunk size: 500-1000 characters (balance context vs precision)")
print("  2. Overlap: 10-20% between chunks")
print("  3. Top-k: 3-5 documents (more = more context but slower)")
print("  4. Embedding model: Titan (fast) vs Cohere (more accurate)")
print("  5. Reranking: Use cross-encoder for better relevance")

## Part 9: Production Considerations

In [None]:
print("üè≠ Production RAG System Checklist:\n")

checklist = {
    "Component": [
        "Vector Database",
        "Embedding Model",
        "LLM",
        "Document Store",
        "Caching",
        "Monitoring",
        "Security"
    ],
    "Development": [
        "FAISS (local)",
        "Titan Embeddings",
        "Claude Sonnet",
        "Local files",
        "None",
        "Basic logging",
        "IAM roles"
    ],
    "Production": [
        "OpenSearch/Pinecone",
        "Titan/Cohere",
        "Claude Sonnet 4.5",
        "S3 + versioning",
        "ElastiCache",
        "CloudWatch + X-Ray",
        "VPC + encryption"
    ]
}

df_prod = pd.DataFrame(checklist)
print(df_prod.to_string(index=False))

print("\nüí∞ Cost Estimates (1M queries/month):")
print("  Embeddings: $0.10 per 1M tokens")
print("  LLM (Claude): $3 per 1M input tokens")
print("  Vector DB: $50-200/month (OpenSearch)")
print("  Total: ~$300-500/month")
print("\n  vs Fine-tuning: $1000+ one-time + retraining costs")

## Cleanup

In [None]:
# No cleanup needed - FAISS is in-memory
print("‚úÖ No cleanup required (FAISS is in-memory)")
print("üí° In production, remember to:")
print("  - Delete OpenSearch domains when not in use")
print("  - Clean up S3 buckets")
print("  - Remove unused embeddings")

## üéì Key Takeaways

### RAG Architecture:
1. **Retrieval**: Find relevant documents using vector similarity
2. **Augmentation**: Add context to prompt
3. **Generation**: LLM generates answer with context

### Vector Databases:
- **FAISS**: Fast, local, good for prototyping
- **OpenSearch**: Production-ready, scalable, hybrid search
- **Pinecone**: Fully managed, easy to use
- **pgvector**: Good if already using PostgreSQL

### When to Use RAG:
- ‚úÖ Knowledge changes frequently
- ‚úÖ Need source attribution
- ‚úÖ Want low-cost solution
- ‚úÖ Quick deployment needed
- ‚úÖ Multiple knowledge sources

### RAG vs Fine-tuning:
- **RAG**: Real-time updates, lower cost, source attribution
- **Fine-tuning**: Specific format, lower latency, frozen knowledge
- **Hybrid**: Best of both (fine-tune for format, RAG for knowledge)

### Production Best Practices:
1. Use managed vector database (OpenSearch/Pinecone)
2. Implement caching for common queries
3. Monitor latency and accuracy
4. Version your documents
5. Implement fallback strategies
6. Use reranking for better relevance
7. Optimize chunk size and overlap

### Banking Use Cases:
- ‚úÖ Policy Q&A (this lab)
- ‚úÖ Compliance document search
- ‚úÖ Product recommendations
- ‚úÖ Customer support automation
- ‚úÖ Internal knowledge base