# üîç Module 4: RAG Fundamentals & Vector Databases

**AI Agent Architectures Workshop - Day 1**

This notebook covers:
- Tokenization and embeddings fundamentals
- Azure AI Search for semantic search
- Building vector databases with Azure-native tools
- Hybrid search (keyword + semantic)

**Prerequisites:** Run `00_setup.ipynb` first to configure Azure OpenAI credentials.

## 1. Setup and Dependencies

In [None]:
# Install required packages
!pip install openai azure-search-documents azure-identity python-dotenv faiss-cpu numpy --quiet

In [None]:
import os
import json
import numpy as np

# =============================================================================
# GOOGLE COLAB SETUP - Add these secrets (click üîë icon):
#   - AZURE_OPENAI_KEY: Your API key for chat model
#   - AZURE_OPENAI_ENDPOINT: https://xxx.openai.azure.com/ (chat model resource)
#   - AZURE_OPENAI_DEPLOYMENT: Your chat model deployment name (e.g., gpt-4o)
#
# For embeddings (can be same or different resource):
#   - AZURE_OPENAI_EMBEDDING: Your embedding deployment name (e.g., text-embedding-3-small)
#   - AZURE_OPENAI_EMBEDDING_ENDPOINT: (optional) If embedding is in different resource
#   - AZURE_OPENAI_EMBEDDING_KEY: (optional) If embedding is in different resource
# =============================================================================

DEMO_MODE = False
client = None
embedding_client = None
MODEL_NAME = "gpt-4o"
EMBEDDING_MODEL = "text-embedding-3-small"

try:
    from google.colab import userdata
    AZURE_OPENAI_KEY = userdata.get('AZURE_OPENAI_KEY')
    AZURE_OPENAI_ENDPOINT = userdata.get('AZURE_OPENAI_ENDPOINT')
    
    # Chat model settings
    try:
        MODEL_NAME = userdata.get('AZURE_OPENAI_DEPLOYMENT')
    except:
        pass
    
    # Embedding settings - check for separate endpoint/key
    try:
        EMBEDDING_MODEL = userdata.get('AZURE_OPENAI_EMBEDDING')
    except:
        pass
    
    # Check if embedding uses different resource
    try:
        EMBEDDING_ENDPOINT = userdata.get('AZURE_OPENAI_EMBEDDING_ENDPOINT')
        EMBEDDING_KEY = userdata.get('AZURE_OPENAI_EMBEDDING_KEY')
    except:
        EMBEDDING_ENDPOINT = None
        EMBEDDING_KEY = None
    
    if AZURE_OPENAI_KEY and AZURE_OPENAI_ENDPOINT:
        if not AZURE_OPENAI_ENDPOINT.startswith('http'):
            AZURE_OPENAI_ENDPOINT = 'https://' + AZURE_OPENAI_ENDPOINT
        print(f"‚úÖ Chat credentials loaded. Model: {MODEL_NAME}")
        print(f"‚úÖ Embedding model: {EMBEDDING_MODEL}")
        if EMBEDDING_ENDPOINT:
            print(f"   (Using separate embedding endpoint)")
    else:
        raise ValueError("Missing")
except Exception as e:
    print(f"‚ö†Ô∏è Running in DEMO MODE: {e}")
    DEMO_MODE = True

if not DEMO_MODE:
    from openai import AzureOpenAI
    
    # Chat client
    client = AzureOpenAI(
        api_key=AZURE_OPENAI_KEY,
        api_version="2024-06-01",
        azure_endpoint=AZURE_OPENAI_ENDPOINT
    )
    
    # Embedding client - use separate endpoint if provided
    if EMBEDDING_ENDPOINT and EMBEDDING_KEY:
        if not EMBEDDING_ENDPOINT.startswith('http'):
            EMBEDDING_ENDPOINT = 'https://' + EMBEDDING_ENDPOINT
        embedding_client = AzureOpenAI(
            api_key=EMBEDDING_KEY,
            api_version="2024-06-01",
            azure_endpoint=EMBEDDING_ENDPOINT
        )
    else:
        embedding_client = client  # Use same client for both
    
    print("‚úÖ Clients ready")

## 2. Understanding Tokenization

In [None]:
# Install tiktoken for tokenization analysis
!pip install tiktoken --quiet
import tiktoken

# Get the tokenizer for GPT-4
encoding = tiktoken.encoding_for_model("gpt-4")

# Banking text examples
texts = [
    "What is my account balance?",
    "I want to dispute a $500 charge from Amazon on my credit card.",
    "The customer's SSN is 123-45-6789 and their account number is 9876543210."
]

print("=== Tokenization Analysis ===")
for text in texts:
    tokens = encoding.encode(text)
    print(f"\nText: '{text}'")
    print(f"Token count: {len(tokens)}")
    print(f"Tokens: {tokens[:10]}..." if len(tokens) > 10 else f"Tokens: {tokens}")
    print(f"Decoded: {[encoding.decode([t]) for t in tokens[:5]]}...")

## 3. Creating Embeddings with Azure OpenAI

In [None]:
# =============================================================================
# EMBEDDING SETUP
# Uses Azure OpenAI embedding model (text-embedding-3-small recommended)
# =============================================================================

def get_embedding(text: str) -> list:
    """Get embedding vector using Azure OpenAI"""
    response = embedding_client.embeddings.create(
        model=EMBEDDING_MODEL,
        input=text
    )
    return response.data[0].embedding

# Test embedding
test_text = "How do I check my account balance?"
embedding = get_embedding(test_text)

print(f"Text: '{test_text}'")
print(f"Embedding model: {EMBEDDING_MODEL}")
print(f"Embedding dimensions: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")
print(f"Vector magnitude: {np.linalg.norm(embedding):.4f}")

In [None]:
# Demonstrate semantic similarity
from numpy import dot
from numpy.linalg import norm

def cosine_similarity(a, b):
    """Calculate cosine similarity between two vectors"""
    return dot(a, b) / (norm(a) * norm(b))

# Banking queries - some similar, some different
queries = [
    "How do I check my account balance?",
    "What is my current balance?",
    "Show me how much money I have",
    "I want to transfer money to another account",
    "What are your mortgage rates?"
]

# Get embeddings for all queries
embeddings = {q: get_embedding(q) for q in queries}

# Compare first query to all others
base_query = queries[0]
print(f"Base query: '{base_query}'\n")
print("Similarity scores:")
for query in queries[1:]:
    similarity = cosine_similarity(embeddings[base_query], embeddings[query])
    print(f"  {similarity:.4f} - '{query}'")

## 4. Building a Simple Vector Database with FAISS

In [None]:
import faiss

# Sample banking policy documents
banking_docs = [
    {
        "id": "policy_001",
        "title": "Transaction Dispute Policy",
        "content": "Customers can dispute unauthorized transactions within 60 days of the statement date. To initiate a dispute, contact customer service or use the mobile app. Provisional credit may be issued within 10 business days while the investigation is ongoing.",
        "category": "disputes"
    },
    {
        "id": "policy_002",
        "title": "Wire Transfer Limits",
        "content": "Daily wire transfer limits are $50,000 for personal accounts and $250,000 for business accounts. International transfers may have additional fees of $25-45. Same-day transfers must be initiated before 4 PM EST.",
        "category": "transfers"
    },
    {
        "id": "policy_003",
        "title": "Fraud Protection Policy",
        "content": "Zero liability protection covers unauthorized transactions reported within 2 business days. After 2 days, liability may increase up to $500. We use AI-powered fraud detection to monitor suspicious activity 24/7.",
        "category": "security"
    },
    {
        "id": "policy_004",
        "title": "Account Balance Inquiry",
        "content": "Check your account balance anytime through online banking, mobile app, ATM, or by calling customer service. Real-time balance updates are available for all checking and savings accounts.",
        "category": "accounts"
    },
    {
        "id": "policy_005",
        "title": "Mortgage Rate Information",
        "content": "Current mortgage rates: 30-year fixed at 6.5%, 15-year fixed at 5.9%, 5/1 ARM at 5.5%. Rates are subject to change daily. Pre-approval is valid for 90 days. Minimum credit score of 620 required.",
        "category": "loans"
    }
]

print(f"Loaded {len(banking_docs)} banking policy documents")

In [None]:
# Create embeddings for all documents
print("Creating embeddings for documents...")
for doc in banking_docs:
    doc["embedding"] = get_embedding(doc["content"])
    print(f"  ‚úì {doc['title']}")

# Build FAISS index
dimension = len(banking_docs[0]["embedding"])
index = faiss.IndexFlatIP(dimension)  # Inner product (cosine similarity for normalized vectors)

# Normalize and add vectors to index
vectors = np.array([doc["embedding"] for doc in banking_docs]).astype('float32')
faiss.normalize_L2(vectors)  # Normalize for cosine similarity
index.add(vectors)

print(f"\n‚úÖ FAISS index built with {index.ntotal} vectors")

In [None]:
def search_documents(query: str, k: int = 3) -> list:
    """Search for similar documents using FAISS"""
    # Get query embedding
    query_embedding = np.array([get_embedding(query)]).astype('float32')
    faiss.normalize_L2(query_embedding)
    
    # Search
    scores, indices = index.search(query_embedding, k)
    
    # Return results with scores
    results = []
    for i, (score, idx) in enumerate(zip(scores[0], indices[0])):
        doc = banking_docs[idx]
        results.append({
            "rank": i + 1,
            "score": float(score),
            "title": doc["title"],
            "content": doc["content"],
            "category": doc["category"]
        })
    return results

# Test search
test_queries = [
    "How do I report a fraudulent transaction?",
    "What are the current mortgage rates?",
    "Can I send money internationally?"
]

for query in test_queries:
    print(f"\nüîç Query: '{query}'")
    results = search_documents(query, k=2)
    for r in results:
        print(f"   [{r['score']:.4f}] {r['title']}")

## 5. Complete RAG Pipeline

In [None]:
def rag_query(question: str, k: int = 3) -> dict:
    """Complete RAG pipeline: Retrieve relevant docs and generate answer"""
    
    # Step 1: Retrieve relevant documents
    retrieved_docs = search_documents(question, k=k)
    
    # Step 2: Build context from retrieved documents
    context_parts = []
    for doc in retrieved_docs:
        context_parts.append(f"[{doc['title']}]: {doc['content']}")
    context = "\n\n".join(context_parts)
    
    # Step 3: Generate response with context
    system_prompt = f"""You are a helpful banking assistant. Answer the customer's question using ONLY the information provided in the context below. If the answer is not in the context, say "I don't have information about that in our policies."

Context:
{context}

Always cite which policy document you're referencing in your answer."""

    response = client.chat.completions.create(
        model=MODEL_NAME,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": question}
        ],
        temperature=0.3
    )
    
    return {
        "question": question,
        "answer": response.choices[0].message.content,
        "sources": [doc["title"] for doc in retrieved_docs],
        "retrieval_scores": [doc["score"] for doc in retrieved_docs]
    }

# Test RAG pipeline
result = rag_query("How long do I have to report fraud on my account?")

print(f"‚ùì Question: {result['question']}")
print(f"\nüí¨ Answer: {result['answer']}")
print(f"\nüìö Sources: {', '.join(result['sources'])}")

In [None]:
# Test with multiple banking questions
banking_questions = [
    "What is the daily limit for wire transfers?",
    "How can I check my account balance?",
    "What credit score do I need for a mortgage?",
    "How do I dispute a charge on my credit card?"
]

print("=== Banking RAG Q&A ===")
for question in banking_questions:
    result = rag_query(question)
    print(f"\n‚ùì {question}")
    print(f"üí¨ {result['answer'][:200]}..." if len(result['answer']) > 200 else f"üí¨ {result['answer']}")
    print(f"üìö Sources: {', '.join(result['sources'][:2])}")
    print("-" * 50)

## 6. Chunking Strategies for Banking Documents

In [None]:
def chunk_document(text: str, chunk_size: int = 500, overlap: int = 50) -> list:
    """Split document into overlapping chunks"""
    chunks = []
    start = 0
    while start < len(text):
        end = start + chunk_size
        chunk = text[start:end]
        chunks.append({
            "text": chunk,
            "start": start,
            "end": min(end, len(text))
        })
        start = end - overlap
    return chunks

# Example: Long banking document
long_document = """
MORTGAGE APPLICATION REQUIREMENTS

Section 1: Income Verification
Applicants must provide proof of income for the past two years. Acceptable documents include W-2 forms, tax returns, and pay stubs from the last 30 days. Self-employed applicants must provide business tax returns and profit/loss statements.

Section 2: Credit Requirements
A minimum credit score of 620 is required for conventional loans. FHA loans may accept scores as low as 580 with a 3.5% down payment. Higher credit scores qualify for better interest rates.

Section 3: Down Payment
Conventional loans require a minimum 3% down payment for first-time buyers. A 20% down payment eliminates the need for private mortgage insurance (PMI). Gift funds are acceptable with proper documentation.

Section 4: Property Requirements
The property must be appraised by a licensed appraiser. The appraisal must meet or exceed the purchase price. Properties must meet minimum safety and habitability standards.
"""

chunks = chunk_document(long_document, chunk_size=300, overlap=50)
print(f"Document split into {len(chunks)} chunks:\n")
for i, chunk in enumerate(chunks):
    print(f"Chunk {i+1} (chars {chunk['start']}-{chunk['end']}):")
    print(f"  '{chunk['text'][:100]}...'\n")

## 7. Exercise: Build Your Own Banking RAG

**Task:** Extend the RAG system to:
1. Add metadata filtering (e.g., only search "security" category documents)
2. Implement hybrid search (combine keyword and semantic search)
3. Add source citations to the response

In [None]:
# Exercise: Implement filtered RAG search
def filtered_rag_query(question: str, category_filter: str = None, k: int = 3) -> dict:
    """
    TODO: Implement RAG with category filtering
    
    1. Filter documents by category before search
    2. Search only filtered documents
    3. Generate response with filtered context
    """
    # Your implementation here
    pass

# Test your implementation
# result = filtered_rag_query("How do I report fraud?", category_filter="security")
# print(result)

---
## üéÅ BONUS: Production Azure AI Search Integration

This section shows how to use **Azure AI Search** for production-grade vector search with:
- Hybrid search (keyword + semantic + vector)
- Scalable indexing
- Built-in security and compliance

**Prerequisites:** Azure subscription and Azure AI Search service

In [None]:
# =============================================================================
# AZURE CLI: Create Azure AI Search Service
# =============================================================================
# Run these commands in Azure Cloud Shell or local terminal with Azure CLI
#
# # Set variables
# RESOURCE_GROUP="rg-ai-workshop"
# LOCATION="eastus"
# SEARCH_SERVICE="search-banking-workshop"  # Must be globally unique
#
# # Create resource group (if not exists)
# az group create --name $RESOURCE_GROUP --location $LOCATION
#
# # Create Azure AI Search (Free tier for testing)
# az search service create \
#     --name $SEARCH_SERVICE \
#     --resource-group $RESOURCE_GROUP \
#     --location $LOCATION \
#     --sku free
#
# # Get the admin key
# az search admin-key show \
#     --service-name $SEARCH_SERVICE \
#     --resource-group $RESOURCE_GROUP
#
# # Get the endpoint
# echo "Endpoint: https://${SEARCH_SERVICE}.search.windows.net"
#
# =============================================================================
# COST ESTIMATES:
# - Free tier: 50 MB storage, 3 indexes (good for testing)
# - Basic tier: ~$75/month, 2 GB storage, 15 indexes
# - Standard S1: ~$250/month, 25 GB storage, 50 indexes
# =============================================================================

print("Azure CLI commands ready - run in terminal to create Azure AI Search")

In [None]:
# =============================================================================
# Azure AI Search Client Setup
# Add these secrets to Google Colab (click üîë icon):
#   - AZURE_SEARCH_ENDPOINT: https://your-search.search.windows.net
#   - AZURE_SEARCH_KEY: Your admin key
# =============================================================================

SEARCH_ENABLED = False
search_client = None
index_client = None

try:
    from google.colab import userdata
    AZURE_SEARCH_ENDPOINT = userdata.get('AZURE_SEARCH_ENDPOINT')
    AZURE_SEARCH_KEY = userdata.get('AZURE_SEARCH_KEY')
    
    if AZURE_SEARCH_ENDPOINT and AZURE_SEARCH_KEY:
        from azure.search.documents import SearchClient
        from azure.search.documents.indexes import SearchIndexClient
        from azure.core.credentials import AzureKeyCredential
        
        credential = AzureKeyCredential(AZURE_SEARCH_KEY)
        index_client = SearchIndexClient(endpoint=AZURE_SEARCH_ENDPOINT, credential=credential)
        SEARCH_ENABLED = True
        print(f"‚úÖ Azure AI Search connected: {AZURE_SEARCH_ENDPOINT}")
    else:
        print("‚ö†Ô∏è Azure AI Search credentials not found - skipping")
except Exception as e:
    print(f"‚ö†Ô∏è Azure AI Search not configured: {e}")

In [None]:
# =============================================================================
# Create Search Index with Vector Field
# =============================================================================

if SEARCH_ENABLED:
    from azure.search.documents.indexes.models import (
        SearchIndex,
        SearchField,
        SearchFieldDataType,
        VectorSearch,
        HnswAlgorithmConfiguration,
        VectorSearchProfile,
        SemanticConfiguration,
        SemanticField,
        SemanticPrioritizedFields,
        SemanticSearch
    )
    
    INDEX_NAME = "banking-policies-workshop"
    
    # Define fields
    fields = [
        SearchField(name="id", type=SearchFieldDataType.String, key=True, filterable=True),
        SearchField(name="title", type=SearchFieldDataType.String, searchable=True),
        SearchField(name="content", type=SearchFieldDataType.String, searchable=True),
        SearchField(name="category", type=SearchFieldDataType.String, filterable=True, facetable=True),
        SearchField(
            name="embedding",
            type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
            searchable=True,
            vector_search_dimensions=1536,
            vector_search_profile_name="vector-profile"
        )
    ]
    
    # Vector search configuration
    vector_search = VectorSearch(
        algorithms=[HnswAlgorithmConfiguration(name="hnsw-config")],
        profiles=[VectorSearchProfile(name="vector-profile", algorithm_configuration_name="hnsw-config")]
    )
    
    # Semantic search configuration
    semantic_config = SemanticConfiguration(
        name="semantic-config",
        prioritized_fields=SemanticPrioritizedFields(
            title_field=SemanticField(field_name="title"),
            content_fields=[SemanticField(field_name="content")]
        )
    )
    semantic_search = SemanticSearch(configurations=[semantic_config])
    
    # Create index
    index = SearchIndex(
        name=INDEX_NAME,
        fields=fields,
        vector_search=vector_search,
        semantic_search=semantic_search
    )
    
    result = index_client.create_or_update_index(index)
    print(f"‚úÖ Created index: {result.name}")
else:
    print("‚è≠Ô∏è Skipping - Azure AI Search not configured")

In [None]:
# =============================================================================
# Upload Documents to Azure AI Search
# =============================================================================

if SEARCH_ENABLED:
    from azure.search.documents import SearchClient
    
    search_client = SearchClient(
        endpoint=AZURE_SEARCH_ENDPOINT,
        index_name=INDEX_NAME,
        credential=credential
    )
    
    # Prepare documents with embeddings (reuse from earlier)
    docs_to_upload = []
    for doc in banking_docs:
        docs_to_upload.append({
            "id": doc["id"],
            "title": doc["title"],
            "content": doc["content"],
            "category": doc["category"],
            "embedding": doc["embedding"]
        })
    
    # Upload
    result = search_client.upload_documents(documents=docs_to_upload)
    print(f"‚úÖ Uploaded {len(docs_to_upload)} documents")
    print(f"   Succeeded: {sum(1 for r in result if r.succeeded)}")
else:
    print("‚è≠Ô∏è Skipping - Azure AI Search not configured")

In [None]:
# =============================================================================
# Hybrid Search: Vector + Keyword + Semantic
# =============================================================================

if SEARCH_ENABLED:
    from azure.search.documents.models import VectorizedQuery
    
    def azure_hybrid_search(query: str, k: int = 3) -> list:
        """Perform hybrid search combining vector, keyword, and semantic ranking"""
        
        # Get query embedding
        query_embedding = get_embedding(query)
        
        # Vector query
        vector_query = VectorizedQuery(
            vector=query_embedding,
            k_nearest_neighbors=k,
            fields="embedding"
        )
        
        # Hybrid search with semantic ranking
        results = search_client.search(
            search_text=query,  # Keyword search
            vector_queries=[vector_query],  # Vector search
            query_type="semantic",  # Semantic ranking
            semantic_configuration_name="semantic-config",
            top=k,
            select=["id", "title", "content", "category"]
        )
        
        return [{"title": r["title"], "content": r["content"], "score": r["@search.score"]} for r in results]
    
    # Test hybrid search
    test_query = "How do I report fraudulent activity on my account?"
    print(f"üîç Query: '{test_query}'\n")
    
    results = azure_hybrid_search(test_query)
    for i, r in enumerate(results, 1):
        print(f"{i}. [{r['score']:.4f}] {r['title']}")
        print(f"   {r['content'][:100]}...\n")
else:
    print("‚è≠Ô∏è Skipping - Azure AI Search not configured")

In [None]:
# =============================================================================
# Production RAG with Azure AI Search
# =============================================================================

if SEARCH_ENABLED and not DEMO_MODE:
    def azure_rag_query(question: str, k: int = 3) -> dict:
        """Production RAG using Azure AI Search"""
        
        # Retrieve with hybrid search
        retrieved = azure_hybrid_search(question, k=k)
        
        # Build context
        context = "\n\n".join([f"[{r['title']}]: {r['content']}" for r in retrieved])
        
        # Generate response
        system_prompt = f"""You are a banking assistant. Answer using ONLY the context below.
If the answer isn't in the context, say so. Cite your sources.

Context:
{context}"""
        
        response = client.chat.completions.create(
            model=MODEL_NAME,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": question}
            ],
            temperature=0.3
        )
        
        return {
            "question": question,
            "answer": response.choices[0].message.content,
            "sources": [r["title"] for r in retrieved]
        }
    
    # Test production RAG
    result = azure_rag_query("What are the wire transfer limits?")
    print(f"‚ùì {result['question']}")
    print(f"\nüí¨ {result['answer']}")
    print(f"\nüìö Sources: {', '.join(result['sources'])}")
else:
    print("‚è≠Ô∏è Skipping - requires both Azure AI Search and Azure OpenAI")

In [None]:
# =============================================================================
# Cleanup (Optional)
# =============================================================================

# Uncomment to delete the index when done
# if SEARCH_ENABLED:
#     index_client.delete_index(INDEX_NAME)
#     print(f"üóëÔ∏è Deleted index: {INDEX_NAME}")

# Azure CLI to delete the search service:
# az search service delete --name $SEARCH_SERVICE --resource-group $RESOURCE_GROUP --yes

print("Cleanup commands ready - uncomment to delete resources")

### FAISS vs Azure AI Search Comparison

| Feature | FAISS (In-Memory) | Azure AI Search |
|---------|-------------------|------------------|
| **Setup** | `pip install faiss-cpu` | Azure subscription required |
| **Cost** | Free | Free tier available, ~$75+/month for production |
| **Persistence** | None (in-memory) | Fully managed, durable |
| **Scale** | Single machine | Distributed, auto-scaling |
| **Search Types** | Vector only | Hybrid (vector + keyword + semantic) |
| **Security** | None built-in | RBAC, private endpoints, encryption |
| **Best For** | Prototyping, small datasets | Production, enterprise apps |

---
## Summary

In this module, you learned:

1. **Tokenization**: How text is split into tokens for LLM processing
2. **Embeddings**: Converting text to dense vectors for semantic similarity
3. **Vector Databases**: Using FAISS for efficient similarity search
4. **RAG Pipeline**: Retrieve ‚Üí Context ‚Üí Generate workflow
5. **Azure AI Search**: Production-ready vector search with hybrid capabilities
6. **Chunking**: Strategies for splitting long documents

**Key Takeaways for Banking:**
- RAG keeps knowledge current without retraining
- Source citations are essential for compliance
- Hybrid search (keyword + semantic) works best for banking queries
- Use metadata filtering for account-type-specific policies