# Demo 3: Hybrid Search RAG (Vector + Keyword)
Pattern: Advanced Retrieval with Dual Search

**Components:**
- Vector similarity search (semantic)
- Keyword/BM25 search (lexical)
- Reciprocal Rank Fusion (RRF) merging
- Query transformation

In [None]:
import boto3
import json
import numpy as np
import time
import re
from typing import List, Dict, Tuple
from collections import Counter
import math

In [None]:
# Initialize clients
bedrock_runtime = boto3.client('bedrock-runtime')
s3 = boto3.client('s3')

# Configuration
HYBRID_BUCKET = f"hybrid-search-demo-{int(time.time())}"
EMBEDDING_MODEL = "amazon.titan-embed-text-v1"
GENERATION_MODEL = "amazon.nova-pro-v1:0"

In [None]:
# Create S3 bucket
s3.create_bucket(Bucket=HYBRID_BUCKET)
print(f"Created hybrid search bucket: {HYBRID_BUCKET}")

In [None]:
# Extended document collection for hybrid search
documents = [
    {
        "id": "lambda_pricing",
        "title": "AWS Lambda Pricing Model",
        "content": "AWS Lambda pricing is based on requests and compute time. You pay $0.20 per 1M requests and $0.0000166667 per GB-second. Free tier includes 1M requests monthly.",
        "keywords": ["pricing", "cost", "billing", "free tier", "requests", "compute"]
    },
    {
        "id": "lambda_memory",
        "title": "Lambda Memory Configuration",
        "content": "Configure Lambda memory from 128 MB to 10,240 MB. CPU power scales with memory allocation. Higher memory improves performance but increases cost per execution.",
        "keywords": ["memory", "configuration", "CPU", "performance", "scaling", "allocation"]
    },
    {
        "id": "lambda_timeout",
        "title": "Lambda Timeout Settings",
        "content": "Lambda maximum execution time is 15 minutes (900 seconds). Default timeout is 3 seconds. Configure timeout based on function requirements plus buffer time.",
        "keywords": ["timeout", "execution", "duration", "limits", "configuration", "seconds"]
    },
    {
        "id": "lambda_coldstart",
        "title": "Lambda Cold Start Optimization",
        "content": "Cold starts add latency when Lambda initializes new execution environments. Use provisioned concurrency, optimize package size, and minimize initialization code to reduce cold starts.",
        "keywords": ["cold start", "latency", "initialization", "provisioned concurrency", "optimization", "performance"]
    },
    {
        "id": "lambda_vpc",
        "title": "Lambda VPC Configuration",
        "content": "Lambda functions can access VPC resources like RDS databases and private subnets. VPC configuration adds cold start latency. Use VPC endpoints for AWS services.",
        "keywords": ["VPC", "networking", "RDS", "subnets", "endpoints", "private", "security"]
    },
    {
        "id": "lambda_monitoring",
        "title": "Lambda Monitoring and Logging",
        "content": "Monitor Lambda with CloudWatch metrics: Duration, Invocations, Errors, Throttles. Enable X-Ray tracing for distributed systems. Use structured logging for better analysis.",
        "keywords": ["monitoring", "CloudWatch", "metrics", "X-Ray", "tracing", "logging", "errors"]
    },
    {
        "id": "lambda_security",
        "title": "Lambda Security Best Practices",
        "content": "Use IAM roles with least privilege. Store secrets in AWS Secrets Manager. Enable encryption at rest and in transit. Validate input data to prevent injection attacks.",
        "keywords": ["security", "IAM", "secrets", "encryption", "validation", "least privilege", "injection"]
    },
    {
        "id": "lambda_deployment",
        "title": "Lambda Deployment Strategies",
        "content": "Deploy Lambda using blue/green, canary, or all-at-once strategies. Use AWS CodeDeploy for automated deployments. Implement proper testing and rollback procedures.",
        "keywords": ["deployment", "blue/green", "canary", "CodeDeploy", "testing", "rollback", "automation"]
    }
]

print(f"Loaded {len(documents)} documents for hybrid search")

In [None]:
def get_embedding(text: str) -> List[float]:
    """Get embedding using Titan model"""
    response = bedrock_runtime.invoke_model(
        modelId=EMBEDDING_MODEL,
        body=json.dumps({"inputText": text})
    )
    return json.loads(response['body'].read())['embedding']

def preprocess_text(text: str) -> List[str]:
    """Preprocess text for keyword search"""
    # Convert to lowercase and split into tokens
    tokens = re.findall(r'\b\w+\b', text.lower())
    # Remove common stop words
    stop_words = {'the', 'a', 'an', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by', 'is', 'are', 'was', 'were'}
    return [token for token in tokens if token not in stop_words and len(token) > 2]

In [None]:
# Create hybrid index (vector + keyword)
print("Creating hybrid index...")

hybrid_index = {
    "documents": {},
    "vectors": {},
    "keyword_index": {},
    "doc_frequencies": {},
    "total_docs": len(documents)
}

# Build inverted index for keyword search
for doc in documents:
    doc_id = doc["id"]
    content = doc["content"]
    
    # Store document
    hybrid_index["documents"][doc_id] = doc
    
    # Create vector embedding
    embedding = get_embedding(content)
    hybrid_index["vectors"][doc_id] = embedding
    
    # Process text for keyword search
    tokens = preprocess_text(content + " " + " ".join(doc["keywords"]))
    token_counts = Counter(tokens)
    
    # Build inverted index
    for token, count in token_counts.items():
        if token not in hybrid_index["keyword_index"]:
            hybrid_index["keyword_index"][token] = {}
        hybrid_index["keyword_index"][token][doc_id] = count
        
        # Track document frequency
        if token not in hybrid_index["doc_frequencies"]:
            hybrid_index["doc_frequencies"][token] = 0
        hybrid_index["doc_frequencies"][token] += 1
    
    print(f"Indexed {doc_id}")
    time.sleep(0.1)

# Store index in S3
s3.put_object(
    Bucket=HYBRID_BUCKET,
    Key="hybrid_index.json",
    Body=json.dumps(hybrid_index)
)

print("Hybrid index created and stored in S3")

In [None]:
def cosine_similarity(a: List[float], b: List[float]) -> float:
    """Calculate cosine similarity"""
    a_np = np.array(a)
    b_np = np.array(b)
    return np.dot(a_np, b_np) / (np.linalg.norm(a_np) * np.linalg.norm(b_np))

def vector_search(query: str, top_k: int = 5) -> List[Tuple[str, float]]:
    """Perform vector similarity search"""
    query_embedding = get_embedding(query)
    
    similarities = []
    for doc_id, doc_embedding in hybrid_index["vectors"].items():
        similarity = cosine_similarity(query_embedding, doc_embedding)
        similarities.append((doc_id, similarity))
    
    similarities.sort(key=lambda x: x[1], reverse=True)
    return similarities[:top_k]

def bm25_score(query_tokens: List[str], doc_id: str, k1: float = 1.5, b: float = 0.75) -> float:
    """Calculate BM25 score for document"""
    doc = hybrid_index["documents"][doc_id]
    doc_tokens = preprocess_text(doc["content"] + " " + " ".join(doc["keywords"]))
    doc_length = len(doc_tokens)
    
    # Average document length
    avg_doc_length = sum(len(preprocess_text(d["content"])) for d in hybrid_index["documents"].values()) / len(hybrid_index["documents"])
    
    score = 0.0
    doc_token_counts = Counter(doc_tokens)
    
    for token in query_tokens:
        if token in hybrid_index["keyword_index"] and doc_id in hybrid_index["keyword_index"][token]:
            tf = doc_token_counts[token]  # Term frequency
            df = hybrid_index["doc_frequencies"][token]  # Document frequency
            idf = math.log((hybrid_index["total_docs"] - df + 0.5) / (df + 0.5))  # Inverse document frequency
            
            # BM25 formula
            numerator = tf * (k1 + 1)
            denominator = tf + k1 * (1 - b + b * (doc_length / avg_doc_length))
            score += idf * (numerator / denominator)
    
    return score

def keyword_search(query: str, top_k: int = 5) -> List[Tuple[str, float]]:
    """Perform BM25 keyword search"""
    query_tokens = preprocess_text(query)
    
    scores = []
    for doc_id in hybrid_index["documents"].keys():
        score = bm25_score(query_tokens, doc_id)
        scores.append((doc_id, score))
    
    scores.sort(key=lambda x: x[1], reverse=True)
    return scores[:top_k]

In [None]:
def reciprocal_rank_fusion(vector_results: List[Tuple[str, float]], 
                          keyword_results: List[Tuple[str, float]], 
                          k: int = 60) -> List[Tuple[str, float]]:
    """Combine results using Reciprocal Rank Fusion"""
    
    # Create rank mappings
    vector_ranks = {doc_id: rank + 1 for rank, (doc_id, _) in enumerate(vector_results)}
    keyword_ranks = {doc_id: rank + 1 for rank, (doc_id, _) in enumerate(keyword_results)}
    
    # Get all unique documents
    all_docs = set(vector_ranks.keys()) | set(keyword_ranks.keys())
    
    # Calculate RRF scores
    rrf_scores = {}
    for doc_id in all_docs:
        vector_score = 1 / (k + vector_ranks.get(doc_id, len(vector_results) + 1))
        keyword_score = 1 / (k + keyword_ranks.get(doc_id, len(keyword_results) + 1))
        rrf_scores[doc_id] = vector_score + keyword_score
    
    # Sort by RRF score
    sorted_results = sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)
    return sorted_results

def transform_query(query: str) -> str:
    """Transform query for better retrieval"""
    # Simple query expansion - add related terms
    expansions = {
        "cost": "pricing billing charges",
        "performance": "speed latency optimization",
        "security": "IAM encryption permissions",
        "monitoring": "CloudWatch metrics logging"
    }
    
    expanded_query = query
    for term, expansion in expansions.items():
        if term in query.lower():
            expanded_query += f" {expansion}"
    
    return expanded_query

In [None]:
def hybrid_search(query: str, top_k: int = 3) -> List[Dict]:
    """Perform hybrid search combining vector and keyword search"""
    
    # Transform query
    expanded_query = transform_query(query)
    
    print(f"Original query: {query}")
    if expanded_query != query:
        print(f"Expanded query: {expanded_query}")
    
    # Perform both searches
    vector_results = vector_search(expanded_query, top_k=5)
    keyword_results = keyword_search(expanded_query, top_k=5)
    
    print(f"\nVector search results:")
    for i, (doc_id, score) in enumerate(vector_results[:3], 1):
        print(f"  {i}. {doc_id} (similarity: {score:.3f})")
    
    print(f"\nKeyword search results:")
    for i, (doc_id, score) in enumerate(keyword_results[:3], 1):
        print(f"  {i}. {doc_id} (BM25: {score:.3f})")
    
    # Combine using RRF
    fused_results = reciprocal_rank_fusion(vector_results, keyword_results)
    
    print(f"\nHybrid (RRF) results:")
    final_results = []
    for i, (doc_id, rrf_score) in enumerate(fused_results[:top_k], 1):
        doc = hybrid_index["documents"][doc_id]
        print(f"  {i}. {doc_id} (RRF: {rrf_score:.3f})")
        final_results.append({
            'id': doc_id,
            'title': doc['title'],
            'content': doc['content'],
            'rrf_score': rrf_score
        })
    
    return final_results

In [None]:
def generate_answer(query: str, context_docs: List[Dict]) -> str:
    """Generate answer using Nova Pro with hybrid search context"""
    
    context_parts = []
    for doc in context_docs:
        context_parts.append(f"Title: {doc['title']}\nContent: {doc['content']}")
    
    context = "\n\n".join(context_parts)
    
    prompt = f"""Based on the following context retrieved through hybrid search (vector + keyword), answer the question accurately.

Context:
{context}

Question: {query}

Answer:"""
    
    response = bedrock_runtime.invoke_model(
        modelId=GENERATION_MODEL,
        body=json.dumps({
            "messages": [{
                "role": "user",
                "content": [{"text": prompt}]
            }],
            "inferenceConfig": {
                "maxTokens": 300,
                "temperature": 0.1
            }
        })
    )
    
    result = json.loads(response['body'].read())
    return result['output']['message']['content'][0]['text']

In [None]:
def query_hybrid_rag(question: str) -> Dict:
    """Complete hybrid RAG pipeline"""
    
    print(f"\n{'='*60}")
    print(f"HYBRID SEARCH RAG QUERY")
    print(f"{'='*60}")
    
    # Perform hybrid search
    retrieved_docs = hybrid_search(question, top_k=3)
    
    # Generate answer
    answer = generate_answer(question, retrieved_docs)
    
    print(f"\nFinal Answer: {answer}")
    print(f"\n{'='*60}\n")
    
    return {
        'question': question,
        'answer': answer,
        'sources': [doc['title'] for doc in retrieved_docs],
        'rrf_scores': [doc['rrf_score'] for doc in retrieved_docs]
    }

In [None]:
# Test hybrid search RAG
test_questions = [
    "What are the Lambda pricing costs?",
    "How to optimize Lambda performance?",
    "Lambda security best practices?",
    "How to monitor Lambda functions?",
    "Lambda deployment strategies?"
]

results = []
for question in test_questions:
    result = query_hybrid_rag(question)
    results.append(result)

In [None]:
# Compare with vector-only search
print("COMPARISON: HYBRID vs VECTOR-ONLY SEARCH")
print("="*50)

sample_query = "Lambda cost optimization"

print(f"Query: {sample_query}\n")

# Vector-only results
vector_only = vector_search(sample_query, top_k=3)
print("Vector-only search:")
for i, (doc_id, score) in enumerate(vector_only, 1):
    title = hybrid_index["documents"][doc_id]["title"]
    print(f"  {i}. {title} (similarity: {score:.3f})")

# Keyword-only results
keyword_only = keyword_search(sample_query, top_k=3)
print("\nKeyword-only search:")
for i, (doc_id, score) in enumerate(keyword_only, 1):
    title = hybrid_index["documents"][doc_id]["title"]
    print(f"  {i}. {title} (BM25: {score:.3f})")

# Hybrid results
hybrid_only = hybrid_search(sample_query, top_k=3)
print("\nHybrid search combines both approaches for better results!")

## Hybrid Search Benefits

### Vector Search (Semantic)
✅ **Understands meaning and context**  
✅ **Handles synonyms and paraphrasing**  
✅ **Good for conceptual queries**  
❌ **May miss exact keyword matches**  
❌ **Less precise for specific terms**  

### Keyword Search (Lexical)
✅ **Exact term matching**  
✅ **Fast and interpretable**  
✅ **Good for specific terminology**  
❌ **Misses semantic relationships**  
❌ **Struggles with synonyms**  

### Hybrid Search (Best of Both)
✅ **Combines semantic understanding with exact matching**  
✅ **Reciprocal Rank Fusion balances both approaches**  
✅ **Query transformation expands search terms**  
✅ **Better overall retrieval quality**  

### When to Use Hybrid Search:
- **Technical documentation** with specific terminology
- **Mixed query types** (conceptual + specific)
- **High-quality retrieval requirements**
- **Domain-specific knowledge bases**

In [None]:
print(f"\nDemo complete! Hybrid search bucket: {HYBRID_BUCKET}")
print("Hybrid search provides superior retrieval by combining vector and keyword approaches.")