# Lab 2: Amazon Bedrock Knowledge Bases & Advanced RAG

**Duration:** 60-75 minutes  
**Cost:** < $0.50 (using Claude Haiku)

## Learning Objectives
1. Build custom knowledge base with chunking strategies
2. Implement advanced RAG patterns
3. Compare retrieval strategies
4. Query decomposition for complex questions
5. Multi-turn conversational RAG

## Prerequisites
- Completion of Lab 1
- Basic understanding of embeddings and vector search

## 1. Setup and Configuration

In [None]:
# Install required packages
!pip install -q boto3 pandas numpy matplotlib seaborn scikit-learn

In [None]:
import boto3
import json
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics.pairwise import cosine_similarity
from typing import List, Dict, Any

# Initialize Bedrock client
bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name='us-east-1'
)

print("✓ Bedrock client initialized")

## 2. Sample Knowledge Base Documents

In [None]:
# Expanded AWS documentation for knowledge base
sample_documents = {
    'compute_services.txt': """AWS Compute Services Overview

Amazon EC2 provides resizable virtual servers in the cloud. You can launch instances 
with various configurations of CPU, memory, storage, and networking. EC2 supports 
multiple operating systems and offers pay-as-you-go pricing. Instance types range 
from t3.micro for small workloads to c5.24xlarge for compute-intensive applications.

AWS Lambda is a serverless compute service that runs code in response to events. 
Lambda automatically manages compute resources, scaling from a few requests per day 
to thousands per second. You only pay for compute time consumed. Maximum execution 
time is 15 minutes per invocation.

Amazon ECS is a container orchestration service supporting Docker containers. 
It allows you to run applications on a managed cluster. ECS eliminates the need 
to install and operate your own container orchestration software.""",
    
    'storage_services.txt': """AWS Storage Services Guide

Amazon S3 is object storage built to store and retrieve any amount of data. 
S3 offers 99.999999999% durability and stores data across multiple facilities. 
Common use cases include backup, archiving, data lakes, and website hosting. 
Storage classes range from S3 Standard to S3 Glacier Deep Archive.

Amazon EBS provides persistent block storage for EC2 instances. EBS volumes 
are automatically replicated within their Availability Zone. You can create 
point-in-time snapshots stored in S3. Volume types include gp3, io2, and st1.

Amazon EFS is a scalable file storage for EC2 instances. EFS automatically 
grows and shrinks as you add and remove files. It can be accessed concurrently 
from multiple EC2 instances across Availability Zones.""",
    
    'database_services.txt': """AWS Database Services Portfolio

Amazon RDS makes it easy to set up and operate relational databases in the cloud. 
RDS supports MySQL, PostgreSQL, Oracle, SQL Server, and MariaDB. Features include 
automated backups, software patching, and monitoring. Multi-AZ deployments provide 
high availability.

Amazon DynamoDB is a fast and flexible NoSQL database service. DynamoDB delivers 
single-digit millisecond performance at any scale. It is fully managed with 
built-in security, backup, and in-memory caching. Supports both key-value and 
document data models.

Amazon Aurora is a MySQL and PostgreSQL-compatible relational database. Aurora 
is up to 5x faster than MySQL and 3x faster than PostgreSQL. It provides 
commercial-grade performance at one-tenth the cost. Features automatic scaling 
from 10GB to 128TB.""",
    
    'ml_services.txt': """AWS Machine Learning and AI Services

Amazon SageMaker is a fully managed machine learning platform. It provides tools 
to build, train, and deploy ML models quickly. SageMaker supports popular frameworks 
like TensorFlow, PyTorch, and scikit-learn. Features include SageMaker Studio, 
Autopilot, and Model Monitor.

Amazon Bedrock provides foundation models from leading AI companies. Access models 
from Anthropic, AI21 Labs, Stability AI, and Amazon through one API. Bedrock enables 
building and scaling generative AI applications securely. Supports RAG and fine-tuning.

Amazon Rekognition provides image and video analysis using deep learning. It can 
detect objects, people, text, scenes, and activities. Rekognition also supports 
facial analysis and face comparison with high accuracy.""",
    
    'security_services.txt': """AWS Security and Identity Services

AWS IAM controls access to AWS resources. IAM lets you create users, groups, and 
roles with specific permissions. Multi-factor authentication adds extra security. 
IAM policies are written in JSON format.

AWS KMS manages encryption keys for your applications. KMS integrates with most 
AWS services to encrypt data. You control who can use keys and how they are used. 
Supports both symmetric and asymmetric keys.

AWS Security Hub provides a comprehensive view of security alerts and compliance. 
It aggregates findings from multiple AWS services and partner solutions. Security 
Hub automatically checks against security best practices and standards."""
}

print(f"Created {len(sample_documents)} sample documents")

## 3. Custom Knowledge Base Implementation

In [None]:
class AdvancedKnowledgeBase:
    def __init__(self, bedrock_client):
        self.client = bedrock_client
        self.documents = []
        self.embeddings = []
        self.metadata = []
    
    def get_embedding(self, text):
        """Generate embedding using Titan"""
        body = json.dumps({"inputText": text})
        response = self.client.invoke_model(
            modelId='amazon.titan-embed-text-v1',
            body=body
        )
        response_body = json.loads(response['body'].read())
        return np.array(response_body['embedding'])
    
    def chunk_text(self, text, chunk_size=300, overlap=50):
        """Split text into overlapping chunks"""
        words = text.split()
        chunks = []
        
        for i in range(0, len(words), chunk_size - overlap):
            chunk = ' '.join(words[i:i + chunk_size])
            if chunk:
                chunks.append(chunk)
        
        return chunks
    
    def add_document(self, text, metadata=None):
        """Add document with chunking"""
        chunks = self.chunk_text(text)
        
        for i, chunk in enumerate(chunks):
            embedding = self.get_embedding(chunk)
            self.documents.append(chunk)
            self.embeddings.append(embedding)
            
            chunk_metadata = metadata.copy() if metadata else {}
            chunk_metadata['chunk_id'] = i
            chunk_metadata['total_chunks'] = len(chunks)
            self.metadata.append(chunk_metadata)
        
        return len(chunks)
    
    def search(self, query, top_k=5):
        """Semantic search"""
        query_embedding = self.get_embedding(query).reshape(1, -1)
        embeddings_array = np.array(self.embeddings)
        
        similarities = cosine_similarity(query_embedding, embeddings_array)[0]
        top_indices = np.argsort(similarities)[::-1][:top_k]
        
        results = []
        for idx in top_indices:
            results.append({
                'text': self.documents[idx],
                'score': float(similarities[idx]),
                'metadata': self.metadata[idx]
            })
        
        return results
    
    def invoke_claude(self, prompt, max_tokens=512):
        """Invoke Claude Haiku"""
        body = json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": max_tokens,
            "messages": [{"role": "user", "content": prompt}]
        })
        
        response = self.client.invoke_model(
            modelId='anthropic.claude-3-haiku-20240307-v1:0',
            body=body
        )
        
        response_body = json.loads(response['body'].read())
        return response_body['content'][0]['text']

kb = AdvancedKnowledgeBase(bedrock_runtime)
print("✓ Knowledge Base initialized")

In [None]:
# Load documents
print("Loading documents into Knowledge Base...\n")

for filename, content in sample_documents.items():
    chunks_added = kb.add_document(
        text=content,
        metadata={'source': filename, 'type': 'aws_documentation'}
    )
    print(f"✓ {filename}: {chunks_added} chunks")

print(f"\nTotal chunks: {len(kb.documents)}")

## 4. Basic RAG Implementation

In [None]:
def rag_query(kb, question, top_k=3):
    """Simple RAG implementation"""
    # Retrieve
    search_results = kb.search(question, top_k=top_k)
    
    # Build context
    context_parts = []
    for i, result in enumerate(search_results):
        source = result['metadata']['source']
        context_parts.append(f"[Source {i+1}: {source}]\n{result['text']}")
    
    context = "\n\n".join(context_parts)
    
    # Generate answer
    prompt = f"""Based on the context below, answer the question.

Context:
{context}

Question: {question}

Answer (cite sources using [Source N]):"""
    
    answer = kb.invoke_claude(prompt, max_tokens=300)
    
    return {
        'question': question,
        'answer': answer,
        'sources': search_results
    }

# Test queries
test_queries = [
    "What storage service should I use for a data lake?",
    "How does Lambda pricing work?",
    "What is the difference between RDS and DynamoDB?"
]

print("Basic RAG Results:\n")
for query in test_queries:
    print(f"{'='*80}")
    result = rag_query(kb, query)
    print(f"Q: {result['question']}\n")
    print(f"A: {result['answer']}\n")

## 5. Query Decomposition

In [None]:
def decompose_query(kb, complex_query):
    """Break complex queries into sub-queries"""
    decompose_prompt = f"""Break this complex question into 2-3 simpler sub-questions.

Complex Question: {complex_query}

List the sub-questions, one per line:"""
    
    sub_questions_text = kb.invoke_claude(decompose_prompt, max_tokens=150)
    sub_questions = [q.strip() for q in sub_questions_text.strip().split('\n') if q.strip() and not q.strip().startswith('#')]
    
    # Clean up numbered lists
    sub_questions = [q.split('. ', 1)[-1] if '. ' in q else q for q in sub_questions]
    
    return sub_questions[:3]  # Limit to 3

# Test
complex_query = "How should I design a scalable web application on AWS with storage and compute?"
sub_questions = decompose_query(kb, complex_query)

print(f"Complex Query: {complex_query}\n")
print("Sub-questions:")
for i, sq in enumerate(sub_questions):
    print(f"  {i+1}. {sq}")

## 6. Multi-Turn Conversational RAG

In [None]:
class ConversationalRAG:
    def __init__(self, knowledge_base):
        self.kb = knowledge_base
        self.history = []
    
    def query(self, question):
        # Get history context
        history_context = "\n".join([
            f"Q: {h['question']}\nA: {h['answer']}"
            for h in self.history[-2:]  # Last 2 turns
        ])
        
        # Search
        search_results = self.kb.search(question, top_k=2)
        doc_context = "\n\n".join([r['text'] for r in search_results])
        
        # Build prompt
        prompt = f"""Answer based on context and conversation history.

Previous Conversation:
{history_context if history_context else 'None'}

Context:
{doc_context}

Question: {question}

Answer:"""
        
        answer = self.kb.invoke_claude(prompt, max_tokens=250)
        
        self.history.append({
            'question': question,
            'answer': answer
        })
        
        return answer

# Test conversation
conv_rag = ConversationalRAG(kb)

conversation = [
    "What is Amazon S3?",
    "What are its durability guarantees?",
    "What are common use cases for it?"
]

print("Conversational RAG Demo:\n")
for question in conversation:
    print(f"User: {question}")
    answer = conv_rag.query(question)
    print(f"Assistant: {answer}\n")

## 7. Retrieval Strategy Comparison

In [None]:
def compare_retrieval_strategies(kb, query):
    """Compare different retrieval approaches"""
    print(f"Query: {query}\n")
    
    # Strategy 1: Top-K only
    results_k3 = kb.search(query, top_k=3)
    print("Top-3 Results:")
    for i, r in enumerate(results_k3):
        print(f"  {i+1}. Score: {r['score']:.4f} - {r['metadata']['source']}")
    
    print()
    
    # Strategy 2: Top-K with threshold
    threshold = 0.7
    results_threshold = [r for r in kb.search(query, top_k=5) if r['score'] > threshold]
    print(f"Results with score > {threshold}:")
    for i, r in enumerate(results_threshold):
        print(f"  {i+1}. Score: {r['score']:.4f} - {r['metadata']['source']}")

# Test
compare_retrieval_strategies(kb, "What database should I use for high performance?")

## 8. Performance Benchmarking

In [None]:
benchmark_queries = [
    "What is EC2?",
    "Tell me about Lambda",
    "Which database for analytics?",
    "What ML services are available?"
]

results = []
print("Benchmarking RAG performance...\n")

for query in benchmark_queries:
    # Time retrieval
    start = time.time()
    search_results = kb.search(query, top_k=3)
    retrieval_time = time.time() - start
    
    # Time generation
    start = time.time()
    result = rag_query(kb, query, top_k=3)
    total_time = time.time() - start
    
    results.append({
        'query': query,
        'retrieval_time': retrieval_time,
        'total_time': total_time,
        'avg_score': np.mean([r['score'] for r in search_results])
    })
    print(f"✓ {query}")

metrics_df = pd.DataFrame(results)
print("\nPerformance Metrics:")
print(metrics_df.to_string(index=False))
print(f"\nAvg retrieval: {metrics_df['retrieval_time'].mean():.3f}s")
print(f"Avg total: {metrics_df['total_time'].mean():.3f}s")

## 9. Visualization

In [None]:
# Plot performance metrics
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Response times
axes[0].bar(range(len(metrics_df)), metrics_df['total_time'])
axes[0].set_xlabel('Query')
axes[0].set_ylabel('Time (seconds)')
axes[0].set_title('RAG Response Times', fontweight='bold')
axes[0].set_xticks(range(len(metrics_df)))
axes[0].set_xticklabels([f'Q{i+1}' for i in range(len(metrics_df))])
axes[0].grid(axis='y', alpha=0.3)

# Relevance scores
axes[1].bar(range(len(metrics_df)), metrics_df['avg_score'])
axes[1].set_xlabel('Query')
axes[1].set_ylabel('Average Relevance Score')
axes[1].set_title('Average Retrieval Scores', fontweight='bold')
axes[1].set_xticks(range(len(metrics_df)))
axes[1].set_xticklabels([f'Q{i+1}' for i in range(len(metrics_df))])
axes[1].set_ylim([0, 1])
axes[1].grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

## 10. Cost Analysis

In [None]:
# Estimate costs
usage = {
    'Embeddings': {'calls': 30, 'tokens': 50, 'cost_per_1K': 0.0001},
    'Claude Haiku': {'calls': 20, 'input_tokens': 200, 'output_tokens': 250, 
                     'input_cost': 0.25, 'output_cost': 1.25}
}

embed_cost = usage['Embeddings']['calls'] * usage['Embeddings']['tokens'] / 1000 * usage['Embeddings']['cost_per_1K']
haiku_cost = (
    usage['Claude Haiku']['calls'] * usage['Claude Haiku']['input_tokens'] / 1_000_000 * usage['Claude Haiku']['input_cost'] +
    usage['Claude Haiku']['calls'] * usage['Claude Haiku']['output_tokens'] / 1_000_000 * usage['Claude Haiku']['output_cost']
)

total = embed_cost + haiku_cost

print("Lab 2 Cost Breakdown:")
print(f"  Embeddings: ${embed_cost:.4f}")
print(f"  Claude Haiku: ${haiku_cost:.4f}")
print(f"  Total: ${total:.4f}")
print("\n✓ Well under budget!")

## Summary

In this lab, you learned:
- ✅ Building custom knowledge bases with chunking
- ✅ Implementing advanced RAG patterns
- ✅ Query decomposition for complex questions
- ✅ Conversational RAG with memory
- ✅ Comparing retrieval strategies
- ✅ Performance benchmarking

**Key Takeaways:**
1. Chunking strategy impacts retrieval quality
2. Query decomposition helps with complex questions
3. Conversation history improves multi-turn interactions
4. Multiple retrieval strategies have different trade-offs

**Next Steps:**
- Lab 3: LLM Evaluation & Agentic AI