# Chunking Strategies Demo: Impact on RAG Performance
Exploring different chunking mechanisms and their effect on retrieval quality

In [None]:
import boto3
import json
import time
from typing import List, Dict

In [None]:
# Initialize clients
bedrock_agent = boto3.client('bedrock-agent')
s3 = boto3.client('s3')
iam = boto3.client('iam')
sts = boto3.client('sts')

In [None]:
# Configuration
BASE_BUCKET = f"chunking-demo-{int(time.time())}"
ROLE_NAME = "ChunkingDemoRole"
EMBEDDING_MODEL = "amazon.titan-embed-text-v1"
GENERATION_MODEL = "amazon.nova-pro-v1:0"

## Sample Document: Technical Manual
We'll use a structured technical document to demonstrate chunking impact

In [None]:
# Create comprehensive technical document
technical_manual = """
AWS LAMBDA DEPLOYMENT GUIDE

CHAPTER 1: INTRODUCTION
AWS Lambda is a serverless compute service that runs code without provisioning servers. Lambda automatically scales applications by running code in response to triggers. The service charges only for compute time consumed.

Key benefits include:
- No server management required
- Automatic scaling from zero to thousands of concurrent executions
- Pay-per-request pricing model
- Built-in fault tolerance and security

CHAPTER 2: FUNCTION CONFIGURATION
Lambda functions require specific configuration parameters for optimal performance.

Memory Configuration:
Memory allocation ranges from 128 MB to 10,240 MB in 1 MB increments. CPU power scales proportionally with memory. For CPU-intensive tasks, allocate more memory to get additional CPU power.

Timeout Settings:
Maximum execution time is 15 minutes (900 seconds). Default timeout is 3 seconds. Set timeout based on expected execution duration plus buffer time.

Environment Variables:
Use environment variables for configuration values that change between environments. Maximum size is 4 KB for all variables combined. Avoid storing sensitive data in plain text.

CHAPTER 3: DEPLOYMENT STRATEGIES
Lambda supports multiple deployment approaches for different use cases.

Blue/Green Deployment:
AWS CodeDeploy automates blue/green deployments for Lambda. Traffic shifts gradually from old version to new version. Rollback is automatic if CloudWatch alarms trigger.

Canary Deployment:
Route small percentage of traffic to new version initially. Monitor metrics and gradually increase traffic. Typical canary percentages are 5%, 10%, or 25%.

All-at-Once Deployment:
Immediate switch to new version for all traffic. Fastest deployment but highest risk. Use only for non-critical applications or during maintenance windows.

CHAPTER 4: MONITORING AND TROUBLESHOOTING
Effective monitoring is crucial for Lambda function reliability.

CloudWatch Metrics:
Key metrics include Duration, Invocations, Errors, Throttles, and ConcurrentExecutions. Set up alarms for error rates above 1% and duration exceeding 80% of timeout.

X-Ray Tracing:
Enable X-Ray for distributed tracing across services. Trace requests through Lambda, API Gateway, DynamoDB, and other AWS services. Identify performance bottlenecks and errors.

Log Analysis:
Lambda automatically sends logs to CloudWatch Logs. Use structured logging with JSON format. Include correlation IDs for request tracking across services.

Common Issues:
Cold start latency affects first invocation after idle period. Provisioned concurrency eliminates cold starts for critical functions. Memory errors occur when function exceeds allocated memory.

CHAPTER 5: SECURITY BEST PRACTICES
Security considerations are paramount for serverless applications.

IAM Roles and Policies:
Each Lambda function requires an execution role with minimum necessary permissions. Use AWS managed policies when possible. Create custom policies for specific resource access.

VPC Configuration:
Lambda functions can run inside VPC for private resource access. VPC configuration adds cold start latency. Use VPC endpoints for AWS service access without internet gateway.

Secrets Management:
Store sensitive data in AWS Secrets Manager or Systems Manager Parameter Store. Never hardcode credentials in function code. Use IAM roles for service-to-service authentication.

Input Validation:
Validate all input data to prevent injection attacks. Sanitize user input before processing. Use AWS WAF for API Gateway protection against common web exploits.
"""

## Chunking Strategy 1: Fixed Size (Default Bedrock)
Default chunking with fixed character limits

In [None]:
def create_kb_with_chunking(strategy_name: str, chunking_config: Dict) -> str:
    """Create Knowledge Base with specific chunking configuration"""
    
    bucket_name = f"{BASE_BUCKET}-{strategy_name.lower()}"
    kb_name = f"lambda-guide-{strategy_name.lower()}"
    
    # Create S3 bucket
    s3.create_bucket(Bucket=bucket_name)
    
    # Upload document
    s3.put_object(
        Bucket=bucket_name,
        Key="lambda-guide.txt",
        Body=technical_manual.encode('utf-8')
    )
    
    # Get/Create IAM role
    account_id = sts.get_caller_identity()['Account']
    role_arn = f"arn:aws:iam::{account_id}:role/{ROLE_NAME}"
    
    # Create Knowledge Base with chunking config
    kb_config = {
        "name": kb_name,
        "description": f"Lambda guide with {strategy_name} chunking",
        "roleArn": role_arn,
        "knowledgeBaseConfiguration": {
            "type": "VECTOR",
            "vectorKnowledgeBaseConfiguration": {
                "embeddingModelArn": f"arn:aws:bedrock:us-east-1::foundation-model/{EMBEDDING_MODEL}"
            }
        },
        "storageConfiguration": {
            "type": "OPENSEARCH_SERVERLESS",
            "opensearchServerlessConfiguration": {
                "collectionArn": "",
                "vectorIndexName": "bedrock-knowledge-base-default-index",
                "fieldMapping": {
                    "vectorField": "bedrock-knowledge-base-default-vector",
                    "textField": "AMAZON_BEDROCK_TEXT_CHUNK",
                    "metadataField": "AMAZON_BEDROCK_METADATA"
                }
            }
        }
    }
    
    kb_response = bedrock_agent.create_knowledge_base(**kb_config)
    kb_id = kb_response['knowledgeBase']['knowledgeBaseId']
    
    # Create Data Source with chunking configuration
    ds_config = {
        "knowledgeBaseId": kb_id,
        "name": f"{strategy_name.lower()}-datasource",
        "dataSourceConfiguration": {
            "type": "S3",
            "s3Configuration": {
                "bucketArn": f"arn:aws:s3:::{bucket_name}"
            }
        },
        "vectorIngestionConfiguration": {
            "chunkingConfiguration": chunking_config
        }
    }
    
    ds_response = bedrock_agent.create_data_source(**ds_config)
    ds_id = ds_response['dataSource']['dataSourceId']
    
    # Start ingestion
    bedrock_agent.start_ingestion_job(
        knowledgeBaseId=kb_id,
        dataSourceId=ds_id
    )
    
    print(f"Created {strategy_name} KB: {kb_id}")
    return kb_id

In [None]:
# Create IAM role first
trust_policy = {
    "Version": "2012-10-17",
    "Statement": [{
        "Effect": "Allow",
        "Principal": {"Service": "bedrock.amazonaws.com"},
        "Action": "sts:AssumeRole"
    }]
}

role_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": ["s3:GetObject", "s3:ListBucket"],
            "Resource": [f"arn:aws:s3:::{BASE_BUCKET}*", f"arn:aws:s3:::{BASE_BUCKET}*/*"]
        },
        {
            "Effect": "Allow",
            "Action": ["bedrock:InvokeModel"],
            "Resource": "*"
        }
    ]
}

try:
    iam.create_role(
        RoleName=ROLE_NAME,
        AssumeRolePolicyDocument=json.dumps(trust_policy)
    )
    iam.put_role_policy(
        RoleName=ROLE_NAME,
        PolicyName="ChunkingDemoPolicy",
        PolicyDocument=json.dumps(role_policy)
    )
    print("Created IAM role")
    time.sleep(10)
except Exception as e:
    print(f"Role exists or error: {e}")

## Strategy 1: Default Fixed Size Chunking

In [None]:
# Default chunking (300 tokens, 20% overlap)
default_chunking = {
    "chunkingStrategy": "FIXED_SIZE",
    "fixedSizeChunkingConfiguration": {
        "maxTokens": 300,
        "overlapPercentage": 20
    }
}

kb_default = create_kb_with_chunking("Default", default_chunking)

## Strategy 2: Small Chunks (High Precision)

In [None]:
# Small chunks for precise retrieval
small_chunking = {
    "chunkingStrategy": "FIXED_SIZE",
    "fixedSizeChunkingConfiguration": {
        "maxTokens": 150,
        "overlapPercentage": 30
    }
}

kb_small = create_kb_with_chunking("Small", small_chunking)

## Strategy 3: Large Chunks (High Context)

In [None]:
# Large chunks for more context
large_chunking = {
    "chunkingStrategy": "FIXED_SIZE",
    "fixedSizeChunkingConfiguration": {
        "maxTokens": 500,
        "overlapPercentage": 10
    }
}

kb_large = create_kb_with_chunking("Large", large_chunking)

## Strategy 4: Semantic Chunking (Hierarchy-Aware)

In [None]:
# Semantic chunking respects document structure
semantic_chunking = {
    "chunkingStrategy": "SEMANTIC",
    "semanticChunkingConfiguration": {
        "maxTokens": 300,
        "bufferSize": 0,
        "breakpointPercentileThreshold": 95
    }
}

kb_semantic = create_kb_with_chunking("Semantic", semantic_chunking)

## Wait for Ingestion Completion

In [None]:
def wait_for_ingestion(kb_id: str, strategy_name: str):
    """Wait for ingestion to complete"""
    print(f"Waiting for {strategy_name} ingestion...")
    
    # Get data source ID
    ds_list = bedrock_agent.list_data_sources(knowledgeBaseId=kb_id)
    ds_id = ds_list['dataSourceSummaries'][0]['dataSourceId']
    
    # Get latest job
    jobs = bedrock_agent.list_ingestion_jobs(
        knowledgeBaseId=kb_id,
        dataSourceId=ds_id
    )
    job_id = jobs['ingestionJobSummaries'][0]['ingestionJobId']
    
    while True:
        job_status = bedrock_agent.get_ingestion_job(
            knowledgeBaseId=kb_id,
            dataSourceId=ds_id,
            ingestionJobId=job_id
        )
        status = job_status['ingestionJob']['status']
        
        if status in ['COMPLETE', 'FAILED']:
            print(f"{strategy_name}: {status}")
            break
        time.sleep(15)

# Wait for all ingestions
strategies = [
    (kb_default, "Default"),
    (kb_small, "Small"),
    (kb_large, "Large"),
    (kb_semantic, "Semantic")
]

for kb_id, name in strategies:
    wait_for_ingestion(kb_id, name)

## Performance Comparison: Test Queries

In [None]:
def query_kb(kb_id: str, question: str) -> Dict:
    """Query knowledge base and return response with metadata"""
    response = bedrock_agent.retrieve_and_generate(
        input={'text': question},
        retrieveAndGenerateConfiguration={
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': kb_id,
                'modelArn': f'arn:aws:bedrock:us-east-1::foundation-model/{GENERATION_MODEL}'
            }
        }
    )
    
    return {
        'answer': response['output']['text'],
        'citations': response.get('citations', []),
        'source_count': len(response.get('citations', []))
    }

# Test questions targeting different aspects
test_questions = [
    "What is the maximum memory allocation for Lambda functions?",
    "Explain the difference between blue/green and canary deployment strategies",
    "What are the key CloudWatch metrics to monitor for Lambda?",
    "How should sensitive data be handled in Lambda functions?"
]

In [None]:
# Compare chunking strategies
results = {}

for question in test_questions:
    print(f"\n{'='*60}")
    print(f"QUESTION: {question}")
    print(f"{'='*60}")
    
    question_results = {}
    
    for kb_id, strategy_name in strategies:
        print(f"\n--- {strategy_name.upper()} CHUNKING ---")
        
        try:
            result = query_kb(kb_id, question)
            question_results[strategy_name] = result
            
            print(f"Sources used: {result['source_count']}")
            print(f"Answer: {result['answer'][:200]}...")
            
        except Exception as e:
            print(f"Error: {e}")
            question_results[strategy_name] = {'error': str(e)}
    
    results[question] = question_results

## Analysis: Chunking Strategy Impact

In [None]:
# Analyze results
print("\n" + "="*80)
print("CHUNKING STRATEGY ANALYSIS")
print("="*80)

strategy_stats = {name: {'total_sources': 0, 'questions': 0} for _, name in strategies}

for question, question_results in results.items():
    print(f"\nQuestion: {question}")
    print("-" * 50)
    
    for strategy_name in ['Default', 'Small', 'Large', 'Semantic']:
        if strategy_name in question_results and 'source_count' in question_results[strategy_name]:
            sources = question_results[strategy_name]['source_count']
            strategy_stats[strategy_name]['total_sources'] += sources
            strategy_stats[strategy_name]['questions'] += 1
            print(f"{strategy_name:10}: {sources} sources")

print("\n" + "="*50)
print("AVERAGE SOURCES PER STRATEGY")
print("="*50)

for strategy_name, stats in strategy_stats.items():
    if stats['questions'] > 0:
        avg_sources = stats['total_sources'] / stats['questions']
        print(f"{strategy_name:10}: {avg_sources:.1f} sources on average")

## Key Insights: Why Chunking Matters

### 1. **Small Chunks (150 tokens)**
- **Pros**: High precision, specific information retrieval
- **Cons**: May lose context, require more sources for complete answers
- **Best for**: Factual queries, specific data points

### 2. **Large Chunks (500 tokens)**
- **Pros**: Rich context, comprehensive information
- **Cons**: May include irrelevant information, less precise
- **Best for**: Complex explanations, conceptual questions

### 3. **Default Chunks (300 tokens)**
- **Pros**: Balanced approach, good for most use cases
- **Cons**: May not be optimal for specific document types
- **Best for**: General-purpose RAG applications

### 4. **Semantic Chunks**
- **Pros**: Respects document structure, maintains logical boundaries
- **Cons**: Variable chunk sizes, may be computationally expensive
- **Best for**: Structured documents, technical manuals

### Chunking Strategy Selection Guidelines:
1. **Document Type**: Technical docs → Semantic, Articles → Fixed
2. **Query Type**: Factual → Small chunks, Explanatory → Large chunks
3. **Context Requirements**: High context → Large chunks, Precision → Small chunks
4. **Performance**: Speed → Fixed size, Quality → Semantic

In [None]:
print("\nDemo complete! Knowledge Base IDs:")
for kb_id, strategy_name in strategies:
    print(f"{strategy_name}: {kb_id}")