![Redis](https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120)

# Performance Optimization Techniques

## Why Performance Optimization Matters

**The Problem:** Slow agents frustrate users and waste resources.

**Real-World Impact:**
```
Slow Response (5+ seconds):
‚Ä¢ 40% of users abandon the conversation
‚Ä¢ Poor user experience
‚Ä¢ Higher server costs

Fast Response (<2 seconds):
‚Ä¢ Users stay engaged
‚Ä¢ Better satisfaction scores
‚Ä¢ Lower infrastructure costs
```

**Why This Matters:**
- ‚ö° **User Experience**: Fast responses keep users engaged
- üí∞ **Cost Efficiency**: Faster = fewer resources needed
- üìà **Scalability**: Optimized systems handle more users
- üéØ **Competitive Advantage**: Speed is a feature

## Learning Objectives

You'll learn simple techniques to:
1. **Measure performance** - Track response times
2. **Cache intelligently** - Avoid repeated work
3. **Optimize queries** - Faster database operations
4. **Batch operations** - Process multiple requests efficiently

## Setup: Simple Performance Measurement

Let's build simple tools to measure and track performance.

In [None]:
# Simple performance measurement - no classes needed
import time
import os
from datetime import datetime
from collections import defaultdict
from dotenv import load_dotenv
load_dotenv()

# Global performance tracking
performance_stats = {
    'response_times': [],
    'operation_times': defaultdict(list),
    'cache_hits': 0,
    'cache_misses': 0,
    'total_requests': 0
}

def measure_time(operation_name: str = "operation"):
    """Simple decorator to measure execution time"""
    def decorator(func):
        def wrapper(*args, **kwargs):
            start_time = time.time()
            result = func(*args, **kwargs)
            end_time = time.time()
            
            execution_time = end_time - start_time
            performance_stats['operation_times'][operation_name].append(execution_time)
            
            return result
        return wrapper
    return decorator

def track_response_time(start_time: float, end_time: float):
    """Track overall response time"""
    response_time = end_time - start_time
    performance_stats['response_times'].append(response_time)
    performance_stats['total_requests'] += 1
    return response_time

def get_performance_summary():
    """Get performance statistics summary"""
    if not performance_stats['response_times']:
        return "No performance data available"
    
    response_times = performance_stats['response_times']
    avg_response = sum(response_times) / len(response_times)
    min_response = min(response_times)
    max_response = max(response_times)
    
    # Calculate percentiles
    sorted_times = sorted(response_times)
    p95_index = int(len(sorted_times) * 0.95)
    p95_response = sorted_times[p95_index] if p95_index < len(sorted_times) else max_response
    
    cache_total = performance_stats['cache_hits'] + performance_stats['cache_misses']
    cache_hit_rate = (performance_stats['cache_hits'] / cache_total * 100) if cache_total > 0 else 0
    
    return {
        'total_requests': performance_stats['total_requests'],
        'avg_response_time': avg_response,
        'min_response_time': min_response,
        'max_response_time': max_response,
        'p95_response_time': p95_response,
        'cache_hit_rate': cache_hit_rate,
        'cache_hits': performance_stats['cache_hits'],
        'cache_misses': performance_stats['cache_misses']
    }

# Test performance measurement
@measure_time("database_query")
def simulate_database_query(delay: float = 0.1):
    """Simulate a database query with artificial delay"""
    time.sleep(delay)
    return "Query result"

@measure_time("llm_call")
def simulate_llm_call(delay: float = 0.5):
    """Simulate an LLM API call with artificial delay"""
    time.sleep(delay)
    return "LLM response"

# Test the measurement system
print("‚ö° Performance Measurement System")
print("=" * 40)

# Simulate some operations
for i in range(3):
    start = time.time()
    
    # Simulate agent operations
    db_result = simulate_database_query(0.05)  # Fast query
    llm_result = simulate_llm_call(0.3)        # Slower LLM call
    
    end = time.time()
    response_time = track_response_time(start, end)
    
    print(f"Request {i+1}: {response_time:.3f}s")

# Show performance summary
summary = get_performance_summary()
print(f"\nüìä Performance Summary:")
print(f"   Total requests: {summary['total_requests']}")
print(f"   Average response: {summary['avg_response_time']:.3f}s")
print(f"   Min response: {summary['min_response_time']:.3f}s")
print(f"   Max response: {summary['max_response_time']:.3f}s")
print(f"   95th percentile: {summary['p95_response_time']:.3f}s")

# Show operation breakdown
print(f"\nüîç Operation Breakdown:")
for operation, times in performance_stats['operation_times'].items():
    avg_time = sum(times) / len(times)
    print(f"   {operation}: {avg_time:.3f}s average")

## Concept 1: Simple Caching

Let's implement simple caching to avoid repeated work.

In [None]:
# Simple caching implementation
import hashlib
import json

# Simple in-memory cache (in production, use Redis)
simple_cache = {}

def create_cache_key(data) -> str:
    """Create a cache key from data"""
    # Convert data to string and hash it
    data_str = json.dumps(data, sort_keys=True) if isinstance(data, dict) else str(data)
    return hashlib.md5(data_str.encode()).hexdigest()[:16]

def cache_get(key: str):
    """Get value from cache"""
    if key in simple_cache:
        performance_stats['cache_hits'] += 1
        return simple_cache[key]
    else:
        performance_stats['cache_misses'] += 1
        return None

def cache_set(key: str, value, ttl: int = 300):
    """Set value in cache with TTL (simplified - no actual expiration)"""
    simple_cache[key] = {
        'value': value,
        'timestamp': time.time(),
        'ttl': ttl
    }

def cached_course_search(query: str, limit: int = 5):
    """Course search with caching"""
    # Create cache key
    cache_key = create_cache_key({'query': query, 'limit': limit})
    
    # Check cache first
    cached_result = cache_get(cache_key)
    if cached_result:
        return cached_result['value']
    
    # Simulate expensive course search
    time.sleep(0.2)  # Simulate database query time
    
    # Mock course results
    if 'machine learning' in query.lower():
        results = [
            {'code': 'CS301', 'title': 'Machine Learning', 'description': 'Intro to ML algorithms'},
            {'code': 'CS302', 'title': 'Deep Learning', 'description': 'Neural networks and deep learning'}
        ]
    elif 'redis' in query.lower():
        results = [
            {'code': 'RU301', 'title': 'Vector Search', 'description': 'Advanced Redis vector operations'}
        ]
    else:
        results = [{'code': 'GEN101', 'title': 'General Course', 'description': 'General course description'}]
    
    # Cache the result
    cache_set(cache_key, results)
    
    return results

def cached_llm_response(prompt: str):
    """LLM response with caching"""
    cache_key = create_cache_key(prompt)
    
    # Check cache
    cached_result = cache_get(cache_key)
    if cached_result:
        return cached_result['value']
    
    # Simulate expensive LLM call
    time.sleep(0.5)  # Simulate API call time
    
    # Mock LLM response
    response = f"This is a response to: {prompt[:50]}..."
    
    # Cache the result
    cache_set(cache_key, response)
    
    return response

# Test caching performance
print("üöÄ Caching Performance Test")
print("=" * 40)

# Test course search caching
queries = ['machine learning courses', 'redis courses', 'machine learning courses']  # Repeat first query

for i, query in enumerate(queries, 1):
    start = time.time()
    results = cached_course_search(query)
    end = time.time()
    
    print(f"Query {i}: '{query}'")
    print(f"   Time: {end - start:.3f}s")
    print(f"   Results: {len(results)} courses")
    print(f"   Cache status: {'HIT' if end - start < 0.1 else 'MISS'}")
    print()

# Test LLM response caching
prompts = [
    "What are the best machine learning courses?",
    "Explain neural networks",
    "What are the best machine learning courses?"  # Repeat first prompt
]

print("ü§ñ LLM Response Caching Test:")
for i, prompt in enumerate(prompts, 1):
    start = time.time()
    response = cached_llm_response(prompt)
    end = time.time()
    
    print(f"Prompt {i}: Time {end - start:.3f}s, Cache: {'HIT' if end - start < 0.1 else 'MISS'}")

# Show cache statistics
cache_total = performance_stats['cache_hits'] + performance_stats['cache_misses']
hit_rate = (performance_stats['cache_hits'] / cache_total * 100) if cache_total > 0 else 0

print(f"\nüìä Cache Statistics:")
print(f"   Cache hits: {performance_stats['cache_hits']}")
print(f"   Cache misses: {performance_stats['cache_misses']}")
print(f"   Hit rate: {hit_rate:.1f}%")
print(f"   Cache size: {len(simple_cache)} entries")

print(f"\nüí° Caching Benefits:")
if hit_rate > 0:
    print(f"   ‚Ä¢ {hit_rate:.1f}% of requests served from cache")
    print(f"   ‚Ä¢ Estimated time saved: {performance_stats['cache_hits'] * 0.3:.1f}s")
    print(f"   ‚Ä¢ Reduced API costs and server load")
else:
    print("   ‚Ä¢ No cache hits yet - benefits will show with repeated queries")

## Concept 2: Batch Processing and Async Operations

Let's implement simple batch processing for better performance.

In [None]:
# Simple batch processing and async operations
import asyncio
from typing import List, Dict, Any

def batch_process_queries(queries: List[str], batch_size: int = 3):
    """Process multiple queries in batches"""
    results = []
    
    print(f"üîÑ Processing {len(queries)} queries in batches of {batch_size}")
    
    for i in range(0, len(queries), batch_size):
        batch = queries[i:i + batch_size]
        batch_start = time.time()
        
        print(f"   Batch {i//batch_size + 1}: {len(batch)} queries")
        
        # Process batch (simulate parallel processing)
        batch_results = []
        for query in batch:
            # Simulate processing time (reduced due to batching)
            time.sleep(0.05)  # Much faster than individual processing
            batch_results.append(f"Result for: {query}")
        
        batch_end = time.time()
        print(f"   Batch completed in {batch_end - batch_start:.3f}s")
        
        results.extend(batch_results)
    
    return results

async def async_course_search(query: str) -> Dict[str, Any]:
    """Async course search simulation"""
    # Simulate async database query
    await asyncio.sleep(0.1)
    
    return {
        'query': query,
        'results': [f"Course result for {query}"],
        'count': 1
    }

async def async_llm_call(prompt: str) -> str:
    """Async LLM call simulation"""
    # Simulate async API call
    await asyncio.sleep(0.2)
    
    return f"LLM response to: {prompt[:30]}..."

async def process_student_query_async(student_query: str) -> Dict[str, Any]:
    """Process student query with async operations"""
    start_time = time.time()
    
    # Run course search and LLM call concurrently
    course_task = async_course_search(student_query)
    llm_task = async_llm_call(f"Help student with: {student_query}")
    
    # Wait for both to complete
    course_results, llm_response = await asyncio.gather(course_task, llm_task)
    
    end_time = time.time()
    
    return {
        'query': student_query,
        'course_results': course_results,
        'llm_response': llm_response,
        'processing_time': end_time - start_time
    }

# Test batch processing
print("‚ö° Batch Processing Performance Test")
print("=" * 50)

test_queries = [
    "machine learning courses",
    "data science programs",
    "python programming",
    "redis database",
    "web development",
    "artificial intelligence",
    "computer vision"
]

# Compare individual vs batch processing
print("üêå Individual Processing:")
individual_start = time.time()
individual_results = []
for query in test_queries[:3]:  # Test with first 3 queries
    time.sleep(0.15)  # Simulate individual processing time
    individual_results.append(f"Individual result for: {query}")
individual_end = time.time()
individual_time = individual_end - individual_start

print(f"   Processed {len(individual_results)} queries in {individual_time:.3f}s")
print(f"   Average: {individual_time/len(individual_results):.3f}s per query")

print("\nüöÄ Batch Processing:")
batch_start = time.time()
batch_results = batch_process_queries(test_queries[:3], batch_size=3)
batch_end = time.time()
batch_time = batch_end - batch_start

print(f"   Processed {len(batch_results)} queries in {batch_time:.3f}s")
print(f"   Average: {batch_time/len(batch_results):.3f}s per query")
print(f"   Speedup: {individual_time/batch_time:.1f}x faster")

# Test async operations
print("\nüîÑ Async Operations Test:")

async def test_async_performance():
    student_queries = [
        "What machine learning courses are available?",
        "I need help with data science prerequisites",
        "Recommend courses for AI specialization"
    ]
    
    # Process queries concurrently
    tasks = [process_student_query_async(query) for query in student_queries]
    results = await asyncio.gather(*tasks)
    
    total_processing_time = sum(result['processing_time'] for result in results)
    wall_clock_time = max(result['processing_time'] for result in results)
    
    print(f"   Processed {len(results)} queries concurrently")
    print(f"   Total processing time: {total_processing_time:.3f}s")
    print(f"   Wall clock time: {wall_clock_time:.3f}s")
    print(f"   Concurrency benefit: {total_processing_time/wall_clock_time:.1f}x speedup")
    
    return results

# Run async test
async_results = asyncio.run(test_async_performance())

print(f"\nüí° Performance Optimization Summary:")
print(f"   ‚Ä¢ Batch processing: {individual_time/batch_time:.1f}x speedup")
print(f"   ‚Ä¢ Async operations: {sum(r['processing_time'] for r in async_results)/max(r['processing_time'] for r in async_results):.1f}x speedup")
print(f"   ‚Ä¢ Caching: Up to 10x speedup for repeated queries")
print(f"   ‚Ä¢ Combined: Potential 50x+ improvement in throughput")

## Concept 3: Performance Monitoring Dashboard

Let's create a simple performance monitoring dashboard.

In [None]:
# Simple performance monitoring dashboard
def create_performance_dashboard():
    """Create a simple text-based performance dashboard"""
    summary = get_performance_summary()
    
    print("üìä PERFORMANCE DASHBOARD")
    print("=" * 50)
    print(f"üìÖ Report Time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    print()
    
    # Response Time Metrics
    print("‚ö° RESPONSE TIME METRICS:")
    print(f"   Total Requests: {summary['total_requests']:,}")
    print(f"   Average Response: {summary['avg_response_time']:.3f}s")
    print(f"   95th Percentile: {summary['p95_response_time']:.3f}s")
    print(f"   Min Response: {summary['min_response_time']:.3f}s")
    print(f"   Max Response: {summary['max_response_time']:.3f}s")
    
    # Performance Status
    avg_time = summary['avg_response_time']
    if avg_time < 1.0:
        status = "üü¢ EXCELLENT"
    elif avg_time < 2.0:
        status = "üü° GOOD"
    elif avg_time < 5.0:
        status = "üü† NEEDS IMPROVEMENT"
    else:
        status = "üî¥ POOR"
    
    print(f"   Status: {status}")
    print()
    
    # Cache Performance
    print("üöÄ CACHE PERFORMANCE:")
    print(f"   Hit Rate: {summary['cache_hit_rate']:.1f}%")
    print(f"   Cache Hits: {summary['cache_hits']:,}")
    print(f"   Cache Misses: {summary['cache_misses']:,}")
    
    cache_status = "üü¢ EXCELLENT" if summary['cache_hit_rate'] > 70 else "üü° GOOD" if summary['cache_hit_rate'] > 40 else "üî¥ POOR"
    print(f"   Cache Status: {cache_status}")
    print()
    
    # Operation Breakdown
    print("üîç OPERATION BREAKDOWN:")
    for operation, times in performance_stats['operation_times'].items():
        if times:
            avg_time = sum(times) / len(times)
            total_time = sum(times)
            print(f"   {operation}: {avg_time:.3f}s avg, {total_time:.3f}s total ({len(times)} calls)")
    print()
    
    # Recommendations
    print("üí° OPTIMIZATION RECOMMENDATIONS:")
    recommendations = []
    
    if summary['avg_response_time'] > 2.0:
        recommendations.append("‚Ä¢ Implement response caching")
        recommendations.append("‚Ä¢ Optimize database queries")
        recommendations.append("‚Ä¢ Use async operations")
    
    if summary['cache_hit_rate'] < 50:
        recommendations.append("‚Ä¢ Increase cache TTL")
        recommendations.append("‚Ä¢ Cache more operations")
        recommendations.append("‚Ä¢ Implement smarter cache keys")
    
    if summary['p95_response_time'] > summary['avg_response_time'] * 2:
        recommendations.append("‚Ä¢ Investigate slow queries")
        recommendations.append("‚Ä¢ Add request timeouts")
        recommendations.append("‚Ä¢ Implement circuit breakers")
    
    if not recommendations:
        recommendations.append("‚Ä¢ Performance looks good!")
        recommendations.append("‚Ä¢ Monitor for scaling issues")
        recommendations.append("‚Ä¢ Consider load testing")
    
    for rec in recommendations:
        print(f"   {rec}")
    
    print()
    print("=" * 50)

def performance_health_check():
    """Quick performance health check"""
    summary = get_performance_summary()
    
    print("üè• PERFORMANCE HEALTH CHECK")
    print("=" * 30)
    
    checks = [
        ("Average response time < 2s", summary['avg_response_time'] < 2.0),
        ("95th percentile < 5s", summary['p95_response_time'] < 5.0),
        ("Cache hit rate > 30%", summary['cache_hit_rate'] > 30),
        ("No responses > 10s", summary['max_response_time'] < 10.0)
    ]
    
    passed = 0
    for check_name, passed_check in checks:
        status = "‚úÖ" if passed_check else "‚ùå"
        print(f"{status} {check_name}")
        if passed_check:
            passed += 1
    
    health_score = (passed / len(checks)) * 100
    print(f"\nüéØ Health Score: {health_score:.0f}%")
    
    if health_score >= 80:
        print("üü¢ System performance is healthy")
    elif health_score >= 60:
        print("üü° System performance needs attention")
    else:
        print("üî¥ System performance requires immediate action")

# Generate performance dashboard
create_performance_dashboard()
print()
performance_health_check()