# Decorator Pattern: LLM Response Caching

**Decorator Pattern** = Wrap functions to add new capabilities without changing core logic!

## The Problem
- LLM API calls are expensive ($0.03 per 1K tokens)
- Slow response times (2-5 seconds)
- Same queries asked repeatedly
- No built-in caching mechanism

## The Solution
Use Decorator Pattern to add transparent caching to any LLM function

In [2]:
import sys
sys.path.append('../..')
from config import API_PROVIDERS 
# API Configuration - Add your OpenAI API key here
OPENAI_API_KEY = API_PROVIDERS['openai']['api_key']# Add your OpenAI API key here

# If no API key provided, we'll use mock responses for demonstration
USE_REAL_API = bool(OPENAI_API_KEY.strip())

print(f"🔑 API Mode: {'Real OpenAI API' if USE_REAL_API else 'Mock Demo Mode'}")
if not USE_REAL_API:
    print("💡 Add your OpenAI API key above to test with real API calls")

🔑 API Mode: Real OpenAI API


In [3]:
# Import required modules
import time
import hashlib
import json
import functools
import random
from typing import Dict, Any, Optional

# Import OpenAI if using real API
if USE_REAL_API:
    try:
        from openai import OpenAI
        client = OpenAI(api_key=OPENAI_API_KEY)
        print("✅ OpenAI client initialized")
    except ImportError:
        print("❌ OpenAI package not found. Install with: pip install openai")
        USE_REAL_API = False
    except Exception as e:
        print(f"❌ OpenAI client failed: {e}")
        USE_REAL_API = False

print("📦 Decorator Pattern Demo - LLM Response Caching")
print("=" * 55)

✅ OpenAI client initialized
📦 Decorator Pattern Demo - LLM Response Caching


## Step 1: Build Smart Cache System

First, let's create an intelligent cache for LLM responses:

In [4]:
class LLMCache:
    """Smart cache for LLM responses with TTL and size management"""
    
    def __init__(self, max_size: int = 100, ttl: int = 3600):
        self.cache: Dict[str, Dict[str, Any]] = {}
        self.max_size = max_size
        self.ttl = ttl  # Time to live in seconds
        self.stats = {'hits': 0, 'misses': 0, 'evictions': 0}
    
    def _generate_key(self, *args, **kwargs) -> str:
        """Generate unique cache key from function arguments"""
        key_data = {'args': str(args), 'kwargs': kwargs}
        key_string = json.dumps(key_data, sort_keys=True)
        return hashlib.md5(key_string.encode()).hexdigest()[:10]
    
    def get(self, key: str) -> Optional[Any]:
        """Get cached value if exists and not expired"""
        if key in self.cache:
            entry = self.cache[key]
            if time.time() - entry['timestamp'] < self.ttl:
                self.stats['hits'] += 1
                return entry['value']
            else:
                del self.cache[key]  # Remove expired entry
        
        self.stats['misses'] += 1
        return None
    
    def set(self, key: str, value: Any):
        """Set cache value with timestamp"""
        # Evict oldest if cache is full
        if len(self.cache) >= self.max_size:
            oldest_key = min(self.cache.keys(), 
                           key=lambda k: self.cache[k]['timestamp'])
            del self.cache[oldest_key]
            self.stats['evictions'] += 1
        
        self.cache[key] = {'value': value, 'timestamp': time.time()}
    
    def get_stats(self) -> Dict[str, Any]:
        """Get cache performance statistics"""
        total = self.stats['hits'] + self.stats['misses']
        hit_rate = (self.stats['hits'] / total * 100) if total > 0 else 0
        
        return {
            'size': len(self.cache),
            'hit_rate': round(hit_rate, 1),
            'hits': self.stats['hits'],
            'misses': self.stats['misses'],
            'evictions': self.stats['evictions']
        }

# Create global cache instance
llm_cache = LLMCache(max_size=50, ttl=300)  # 5 minute TTL

print("✅ LLMCache created!")
print("🔑 Features: TTL expiration, size limits, performance stats")

✅ LLMCache created!
🔑 Features: TTL expiration, size limits, performance stats


## Step 2: Create Caching Decorator

This is the core of the Decorator Pattern - wrapping functions to add caching:

In [5]:
def cache_llm_response(cache_instance: LLMCache = llm_cache):
    """Decorator to cache LLM responses
    
    This is the Decorator Pattern in action:
    - Wraps original function transparently
    - Adds caching behavior without code changes
    - Preserves original function interface
    """
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            # Generate cache key from arguments
            cache_key = cache_instance._generate_key(*args, **kwargs)
            
            # Try cache first
            cached_result = cache_instance.get(cache_key)
            if cached_result is not None:
                print(f"🎯 Cache HIT! Key: {cache_key}")
                return cached_result
            
            # Cache miss - call original function
            print(f"📡 Cache MISS - Calling function... Key: {cache_key}")
            start_time = time.time()
            
            result = func(*args, **kwargs)
            
            # Store in cache
            cache_instance.set(cache_key, result)
            duration = time.time() - start_time
            print(f"💾 Response cached (took {duration:.2f}s)")
            
            return result
        
        return wrapper
    return decorator

print("✅ Caching Decorator created!")
print("🎯 Ready to wrap any function with intelligent caching")

✅ Caching Decorator created!
🎯 Ready to wrap any function with intelligent caching


## Step 3: Create LLM Functions

Let's create both real API and mock versions of LLM functions:

In [6]:
def mock_llm_call(prompt: str, model: str = "gpt-3.5-turbo") -> str:
    """Mock LLM function for demonstration"""
    # Simulate API delay
    time.sleep(random.uniform(1.0, 3.0))
    
    # Generate realistic mock responses
    responses = {
        "artificial intelligence": "Artificial Intelligence (AI) refers to computer systems that can perform tasks typically requiring human intelligence, such as learning, reasoning, and problem-solving.",
        "machine learning": "Machine learning is a subset of AI that enables computers to learn and improve from data without being explicitly programmed for every task.",
        "cloud computing": "Cloud computing delivers computing services over the internet, offering scalability, cost-effectiveness, and accessibility from anywhere.",
        "blockchain": "Blockchain is a distributed ledger technology that maintains a secure, transparent record of transactions across multiple computers."
    }
    
    # Find best match for prompt
    for key, response in responses.items():
        if key.lower() in prompt.lower():
            return response
    
    return f"This is a mock response to your query about: {prompt[:50]}..."

def real_llm_call(prompt: str, model: str = "gpt-3.5-turbo") -> str:
    """Real OpenAI API call"""
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            max_tokens=150,
            temperature=0.7
        )
        result = response.choices[0].message.content
        # Handle None response
        if result is None:
            return f"[API returned empty response for: {prompt[:50]}...]"
        return result
    except Exception as e:
        print(f"❌ API Error: {e}")
        return f"API Error: {str(e)}"

# Select the appropriate function based on API availability
base_llm_call = real_llm_call if USE_REAL_API else mock_llm_call

print(f"✅ LLM function ready: {'Real OpenAI API' if USE_REAL_API else 'Mock Demo'}")

✅ LLM function ready: Real OpenAI API


## Step 4: Apply Decorator Pattern

Now let's enhance our LLM function with caching using the decorator:

In [7]:
# Apply the decorator to our LLM function
@cache_llm_response()
def cached_llm_call(prompt: str, model: str = "gpt-3.5-turbo") -> str:
    """LLM call enhanced with caching via Decorator Pattern"""
    return base_llm_call(prompt, model)

# Create multiple cached functions for different use cases
@cache_llm_response()
def ask_question(question: str) -> str:
    """Ask a general question"""
    return base_llm_call(f"Please answer this question: {question}")

@cache_llm_response()
def summarize_text(text: str, max_words: int = 50) -> str:
    """Summarize text with caching"""
    prompt = f"Summarize this text in {max_words} words: {text}"
    return base_llm_call(prompt)

print("✅ Decorator Pattern Applied!")
print("🎭 Functions enhanced with caching behavior")
print("🔄 Same interface, new capabilities")

✅ Decorator Pattern Applied!
🎭 Functions enhanced with caching behavior
🔄 Same interface, new capabilities


## Step 5: Demonstrate Cache Performance

Let's see the performance benefits of our caching decorator:

In [8]:
print("🚀 Decorator Pattern Performance Demo")
print("=" * 45)

# Test queries
queries = [
    "What is artificial intelligence?",
    "Explain machine learning",
    "What are benefits of cloud computing?"
]

print("\n📝 Round 1: First calls (cache misses)")
print("-" * 40)

round1_times = []
for i, query in enumerate(queries, 1):
    print(f"\n{i}. {query}")
    
    start = time.time()
    response = cached_llm_call(query)
    duration = time.time() - start
    round1_times.append(duration)
    
    print(f"   ⏱️ Time: {duration:.2f}s")
    if response:
        print(f"   📤 Response: {response[:80]}...")
    else:
        print(f"   📤 Response: [No response received]")

# Show cache stats
stats = llm_cache.get_stats()
print(f"\n📊 Cache Stats: {stats['hits']} hits, {stats['misses']} misses")

print("\n📝 Round 2: Repeat calls (cache hits expected)")
print("-" * 40)

round2_times = []
for i, query in enumerate(queries, 1):
    print(f"\n{i}. {query}")
    
    start = time.time()
    response = cached_llm_call(query)
    duration = time.time() - start
    round2_times.append(duration)
    
    print(f"   ⚡ Time: {duration:.3f}s")
    if response:
        print(f"   📤 Response: {response[:80]}...")
    else:
        print(f"   📤 Response: [No response received]")

# Final stats
final_stats = llm_cache.get_stats()
print(f"\n📊 Final Stats: {final_stats['hits']} hits, {final_stats['misses']} misses")
print(f"🎯 Hit Rate: {final_stats['hit_rate']}%")

🚀 Decorator Pattern Performance Demo

📝 Round 1: First calls (cache misses)
----------------------------------------

1. What is artificial intelligence?
📡 Cache MISS - Calling function... Key: 9bfe3c17c3
💾 Response cached (took 1.77s)
   ⏱️ Time: 1.77s
   📤 Response: Artificial intelligence, or AI, refers to the simulation of human intelligence i...

2. Explain machine learning
📡 Cache MISS - Calling function... Key: 592ae03d83
💾 Response cached (took 1.63s)
   ⏱️ Time: 1.63s
   📤 Response: Machine learning is a subset of artificial intelligence that involves the develo...

3. What are benefits of cloud computing?
📡 Cache MISS - Calling function... Key: 4611cbbf62
💾 Response cached (took 1.69s)
   ⏱️ Time: 1.69s
   📤 Response: 1. Cost savings: Cloud computing eliminates the need for upfront investments in ...

📊 Cache Stats: 0 hits, 3 misses

📝 Round 2: Repeat calls (cache hits expected)
----------------------------------------

1. What is artificial intelligence?
🎯 Cache HIT! Key: 9b

## Step 6: Performance Analysis

Let's analyze the performance benefits:

In [9]:
print("\n📈 Performance Analysis")
print("=" * 30)

# Calculate improvements
avg_round1 = sum(round1_times) / len(round1_times)
avg_round2 = sum(round2_times) / len(round2_times)
speed_improvement = ((avg_round1 - avg_round2) / avg_round1) * 100

print(f"\n⚡ Speed Comparison:")
print(f"   First calls:  {avg_round1:.2f}s average")
print(f"   Cached calls: {avg_round2:.3f}s average")
print(f"   Improvement:  {speed_improvement:.1f}% faster")

# Cost analysis
api_cost_per_call = 0.002  # Example cost
calls_without_cache = 6    # 3 queries × 2 rounds
actual_api_calls = final_stats['misses']

cost_without_cache = calls_without_cache * api_cost_per_call
cost_with_cache = actual_api_calls * api_cost_per_call
savings = cost_without_cache - cost_with_cache
savings_percent = (savings / cost_without_cache) * 100

print(f"\n💰 Cost Analysis:")
print(f"   Without cache: ${cost_without_cache:.3f}")
print(f"   With cache:    ${cost_with_cache:.3f}")
print(f"   Savings:       ${savings:.3f} ({savings_percent:.1f}%)")

print(f"\n🎯 Decorator Pattern Benefits:")
benefits = [
    "Zero code changes to original function",
    "Transparent caching behavior",
    f"{speed_improvement:.1f}% performance improvement",
    f"{savings_percent:.1f}% cost reduction",
    "Built-in cache management",
    "Easy to enable/disable"
]

for i, benefit in enumerate(benefits, 1):
    print(f"   {i}. ✅ {benefit}")


📈 Performance Analysis

⚡ Speed Comparison:
   First calls:  1.69s average
   Cached calls: 0.000s average
   Improvement:  100.0% faster

💰 Cost Analysis:
   Without cache: $0.012
   With cache:    $0.006
   Savings:       $0.006 (50.0%)

🎯 Decorator Pattern Benefits:
   1. ✅ Zero code changes to original function
   2. ✅ Transparent caching behavior
   3. ✅ 100.0% performance improvement
   4. ✅ 50.0% cost reduction
   5. ✅ Built-in cache management
   6. ✅ Easy to enable/disable


## Step 7: Advanced - Decorator Composition

Show how multiple decorators can be stacked for rich functionality:

In [10]:
def timing_decorator(func):
    """Decorator to measure execution time"""
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        start = time.time()
        result = func(*args, **kwargs)
        duration = time.time() - start
        print(f"⏱️ Execution: {duration:.3f}s")
        return result
    return wrapper

def cost_tracker(cost_per_call: float = 0.002):
    """Decorator to track costs"""
    total_cost = {'value': 0.0}
    
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            result = func(*args, **kwargs)
            
            # Only count cost if it was an actual API call (not cached)
            if "Cache HIT" not in str(result):
                total_cost['value'] += cost_per_call
                print(f"💰 Cost: ${cost_per_call:.3f} | Total: ${total_cost['value']:.3f}")
            
            return result
        
        wrapper.get_total_cost = lambda: total_cost['value']
        return wrapper
    return decorator

# Create function with multiple decorators
@timing_decorator
@cost_tracker(0.002)
@cache_llm_response()
def enhanced_llm_call(prompt: str) -> str:
    """LLM call with timing, cost tracking, and caching"""
    return base_llm_call(prompt)

print("🔗 Decorator Composition Demo")
print("=" * 32)
print("Stack: @timing → @cost_tracker → @cache → function")

test_prompt = "What is the future of AI?"

print(f"\n📝 First call: {test_prompt}")
result1 = enhanced_llm_call(test_prompt)

print(f"\n📝 Second call (cached): {test_prompt}")
result2 = enhanced_llm_call(test_prompt)

print(f"\n📊 Total tracked cost: ${enhanced_llm_call.get_total_cost():.3f}")

print("\n🎯 Composition Benefits:")
print("   1. ✨ Modular design - each decorator handles one concern")
print("   2. ✨ Flexible stacking - mix and match as needed")
print("   3. ✨ Order matters - cache first, then track what executes")
print("   4. ✨ Reusable - same decorators work on any function")

🔗 Decorator Composition Demo
Stack: @timing → @cost_tracker → @cache → function

📝 First call: What is the future of AI?
📡 Cache MISS - Calling function... Key: fc9d9c24c4
💾 Response cached (took 2.75s)
💰 Cost: $0.002 | Total: $0.002
⏱️ Execution: 2.751s

📝 Second call (cached): What is the future of AI?
🎯 Cache HIT! Key: fc9d9c24c4
💰 Cost: $0.002 | Total: $0.004
⏱️ Execution: 0.001s

📊 Total tracked cost: $0.004

🎯 Composition Benefits:
   1. ✨ Modular design - each decorator handles one concern
   2. ✨ Flexible stacking - mix and match as needed
   3. ✨ Order matters - cache first, then track what executes
   4. ✨ Reusable - same decorators work on any function


**You've successfully learned the Decorator Pattern for LLM applications!**

## 🎓 Decorator Pattern Learning Summary

The Decorator Pattern allows you to add new behavior to existing functions without altering their core logic.
It preserves the original interface, supports flexible configuration, and enables multiple enhancements through stacking.

**Core Concepts**

* Add functionality without changing original code
* Keep the same function signature (transparency)
* Use `functools.wraps` to preserve metadata
* Allow parameterized decorators
* Support stacking multiple decorators

In practice, decorators improve flexibility, maintain cleaner code, and make enhancements reusable.

**Key Advantages**

* **Zero Code Changes** – Apply the decorator directly
* **Separation of Concerns** – Business logic remains isolated
* **Reusability** – Works with different functions
* **Composability** – Combine multiple decorators easily
* **Testability** – Test each decorator in isolation
* **Performance Gains** – Faster execution and reduced costs

When applied to LLM systems, decorators can significantly improve efficiency and monitoring.

**LLM-Specific Applications**

* Response caching (50–80% API cost reduction)
* Cost tracking across all calls
* Performance monitoring (latency, success rates)
* Retry logic with exponential backoff
* Rate limiting to avoid hitting quotas
* Graceful error handling with fallback options

To ensure production readiness, follow best practices for caching and scalability.

**Best Practices**

* Set TTL for cache expiration
* Limit cache size to avoid memory overload
* Maintain cache statistics for optimization
* Provide cache bypass for real-time scenarios
* Use consistent cache key generation
* Design with composition in mind from the start