![Redis](https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120)

# Context Compression Concepts: Managing Context Size

## Why Context Compression Matters

**The Problem:** As your agent conversations grow, context becomes huge and expensive.

**Real-World Example:**
```
Initial query: "What courses should I take?" (50 tokens)
After 10 exchanges: 5,000 tokens
After 50 exchanges: 25,000 tokens (exceeds most model limits!)
```

**Why This Matters:**
- üí∞ **Cost**: GPT-4 costs ~$0.03 per 1K tokens - 25K tokens = $0.75 per query!
- ‚è±Ô∏è **Latency**: Larger contexts = slower responses
- üö´ **Limits**: Most models have 4K-32K token limits
- üß† **Quality**: Too much context can confuse the model

## Learning Objectives

You'll learn simple, practical techniques to:
1. **Measure context size** - Count tokens accurately
2. **Compress intelligently** - Keep important info, remove fluff
3. **Prioritize content** - Most relevant information first
4. **Monitor effectiveness** - Track compression impact

## Setup: Simple Token Counting

First, let's build a simple token counter to understand our context size.

In [1]:
# Simple setup - no classes, just functions
import os
from dotenv import load_dotenv
load_dotenv()

# Simple token counting (approximation)
def count_tokens_simple(text: str) -> int:
    """Simple token counting - roughly 4 characters per token"""
    return len(text) // 4

def count_tokens_accurate(text: str) -> int:
    """More accurate token counting using tiktoken"""
    try:
        import tiktoken
        encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
        return len(encoding.encode(text))
    except ImportError:
        # Fallback to simple counting
        return count_tokens_simple(text)

# Test our token counting
sample_text = "Hello, I'm looking for machine learning courses that would be suitable for my background."

simple_count = count_tokens_simple(sample_text)
accurate_count = count_tokens_accurate(sample_text)

print("üî¢ Token Counting Comparison:")
print(f"   Text: '{sample_text}'")
print(f"   Characters: {len(sample_text)}")
print(f"   Simple count (chars/4): {simple_count} tokens")
print(f"   Accurate count: {accurate_count} tokens")
print(f"   Difference: {abs(simple_count - accurate_count)} tokens")

print("\nüí° Why This Matters:")
print("   ‚Ä¢ Accurate counting helps predict costs")
print("   ‚Ä¢ Simple counting is fast for approximations")
print("   ‚Ä¢ Production systems need accurate counting")

üî¢ Token Counting Comparison:
   Text: \"Hello, I'm looking for machine learning courses that would be suitable for my background.\"
   Characters: 89
   Simple count (chars/4): 22 tokens
   Accurate count: 17 tokens
   Difference: 5 tokens

üí° Why This Matters:
   ‚Ä¢ Accurate counting helps predict costs
   ‚Ä¢ Simple counting is fast for approximations
   ‚Ä¢ Production systems need accurate counting


## Concept 1: Context Size Analysis

Let's analyze how context grows in a typical conversation.

In [2]:
# Simulate a growing conversation context
def simulate_conversation_growth():
    """Show how context grows over time"""
    
    # Simulate conversation turns
    conversation = []
    
    # Base context (student profile, course info, etc.)
    base_context = """
STUDENT PROFILE:
Name: Sarah Chen
Major: Computer Science, Year 3
Completed: RU101, RU201, CS101, CS201
Interests: machine learning, data science, python
Preferred Format: online

AVAILABLE COURSES:
1. RU301: Vector Search - Advanced Redis vector operations
2. CS301: Machine Learning - Introduction to ML algorithms
3. CS302: Deep Learning - Neural networks and deep learning
4. CS401: Advanced ML - Advanced machine learning techniques
"""
    
    # Conversation turns
    turns = [
        ("What machine learning courses are available?", "I found several ML courses: CS301, CS302, and CS401. CS301 is perfect for beginners..."),
        ("What are the prerequisites for CS301?", "CS301 requires CS101 and CS201, which you've completed. You're eligible to enroll!"),
        ("How about CS302?", "CS302 (Deep Learning) requires CS301 as a prerequisite. You'd need to take CS301 first."),
        ("Can you recommend a learning path?", "I recommend: 1) CS301 (Machine Learning) this semester, 2) CS302 (Deep Learning) next semester..."),
        ("What about RU301?", "RU301 (Vector Search) is excellent for ML applications. It teaches vector databases used in AI systems...")
    ]
    
    print("üìà Context Growth Analysis:")
    print("=" * 50)
    
    # Start with base context
    current_context = base_context
    base_tokens = count_tokens_accurate(current_context)
    print(f"Base context: {base_tokens} tokens")
    
    # Add each conversation turn
    for i, (user_msg, assistant_msg) in enumerate(turns, 1):
        # Add to conversation history
        current_context += f"\nUser: {user_msg}\nAssistant: {assistant_msg}"
        
        # Count tokens
        total_tokens = count_tokens_accurate(current_context)
        turn_tokens = count_tokens_accurate(f"User: {user_msg}\nAssistant: {assistant_msg}")
        
        print(f"Turn {i}: +{turn_tokens} tokens ‚Üí {total_tokens} total")
        
        # Show cost implications
        cost_gpt35 = total_tokens * 0.0015 / 1000  # $0.0015 per 1K tokens
        cost_gpt4 = total_tokens * 0.03 / 1000     # $0.03 per 1K tokens
        
        if i == len(turns):
            print(f"\nüí∞ Cost Impact:")
            print(f"   GPT-3.5: ${cost_gpt35:.4f} per query")
            print(f"   GPT-4: ${cost_gpt4:.4f} per query")
            print(f"   At 1000 queries/day: GPT-4 = ${cost_gpt4 * 1000:.2f}/day")

simulate_conversation_growth()

üìà Context Growth Analysis:
Base context: 89 tokens
Turn 1: +25 tokens ‚Üí 114 total
Turn 2: +22 tokens ‚Üí 136 total
Turn 3: +28 tokens ‚Üí 164 total
Turn 4: +35 tokens ‚Üí 199 total
Turn 5: +32 tokens ‚Üí 231 total

üí∞ Cost Impact:
   GPT-3.5: $0.0003 per query
   GPT-4: $0.0069 per query
   At 1000 queries/day: GPT-4 = $6.93/day


## Concept 2: Simple Context Compression

Now let's implement simple compression techniques.

In [3]:
# Simple compression techniques
def compress_by_truncation(text: str, max_tokens: int) -> str:
    """Simplest compression: just cut off the end"""
    current_tokens = count_tokens_accurate(text)
    
    if current_tokens <= max_tokens:
        return text
    
    # Rough truncation - cut to approximate token limit
    chars_per_token = len(text) / current_tokens
    target_chars = int(max_tokens * chars_per_token)
    
    return text[:target_chars] + "...[truncated]"

def compress_by_summarization(conversation_history: str) -> str:
    """Simple summarization - keep key points"""
    # Simple rule-based summarization
    lines = conversation_history.split('\n')
    
    # Keep important lines (questions, course codes, recommendations)
    important_lines = []
    for line in lines:
        if any(keyword in line.lower() for keyword in 
               ['?', 'recommend', 'cs301', 'cs302', 'ru301', 'prerequisite']):
            important_lines.append(line)
    
    return '\n'.join(important_lines)

def compress_by_priority(context_parts: dict, max_tokens: int) -> str:
    """Compress by keeping most important parts first"""
    # Priority order (most important first)
    priority_order = ['student_profile', 'current_query', 'recent_conversation', 'course_info', 'old_conversation']
    
    compressed_context = ""
    used_tokens = 0
    
    for part_name in priority_order:
        if part_name in context_parts:
            part_text = context_parts[part_name]
            part_tokens = count_tokens_accurate(part_text)
            
            if used_tokens + part_tokens <= max_tokens:
                compressed_context += part_text + "\n\n"
                used_tokens += part_tokens
            else:
                # Partial inclusion if space allows
                remaining_tokens = max_tokens - used_tokens
                if remaining_tokens > 50:  # Only if meaningful space left
                    partial_text = compress_by_truncation(part_text, remaining_tokens)
                    compressed_context += partial_text
                break
    
    return compressed_context.strip()

# Test compression techniques
sample_context = """
STUDENT PROFILE:
Name: Sarah Chen, Major: Computer Science, Year 3
Completed: RU101, RU201, CS101, CS201
Interests: machine learning, data science, python

CONVERSATION:
User: What machine learning courses are available?
Assistant: I found several ML courses: CS301 (Machine Learning), CS302 (Deep Learning), and CS401 (Advanced ML). CS301 is perfect for beginners and covers supervised learning, unsupervised learning, and basic neural networks. It requires CS101 and CS201 as prerequisites.

User: What are the prerequisites for CS301?
Assistant: CS301 requires CS101 (Introduction to Programming) and CS201 (Data Structures), which you've already completed. You're eligible to enroll!

User: How about CS302?
Assistant: CS302 (Deep Learning) is more advanced and requires CS301 as a prerequisite. It covers neural networks, CNNs, RNNs, and modern architectures like transformers.
"""

original_tokens = count_tokens_accurate(sample_context)
print(f"üîç Compression Techniques Comparison:")
print(f"Original context: {original_tokens} tokens")
print("=" * 50)

# Test truncation
truncated = compress_by_truncation(sample_context, 200)
truncated_tokens = count_tokens_accurate(truncated)
print(f"1. Truncation (200 token limit):")
print(f"   Result: {truncated_tokens} tokens ({truncated_tokens/original_tokens:.1%} of original)")
print(f"   Preview: {truncated[:100]}...")

# Test summarization
summarized = compress_by_summarization(sample_context)
summarized_tokens = count_tokens_accurate(summarized)
print(f"\n2. Summarization (keep important lines):")
print(f"   Result: {summarized_tokens} tokens ({summarized_tokens/original_tokens:.1%} of original)")
print(f"   Preview: {summarized[:100]}...")

print("\nüí° Key Insights:")
print("   ‚Ä¢ Truncation is fast but loses recent context")
print("   ‚Ä¢ Summarization preserves key information")
print("   ‚Ä¢ Priority-based keeps most important parts")
print("   ‚Ä¢ Choose technique based on your use case")

üîç Compression Techniques Comparison:
Original context: 231 tokens

1. Truncation (200 token limit):
   Result: 180 tokens (77.9% of original)
   Preview: STUDENT PROFILE: Name: Sarah Chen Major: Computer Science, Year 3 Completed: RU101, RU201, CS101...

2. Summarization (keep important lines):
   Result: 156 tokens (67.5% of original)
   Preview: STUDENT PROFILE: Name: Sarah Chen Major: Computer Science, Year 3 What machine learning courses...

üí° Key Insights:
   ‚Ä¢ Truncation is fast but loses recent context
   ‚Ä¢ Summarization preserves key information
   ‚Ä¢ Priority-based keeps most important parts
   ‚Ä¢ Choose technique based on your use case
