# Context layering

Context windows are precious and finite, yet not all information deserves equal priority. Some content is absolutely critical - system instructions that define agent behavior, current task objectives, user authentication state. Other content is highly valuable but not strictly essential - recent conversation history, relevant memories, helpful background knowledge. Still other content is nice-to-have but expendable - older conversation turns, tangentially related documentation, supplementary examples. The challenge is organizing this hierarchy of importance so that when context space runs low, we preserve what matters most.

Context layering addresses this challenge by organizing information into priority-based tiers, each with different treatment under space constraints. Layer 0 contains immutable critical content that must always be present. Layer 1 holds important information that should be included when possible but can be summarized if needed. Layer 2 contains supplementary content that is included opportunistically when space permits but dropped entirely under tight constraints. This stratification ensures graceful degradation - as context budgets tighten, we shed lower-priority layers while preserving the foundation.

In this notebook, we explore how to implement context layering as a sophisticated select strategy for context engineering. We will examine how to define layer priorities and assign content appropriately, how to implement layer-aware context building with space constraints, how to compress or drop lower layers when budgets are tight, how to ensure critical information always survives, and how to build production-ready layered context systems.

In [1]:
import os
from langchain_openai import ChatOpenAI
from typing import List, Dict, Optional, Tuple
from dataclasses import dataclass
from enum import Enum

We begin by initializing the language model that will work with our layered context. Consistent model configuration ensures reproducible behavior across different layering strategies.

In [2]:
# Initialize the language model for generating responses
llm = ChatOpenAI(
    model="gpt-4o-mini",
    api_key=os.getenv("OPENAI_API_KEY", "").strip(),
    temperature=0  # Set to 0 for more deterministic outputs
)

print("Language model initialized successfully!")

Language model initialized successfully!


## Part 1: Defining context layers and priorities

Effective context layering begins with a clear taxonomy of layer priorities and guidelines for what content belongs in each tier. The layer structure should reflect our system's non-negotiable requirements versus nice-to-have enhancements. Layer 0 typically contains system instructions and current task context that define the agent's behavior and purpose. Layer 1 holds important supporting information like recent conversation history and key user preferences. Layer 2 contains supplementary material like background knowledge and older interactions.

We will define a LayerPriority enumeration and a ContextLayer class that encapsulates content with its assigned priority, token budget, and compression options. This structure enables systematic layer-aware context management.

In [3]:
class LayerPriority(Enum):
    """Priority levels for context layers."""
    CRITICAL = 0  # Must always be included, never compressed
    IMPORTANT = 1  # Should be included, can be summarized if needed
    SUPPLEMENTARY = 2  # Nice to have, dropped first under constraints

@dataclass
class ContextLayer:
    """A layer of context with priority and content."""
    
    priority: LayerPriority  # Layer's priority level (CRITICAL, IMPORTANT, or SUPPLEMENTARY)
    name: str  # Name for this layer (e.g., "System Instructions")
    content: str  # The actual text content stored in this layer
    compressible: bool = False  # Can this layer be summarized?
    
    def estimate_tokens(self) -> int:
        """Estimate token count for this layer's content.
        
        Returns:
            Estimated tokens (rough approximation)
        """
        # Split by whitespace to count words
        word_count = len(self.content.split())
        # Rough estimate: ~0.75 tokens per word
        return int(word_count * 0.75)
    
    def compress(self, target_length: int) -> str:
        """Create compressed version of content.
        
        Args:
            target_length: Target character length
            
        Returns:
            Compressed content string
        """
        # Don't compress if layer is marked as non-compressible or already shorter than target
        if not self.compressible or len(self.content) <= target_length:
            return self.content
        
        # Simple truncation with ellipsis (production would use smarter summarization)
        return self.content[:target_length-3] + "..."

# Define example context layers for a customer service agent
def create_sample_layers() -> List[ContextLayer]:
    """Create sample context layers demonstrating the hierarchy.
    
    Returns:
        List of context layers with different priorities
    """
    # This creates a realistic example of how a customer service chatbot would organize its context into different priority layers
    return [
        # Layer 0: Critical - System instructions
        ContextLayer(
            priority=LayerPriority.CRITICAL,
            name="System Instructions",
            content="""You are a helpful customer service agent for TechStore.

Core responsibilities:
- Answer product questions accurately
- Help with orders and returns
- Provide technical support
- Maintain professional, friendly tone

Critical rules:
- Never share customer data with unauthorized parties
- Always verify identity before account changes
- Escalate refunds over $500 to manager""",
            compressible=False
        ),
        
        # Layer 0: Critical - Current task
        ContextLayer(
            priority=LayerPriority.CRITICAL,
            name="Current Task",
            content="""Current customer request: Help with laptop overheating issue
Customer: Premium member, ID #12345
Product: Laptop Pro X1 purchased 45 days ago""",
            compressible=False
        ),
        
        # Layer 1: Important - Recent conversation
        ContextLayer(
            priority=LayerPriority.IMPORTANT,
            name="Recent Conversation",
            content="""Last 3 exchanges:

Customer: My laptop gets really hot when I'm gaming.
Agent: I understand that's frustrating. Let me help you troubleshoot this.

Customer: It's been happening for about a week now.
Agent: Thank you for that detail. Has anything changed recently—new software, updates, or room temperature?

Customer: I did install some new games last week.
Agent: That could be related. Let's check a few things.""",
            compressible=True
        ),
        
        # Layer 1: Important - Key user preferences
        ContextLayer(
            priority=LayerPriority.IMPORTANT,
            name="Key Preferences",
            content="""User preferences:
- Prefers detailed technical explanations
- Wants to try DIY solutions before returns
- Previous positive experience with cooling pad recommendation
- Values quick resolution""",
            compressible=True
        ),
        
        # Layer 2: Supplementary - Product documentation
        ContextLayer(
            priority=LayerPriority.SUPPLEMENTARY,
            name="Product Documentation",
            content="""Laptop Pro X1 Technical Specs:
- Intel Core i7-12700H processor (14 cores, up to 4.7 GHz)
- NVIDIA RTX 3060 graphics (6GB GDDR6)
- 16GB DDR5 RAM
- Dual fan cooling system with heat pipes
- Recommended operating temperature: 0-35°C ambient
- Maximum thermal design power: 140W

Common thermal management tips:
- Ensure ventilation openings are clear
- Use on hard, flat surfaces (not soft bedding)
- Update thermal drivers to latest version
- Consider cooling pad for intensive workloads""",
            compressible=True
        ),
        
        # Layer 2: Supplementary - Policy information
        ContextLayer(
            priority=LayerPriority.SUPPLEMENTARY,
            name="Return Policy",
            content="""TechStore Return Policy:
- 30-day return window from purchase date
- Products must be in original packaging
- Refunds processed in 5-7 business days
- Free return shipping for defective items
- Customer pays return shipping for buyer's remorse
- 2-year manufacturer warranty covers defects
- Extended warranty available for purchase""",
            compressible=True
        ),
        
        # Layer 2: Supplementary - Older conversation history
        ContextLayer(
            priority=LayerPriority.SUPPLEMENTARY,
            name="Earlier Conversation",
            content="""Earlier in conversation:

Customer: Hi, I need some help.
Agent: Hello! I'm happy to help. What can I assist you with today?

Customer: I'm having an issue with my laptop.
Agent: I'm sorry to hear that. Can you tell me more about what's happening?""",
            compressible=True
        ),
    ]

# Create sample layers and analyze them
layers = create_sample_layers()

print("Context Layer Structure")
print("=" * 70)

# Group layers by priority level and calculate statistics
# This helps us understand the token distribution across priority tiers
for priority in LayerPriority:
    # Filter to get only layers matching this priority level
    priority_layers = [l for l in layers if l.priority == priority]
    # Calculate total tokens for this priority level
    total_tokens = sum(l.estimate_tokens() for l in priority_layers)

    # Display summary for this priority level
    print(f"\n{priority.name} (Priority {priority.value}):")
    print(f"  Layers: {len(priority_layers)}")
    print(f"  Total tokens: ~{total_tokens}")
    print(f"  Content:")

    # Show details for each layer in this priority
    for layer in priority_layers:
        compressible = "compressible" if layer.compressible else "fixed"
        print(f"    - {layer.name} (~{layer.estimate_tokens()} tokens, {compressible})")

# Calculate and display overall statistics
total_tokens = sum(l.estimate_tokens() for l in layers)
print(f"\nTotal context size: ~{total_tokens} tokens")

Context Layer Structure

CRITICAL (Priority 0):
  Layers: 2
  Total tokens: ~56
  Content:
    - System Instructions (~41 tokens, fixed)
    - Current Task (~15 tokens, fixed)

IMPORTANT (Priority 1):
  Layers: 2
  Total tokens: ~69
  Content:
    - Recent Conversation (~49 tokens, compressible)
    - Key Preferences (~20 tokens, compressible)

SUPPLEMENTARY (Priority 2):
  Layers: 3
  Total tokens: ~131
  Content:
    - Product Documentation (~60 tokens, compressible)
    - Return Policy (~38 tokens, compressible)
    - Earlier Conversation (~33 tokens, compressible)

Total context size: ~256 tokens


The layer structure demonstrates a clear priority hierarchy:
1. Defines three priority tiers (Critical, Important, Supplementary) that encode different treatment under space constraints.
2. Creates a ContextLayer class with token estimation, compression capabilities and priority assignment.
3. Builds realistic example layers showing system instructions and current task as critical, recent conversation and preferences as important and documentation as supplementary.
4. Analyzes token distribution across priorities revealing that critical layers consume modest space while supplementary layers contain the bulk of optional content.

This foundation enables intelligent context management under varying token budgets.

## Part 2: Layer-aware context assembly

With layers defined and prioritized, we can now implement context assembly that respects priority ordering and token budgets. The assembly algorithm should always include critical layers in full, include important layers when space permits with compression as a fallback, and include supplementary layers only when abundant space is available. This ensures graceful degradation as budgets tighten.

The algorithm operates in priority order, allocating budget to higher-priority layers first. Critical layers consume their required space unconditionally. Important layers attempt to fit in full, compressing if needed. Supplementary layers fill remaining space opportunistically. This systematic approach guarantees that essential information always survives resource constraints.

In [4]:
def assemble_layered_context(layers: List[ContextLayer],
                            token_budget: int,
                            compression_ratio: float = 0.5) -> Dict:
    """Assemble context from layers respecting priorities and budget.
    
    Args:
        layers: All available context layers
        token_budget: Maximum tokens to use
        compression_ratio: Target size ratio when compressing (0-1, default 0.5)
        
    Returns:
        Dict containing assembled context and metadata
    """
    # Sort layers by priority (critical first). This ensures we process higher priority layers before lower priority ones
    sorted_layers = sorted(layers, key=lambda l: l.priority.value)
    
    # Initialize tracking variables
    included_layers = []  # Store the actual content to include
    tokens_used = 0  # Track how many tokens we have consumed so far
    compression_applied = []  # Track which layers got compressed
    dropped_layers = []  # Track which layers we could not include

    # Process each layer in priority order
    for layer in sorted_layers:
        # Calculate how many tokens this layer would use
        layer_tokens = layer.estimate_tokens()
        
        # Critical layers: Always include in full
        if layer.priority == LayerPriority.CRITICAL:
            # Add to context without checking budget
            included_layers.append(layer.content)  # Critical layers are non-negotiable - system can't function without them
            tokens_used += layer_tokens
            continue  # Move to next layer
        
        # Check remaining budget
        remaining_budget = token_budget - tokens_used
        
        # Important layers: Include if possible, compress if needed
        if layer.priority == LayerPriority.IMPORTANT:
            # Check if layer fits in full within remaining budget
            if layer_tokens <= remaining_budget:
                # Fits in full. Add to context
                included_layers.append(layer.content)
                tokens_used += layer_tokens
            # If it does not fit but is compressible and we have some space
            elif layer.compressible and remaining_budget > 0:
                # Calculate target character length (rough approximation)
                target_chars = int(len(layer.content) * compression_ratio)
                # Compress to fit budget
                compressed = layer.compress(target_chars)
                compressed_tokens = int(layer_tokens * compression_ratio)

                # Check if compressed version fits
                if compressed_tokens <= remaining_budget:
                    # Include the compressed version
                    included_layers.append(compressed)
                    tokens_used += compressed_tokens
                    compression_applied.append(layer.name)
                else:
                    # Even compressed, it doesn't fit - have to drop it
                    dropped_layers.append(layer.name)
            else:
                # Can't compress or no space left - drop this layer
                dropped_layers.append(layer.name)
            continue # Move to next layer
        
        # Supplementary layers: Include only if budget allows
        if layer.priority == LayerPriority.SUPPLEMENTARY:
            # Check if layer fits in full
            if layer_tokens <= remaining_budget:
                # We have enough space - include it
                included_layers.append(layer.content)
                tokens_used += layer_tokens
            # Try compressing if we have at least 30% of the tokens needed
            elif layer.compressible and remaining_budget > layer_tokens * 0.3:
                # Try compressing if it would fit
                target_chars = int(len(layer.content) * compression_ratio)
                compressed = layer.compress(target_chars)
                compressed_tokens = int(layer_tokens * compression_ratio)

                # Check if compressed version fits
                if compressed_tokens <= remaining_budget:
                    included_layers.append(compressed)
                    tokens_used += compressed_tokens
                    compression_applied.append(layer.name)
                else:
                    # Doesn't fit even compressed
                    dropped_layers.append(layer.name)
            else:
                # Can't compress or no space left - drop this layer
                dropped_layers.append(layer.name)
    
    # Assemble final context by joining all included layers with double newlines
    context = "\n\n".join(included_layers)

    # Return comprehensive metadata about the assembly process
    return {
        'context': context,  # The assembled context string
        'tokens_used': tokens_used,  # Actual tokens consumed
        'tokens_budget': token_budget,  # Budget we were given
        'budget_utilization': (tokens_used / token_budget * 100) if token_budget > 0 else 0,  # Percentage used
        'layers_included': len(included_layers),  # How many layers made it in
        'layers_total': len(layers),  # Total layers we started with
        'compressed_layers': compression_applied,  # Which layers were compressed
        'dropped_layers': dropped_layers  # Which layers were dropped
    }

This function implements the core layering algorithm:
1. CRITICAL layers are ALWAYS included in full (no compression, no dropping)
2. IMPORTANT layers are included if space permits, compressed if needed
3. SUPPLEMENTARY layers are included only when abundant space is available

In [5]:
# Test with different token budgets
budgets = [100, 200, 400]

print("Layer-Aware Context Assembly")
print("=" * 70)

# Try assembling context under different budget constraints
for budget in budgets:
    # Call the assembly function with this budget
    result = assemble_layered_context(layers, token_budget=budget)
    
    print(f"\nToken Budget: {budget}")
    print("-" * 70)
    # Show how much of the budget was used
    print(f"Used: {result['tokens_used']}/{result['tokens_budget']} tokens ({result['budget_utilization']:.1f}%)")
    # Show how many layers fit
    print(f"Included: {result['layers_included']}/{result['layers_total']} layers")

    # Show which layers were compressed (if any)
    if result['compressed_layers']:
        print(f"Compressed: {', '.join(result['compressed_layers'])}")

    # Show which layers were dropped (if any)
    if result['dropped_layers']:
        print(f"Dropped: {', '.join(result['dropped_layers'])}")

    # Show a preview of the assembled context
    print(f"\nContext preview (first 200 chars):")
    print(f"{result['context'][:200]}...")
    print()

Layer-Aware Context Assembly

Token Budget: 100
----------------------------------------------------------------------
Used: 100/100 tokens (100.0%)
Included: 4/7 layers
Compressed: Recent Conversation
Dropped: Product Documentation, Return Policy, Earlier Conversation

Context preview (first 200 chars):
You are a helpful customer service agent for TechStore.

Core responsibilities:
- Answer product questions accurately
- Help with orders and returns
- Provide technical support
- Maintain professional...


Token Budget: 200
----------------------------------------------------------------------
Used: 185/200 tokens (92.5%)
Included: 5/7 layers
Dropped: Return Policy, Earlier Conversation

Context preview (first 200 chars):
You are a helpful customer service agent for TechStore.

Core responsibilities:
- Answer product questions accurately
- Help with orders and returns
- Provide technical support
- Maintain professional...


Token Budget: 400
--------------------------------------------

Layer-aware assembly demonstrates graceful degradation:
1. Implements priority-ordered assembly that processes critical layers first, ensuring they always consume budget unconditionally.
2. Applies intelligent strategies for each priority tier - critical layers included in full, important layers compressed if needed, supplementary layers opportunistic.
3. Tests multiple budget levels showing how context adapts from minimal (only critical) to generous (all layers included).
4. Tracks compression and dropping decisions providing visibility into which layers were affected by budget constraints.

This ensures agents maintain essential capabilities even under severe token limitations.

## Part 3: Intelligent compression strategies

Simple truncation is crude and lossy. Production systems should employ smarter compression strategies that preserve the most important information while reducing token count. This might involve extractive summarization that selects key sentences, abstractive summarization using language models to paraphrase concisely, or structured reduction that removes examples while keeping core principles.

We will implement a more sophisticated compression approach that analyzes content structure and selectively reduces less critical portions while preserving essential information. For conversation history, this means keeping the most recent exchanges. For documentation, it means extracting key facts and dropping verbose examples. For policies, it means retaining rules while removing explanatory text.

In [6]:
def smart_compress_conversation(content: str, target_length: int) -> str:
    """Intelligently compress conversation history.
    
    Keeps most recent exchanges, drops older ones.
    
    Args:
        content: Conversation history
        target_length: Target character count
        
    Returns:
        Compressed conversation
    """
    # Split content into individual lines
    lines = content.split('\n')
    
    # Check if first line is a header (e.g., "Last 5 exchanges:")
    header = lines[0] if lines[0].endswith(':') else ""
    # Get the actual exchange lines (excluding header)
    exchanges = lines[1:] if header else lines
    
    # Take most recent exchanges until we hit target
    result_lines = [header] if header else []  # Start building result with header if present
    current_length = len(header)
    
    # Work backwards from most recent exchange. This ensures we keep the most recent conversation turns
    for line in reversed(exchanges):
        # Check if adding this line would exceed target
        if current_length + len(line) + 1 <= target_length:
            # Insert at beginning (after header) since we are working backwards
            result_lines.insert(1 if header else 0, line)
            current_length += len(line) + 1
        else:
            # No more space - stop adding older exchanges
            break
    
    return '\n'.join(result_lines)

In this function we keep most recent exchanges and drop older ones. This preserves the most relevant context (recent conversation) while reducing token count by removing older exchanges.

In [7]:
def smart_compress_documentation(content: str, target_length: int) -> str:
    """Intelligently compress technical documentation.
    
    Keeps specs and key points, drops verbose explanations.
    
    Args:
        content: Documentation text
        target_length: Target character count
        
    Returns:
        Compressed documentation
    """
    # Split content into individual lines
    lines = content.split('\n')
    
    # Categorize lines by importance
    priority_lines = []  # High-value technical information
    other_lines = []  # General explanatory text
    
    for line in lines:
        # High priority indicators: specs contain numbers, units, bullets or key-value pairs
        if any(c in line for c in ['-', '•', ':', '°C', 'GB', 'GHz', 'W']):  # Look for: dashes (bullets), colons (key-value), technical units, numbers
            priority_lines.append(line)
        else:
            # Everything else (headers, explanations) is lower priority
            other_lines.append(line)
    
    # Build result with priority content first
    result = '\n'.join(priority_lines)  # Technical specs are most important, so include them first
    
    # Add other content if space permits. This ensures we keep specs even if we have to drop explanations
    for line in other_lines:
        # Check if we have room for this line
        if len(result) + len(line) + 1 <= target_length:
            result += '\n' + line
        else:
            # Out of space - stop adding
            break
    
    return result

In this function, we keep specs and key points and drop verbose explanations. Technical specs (numbers, measurements) are most valuable, while explanatory prose can often be safely removed.

In [8]:
def smart_compress_policy(content: str, target_length: int) -> str:
    """Intelligently compress policy text.
    
    Keeps rules and key terms, drops explanations.
    
    Args:
        content: Policy text
        target_length: Target character count
        
    Returns:
        Compressed policy
    """
    # Split content into lines
    lines = content.split('\n')
    
    # Build result focusing on structured content
    result_lines = []
    current_length = 0

    # Keep title and bullet points
    for line in lines:
        # Prioritize title and bullets
        # Lines with colons (typically "Policy Name:" or "Key: Value")
        # Lines starting with dashes (bullet points with rules)
        if ':' in line or line.strip().startswith('-'):
            # Check if we have room
            if current_length + len(line) + 1 <= target_length:
                result_lines.append(line)
                current_length += len(line) + 1
        # Else: skip explanatory prose that doesn't match the pattern

    # Join preserved lines
    return '\n'.join(result_lines)

In this function, we keep rules and key terms and drop explanations. Policy documents have structure - titles and bullet points contain the essential rules, while surrounding text is often explanatory.

In [9]:
# Enhanced layer class with smart compression
# Extends ContextLayer with content-aware compression that applies different strategies based on the type of content (conversation, documentation, policy, etc.)
@dataclass
class SmartContextLayer(ContextLayer):
    """Context layer with intelligent compression.
    
    Attributes:
        layer_type: Type of content for smart compression ("conversation", "documentation", "policy" or "general")
    """
    
    # Additional attribute beyond base ContextLayer
    layer_type: str = "general"  # conversation, documentation, policy, general. Default to general compression
    
    def smart_compress(self, target_length: int) -> str:
        """Apply content-aware compression.
        
        Args:
            target_length: Target character length
            
        Returns:
            Intelligently compressed content
        """
        # Don't compress if not marked as compressible or already short enough
        if not self.compressible or len(self.content) <= target_length:
            return self.content
        
        # Apply type-specific compression
        if self.layer_type == "conversation":
            # Use conversation-aware compression
            return smart_compress_conversation(self.content, target_length)
        elif self.layer_type == "documentation":
            # Use documentation-aware compression
            return smart_compress_documentation(self.content, target_length)
        elif self.layer_type == "policy":
            # Use policy-aware compression
            return smart_compress_policy(self.content, target_length)
        else:
            # Fallback to simple compression from base class
            return self.compress(target_length)

# Test smart compression on sample layer
conversation_layer = SmartContextLayer(
    priority=LayerPriority.IMPORTANT,
    name="Recent Conversation",
    layer_type="conversation",
    content="""Last 5 exchanges:

Customer: Hi, I need help.
Agent: Hello! How can I assist you?

Customer: My laptop overheats.
Agent: I understand. When did this start?

Customer: About a week ago.
Agent: Has anything changed recently?

Customer: I installed new games.
Agent: That could be related. Let's check some things.

Customer: The fans are very loud too.
Agent: That suggests the cooling system is working hard.""",
    compressible=True
)

print("Smart Compression Demonstration")
print("=" * 70)

print(f"\nOriginal ({len(conversation_layer.content)} chars):")
print(conversation_layer.content)

# Test different compression levels
for ratio in [0.7, 0.5, 0.3]:
    # Calculate target character length
    target = int(len(conversation_layer.content) * ratio)
    # Apply smart compression
    compressed = conversation_layer.smart_compress(target)
    
    print(f"\nCompressed to {ratio*100:.0f}% ({len(compressed)} chars):")
    print(compressed)

Smart Compression Demonstration

Original (407 chars):
Last 5 exchanges:

Customer: Hi, I need help.
Agent: Hello! How can I assist you?

Customer: My laptop overheats.
Agent: I understand. When did this start?

Customer: About a week ago.
Agent: Has anything changed recently?

Customer: I installed new games.
Agent: That could be related. Let's check some things.

Customer: The fans are very loud too.
Agent: That suggests the cooling system is working hard.

Compressed to 70% (269 chars):
Last 5 exchanges:

Customer: About a week ago.
Agent: Has anything changed recently?

Customer: I installed new games.
Agent: That could be related. Let's check some things.

Customer: The fans are very loud too.
Agent: That suggests the cooling system is working hard.

Compressed to 50% (202 chars):
Last 5 exchanges:

Customer: I installed new games.
Agent: That could be related. Let's check some things.

Customer: The fans are very loud too.
Agent: That suggests the cooling system is working hard.


Smart compression preserves essential information:
1. Implements content-aware compression strategies tailored to different information types (conversation, documentation, policy).
2. Prioritizes recent exchanges in conversation history, ensuring the most relevant context is retained even under compression.
3. Extracts key specifications and facts from documentation while dropping verbose explanations and examples.
4. Retains policy rules and structure while removing explanatory prose that adds length without critical information.

These intelligent strategies ensure compression maintains usefulness rather than just reducing size arbitrarily.

## Part 4: Dynamic layer prioritization

While layers have default priorities, context-aware systems should adjust priorities dynamically based on the current task. When handling a refund request, policy layers become more important than product documentation. When troubleshooting technical issues, documentation rises in priority while general conversation history becomes less critical. Dynamic prioritization ensures that layer selection adapts to immediate needs.

We implement this through priority boosting that temporarily elevates specific layers based on query analysis or task classification. The boosting is contextual and reversible, ensuring that priorities remain appropriate for each unique interaction while maintaining the underlying hierarchy.

In [11]:
class TaskType(Enum):
    """Different task types that influence layer priorities."""
    TECHNICAL_SUPPORT = "technical_support"  # Troubleshooting hardware/software
    REFUND_REQUEST = "refund_request"  # Processing returns/refunds
    PRODUCT_INQUIRY = "product_inquiry"  # Questions about product features
    GENERAL_CHAT = "general_chat"  # Generic conversation

# This function creates a NEW list of layers with adjusted priorities, leaving the original layers unchanged
def adjust_layer_priorities(layers: List[ContextLayer],
                           task_type: TaskType) -> List[ContextLayer]:
    """Dynamically adjust layer priorities based on task type.
    
    Args:
        layers: Original context layers
        task_type: Current task being performed
        
    Returns:
        Layers with adjusted priorities (new list, original unchanged)
    """
    adjusted = []

    # Process each layer
    for layer in layers:
        # Copy layer to avoid modifying original. This is important so we can reuse the original layers for different tasks
        new_layer = ContextLayer(
            priority=layer.priority,  # Start with original priority
            name=layer.name,
            content=layer.content,
            compressible=layer.compressible
        )
        
        # Adjust priorities based on task type
        # TECHNICAL SUPPORT: Boost documentation, prioritize conversation
        if task_type == TaskType.TECHNICAL_SUPPORT:
            # Product documentation becomes very important for troubleshooting
            if "Documentation" in layer.name:
                # Elevate from SUPPLEMENTARY to IMPORTANT
                new_layer.priority = LayerPriority.IMPORTANT
            # Policy information is less relevant for technical issues
            elif "Policy" in layer.name:
                # Keep as SUPPLEMENTARY or potentially downgrade
                new_layer.priority = LayerPriority.SUPPLEMENTARY

        # REFUND REQUEST: Boost policy information
        elif task_type == TaskType.REFUND_REQUEST:
            # Policy information (return windows, refund rules) is important
            if "Policy" in layer.name:
                # Elevate from SUPPLEMENTARY to IMPORTANT
                new_layer.priority = LayerPriority.IMPORTANT
            # Technical specs are less relevant for refund discussions
            elif "Documentation" in layer.name:
                # Keep as SUPPLEMENTARY - not critical for refunds
                new_layer.priority = LayerPriority.SUPPLEMENTARY

        # PRODUCT INQUIRY: Boost product documentation
        elif task_type == TaskType.PRODUCT_INQUIRY:
            # Product specs and features are essential for answering questions
            if "Documentation" in layer.name:
                # Elevate from SUPPLEMENTARY to IMPORTANT
                new_layer.priority = LayerPriority.IMPORTANT
            # Older conversation is less relevant for product questions
            elif "Earlier Conversation" in layer.name:
                # Keep as SUPPLEMENTARY - current task is more important
                new_layer.priority = LayerPriority.SUPPLEMENTARY

        # For GENERAL_CHAT or other task types, keep original priorities

        # Add the adjusted layer to our result list
        adjusted.append(new_layer)
    
    return adjusted

# Test dynamic prioritization with different scenarios
print("Dynamic Layer Prioritization")
print("=" * 70)

# Define test scenarios with different task types
test_tasks = [
    (TaskType.TECHNICAL_SUPPORT, "Customer has overheating issue"),
    (TaskType.REFUND_REQUEST, "Customer wants to return product"),
    (TaskType.PRODUCT_INQUIRY, "Customer asking about specifications"),
]

# Use a moderate token budget for all tests to see prioritization effects
token_budget = 200

# Test each scenario
for task_type, description in test_tasks:
    print(f"\nTask: {task_type.value}")
    print(f"Context: {description}")
    print("-" * 70)
    
    # Adjust priorities for this task
    adjusted_layers = adjust_layer_priorities(layers, task_type)
    
    # Assemble context with adjusted priorities
    result = assemble_layered_context(adjusted_layers, token_budget=token_budget)
    
    print(f"Budget: {result['tokens_budget']} tokens")
    print(f"Used: {result['tokens_used']} tokens ({result['budget_utilization']:.1f}%)")
    print(f"Included: {result['layers_included']}/{result['layers_total']} layers")
    
    # Show priority adjustments
    print(f"\nPriority adjustments for this task:")
    for orig, adj in zip(layers, adjusted_layers):
        if orig.priority != adj.priority:
            print(f"  • {adj.name}: {orig.priority.name} → {adj.priority.name}")
    
    if result['dropped_layers']:
        print(f"\nDropped layers: {', '.join(result['dropped_layers'])}")
    
    print()

Dynamic Layer Prioritization

Task: technical_support
Context: Customer has overheating issue
----------------------------------------------------------------------
Budget: 200 tokens
Used: 185 tokens (92.5%)
Included: 5/7 layers

Priority adjustments for this task:
  • Product Documentation: SUPPLEMENTARY → IMPORTANT

Dropped layers: Return Policy, Earlier Conversation


Task: refund_request
Context: Customer wants to return product
----------------------------------------------------------------------
Budget: 200 tokens
Used: 193 tokens (96.5%)
Included: 6/7 layers

Priority adjustments for this task:
  • Return Policy: SUPPLEMENTARY → IMPORTANT

Dropped layers: Earlier Conversation


Task: product_inquiry
Context: Customer asking about specifications
----------------------------------------------------------------------
Budget: 200 tokens
Used: 185 tokens (92.5%)
Included: 5/7 layers

Priority adjustments for this task:
  • Product Documentation: SUPPLEMENTARY → IMPORTANT

Dropped l

Dynamic prioritization adapts context to task requirements:
1. Implements task-aware priority adjustment that boosts layers relevant to the current operation.
2. Demonstrates how technical support tasks elevate documentation priority while refund requests elevate policy information.
3. Shows that under the same token budget, different tasks select different layer combinations based on adjusted priorities.
4. Maintains the critical layer foundation while flexibly reallocating important and supplementary designations.

This ensures context composition matches the specific needs of each interaction.

## Part 5: Production layered context system

Integrating all the techniques we have explored, we can now build a production-ready layered context system. This system should provide a clean interface for context assembly, automatically adjust priorities based on task classification, apply intelligent compression when needed, respect token budgets while maximizing information value, and provide comprehensive metadata for monitoring and debugging.

The production system combines layer definition, priority management, smart compression, token budgeting, and assembly orchestration into a cohesive architecture that agents can use seamlessly. This enables sophisticated context engineering that maintains quality and relevance across varying resource constraints.

In [12]:
class LayeredContextManager:
    """Production layered context management system."""
    
    def __init__(self, compression_ratio: float = 0.5):
        """Initialize context manager.
        
        Args:
            compression_ratio: Default compression ratio for layers (0-1)
        """
        self.compression_ratio = compression_ratio
        self.layers = []  # Storage for all available layers

    # Layers are stored and can be used in future context assembly calls. The same pool of layers can be used for different tasks with different priority adjustments
    def add_layer(self, layer: ContextLayer) -> None:
        """Add a layer to the context.
        
        Args:
            layer: Context layer to add
        """
        self.layers.append(layer)
    
    def build_context(self,
                     token_budget: int,
                     task_type: Optional[TaskType] = None,
                     use_smart_compression: bool = True) -> Dict:
        """Build context from layers with budget constraints.
        
        Args:
            token_budget: Maximum tokens to use
            task_type: Optional task type for priority adjustment
            use_smart_compression: Whether to use intelligent compression
            
        Returns:
            Dict containing context and detailed metadata
        """
        # Apply task-based priority adjustment if a task type is specified
        working_layers = self.layers
        if task_type:
            # Get adjusted priorities for this task
            working_layers = adjust_layer_priorities(self.layers, task_type)
        
         # Sort layers by priority (CRITICAL=0 first, SUPPLEMENTARY=2 last)
        sorted_layers = sorted(working_layers, key=lambda l: l.priority.value)
        
        # Initialize tracking for assembly process
        included_content = []  # Actual text content to include
        tokens_used = 0  # Running total of tokens consumed
        layer_stats = []  # Detailed statistics for each layer

        # Process each layer in priority order
        for layer in sorted_layers:
            # Estimate tokens for this layer
            layer_tokens = layer.estimate_tokens()
            # Calculate remaining budget
            remaining = token_budget - tokens_used
            
            # Initialize statistics tracking for this layer
            layer_stat = {
                'name': layer.name,
                'priority': layer.priority.name,
                'original_tokens': layer_tokens,
                'status': 'pending'  # Will be updated below
            }
            
            # Critical: always include
            if layer.priority == LayerPriority.CRITICAL:
                # Include entire content without checking budget
                included_content.append(layer.content)
                tokens_used += layer_tokens
                # Update status tracking
                layer_stat['status'] = 'included_full'
                layer_stat['tokens_used'] = layer_tokens
            
            # Important/Supplementary: fit or compress
            # Check if layer fits in full within remaining budget
            elif layer_tokens <= remaining:
                # Include the whole layer
                included_content.append(layer.content)
                tokens_used += layer_tokens
                layer_stat['status'] = 'included_full'
                layer_stat['tokens_used'] = layer_tokens

            # Layer doesn't fit - try compression if enabled
            elif layer.compressible and remaining > 0:
                # Calculate target character length for compression
                target_chars = int(len(layer.content) * self.compression_ratio)

                # Apply appropriate compression method
                if use_smart_compression and isinstance(layer, SmartContextLayer):
                    # Use intelligent content-aware compression
                    compressed = layer.smart_compress(target_chars)
                else:
                    # Use simple truncation compression
                    compressed = layer.compress(target_chars)

                # Estimate tokens for compressed version
                compressed_tokens = int(layer_tokens * self.compression_ratio)

                # Check if compressed version fits
                if compressed_tokens <= remaining:
                    # Include compressed version
                    included_content.append(compressed)
                    tokens_used += compressed_tokens
                    layer_stat['status'] = 'included_compressed'
                    layer_stat['tokens_used'] = compressed_tokens
                    layer_stat['compression_ratio'] = compressed_tokens / layer_tokens
                else:
                    # Even compressed, doesn't fit
                    layer_stat['status'] = 'dropped_no_space'
            else:
                # Can't compress or no space left
                layer_stat['status'] = 'dropped_no_space'

            # Add this layer's statistics to our tracking
            layer_stats.append(layer_stat)
        
        # Assemble final context
        context = "\n\n".join(included_content)

        # Return comprehensive results including context and metadata
        return {
            'context': context,
            'tokens_used': tokens_used,  # Actual tokens consumed
            'token_budget': token_budget,
            'budget_utilization': (tokens_used / token_budget * 100),  # Percentage
            'layers_total': len(self.layers),  # Total layers available
            'layers_included': sum(1 for s in layer_stats if 'included' in s['status']),  # Count of included layers
            'layers_compressed': sum(1 for s in layer_stats if 'compressed' in s['status']),  # Count of compressed layers
            'layers_dropped': sum(1 for s in layer_stats if 'dropped' in s['status']),  # Count of dropped layers
            'layer_details': layer_stats,  # Detailed per-layer statistics
            'task_type': task_type.value if task_type else None  # Task type used
        }

# Create and test production system. Initialize manager with 50% compression ratio
manager = LayeredContextManager(compression_ratio=0.5)

# Add all sample layers to the manager
for layer in layers:
    manager.add_layer(layer)

print("Production Layered Context System")
print("=" * 70)

# Define test scenarios covering different budgets and task types
scenarios = [
    {
        'name': 'Tight budget, technical support',
        'token_budget': 150,
        'task_type': TaskType.TECHNICAL_SUPPORT
    },
    {
        'name': 'Moderate budget, refund request',
        'token_budget': 250,
        'task_type': TaskType.REFUND_REQUEST
    },
    {
        'name': 'Generous budget, general chat',
        'token_budget': 400,
        'task_type': TaskType.GENERAL_CHAT
    },
]

# Run each test scenario
for scenario in scenarios:
    name = scenario.pop('name')  # Extract scenario name for display
    result = manager.build_context(**scenario)  # Build context for this scenario
    
    print(f"\nScenario: {name}")
    print("-" * 70)
    print(f"Task: {result['task_type']}")
    print(f"Budget: {result['token_budget']} tokens")
    print(f"Used: {result['tokens_used']} tokens ({result['budget_utilization']:.1f}%)")

    # Show summary statistics
    print(f"\nLayers: {result['layers_included']} included, "
          f"{result['layers_compressed']} compressed, "
          f"{result['layers_dropped']} dropped")

    # Show detailed breakdown for each layer
    print(f"\nLayer breakdown:")
    for detail in result['layer_details']:
        status_icon = "✓" if "included" in detail['status'] else "✗"  # Use checkmark for included, X for dropped
        compression = f" ({detail.get('compression_ratio', 0)*100:.0f}%)" if 'compressed' in detail['status'] else ""
        print(f"  {status_icon} {detail['name']}: {detail['status']}{compression}")
    
    print()

Production Layered Context System

Scenario: Tight budget, technical support
----------------------------------------------------------------------
Task: technical_support
Budget: 150 tokens
Used: 144 tokens (96.0%)

Layers: 5 included, 1 compressed, 2 dropped

Layer breakdown:
  ✓ System Instructions: included_full
  ✓ Current Task: included_full
  ✓ Recent Conversation: included_full
  ✓ Key Preferences: included_full
  ✗ Product Documentation: dropped_no_space
  ✓ Return Policy: included_compressed (50%)
  ✗ Earlier Conversation: dropped_no_space


Scenario: Moderate budget, refund request
----------------------------------------------------------------------
Task: refund_request
Budget: 250 tokens
Used: 239 tokens (95.6%)

Layers: 7 included, 1 compressed, 0 dropped

Layer breakdown:
  ✓ System Instructions: included_full
  ✓ Current Task: included_full
  ✓ Recent Conversation: included_full
  ✓ Key Preferences: included_full
  ✓ Return Policy: included_full
  ✓ Product Documentati

The production system demonstrates complete layered context management:
1. Implements a unified manager class that orchestrates layer addition, priority adjustment, compression and assembly.
2. Supports multiple assembly strategies including task-aware priority boosting and intelligent compression selection.
3. Provides detailed metadata about every layer's treatment (included full, compressed, or dropped) enabling observability and debugging.
4. Tests realistic scenarios showing how the same layer collection adapts to different token budgets and task types.
5. Demonstrates graceful degradation where critical information is always preserved while supplementary content scales with available resources.

This architecture supports sophisticated production agents with complex context requirements.