# Salience Ordering

Language models do not weigh all parts of their input context equally. Research on attention mechanisms reveals that models exhibit both primacy effects, where information near the beginning of the prompt receives stronger weighting, and recency effects, where content near the end is more heavily considered. Information buried in the middle of long prompts often receives diminished attention, leading to degraded performance when critical instructions or facts are positioned in these attention valleys. This uneven weighting becomes particularly problematic in production systems with complex prompts containing hundreds or thousands of tokens.

Salience ordering addresses this challenge by strategically positioning content based on its importance and the model's attention patterns. Rather than organizing context chronologically or haphazardly, this technique places the most critical instructions and information in attention zones where models naturally focus: early in the prompt and near the end. Supporting details, background context, and less critical information occupy the middle regions where reduced attention has minimal impact. This deliberate positioning ensures that essential guidance reaches the model with maximum effectiveness.

This notebook demonstrates how to implement salience-based context ordering from basic approaches to production-ready systems. We will explore attention pattern analysis, simple critical-first reordering, multi-zone optimization that leverages both primacy and recency effects, dynamic salience scoring for automatic prioritization, and complete salience managers with configurable attention strategies. The techniques shown here are essential for building robust prompts where critical information consistently influences model behavior regardless of total context length.

#### Primacy and recency effects
The primacy effect means information presented early in the context receives stronger weighting in the model's processing, while the recency effect means information presented late is more readily accessed during generation. Information in the middle of long contexts suffers from both diminished attention and potential interference from surrounding content. This creates a U-shaped attention curve where the beginning and end are privileged positions.

In [1]:
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from pydantic import BaseModel, Field
from typing import List, Optional, Dict, Any, Tuple
from enum import Enum
import os

### Initialize the language model for testing salience ordering

In [2]:
# Using gpt-4o-mini for cost-effective experimentation
llm = ChatOpenAI(model="gpt-4o-mini", api_key=os.getenv("OPENAI_API_KEY", "").strip(), temperature=0)

## Simple salience-based reordering
The most straightforward application of salience ordering is to manually tag context items by importance and place critical items at the beginning where primacy effects are strongest. This simple approach requires explicit salience annotations but provides immediate improvements for prompts where certain instructions or facts are clearly more important than others. By ensuring critical content appears early, we maximize the probability that the model will attend to and follow these essential elements.

We will implement a basic reordering system that accepts context items with importance levels and automatically sorts them to place high-salience items first, medium-salience items in the middle, and low-salience supporting content last. This creates a simple but effective attention-aware prompt structure where the most important guidance consistently receives strong model attention.

In [3]:
class SalienceLevel(Enum):
    """Enumeration of importance levels for context items."""
    CRITICAL = 3  # Must be followed, placed in primacy position
    IMPORTANT = 2  # Should influence output, placed early-middle
    SUPPORTING = 1  # Background context, can be placed in middle

class ContextItem(BaseModel):
    """A single item of context with salience annotation."""
    content: str = Field(description="The actual content text")
    salience: SalienceLevel = Field(description="Importance level")
    category: Optional[str] = Field(default=None, description="Category tag")

def reorder_by_salience(items: List[ContextItem]) -> List[ContextItem]:
    """
    Reorder context items placing critical items first.
    
    Args:
        items: List of context items with salience annotations
        
    Returns:
        Reordered list with critical items at the start
    """
    # Sort by salience level in descending order (CRITICAL -> IMPORTANT -> SUPPORTING)
    # The lambda extracts the numeric value: CRITICAL=3, IMPORTANT=2, SUPPORTING=1
    # reverse=True ensures highest values (most critical) appear first
    sorted_items = sorted(items, key=lambda x: x.salience.value, reverse=True)
    
    return sorted_items

def build_prompt_from_items(items: List[ContextItem], query: str) -> str:
    """
    Build a complete prompt from ordered context items.
    
    Args:
        items: Ordered context items
        query: The actual query or task
        
    Returns:
        Complete prompt string
    """
    # Extract just the content text from each ContextItem object - This strips away the salience metadata for the final prompt
    context_parts = [item.content for item in items]
    
    # Join all context items with double newlines for clear separation - This helps the model parse individual pieces of guidance
    context_section = "\n\n".join(context_parts)
    
    # Build final prompt with context followed by query - The query appears last so it's in the recency position
    prompt = f"{context_section}\n\n{query}"
    
    return prompt

# Example: Create context items with different salience levels
# In production, these would come from various sources (RAG, tools, memory, etc.)
context_items = [
    ContextItem(
        content="Background: The company was founded in 1995 and has grown steadily.",
        salience=SalienceLevel.SUPPORTING,  # Background info - can go in middle
        category="background"
    ),
    ContextItem(
        content="CRITICAL CONSTRAINT: All financial figures must be in USD with 2 decimal places.",
        salience=SalienceLevel.CRITICAL,  # Hard constraint - must be followed
        category="constraint"
    ),
    ContextItem(
        content="Important: The analysis should focus on the last fiscal year (2023).",
        salience=SalienceLevel.IMPORTANT,  # Scoping guidance - should influence output
        category="scope"
    ),
    ContextItem(
        content="Context: The industry has seen significant technological disruption.",
        salience=SalienceLevel.SUPPORTING,  # Industry context - helpful but not critical
        category="background"
    ),
    ContextItem(
        content="CRITICAL REQUIREMENT: Do not include speculative predictions.",
        salience=SalienceLevel.CRITICAL,  # Hard constraint - must be followed
        category="constraint"
    ),
    ContextItem(
        content="Important: Compare year-over-year performance where possible.",
        salience=SalienceLevel.IMPORTANT,  # Methodology guidance - should influence output
        category="guidance"
    )
]

# Reorder by salience to prioritize critical items
ordered_items = reorder_by_salience(context_items)

# Display the reordering to visualize the transformation
print("Original Order:")
print("=" * 80)
for i, item in enumerate(context_items, 1):
    print(f"{i}. [{item.salience.name}] {item.content[:60]}...")

print("\nSalience-Ordered (Critical First):")
print("=" * 80)
for i, item in enumerate(ordered_items, 1):
    print(f"{i}. [{item.salience.name}] {item.content[:60]}...")

# Build and show the final prompt
query = "Task: Analyze the company's financial performance and provide key insights."
final_prompt = build_prompt_from_items(ordered_items, query)

print("\nFinal Prompt Structure:")
print("=" * 80)
print(final_prompt[:500] + "\n...")

Original Order:
1. [SUPPORTING] Background: The company was founded in 1995 and has grown st...
2. [CRITICAL] CRITICAL CONSTRAINT: All financial figures must be in USD wi...
3. [IMPORTANT] Important: The analysis should focus on the last fiscal year...
4. [SUPPORTING] Context: The industry has seen significant technological dis...
5. [CRITICAL] CRITICAL REQUIREMENT: Do not include speculative predictions...
6. [IMPORTANT] Important: Compare year-over-year performance where possible...

Salience-Ordered (Critical First):
1. [CRITICAL] CRITICAL CONSTRAINT: All financial figures must be in USD wi...
2. [CRITICAL] CRITICAL REQUIREMENT: Do not include speculative predictions...
3. [IMPORTANT] Important: The analysis should focus on the last fiscal year...
4. [IMPORTANT] Important: Compare year-over-year performance where possible...
5. [SUPPORTING] Background: The company was founded in 1995 and has grown st...
6. [SUPPORTING] Context: The industry has seen significant technological dis...


The salience-based reordering uses an enumeration to define three importance levels with numeric values that enable simple sorting.
- The `ContextItem` model combines content with salience metadata, allowing each piece of context to be annotated with its importance.
- The `reorder_by_salience` function uses Python's sorted with a key function that extracts the salience value, placing items with higher values first. This creates a prompt where critical constraints and requirements appear in the primacy position, important scope and guidance items appear early-middle, and supporting background context appears later.
- In the example, two critical constraints move from positions 2 and 5 to positions 1 and 2, ensuring they receive maximum model attention. This simple reordering can improve constraint adherence compared to random or chronological ordering, particularly in prompts with multiple competing instructions.

## Attention zone optimization
Simple primacy-based ordering only leverages the beginning of the prompt, leaving the recency effect unutilized. A more sophisticated approach divides the prompt into three attention zones: the opening primacy zone, the middle supporting zone, and the closing recency zone. By placing critical information in both the opening and closing positions, we take advantage of the full U-shaped attention curve. This dual-positioning strategy ensures that essential instructions bracket the prompt, receiving attention from both early and late processing stages.

We will implement a three-zone placement system that allocates the most critical items to both start and end positions, places important items in the early-middle region, and reserves the true middle for supporting content. This creates a sandwich structure where critical guidance surrounds less important context, maximizing the probability that essential instructions influence the model's behavior.

In [4]:
class AttentionZone(Enum):
    """Zones in the prompt with different attention characteristics."""
    PRIMACY = "primacy"  # Start of prompt, strong attention
    RECENCY = "recency"  # End of prompt, strong attention
    MIDDLE = "middle"    # Middle section, weaker attention

def optimize_attention_zones(
    items: List[ContextItem],
    primacy_ratio: float = 0.5,
    recency_ratio: float = 0.5
) -> List[ContextItem]:
    """
    Distribute items across attention zones for optimal placement.
    Places critical items in primacy and recency zones.
    
    Args:
        items: Context items to distribute
        primacy_ratio: Proportion of critical items for start (0.0 to 1.0)
        recency_ratio: Proportion of critical items for end (0.0 to 1.0)
        
    Returns:
        Optimally ordered items leveraging both primacy and recency effects
    """
    # Separate items by salience level - This allows us to treat each importance tier differently
    critical_items = [item for item in items if item.salience == SalienceLevel.CRITICAL]
    important_items = [item for item in items if item.salience == SalienceLevel.IMPORTANT]
    supporting_items = [item for item in items if item.salience == SalienceLevel.SUPPORTING]
    
    # Split critical items between primacy and recency zones
    # Using the primacy_ratio to determine how many go to the start
    critical_count = len(critical_items)
    if critical_count == 0:
        # No critical items, so we will work with what we have
        primacy_count = 0
    elif critical_count == 1:
        # Single critical item goes to primacy (strongest position)
        primacy_count = 1
    else:
        # Multiple critical items - split according to ratio
        # At least 1 item goes to primacy, remainder goes to recency
        primacy_count = max(1, int(critical_count * primacy_ratio))
    
    # Divide critical items into two groups
    primacy_critical = critical_items[:primacy_count]
    recency_critical = critical_items[primacy_count:]
    
    # Split important items between early-middle and late-middle positions
    # Important items get moderate attention, so they surround the middle zone
    important_count = len(important_items)
    important_split_point = important_count // 2  # Split roughly in half
    
    important_for_primacy = important_items[:important_split_point]
    important_for_recency = important_items[important_split_point:]
    
    # Build the three-zone structure:
    # Zone 1 (PRIMACY): Critical items first, then some important items
    #   - Critical items get the absolute strongest attention (positions 1-N)
    #   - Important items fill out the primacy zone (positions N+1 to ~30% of prompt)
    primacy_zone = primacy_critical + important_for_primacy
    
    # Zone 2 (MIDDLE): All supporting content
    #   - Supporting items experience reduced attention
    #   - This is acceptable because they provide context, not constraints
    middle_zone = supporting_items
    
    # Zone 3 (RECENCY): Remaining important items, then critical items at the very end
    #   - Important items get moderate-strong attention as model nears the task
    #   - Critical items at the very end benefit from recency effect
    recency_zone = important_for_recency + recency_critical
    
    # Combine zones in order: primacy -> middle -> recency
    # This creates the "sandwich" structure where critical info brackets the prompt
    optimized_order = primacy_zone + middle_zone + recency_zone
    
    return optimized_order

def visualize_attention_zones(items: List[ContextItem]) -> None:
    """
    Visualize how items are distributed across attention zones.
    
    Args:
        items: Ordered context items
    """
    total = len(items)
    
    # Estimate zone boundaries (approximate heuristic)
    # Primacy zone: roughly first 30% or first 3 items, whichever is smaller
    # Recency zone: roughly last 30% or last 3 items, whichever is smaller
    # Middle zone: everything in between
    primacy_end = min(3, total // 3)
    recency_start = max(primacy_end + 1, total - min(3, total // 3))
    
    print("Attention Zone Distribution:")
    print("=" * 80)
    
    for i, item in enumerate(items, 1):
        # Determine which zone this item falls into
        if i <= primacy_end:
            zone = "PRIMACY (High Attention)"
            marker = ">>>"  # Right arrows indicate strong forward attention
        elif i >= recency_start:
            zone = "RECENCY (High Attention)"
            marker = "<<<"  # Left arrows indicate recent memory
        else:
            zone = "MIDDLE (Lower Attention)"
            marker = "---"  # Dashes indicate attention valley
        
        # Display item with zone information
        print(f"{marker} Position {i}/{total} [{zone}]")
        print(f"    [{item.salience.name}] {item.content[:65]}...")
        print()

# Example: Optimize the same context items using attention zones - This demonstrates the difference between simple ordering and zone optimization
zone_optimized = optimize_attention_zones(context_items, primacy_ratio=0.5, recency_ratio=0.5)

print("Simple Salience Ordering (Critical First Only):")
print("=" * 80)
for i, item in enumerate(ordered_items[:5], 1):
    print(f"{i}. [{item.salience.name}] {item.content[:60]}...")

print("\n" + "="*80)
print("\nAttention Zone Optimization (Critical at Start AND End):")
print("=" * 80)
visualize_attention_zones(zone_optimized)

Simple Salience Ordering (Critical First Only):
1. [CRITICAL] CRITICAL CONSTRAINT: All financial figures must be in USD wi...
2. [CRITICAL] CRITICAL REQUIREMENT: Do not include speculative predictions...
3. [IMPORTANT] Important: The analysis should focus on the last fiscal year...
4. [IMPORTANT] Important: Compare year-over-year performance where possible...
5. [SUPPORTING] Background: The company was founded in 1995 and has grown st...


Attention Zone Optimization (Critical at Start AND End):
Attention Zone Distribution:
>>> Position 1/6 [PRIMACY (High Attention)]
    [CRITICAL] CRITICAL CONSTRAINT: All financial figures must be in USD with 2 ...

>>> Position 2/6 [PRIMACY (High Attention)]
    [IMPORTANT] Important: The analysis should focus on the last fiscal year (202...

--- Position 3/6 [MIDDLE (Lower Attention)]
    [SUPPORTING] Background: The company was founded in 1995 and has grown steadil...

<<< Position 4/6 [RECENCY (High Attention)]
    [SUPPORTING] Context: The indust

The attention zone optimization divides the prompt into three regions based on attention research.
1. Critical items are split between primacy and recency zones using configurable ratios, ensuring that essential constraints appear in both high-attention positions.
2. Important items are distributed between early-middle and late-middle positions where they still receive moderate attention.
3. Supporting content occupies the true middle where reduced attention has minimal impact.

- The algorithm uses list slicing and concatenation to construct the final ordering, creating a structure where critical information brackets the prompt. 
- The visualization function estimates zone boundaries and displays which items fall into each attention region.
- In the example, critical constraints appear at both position 1 (primacy) and position 6 (recency), ensuring they influence the model's processing at multiple stages. This sandwich structure typically improves constraint adherence beyond simple primacy-only ordering, particularly for complex prompts where multiple critical instructions must all be followed.

## Dynamic salience scoring
Manual salience annotation becomes impractical for large-scale systems where context items are generated dynamically or come from multiple sources. A more scalable approach uses heuristic rules or lightweight models to automatically compute salience scores based on content characteristics. Items containing constraint keywords, formatting requirements, or explicit instructions receive higher scores, while background information and examples receive lower scores. This automatic scoring enables salience ordering without requiring manual annotation for every context item.

We will implement a dynamic salience scorer that analyzes content to identify high-salience indicators such as constraint language, requirement keywords, critical markers, and formatting specifications. The scorer assigns numeric salience values that can be used for automatic reordering, making salience optimization practical for production systems that handle diverse and dynamically generated context.

In [5]:
class SalienceScorer:
    """
    Automatically compute salience scores based on content analysis. Uses heuristic rules to identify high-importance content.
    """
    
    def __init__(self):
        """Initialize with keyword sets for different importance indicators."""
        # Keywords that indicate critical constraints or requirements - These typically represent hard rules that must be followed
        self.critical_keywords = [
            'must', 'required', 'critical', 'never', 'always', 'do not',
            'constraint', 'requirement', 'mandatory', 'essential', 'forbidden'
        ]
        
        # Keywords that indicate important guidance - These typically represent soft guidance that should influence behavior
        self.important_keywords = [
            'should', 'important', 'prefer', 'recommend', 'focus on',
            'consider', 'ensure', 'verify', 'primary', 'key'
        ]
        
        # Keywords that indicate supporting context - These typically represent background information or examples
        self.supporting_keywords = [
            'background', 'context', 'note', 'example', 'for instance',
            'historically', 'typically', 'generally'
        ]
    
    def compute_score(self, content: str) -> float:
        """
        Compute a numeric salience score for content.
        
        Args:
            content: Text to score
            
        Returns:
            Salience score (0.0 to 10.0, where 10.0 is maximum salience)
        """
        # Convert to lowercase for case-insensitive matching
        content_lower = content.lower()
        
        # Start with a neutral base score
        score = 5.0
        
        # Boost score for critical indicators - Each critical keyword adds 2.0 points (strong signal)
        for keyword in self.critical_keywords:
            if keyword in content_lower:
                score += 2.0  # Strong boost for critical markers
        
        # Moderate boost for important indicators - Each important keyword adds 1.0 point (moderate signal)
        for keyword in self.important_keywords:
            if keyword in content_lower:
                score += 1.0  # Moderate boost for important markers
        
        # Reduce score for supporting indicators - Each supporting keyword subtracts 1.0 point (weak signal)
        for keyword in self.supporting_keywords:
            if keyword in content_lower:
                score -= 1.0  # Reduction for background content
        
        # Additional heuristics for emphasis patterns:
        
        # Heuristic 1: ALL CAPS words often indicate emphasis or importance
        # Count words that are fully capitalized (excluding short words like "A", "I")
        caps_words = [word for word in content.split() if word.isupper() and len(word) > 2]
        if caps_words:
            # Each ALL CAPS word adds 0.5 points
            score += len(caps_words) * 0.5
        
        # Heuristic 2: Explicit markers like "CRITICAL:" or "REQUIRED:" are strong signals - These often appear at the start of critical instructions
        explicit_markers = ['CRITICAL:', 'REQUIRED:', 'MUST:', 'FORBIDDEN:', 'NEVER:']
        for marker in explicit_markers:
            if marker in content.upper():
                score += 3.0  # Very strong signal
        
        # Heuristic 3: Negation patterns ("do not", "never") often indicate constraints - These are critical because they define what NOT to do
        negation_patterns = ['do not', 'does not', 'never', 'cannot', 'must not']
        for pattern in negation_patterns:
            if pattern in content_lower:
                score += 1.5  # Strong signal for constraints
        
        # Clamp score to valid range [0, 10] - This ensures scores don't become negative or exceed maximum
        score = max(0.0, min(10.0, score))
        
        return score
    
    def categorize_by_score(self, score: float) -> SalienceLevel:
        """
        Convert numeric score to categorical salience level.
        
        Args:
            score: Numeric salience score (0.0 to 10.0)
            
        Returns:
            Categorical salience level (CRITICAL, IMPORTANT, or SUPPORTING)
        """
        # Map score ranges to categories using thresholds
        # High scores (7.5+) -> CRITICAL: Hard constraints, must be followed
        # Medium scores (5.0-7.5) -> IMPORTANT: Guidance that should influence behavior
        # Low scores (<5.0) -> SUPPORTING: Background context
        if score >= 7.5:
            return SalienceLevel.CRITICAL
        elif score >= 5.0:
            return SalienceLevel.IMPORTANT
        else:
            return SalienceLevel.SUPPORTING

def auto_score_and_reorder(content_strings: List[str]) -> List[Tuple[str, float, SalienceLevel]]:
    """
    Automatically score and reorder content by salience.
    
    Args:
        content_strings: Raw content items without manual annotations
        
    Returns:
        List of tuples (content, score, level) in descending salience order
    """
    # Initialize the scorer
    scorer = SalienceScorer()
    
    # Score each item
    scored_items = []
    for content in content_strings:
        # Compute numeric score based on content analysis
        score = scorer.compute_score(content)
        
        # Convert score to categorical level
        level = scorer.categorize_by_score(score)
        
        # Store as tuple for sorting
        scored_items.append((content, score, level))
    
    # Sort by score descending (highest salience first) - This places critical items before important items before supporting items
    scored_items.sort(key=lambda x: x[1], reverse=True)
    
    return scored_items

# Example: Automatically score and reorder mixed content - This simulates receiving unstructured content from various sources
mixed_content = [
    "The company was founded in 1995 and has grown steadily over the years.",
    "CRITICAL: All financial figures MUST be presented in USD with exactly 2 decimal places.",
    "For context, the industry has experienced significant technological disruption.",
    "Important: You should focus your analysis primarily on fiscal year 2023 data.",
    "REQUIRED: Do not include any speculative predictions or forward-looking statements.",
    "Note that historical trends may not be reliable predictors of future performance.",
    "You must verify all data points against the source financial statements.",
    "Background: The company operates in three main market segments."
]

# Auto-score and reorder
scored_and_ordered = auto_score_and_reorder(mixed_content)

print("Automatic Salience Scoring and Reordering:")
print("=" * 80)
print("\nOriginal Order:")
for i, content in enumerate(mixed_content, 1):
    print(f"{i}. {content[:70]}...")

print("\n" + "="*80)
print("\nAuto-Scored and Reordered (by salience):")
for i, (content, score, level) in enumerate(scored_and_ordered, 1):
    print(f"\n{i}. Score: {score:.1f} | Level: {level.name}")
    print(f"   {content}")

Automatic Salience Scoring and Reordering:

Original Order:
1. The company was founded in 1995 and has grown steadily over the years....
2. CRITICAL: All financial figures MUST be presented in USD with exactly ...
3. For context, the industry has experienced significant technological di...
4. Important: You should focus your analysis primarily on fiscal year 202...
5. REQUIRED: Do not include any speculative predictions or forward-lookin...
6. Note that historical trends may not be reliable predictors of future p...
7. You must verify all data points against the source financial statement...
8. Background: The company operates in three main market segments....


Auto-Scored and Reordered (by salience):

1. Score: 10.0 | Level: CRITICAL
   CRITICAL: All financial figures MUST be presented in USD with exactly 2 decimal places.

2. Score: 10.0 | Level: CRITICAL
   REQUIRED: Do not include any speculative predictions or forward-looking statements.

3. Score: 8.0 | Level: CRITICAL
   You mu

The `SalienceScorer` implements a multi-heuristic scoring system that analyzes content for importance indicators. It maintains sets of keywords for critical, important, and supporting content, applying different score adjustments based on which keywords are present.
- The base score starts at 5.0 and is adjusted upward for critical markers (must, required, forbidden) and downward for supporting markers (background, context, example).
- Additional heuristics detect emphasis patterns like ALL CAPS words, explicit markers like "CRITICAL:", and negation patterns like "do not" which often indicate constraints.
- The scorer combines multiple signals to produce robust salience estimates.
- The `categorize_by_score` method maps numeric scores to categorical levels using thresholds.
- In the example, items with "CRITICAL" and "MUST" receive scores of 8-10 and are placed first, while background context receives scores of 4-5 and is placed last. This automatic scoring eliminates manual annotation overhead while achieving comparable ordering quality, making salience optimization practical for production systems processing thousands of context items.

## Production salience manager - Part 1: Core structure
For production systems that handle dynamically generated content from multiple sources like RAG retrievals, tool outputs and memory systems, we need a unified manager that handles the entire salience optimization workflow. Rather than manually scoring and ordering content each time, a production manager provides a clean interface where we simply add content items and the system automatically handles scoring, zone distribution and prompt assembly.

We will build this system incrementally, starting with the core data structure and item management. The manager maintains a collection of context items and automatically scores them if no manual salience is provided. It also tracks metrics about salience distribution, which becomes valuable for debugging and optimization in production systems where understanding why certain content is prioritized helps tune the overall system behavior.

In [6]:
class ProductionSalienceManager:
    """
    Production-ready system for attention-optimized prompt construction.
    Handles automatic scoring, ordering, and zone distribution.
    
    Part 1: Core initialization and item management
    """
    
    def __init__(
        self,
        primacy_zone_size: int = 3,
        recency_zone_size: int = 3,
        auto_score: bool = True,
        zone_markers: bool = True
    ):
        """
        Initialize production salience manager.
        
        Args:
            primacy_zone_size: Number of items in opening high-attention zone
            recency_zone_size: Number of items in closing high-attention zone
            auto_score: Whether to automatically score items without manual salience
            zone_markers: Whether to include zone marker comments in output prompts
        """
        # Configuration for zone sizes - These define how many items can fit in high-attention positions
        self.primacy_zone_size = primacy_zone_size
        self.recency_zone_size = recency_zone_size
        
        # Configuration for behavior
        self.auto_score = auto_score  # Automatic scoring vs manual annotation
        self.zone_markers = zone_markers  # Include visual zone separators in prompts
        
        # Initialize automatic scorer for when auto_score is enabled - This scorer will analyze content to determine salience
        self.scorer = SalienceScorer()
        
        # Storage for context items - As items are added, they accumulate here
        self.items: List[ContextItem] = []
        
        # Metrics tracking for debugging and optimization - Helps understand the distribution of salience levels
        self.total_items_added = 0
        self.salience_distribution: Dict[str, int] = {
            'CRITICAL': 0,
            'IMPORTANT': 0,
            'SUPPORTING': 0
        }
    
    def add_item(
        self,
        content: str,
        salience: Optional[SalienceLevel] = None,
        category: Optional[str] = None
    ) -> None:
        """
        Add a context item with optional manual salience override.
        
        Args:
            content: The content text
            salience: Optional manual salience level (overrides auto-scoring)
            category: Optional category tag for organization
        """
        # Determine salience level for this item
        if salience is None and self.auto_score:
            # No manual salience provided and auto-scoring is enabled
            # Use the automatic scorer to analyze content
            score = self.scorer.compute_score(content)
            salience = self.scorer.categorize_by_score(score)
        elif salience is None:
            # No manual salience and auto-scoring is disabled
            # Default to SUPPORTING to avoid errors
            salience = SalienceLevel.SUPPORTING
        # If salience was provided manually, use it as-is
        
        # Create context item with determined salience
        item = ContextItem(content=content, salience=salience, category=category)
        
        # Add to storage
        self.items.append(item)
        
        # Update metrics for tracking
        self.total_items_added += 1
        self.salience_distribution[salience.name] += 1
    
    def get_stats(self) -> Dict[str, Any]:
        """
        Get statistics about current state of the manager.
        
        Returns:
            Dictionary with metrics about items and their distribution
        """
        return {
            'total_items': len(self.items),
            'critical': self.salience_distribution['CRITICAL'],
            'important': self.salience_distribution['IMPORTANT'],
            'supporting': self.salience_distribution['SUPPORTING'],
            'auto_score_enabled': self.auto_score,
            'primacy_zone_size': self.primacy_zone_size,
            'recency_zone_size': self.recency_zone_size
        }

# Example: Initialize a manager and add items
manager = ProductionSalienceManager(
    primacy_zone_size=3,  # Top 3 positions are high-attention primacy zone
    recency_zone_size=2,  # Bottom 2 positions are high-attention recency zone
    auto_score=True,      # Automatically score items based on content
    zone_markers=True     # Include visual separators in final prompts
)

# Add various content items (salience automatically determined)
test_content = [
    "CRITICAL: All monetary amounts MUST be in USD with exactly 2 decimal places.",
    "The company was founded in 1995 and has grown to 500 employees.",
    "Important: Focus your analysis on fiscal year 2023 performance."
]

for content in test_content:
    manager.add_item(content)

# Check statistics
stats = manager.get_stats()
print("Manager Statistics After Adding 3 Items:")
print("=" * 80)
for key, value in stats.items():
    print(f"  {key.replace('_', ' ').title()}: {value}")
print("\nItems are being automatically scored and tracked.")

Manager Statistics After Adding 3 Items:
  Total Items: 3
  Critical: 1
  Important: 2
  Supporting: 0
  Auto Score Enabled: True
  Primacy Zone Size: 3
  Recency Zone Size: 2

Items are being automatically scored and tracked.


The manager's initialization sets up configurable zone sizes and behavior flags that control how context is processed.
- The `add_item` method provides flexibility by accepting both manual and automatic salience annotation, making it suitable for hybrid systems where some content has known importance while other content needs analysis.
- When auto-scoring is enabled, the scorer analyzes content patterns to determine salience, falling back to a default SUPPORTING level when auto-scoring is disabled to prevent errors.
- The statistics tracking provides visibility into salience distribution, which helps identify potential issues like having too many critical items that can not all fit in high-attention zones, or having insufficient important content to fill the primacy zone effectively.

## Production salience manager - Part 2: Zone distribution
With items collected and scored, the next step is distributing them across attention zones to leverage both primacy and recency effects. The zone distribution algorithm must handle various edge cases that occur in production, such as having more critical items than available zone slots, having fewer items than zone capacity, or having an imbalanced distribution across salience levels. The algorithm prioritizes fitting critical items into both primacy and recency zones, then fills remaining space with important items, and finally places supporting content in the middle where reduced attention is acceptable.

In [7]:
# Add the zone distribution method to ProductionSalienceManager - In practice, this would be part of the class definition above

def _distribute_items_to_zones(self) -> Dict[str, List[ContextItem]]:
    """
    Distribute items across attention zones based on salience.
    This is a method of ProductionSalienceManager.
    
    Returns:
        Dictionary mapping zone names ('primacy', 'middle', 'recency') to item lists
    """
    # Separate items by salience level for zone-specific handling - Each salience tier has different placement priorities
    critical = [item for item in self.items if item.salience == SalienceLevel.CRITICAL]
    important = [item for item in self.items if item.salience == SalienceLevel.IMPORTANT]
    supporting = [item for item in self.items if item.salience == SalienceLevel.SUPPORTING]
    
    # Split critical items between primacy and recency zones
    # Goal: Place some critical items at start, some at end for dual reinforcement
    primacy_critical_count = min(len(critical), self.primacy_zone_size)
    primacy_critical = critical[:primacy_critical_count]
    recency_critical = critical[primacy_critical_count:]
    
    # Build primacy zone: critical items + important items to fill capacity
    # Fill primacy zone to capacity, prioritizing critical items first
    primacy_zone = primacy_critical[:]
    remaining_primacy_slots = self.primacy_zone_size - len(primacy_zone)
    if remaining_primacy_slots > 0 and important:
        # Add important items to fill remaining primacy slots
        primacy_zone.extend(important[:remaining_primacy_slots])
        # Remove used important items from the pool
        important = important[remaining_primacy_slots:]
    
    # Build recency zone: important items + remaining critical items at the very end
    # Strategy: Important items lead into the zone, then critical items for final reinforcement
    recency_zone = []
    # Calculate how many important items can fit before critical items
    recency_important_count = min(len(important), self.recency_zone_size - len(recency_critical))
    if recency_important_count > 0:
        # Take important items from the end of the list for recency zone
        recency_zone.extend(important[-recency_important_count:])
        # Remove used important items from the pool
        important = important[:-recency_important_count] if recency_important_count < len(important) else []
    # Add critical items at the very end for maximum recency effect
    recency_zone.extend(recency_critical)
    
    # Middle zone: all remaining important and supporting items
    # These items don't need high-attention positions
    middle_zone = important + supporting
    
    return {
        'primacy': primacy_zone,
        'middle': middle_zone,
        'recency': recency_zone
    }

# Attach the method to the class
ProductionSalienceManager._distribute_items_to_zones = _distribute_items_to_zones

# Test zone distribution with the manager we created earlier
print("Zone Distribution Test:")
print("=" * 80)

# Add more items to demonstrate zone distribution
additional_content = [
    "REQUIRED: Do not include any speculative predictions or forward-looking statements.",
    "The industry has seen 15% annual growth over the past decade.",
    "You must verify all data points against the source financial statements.",
    "For context, the economic environment included high inflation in 2023.",
    "Important: Compare performance against the top 3 industry competitors.",
]

for content in additional_content:
    manager.add_item(content)

# Get zone distribution
zones = manager._distribute_items_to_zones()

print(f"\nPRIMACY ZONE ({len(zones['primacy'])} items):")
for i, item in enumerate(zones['primacy'], 1):
    print(f"  {i}. [{item.salience.name}] {item.content[:60]}...")

print(f"\nMIDDLE ZONE ({len(zones['middle'])} items):")
for i, item in enumerate(zones['middle'], 1):
    print(f"  {i}. [{item.salience.name}] {item.content[:60]}...")

print(f"\nRECENCY ZONE ({len(zones['recency'])} items):")
for i, item in enumerate(zones['recency'], 1):
    print(f"  {i}. [{item.salience.name}] {item.content[:60]}...")

Zone Distribution Test:

PRIMACY ZONE (3 items):
  1. [CRITICAL] CRITICAL: All monetary amounts MUST be in USD with exactly 2...
  2. [CRITICAL] REQUIRED: Do not include any speculative predictions or forw...
  3. [CRITICAL] You must verify all data points against the source financial...

MIDDLE ZONE (3 items):
  1. [IMPORTANT] The company was founded in 1995 and has grown to 500 employe...
  2. [IMPORTANT] Important: Focus your analysis on fiscal year 2023 performan...
  3. [SUPPORTING] For context, the economic environment included high inflatio...

RECENCY ZONE (2 items):
  1. [IMPORTANT] The industry has seen 15% annual growth over the past decade...
  2. [IMPORTANT] Important: Compare performance against the top 3 industry co...


The zone distribution method separates items by salience level and strategically allocates them across attention zones. 
- Critical items are split between primacy and recency, with the split point determined by `primacy_zone_size`.
- The method fills primacy slots with critical items first, then important items to capacity. For the recency zone, it reserves space for remaining critical items and fills preceding slots with important items. All remaining items go to the middle zone where reduced attention has minimal impact on system behavior.
- This approach handles edge cases gracefully, such as when there are more critical items than total zone capacity, ensuring the most important content still gets optimal placement.

## Production salience manager - Part 3: Prompt assembly
The final component of the production manager handles prompt construction, transforming the distributed zones into a formatted prompt ready for model consumption. This involves assembling items in zone order, optionally adding visual zone markers that help structure the context, and providing visualization tools for debugging. The prompt building process also needs to handle the actual query placement, ensuring it appears at the very end to benefit from recency effects when the model begins generation.

In [8]:
# Add prompt building and visualization methods to ProductionSalienceManager

def build_prompt(self, query: str, include_stats: bool = False) -> str:
    """
    Build optimized prompt with attention zone distribution.
    This is a method of ProductionSalienceManager.
    
    Args:
        query: The actual task or question for the model
        include_stats: Whether to append salience statistics (useful for debugging)
        
    Returns:
        Complete attention-optimized prompt string
    """
    # Get zone-distributed items
    zones = self._distribute_items_to_zones()
    
    # Build prompt sections, assembling zones in order
    prompt_parts = []
    
    # PRIMACY ZONE: High-attention opening section
    if self.zone_markers and zones['primacy']:
        # Add visual marker to delineate the primacy zone
        prompt_parts.append("# HIGH PRIORITY - PRIMACY ZONE")
    for item in zones['primacy']:
        # Add each primacy item's content
        prompt_parts.append(item.content)
    
    # MIDDLE ZONE: Supporting content with lower attention
    if zones['middle']:
        if self.zone_markers:
            # Add visual marker for supporting context section
            prompt_parts.append("\n# SUPPORTING CONTEXT")
        for item in zones['middle']:
            # Add each middle zone item's content
            prompt_parts.append(item.content)
    
    # RECENCY ZONE: High-attention closing section
    if zones['recency']:
        if self.zone_markers:
            # Add visual marker for recency zone
            prompt_parts.append("\n# HIGH PRIORITY - RECENCY ZONE")
        for item in zones['recency']:
            # Add each recency item's content
            prompt_parts.append(item.content)
    
    # TASK/QUERY: Placed at the very end for maximum recency
    if self.zone_markers:
        prompt_parts.append("\n# TASK")
    prompt_parts.append(query)
    
    # Combine all parts with double newlines for clear separation
    prompt = "\n\n".join(prompt_parts)
    
    # Optionally append statistics for debugging
    if include_stats:
        stats = self.get_stats()
        stats_text = (
            f"\n\n[Salience Stats: {stats['critical']} critical, "
            f"{stats['important']} important, {stats['supporting']} supporting]"
        )
        prompt += stats_text
    
    return prompt

def visualize_zones(self) -> None:
    """
    Print a visualization of how items are distributed across zones.
    This is a method of ProductionSalienceManager.
    Useful for debugging and understanding salience distribution.
    """
    # Get the zone distribution
    zones = self._distribute_items_to_zones()
    
    print("Attention Zone Distribution:")
    print("=" * 80)
    
    # PRIMACY ZONE visualization
    print("\n>>> PRIMACY ZONE (High Attention):")
    if not zones['primacy']:
        print("  (empty)")
    for i, item in enumerate(zones['primacy'], 1):
        # Show position, salience level, and truncated content
        print(f"  {i}. [{item.salience.name}] {item.content[:60]}...")
    
    # MIDDLE ZONE visualization
    if zones['middle']:
        print("\n--- MIDDLE ZONE (Lower Attention):")
        for i, item in enumerate(zones['middle'], 1):
            print(f"  {i}. [{item.salience.name}] {item.content[:60]}...")
    else:
        print("\n--- MIDDLE ZONE (Lower Attention):")
        print("  (empty)")
    
    # RECENCY ZONE visualization
    if zones['recency']:
        print("\n<<< RECENCY ZONE (High Attention):")
        for i, item in enumerate(zones['recency'], 1):
            print(f"  {i}. [{item.salience.name}] {item.content[:60]}...")
    else:
        print("\n<<< RECENCY ZONE (High Attention):")
        print("  (empty)")

# Attach methods to the class
ProductionSalienceManager.build_prompt = build_prompt
ProductionSalienceManager.visualize_zones = visualize_zones

# Test prompt building with our manager
print("Testing Prompt Building:")
print("=" * 80)

# Visualize the zone distribution
manager.visualize_zones()

# Build an optimized prompt
query = "Analyze the company's 2023 financial performance and provide key insights."
optimized_prompt = manager.build_prompt(query, include_stats=False)

print("\n" + "="*80)
print("\nGenerated Prompt (first 600 characters):")
print("="*80)
print(optimized_prompt[:600] + "\n...")

Testing Prompt Building:
Attention Zone Distribution:

>>> PRIMACY ZONE (High Attention):
  1. [CRITICAL] CRITICAL: All monetary amounts MUST be in USD with exactly 2...
  2. [CRITICAL] REQUIRED: Do not include any speculative predictions or forw...
  3. [CRITICAL] You must verify all data points against the source financial...

--- MIDDLE ZONE (Lower Attention):
  1. [IMPORTANT] The company was founded in 1995 and has grown to 500 employe...
  2. [IMPORTANT] Important: Focus your analysis on fiscal year 2023 performan...
  3. [SUPPORTING] For context, the economic environment included high inflatio...

<<< RECENCY ZONE (High Attention):
  1. [IMPORTANT] The industry has seen 15% annual growth over the past decade...
  2. [IMPORTANT] Important: Compare performance against the top 3 industry co...


Generated Prompt (first 600 characters):
# HIGH PRIORITY - PRIMACY ZONE

CRITICAL: All monetary amounts MUST be in USD with exactly 2 decimal places.

REQUIRED: Do not include any speculativ

The prompt building method orchestrates the final assembly by retrieving zone-distributed items and constructing a properly ordered prompt string.
- Zone markers provide visual structure that helps both developers debugging prompts and potentially the model itself in parsing distinct sections.
- The method handles empty zones gracefully and ensures the query always appears last to maximize recency benefits when generation begins.
- The `visualize_zones` method provides debugging capability by showing exactly which items landed in which zones, making it easy to verify that salience ordering is working as intended. This is particularly valuable in production when tuning zone sizes or investigating why certain constraints are not being followed - often revealing that too many critical items are competing for limited high-attention slots.

## Complete production system
The `ProductionSalienceManager` now provides a complete, production-ready solution for attention-optimized prompt construction. By combining automatic salience scoring, zone-based distribution, and flexible prompt assembly, it handles the entire workflow from raw content to optimized prompts. This system is lean and focused on the core problem: ensuring critical information reaches the model with maximum effectiveness by leveraging attention patterns.

In production deployments, this manager integrates seamlessly with RAG systems, tool-calling frameworks and memory architectures. Content from diverse sourcesâ€”retrieved documents, tool outputs, conversation history, system instructions - can all be added through the simple `add_item` interface. The manager automatically determines importance, distributes items across attention zones, and constructs prompts where critical constraints consistently influence model behavior. This eliminates the manual prompt engineering overhead while significantly improving reliability for production AI systems where instruction adherence is critical.