Skip to content

[FEATURE] PruningConversationManager for Selective Message Compression #556

@roeetal

Description

@roeetal

Problem Statement

The current strand-agents SDK provides a SummarizingConversationManager that compresses conversation history into a single summary message. While effective for general context reduction, this approach has significant limitations:

  1. Loss of Message Structure: Summarization collapses multiple messages into one, losing the conversational flow and individual message context that may be important for the model's understanding.

  2. Inefficient for Large Tool Results: When agents process large API responses, database queries, or file contents as intermediate steps, the entire large response gets summarized even though only the final conclusion may be relevant.

  3. Poor Granular Control: Users cannot selectively preserve important messages while aggressively compressing less relevant ones.

  4. Tool Result Bloat: Large tool results that are no longer needed (e.g., raw data that has been processed) continue to consume context space unnecessarily.

  5. Conversation Flow Disruption: Summarization can break the natural question-answer flow that models rely on for context understanding.

Consider this scenario: An agent processes a 50KB JSON API response, extracts key insights, and provides a summary to the user. The raw JSON is no longer needed, but the current system would either keep the entire response or summarize the entire conversation, potentially losing the structured interaction pattern.

Proposed Solution

Implement a PruningConversationManager that selectively compresses or removes individual messages while preserving the overall conversation structure and flow. Unlike summarization, pruning returns a list of messages where some have been compressed, removed, or truncated while others remain intact.

Core Components

1. Pruning Strategy Interface

Define strategies for selective message compression:

from abc import ABC, abstractmethod
from typing import List, Optional, Dict, Any
from ...types.content import Message, Messages

class PruningStrategy(ABC):
    """Abstract interface for message pruning strategies."""
    
    @abstractmethod
    def should_prune_message(self, message: Message, context: Dict[str, Any]) -> bool:
        """Determine if a message should be pruned."""
        pass
    
    @abstractmethod
    def prune_message(self, message: Message, agent: "Agent") -> Optional[Message]:
        """Prune a message, returning the compressed version or None to remove."""
        pass

class ToolResultPruningStrategy(PruningStrategy):
    """Prune large tool results while preserving tool use context."""
    
    def __init__(self, max_tool_result_tokens: int = 500):
        self.max_tool_result_tokens = max_tool_result_tokens
    
    def should_prune_message(self, message: Message, context: Dict[str, Any]) -> bool:
        # Check if message contains large tool results
        for content in message.get("content", []):
            if "toolResult" in content:
                result_size = self._estimate_tool_result_tokens(content["toolResult"])
                if result_size > self.max_tool_result_tokens:
                    return True
        return False
    
    def prune_message(self, message: Message, agent: "Agent") -> Optional[Message]:
        # Compress large tool results while preserving structure
        pruned_message = message.copy()
        for content in pruned_message["content"]:
            if "toolResult" in content:
                content["toolResult"] = self._compress_tool_result(
                    content["toolResult"], agent
                )
        return pruned_message

2. PruningConversationManager Implementation

Extend the conversation manager pattern:

class PruningConversationManager(ConversationManager):
    """Conversation manager that selectively prunes messages."""
    
    def __init__(
        self,
        pruning_strategies: List[PruningStrategy],
        preserve_recent_messages: int = 10,
        max_pruning_ratio: float = 0.6,
        enable_proactive_pruning: bool = True,
        pruning_threshold: float = 0.7,
    ):
        """Initialize the pruning conversation manager.
        
        Args:
            pruning_strategies: List of strategies to apply for message pruning.
            preserve_recent_messages: Number of recent messages to never prune.
            max_pruning_ratio: Maximum percentage of messages that can be pruned.
            enable_proactive_pruning: Whether to prune proactively based on threshold.
            pruning_threshold: Context usage threshold to trigger proactive pruning.
        """
        super().__init__()
        self.pruning_strategies = pruning_strategies
        self.preserve_recent_messages = preserve_recent_messages
        self.max_pruning_ratio = max_pruning_ratio
        self.enable_proactive_pruning = enable_proactive_pruning
        self.pruning_threshold = pruning_threshold
    
    def apply_management(self, agent: "Agent", **kwargs: Any) -> None:
        """Apply pruning management strategy."""
        if self.enable_proactive_pruning and self._should_prune_proactively(agent):
            self.reduce_context(agent, **kwargs)
    
    def reduce_context(self, agent: "Agent", e: Optional[Exception] = None, **kwargs: Any) -> None:
        """Reduce context through selective message pruning."""
        original_messages = agent.messages.copy()
        pruned_messages = self._prune_messages(agent.messages, agent)
        
        # Validate that pruning actually reduced token usage
        if self._validate_pruning_effectiveness(original_messages, pruned_messages, agent):
            agent.messages[:] = pruned_messages
            self.removed_message_count += len(original_messages) - len(pruned_messages)
        else:
            # Fallback to more aggressive pruning or raise exception
            self._handle_pruning_failure(agent, e)

3. Built-in Pruning Strategies

Provide common pruning strategies out of the box:

class LargeToolResultPruningStrategy(PruningStrategy):
    """Compress large tool results to summaries."""
    
    def prune_message(self, message: Message, agent: "Agent") -> Optional[Message]:
        # Use LLM to summarize large tool results
        return self._llm_compress_tool_result(message, agent)

class OldMessageRemovalStrategy(PruningStrategy):
    """Remove very old messages that are likely irrelevant."""
    
    def should_prune_message(self, message: Message, context: Dict[str, Any]) -> bool:
        message_age = context.get("message_age", 0)
        return message_age > self.max_message_age

class DuplicateContentPruningStrategy(PruningStrategy):
    """Remove or compress messages with duplicate or similar content."""
    
    def should_prune_message(self, message: Message, context: Dict[str, Any]) -> bool:
        # Use similarity detection to identify duplicate content
        return self._detect_content_similarity(message, context)

class IntermediateStepPruningStrategy(PruningStrategy):
    """Compress intermediate reasoning steps while preserving conclusions."""
    
    def prune_message(self, message: Message, agent: "Agent") -> Optional[Message]:
        # Identify and compress intermediate reasoning
        return self._compress_intermediate_steps(message, agent)

4. Pruning Context and Metadata

Track pruning decisions and provide transparency:

class PruningContext:
    """Context information for pruning decisions."""
    
    def __init__(self, messages: Messages, agent: "Agent"):
        self.messages = messages
        self.agent = agent
        self.message_ages = self._calculate_message_ages()
        self.token_counts = self._calculate_token_counts()
        self.tool_usage_map = self._build_tool_usage_map()
    
    def get_message_context(self, index: int) -> Dict[str, Any]:
        """Get context information for a specific message."""
        return {
            "message_age": self.message_ages[index],
            "token_count": self.token_counts[index],
            "has_tool_use": self._has_tool_use(self.messages[index]),
            "has_tool_result": self._has_tool_result(self.messages[index]),
            "is_recent": index >= len(self.messages) - self.preserve_recent_messages,
        }

Use Case

1. Data Processing Workflows

# Agent that processes large datasets
agent = Agent(
    model=model,
    tools=[data_processor, api_client, file_reader],
    conversation_manager=PruningConversationManager(
        pruning_strategies=[
            LargeToolResultPruningStrategy(max_tool_result_tokens=1000),
            IntermediateStepPruningStrategy(),
        ],
        preserve_recent_messages=5,
        pruning_threshold=0.6
    )
)

# Large API response gets pruned after processing
result = agent("Fetch user data from API and analyze patterns")
# Raw API response is compressed, analysis remains intact

2. Long-Running Research Sessions

# Research agent with selective memory
research_agent = Agent(
    conversation_manager=PruningConversationManager(
        pruning_strategies=[
            OldMessageRemovalStrategy(max_age_messages=50),
            DuplicateContentPruningStrategy(similarity_threshold=0.8),
            LargeToolResultPruningStrategy(max_tool_result_tokens=500),
        ],
        preserve_recent_messages=15,
        max_pruning_ratio=0.7
    )
)

# Maintains research flow while pruning redundant information
for topic in research_topics:
    result = research_agent(f"Research {topic} and provide key insights")

3. Multi-Step Problem Solving

# Problem-solving agent that preserves solution structure
solver_agent = Agent(
    conversation_manager=PruningConversationManager(
        pruning_strategies=[
            IntermediateStepPruningStrategy(),
            LargeToolResultPruningStrategy(max_tool_result_tokens=800),
        ],
        preserve_recent_messages=8
    )
)

# Keeps problem-solution structure while compressing intermediate work
result = solver_agent("Solve this complex optimization problem step by step")

4. Custom Pruning Strategy

class BusinessLogicPruningStrategy(PruningStrategy):
    """Custom strategy for business-specific content pruning."""
    
    def should_prune_message(self, message: Message, context: Dict[str, Any]) -> bool:
        # Custom business logic for identifying pruneable content
        return self._contains_temporary_data(message)
    
    def prune_message(self, message: Message, agent: "Agent") -> Optional[Message]:
        # Custom compression logic
        return self._compress_business_data(message)

# Use custom strategy
agent = Agent(
    conversation_manager=PruningConversationManager(
        pruning_strategies=[
            BusinessLogicPruningStrategy(),
            LargeToolResultPruningStrategy(),
        ]
    )
)

Alternatives Solutions

1. Enhanced Summarization

  • Extend SummarizingConversationManager with selective summarization
  • Pros: Builds on existing architecture, familiar pattern
  • Cons: Still loses message structure, limited granular control

2. Hierarchical Compression

  • Implement multi-level compression with different strategies per level
  • Pros: Very flexible, can optimize for different content types
  • Cons: Complex configuration, potential over-engineering

3. Content-Aware Sliding Window

  • Enhance SlidingWindowConversationManager with content-aware trimming
  • Pros: Simple conceptual model, predictable behavior
  • Cons: Less flexible than pruning, may remove important content

4. Hybrid Pruning-Summarization

  • Combine pruning and summarization in a single manager
  • Pros: Best of both approaches, maximum flexibility
  • Cons: Increased complexity, potential conflicts between strategies

Additional Context

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions