# Importance-based memory selection

AI agents with memory capabilities accumulate vast amounts of interaction history over time - past conversations, user preferences, task outcomes, feedback signals and contextual observations. While this rich memory enables personalization and continuity, it creates a fundamental challenge: context windows are finite, and not all memories are equally valuable. Loading every stored memory into each request wastes tokens on trivial details while potentially crowding out critical information that should influence the agent's behavior.

Importance-based memory selection addresses this challenge by assigning significance scores to memories and using these scores to prioritize what gets loaded into context. Rather than treating all memories equally or simply using recency as a proxy for relevance, we explicitly model importance through signals like user feedback, task success rates, business impact, and contextual relevance. This allows agents to maintain extensive memory stores while selectively surfacing only the most impactful information for each interaction.

In this notebook, we explore how to implement importance-based memory selection as a sophisticated select strategy for context engineering. We will examine different dimensions of memory importance including user feedback signals, task outcome tracking, business impact scoring and contextual relevance, how to combine multiple importance signals into composite scores, how to select memory subsets under token budget constraints, and how to build production-ready memory ranking systems.

In [1]:
import os
from datetime import datetime, timedelta
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from typing import List, Dict, Optional
from dataclasses import dataclass, field
from enum import Enum
import json
import numpy as np

We begin by initializing the language model that will use our selected memories to generate personalized responses. Consistent model configuration ensures reproducible behavior across different memory selection strategies.

In [2]:
# Initialize the language model for generating responses
llm = ChatOpenAI(
    model="gpt-4o-mini",
    api_key=os.getenv("OPENAI_API_KEY", "").strip(),
    temperature=0  # Set to 0 for more deterministic outputs
)

# Initialize embedding model for semantic similarity
embeddings = OpenAIEmbeddings(model="text-embedding-ada-002", api_key=os.getenv("OPENAI_API_KEY", "").strip())

print("Models initialized successfully!")

Models initialized successfully!


## Part 1: Modeling memory with importance scores
The foundation of importance-based memory selection lies in how we model memories themselves. Unlike simple memory systems that store just content and timestamps, an importance-based system needs to capture a rich tapestry of signals that indicate value. Think about how we remember things in our own life - we do not just remember facts, we remember how we felt about them, whether they proved useful, and how often we have needed them. Our memory system should work similarly.

When an AI agent interacts with users over time, different memories accumulate different kinds of value. A user preference that led to a successful purchase is more valuable than a casual question from months ago. A fact that the user explicitly saved deserves more weight than something mentioned in passing. A memory that gets referenced frequently in conversations signals ongoing relevance. To capture these nuances, we need to track multiple dimensions: user feedback signals that indicate explicit or implicit approval, task outcome tracking that shows whether information led to successful completions, business value metrics that tie memories to real-world impact like revenue or strategic goals, temporal factors that account for both recency and decay over time, and usage patterns that reveal which memories prove consistently useful.

The challenge is combining these disparate signals into a single, actionable importance score. We can not simply add them up - a very old but highly successful memory might be less relevant than a recent moderate-success memory depending on the context. We need a flexible scoring system that can weight different factors appropriately for different use cases, while providing clear explanations for why certain memories rank higher than others. Let's build this step by step, starting with defining the structure of our memories and the types they can represent.

First, we need to categorize the different kinds of information an agent might remember. Not all memories serve the same purpose - some capture user preferences like product choices or communication styles, others record task outcomes showing what worked or failed, some preserve conversation history for context continuity, and still others store factual information or explicit feedback. By classifying memories into types, we can apply type-specific importance weighting and selection strategies later on.

In [3]:
class MemoryType(Enum):
    """Types of memories an agent might store."""
    USER_PREFERENCE = "user_preference"  # Preferences and choices the user has expressed
    TASK_OUTCOME = "task_outcome"  # Results from completed tasks or actions
    CONVERSATION = "conversation"  # Past conversation exchanges
    FACTUAL_INFO = "factual_info"  # Facts learned about user or their context
    FEEDBACK = "feedback"  # Explicit user feedback

With memory types defined, we can now build the complete `Memory` data structure that captures all the dimensions of importance we discussed. This is where the real power of importance-based selection comes from - rather than just storing content and a timestamp, we track a comprehensive set of signals that collectively indicate value. The `Memory` class needs to hold the actual content and its type, temporal information like creation time and access patterns, user feedback signals that might be explicit like a save button or implicit like ratings, task success indicators showing whether this information led to successful outcomes when used previously, business metrics that tie memories to organizational goals or revenue, and metadata like tags for additional categorization.

Beyond just storing these fields, the `Memory` class provides a crucial method that synthesizes all these signals into a single importance score. This `calculate_importance_score` method implements a weighted combination approach where different dimensions contribute proportionally to the final score. The weights are configurable because different applications have different priorities - a customer support agent might weight user feedback heavily, while a sales agent might prioritize business value and revenue impact. The method returns a normalized score between 0 and 1, making it easy to compare memories and apply thresholds. Let's implement this comprehensive memory model.

In [4]:
@dataclass
class Memory:
    """Memory with comprehensive importance tracking."""
    
    # Core content
    content: str  # The actual information being stored
    memory_type: MemoryType  # What category this memory falls into (preference, conversation, etc.)
    
    # Temporal information
    created_at: datetime  # When this memory was first created
    last_accessed: Optional[datetime] = None  # Last time this memory was retrieved and used (None if never accessed)
    access_count: int = 0  # How many times this memory has been selected and used
    
    # User feedback signals
    user_feedback_score: float = 0.0  # -1 (negative) to 1 (positive)
    user_explicitly_saved: bool = False  # Whether user explicitly clicked "save this" or similar action
    
    # Task success signals
    led_to_task_success: Optional[bool] = None  # Did this memory lead to successful task completion when last used?
    task_success_rate: float = 0.0  # For memories used multiple times, what is the success rate?
    
    # Business impact
    business_value: float = 0.0  # 0-1 score for business importance
    revenue_impact: float = 0.0  # Estimated revenue impact
    
    # Metadata - Tags for categorization and filtering
    tags: List[str] = field(default_factory=list)
    
    def calculate_importance_score(self, 
                                   current_context: Optional[str] = None,
                                   weights: Optional[Dict[str, float]] = None) -> float:
        """Calculate composite importance score from multiple signals. Combines user feedback, task success, business value, recency, and access frequency into a single 0-1 score representing memory importance.
        
        Args:
            current_context: Current user query for contextual relevance
            weights: Weight factors for different importance dimensions
            
        Returns:
            Importance score (0-1)
        """
        # Default weights if not provided - These weights determine how much each dimension contributes to final score
        if weights is None:
            weights = {
                'feedback': 0.3,
                'task_success': 0.25,
                'business_value': 0.2,
                'recency': 0.15,
                'access_frequency': 0.1
            }
        
        # Component 1: User feedback signal (normalized to 0-1)
        feedback_component = (self.user_feedback_score + 1) / 2  # Convert feedback score from [-1, 1] range to [0, 1] range
        # Boost score if user explicitly saved this memory (shows strong intent)
        if self.user_explicitly_saved:
            feedback_component = min(1.0, feedback_component + 0.3)
        
        # Component 2: Task success signal
        task_component = 0.5  # Start with neutral score if we don't know success/failure
        # If we have explicit success/failure data, use it
        if self.led_to_task_success is not None:
            # Success = 1.0, failure = 0.2 (we still learn from failures)
            task_component = 1.0 if self.led_to_task_success else 0.2
        # If we have success rate from multiple uses, prefer that (more reliable)
        if self.task_success_rate > 0:
            task_component = self.task_success_rate
        
        # Component 3: Business value
        business_component = self.business_value  # Direct business value score
        
        # Component 4: Recency (decay over time)
        age_days = (datetime.now() - self.created_at).days  # Calculate how old this memory is in days
        recency_component = max(0, 1 - (age_days / 365))  # Apply linear decay over one year (recent = 1.0, year+ old = 0.0)
        
        # Component 5: Access frequency (normalized by log to prevent dominance)
        import math
        frequency_component = min(1.0, math.log(self.access_count + 1) / 5)
        
        # Combine weighted components
        importance = (
            weights['feedback'] * feedback_component +
            weights['task_success'] * task_component +
            weights['business_value'] * business_component +
            weights['recency'] * recency_component +
            weights['access_frequency'] * frequency_component
        )

        # Ensure final score stays in valid [0, 1] range
        return min(1.0, max(0.0, importance))

Now that we have our `Memory` class defined, we need to create a diverse set of sample memories that demonstrate different importance profiles. This is crucial for testing our selection strategies - we want memories that vary across all the dimensions we track. Some memories should have high user feedback but low business value, others should be recent but have neutral feedback, and still others should be old but frequently accessed. By creating this variety, we can observe how our importance scoring algorithm differentiates between memories and ensures the right ones surface in different contexts.

Our sample memories will represent a realistic scenario of an e-commerce agent helping a user shop for electronics. We will include a high-value user preference that was explicitly saved and has led to successful outcomes, an old conversation fragment that has not proven particularly useful, a budget preference that is business-critical and frequently referenced, a recent successful recommendation that drove revenue, contextual information about the user's lifestyle that's moderately useful, and a failed recommendation that teaches us what not to do. This mix will let us see importance-based selection in action.

In [5]:
# Capture current time for calculating relative dates
now = datetime.now()

# Create a diverse set of memories with varying importance signals
sample_memories = [
    # Memory 1: High-value user preference. Explicitly saved, positive feedback, high success rate. Should score very high on importance
    Memory(
        content="User prefers eco-friendly packaging and sustainable products",
        memory_type=MemoryType.USER_PREFERENCE,
        created_at=now - timedelta(days=30),  # One month old
        access_count=5,  # Accessed multiple times
        user_feedback_score=1.0,  # Maximum positive feedback
        user_explicitly_saved=True,  # User clicked "remember this"
        led_to_task_success=True,  # Led to successful purchases
        task_success_rate=0.9,  # 90% success rate when used
        business_value=0.8,  # High strategic value (brand alignment)
        tags=["preference", "sustainability"]
    ),
    # Memory 2: Old, low-value conversation. Old, rarely accessed, no clear value signals. Should score low on importance
    Memory(
        content="User asked about laptop specifications on 2024-01-15",
        memory_type=MemoryType.CONVERSATION,
        created_at=now - timedelta(days=200),  # Over 6 months old
        access_count=1,  # Only accessed once (when created)
        user_feedback_score=0.0,  # No feedback
        business_value=0.2,  # Low business value
        tags=["conversation", "product_inquiry"]
    ),
    # Memory 3: Critical budget preference. Business-critical, frequently used, high success. Should score very high (highest importance)
    Memory(
        content="User's budget range is $800-$1200 for electronics",
        memory_type=MemoryType.USER_PREFERENCE,
        created_at=now - timedelta(days=60),  # Two months old
        access_count=8,  # Heavily accessed
        user_feedback_score=0.5,  # Moderate positive feedback
        led_to_task_success=True,  # Leads to purchases
        task_success_rate=0.85,  # 85% success rate
        business_value=0.9,  # Critical for conversions
        revenue_impact=1000.0,  # Estimated $1000 revenue driven
        tags=["preference", "budget"]
    ),
    # Memory 4: Recent successful recommendation. Very recent, drove revenue, positive outcome. Should score high (recency + success)
    Memory(
        content="Recommended cooling pad, user purchased it and left positive review",
        memory_type=MemoryType.TASK_OUTCOME,
        created_at=now - timedelta(days=10),  # Very recent (10 days)
        access_count=2,  # Accessed a couple times
        user_feedback_score=1.0,  # Positive review
        led_to_task_success=True,  # Purchase completed
        business_value=0.7,  # Good business value (conversion)
        revenue_impact=49.99,  # $49.99 revenue
        tags=["recommendation", "success"]
    ),
    # Memory 5: Contextual information (moderate value). Useful context but not critical. Should score medium importance
    Memory(
        content="User mentioned they work from home",
        memory_type=MemoryType.FACTUAL_INFO,
        created_at=now - timedelta(days=45),  # Month and a half old
        access_count=3,  # Occasionally referenced
        user_feedback_score=0.0,  # No explicit feedback
        business_value=0.4,  # Moderate value (helps personalization)
        tags=["context", "lifestyle"]
    ),
    # Memory 6: Failed recommendation. Negative feedback, didn't lead to success. Should score low (but not zero - we learn from failures)
    Memory(
        content="Previous recommendation for expensive laptop was rejected",
        memory_type=MemoryType.TASK_OUTCOME,
        created_at=now - timedelta(days=70),  # Over two months old
        access_count=1,  # Not frequently referenced
        user_feedback_score=-0.5,  # Negative feedback
        led_to_task_success=False,  # User rejected recommendation
        business_value=0.3,  # Some value (teaches us what not to do)
        tags=["recommendation", "failure"]
    ),
]

print(f"Created {len(sample_memories)} sample memories\n")

Created 6 sample memories



With our diverse set of sample memories created, let's now calculate and display their importance scores to see how our multi-dimensional scoring algorithm differentiates between them. This is the moment where all those tracked signals - feedback, task success, business value, recency and access frequency - come together into actionable rankings. We should see the budget preference and sustainability preference scoring highest because they combine multiple strong signals, the recent successful recommendation scoring high due to recency and positive outcomes, the contextual information and old conversation scoring lower due to weaker signals, and the failed recommendation scoring lowest despite having some learning value.

By displaying not just the scores but also the underlying signals that contributed to each score, we can understand and validate how the importance calculation works. This transparency is crucial for production systems where we need to explain why certain memories were selected or debug unexpected selection behavior.

In [6]:
# Display importance scores for all memories
print("Memory importance scores:")
print("=" * 70)

# Iterate through each memory and show its importance calculation
for i, mem in enumerate(sample_memories, 1):
    # Calculate the composite importance score for this memory
    score = mem.calculate_importance_score()
    print(f"\n{i}. [Importance: {score:.3f}]")  # Display memory number and importance score
    print(f"   {mem.content}")  # Show the actual memory content
    print(f"   Type: {mem.memory_type.value} | Age: {(now - mem.created_at).days} days")  # Display key metadata that influenced the score
    print(f"   Feedback: {mem.user_feedback_score:+.1f} | Task success: {mem.led_to_task_success}")  # Show feedback and success signals (these are major contributors)

Memory importance scores:

1. [Importance: 0.859]
   User prefers eco-friendly packaging and sustainable products
   Type: user_preference | Age: 30 days
   Feedback: +1.0 | Task success: True

2. [Importance: 0.397]
   User asked about laptop specifications on 2024-01-15
   Type: conversation | Age: 200 days
   Feedback: +0.0 | Task success: None

3. [Importance: 0.787]
   User's budget range is $800-$1200 for electronics
   Type: user_preference | Age: 60 days
   Feedback: +0.5 | Task success: True

4. [Importance: 0.858]
   Recommended cooling pad, user purchased it and left positive review
   Type: task_outcome | Age: 10 days
   Feedback: +1.0 | Task success: True

5. [Importance: 0.514]
   User mentioned they work from home
   Type: factual_info | Age: 45 days
   Feedback: +0.0 | Task success: None

6. [Importance: 0.320]
   Previous recommendation for expensive laptop was rejected
   Type: task_outcome | Age: 70 days
   Feedback: -0.5 | Task success: False


The `Memory` class uses Python dataclasses for clean field definitions with default values and provides a comprehensive importance modeling framework:
1. Defines a rich memory structure capturing content, type, temporal information, feedback signals, success tracking, and business metrics.
2. Implements a composite importance calculation combining multiple signals with configurable weights for different domains.
3. Creates diverse sample memories demonstrating varying importance profiles from highly valuable preferences to low-value conversation snippets.
4. Calculates importance scores showing clear differentiation between critical information (user preferences with high feedback) and trivial details (old conversation fragments).

This foundation enables intelligent memory selection based on true value rather than arbitrary heuristics.

## Part 2: Importance-based ranking and selection

With importance scores calculated, we can now implement selection strategies that determine which memories get loaded into the agent's context. The simplest approach is top-k selection, where we rank all memories by importance and take the k highest-scoring items. This ensures the most valuable information is always included, though it may miss contextually relevant but lower-scoring memories.

More sophisticated strategies might combine importance with other factors like semantic relevance to the current query, or implement threshold-based filtering where only memories exceeding a minimum importance score are considered. The right approach depends on whether we prioritize surfacing the globally most important information or balancing importance with contextual relevance.

In [7]:
def select_by_importance(memories: List[Memory],
                        top_k: int = 3,
                        min_importance: float = 0.0,
                        weights: Optional[Dict[str, float]] = None) -> List[Memory]:
    """Select memories based on importance scores using top-k ranking.
    
    Args:
        memories: List of all available memories
        top_k: Number of memories to select (default: 3)
        min_importance: Minimum importance threshold; memories below this are excluded (default: 0.0)
        weights: Optional custom weights for importance calculation (default: None uses standard weights)
        
    Returns:
        List of selected memories, sorted by importance (highest first)
    """
    # Step 1: Calculate importance score for each memory
    scored_memories = []  # Store as tuples of (memory, score) for sorting
    for mem in memories:
        # Calculate score using either custom or default weights
        score = mem.calculate_importance_score(weights=weights)
        scored_memories.append((mem, score))
    
    # Step 2: Filter by minimum importance threshold - This removes low-value memories even if they would fit in top-k
    filtered = [(mem, score) for mem, score in scored_memories if score >= min_importance]
    
    # Step 3: Sort by importance score in descending order (highest first)
    filtered.sort(key=lambda x: x[1], reverse=True)  # The key function extracts the score (second element of tuple) for comparison
    
    # Step 4: Take only the top k memories
    selected = [mem for mem, score in filtered[:top_k]]  # Extract just the memory objects (first element of tuple)
    
    return selected

Let's test our importance-based selection function with our sample memories to see it in action. We will select the top 3 most important memories from our collection of 6, which should surface the memories with the strongest combination of signals across feedback, task success, business value, recency, and access frequency. Beyond just showing which memories were selected, we will also display the specific reasons why each memory was deemed important - this explainability is crucial for debugging selection behavior and building trust in production systems where stakeholders need to understand why certain information is being prioritized.

In [8]:
# Test importance-based selection with our sample memories
print("Importance-Based Memory Selection")
print("=" * 70)

# Select top 3 memories based on importance scores
selected = select_by_importance(sample_memories, top_k=3)  # Using default weights and no minimum threshold

print(f"\nSelected top 3 memories from {len(sample_memories)} total:\n")

# Display each selected memory with its importance score and contributing factors
for i, mem in enumerate(selected, 1):
    # Recalculate score for display (already calculated during selection)
    score = mem.calculate_importance_score()
    # Show rank, score, and content
    print(f"{i}. [Importance: {score:.3f}]")
    print(f"   {mem.content}")
    print(f"   Why important:")

    # Build a list of reasons this memory scored highly - These are the signals that contributed to its high importance
    reasons = []
    # Check for positive user feedback
    if mem.user_feedback_score > 0.5:
        reasons.append(f"Positive user feedback ({mem.user_feedback_score:+.1f})")
    # Check if user explicitly saved this memory
    if mem.user_explicitly_saved:
        reasons.append("Explicitly saved by user")
    # Check for task success
    if mem.led_to_task_success:
        reasons.append("Led to successful outcome")
    # Check for high business value
    if mem.business_value > 0.6:
        reasons.append(f"High business value ({mem.business_value:.1f})")
    # Check for frequent access (indicates ongoing relevance)
    if mem.access_count > 4:
        reasons.append(f"Frequently accessed ({mem.access_count}x)")

    # Display all contributing reasons
    for reason in reasons:
        print(f"     • {reason}")
    print()

Importance-Based Memory Selection

Selected top 3 memories from 6 total:

1. [Importance: 0.859]
   User prefers eco-friendly packaging and sustainable products
   Why important:
     • Positive user feedback (+1.0)
     • Explicitly saved by user
     • Led to successful outcome
     • High business value (0.8)
     • Frequently accessed (5x)

2. [Importance: 0.858]
   Recommended cooling pad, user purchased it and left positive review
   Why important:
     • Positive user feedback (+1.0)
     • Led to successful outcome
     • High business value (0.7)

3. [Importance: 0.787]
   User's budget range is $800-$1200 for electronics
   Why important:
     • Led to successful outcome
     • High business value (0.9)
     • Frequently accessed (8x)



Importance-based selection surfaces the most valuable memories:
1. Implements top-k selection that ranks memories by composite importance scores and selects the highest-scoring subset.
2. Applies optional minimum importance thresholds to exclude low-value memories even if they would fit in top-k.
3. Demonstrates that memories with strong signals across multiple dimensions (user feedback, task success, business value) rise to the top.
4. Provides explainability by showing which importance factors contributed to each memory's selection.

This ensures agents prioritize information that has proven valuable through actual usage and outcomes.

## Part 3: Balancing importance with contextual relevance

Pure importance-based selection has a limitation: it may surface globally important memories that are irrelevant to the current query. A user preference for sustainable products is highly important overall, but if the current query asks about shipping times, that preference adds no value to the response. We need to balance importance with contextual relevance.

The solution is hybrid selection that balances importance with semantic similarity to the current query. We calculate two scores for each memory: the importance score we have already implemented, and a relevance score based on how semantically similar the memory content is to the user's query. We then combine these scores using configurable weights, allowing us to tune whether we prioritize surfacing globally important information or contextually precise information. This is implemented through embedding-based similarity where we convert both the query and memory content into vector representations and calculate cosine similarity.

The beauty of this approach is its flexibility. For a customer support agent where contextual precision matters greatly, we might weight relevance at 70% and importance at 30%. For a sales agent where certain user preferences should always be considered regardless of context, we might weight importance at 80% and relevance at 20%. The hybrid scoring ensures that memories must pass both bars - they need to be valuable and relevant - preventing the pitfalls of pure importance or pure similarity-based selection. Let's implement this sophisticated selection strategy.

In [9]:
def select_by_importance_and_relevance(memories: List[Memory],
                                       query: str,
                                       embeddings: OpenAIEmbeddings,
                                       top_k: int = 3,
                                       importance_weight: float = 0.6,
                                       relevance_weight: float = 0.4) -> List[Memory]:
    """Select memories balancing importance and contextual relevance.
    
    Args:
        memories: List of all available memories
        query: Current user query for relevance calculation
        embeddings: Embedding model for semantic similarity
        top_k: Number of memories to select
        importance_weight: Weight for importance score (0-1), higher = prioritize important memories
        relevance_weight: Weight for relevance score (0-1), higher = prioritize contextually relevant memories
        
    Returns:
        List of selected memories balancing both factors
    """
    # Step 1: Get embedding vector for the user's query
    query_embedding = embeddings.embed_query(query)
    
    # Step 2: Calculate both importance and relevance for each memory
    scored_memories = []
    
    for mem in memories:
        # Calculate importance score using our existing method
        importance = mem.calculate_importance_score()  # This is the global value score independent of context
        
        # Calculate semantic relevance to current query
        mem_embedding = embeddings.embed_query(mem.content)  # Convert memory content to vector representation
        
        # Calculate cosine similarity between query and memory vectors
        similarity = np.dot(query_embedding, mem_embedding) / (
            np.linalg.norm(query_embedding) * np.linalg.norm(mem_embedding)
        )
        relevance = (similarity + 1) / 2  # Normalize cosine similarity from [-1, 1] to [0, 1] range
        
        # Combine importance and relevance using weighted average - importance_weight + relevance_weight should sum to 1
        combined_score = (importance_weight * importance + 
                         relevance_weight * relevance)

        # Store all scores for sorting and analysis
        scored_memories.append((mem, combined_score, importance, relevance))
    
    # Step 3: Sort by combined score (highest first)
    scored_memories.sort(key=lambda x: x[1], reverse=True)
    
    # Step 4: Return top k memories with their detailed scores
    return scored_memories[:top_k]

The implementation combines importance-based selection with semantic relevance through a hybrid scoring function that calculates both dimensions for each memory, applies configurable weights to balance global value against contextual fit, and returns memories ranked by their combined scores along with the breakdown showing how each dimension contributed. Let's demonstrate context-aware selection across three different query types: price-related queries surface the budget preference, sustainability queries surface the eco-friendly preference, and technical queries surface the cooling pad recommendation, proving that the same memory store produces different selections based on contextual relevance.

In [10]:
# Test hybrid selection with different query types - Each query should surface different memories based on contextual relevance
test_queries = [
    "I'm looking for a new laptop in my price range",      # Should surface budget preference
    "Tell me about eco-friendly product options",           # Should surface sustainability preference
    "What cooling solutions do you recommend?"              # Should surface cooling pad recommendation
]

print("Hybrid Selection: Importance + Contextual Relevance")
print("=" * 70)

# Run selection for each query to demonstrate context-aware selection
for query in test_queries:
    print(f"\nQuery: '{query}'")
    print("-" * 70)

    # Select memories with 60% importance weight, 40% relevance weight
    selected = select_by_importance_and_relevance(
        sample_memories, 
        query, 
        embeddings,
        top_k=3,  # Select top 3 memories
        importance_weight=0.6,  # 60% weight on importance
        relevance_weight=0.4  # 40% weight on relevance
    )
    
    print(f"\nTop 3 memories (60% importance, 40% relevance):\n")

    # Display each selected memory with score breakdown
    for i, (mem, combined, importance, relevance) in enumerate(selected, 1):
        print(f"{i}. [Combined: {combined:.3f} | Importance: {importance:.3f} | Relevance: {relevance:.3f}]")
        print(f"   {mem.content}")
        print()

Hybrid Selection: Importance + Contextual Relevance

Query: 'I'm looking for a new laptop in my price range'
----------------------------------------------------------------------

Top 3 memories (60% importance, 40% relevance):

1. [Combined: 0.869 | Importance: 0.859 | Relevance: 0.884]
   User prefers eco-friendly packaging and sustainable products

2. [Combined: 0.868 | Importance: 0.858 | Relevance: 0.883]
   Recommended cooling pad, user purchased it and left positive review

3. [Combined: 0.840 | Importance: 0.787 | Relevance: 0.919]
   User's budget range is $800-$1200 for electronics


Query: 'Tell me about eco-friendly product options'
----------------------------------------------------------------------

Top 3 memories (60% importance, 40% relevance):

1. [Combined: 0.895 | Importance: 0.859 | Relevance: 0.949]
   User prefers eco-friendly packaging and sustainable products

2. [Combined: 0.865 | Importance: 0.858 | Relevance: 0.877]
   Recommended cooling pad, user purchas

Hybrid selection balances global importance with contextual fit:
1. Implements combined scoring that weights both importance and semantic relevance to current query, preventing irrelevant but important memories from dominating.
2. Uses configurable weight parameters allowing tuning based on whether you prioritize surfacing important information or contextually precise information.
3. Demonstrates how different queries select different memory subsets - budget preference surfaces for price-related queries, sustainability for eco-queries, cooling success for technical queries.
4. Returns scoring breakdowns showing how importance and relevance each contributed to final selection, providing transparency and debuggability.

This sophisticated approach ensures memories are both valuable and applicable to the current conversation.

## Part 4: Token-constrained memory selection

In production AI systems, one of the hardest constraints we face is the finite context window. While top-k selection gives us control over the number of memories, it does not account for the fact that different memories consume different amounts of tokens - a brief preference like "User prefers dark mode" might use 5 tokens, while a detailed product recommendation could use 30 tokens. If we select a fixed count of memories, we might wastefully use only 40% of our available context budget, or we might exceed it entirely and cause truncation errors. What we really need is to pack as many high-value memories as possible into our available token budget, maximizing the information density of our context.

This is a classic optimization problem known as the knapsack problem: given items with values (importance scores) and weights (token counts), select the subset that maximizes total value while staying under a weight limit. While the optimal solution requires dynamic programming with O(n * budget) complexity, a greedy approximation works remarkably well in practice - sort memories by their combined score, then add them one by one until the budget is exhausted. This greedy approach has O(n log n) complexity and typically achieves near-optimal results because importance scores already capture value density, and we are selecting the highest-value items first.

Token-constrained selection adapts our previous selection strategies to respect hard budget limits. If we are doing pure importance selection, we sort by importance and pack greedily. If we are doing hybrid importance-relevance selection, we use those combined scores for ranking. The result is a selection that makes optimal use of available context space, ensuring we do not waste tokens on padding while preventing overflow errors. Let's implement this crucial production feature.

In [11]:
def estimate_tokens(text: str) -> int:
    """
    Estimate token count for text using a rough word-based approximation.
        
    Args:
        text: Text to estimate token count for
        
    Returns:
        Estimated token count
    """
    # Rough estimate based on empirical observation: English text averages ~0.75 tokens per word
    # This varies by language and technical content but works well for estimation
    words = len(text.split())
    return int(words * 0.75)

def select_with_token_budget(memories: List[Memory],
                            token_budget: int,
                            query: Optional[str] = None,
                            embeddings: Optional[OpenAIEmbeddings] = None,
                            importance_weight: float = 0.7,
                            relevance_weight: float = 0.3) -> Dict:
    """
    Select memories to maximize importance within a token budget constraint.
        
    Args:
        memories: List of all available memories
        token_budget: Maximum tokens to use for memory content
        query: Optional query for hybrid selection (if None, uses pure importance)
        embeddings: Embedding model required if query provided
        importance_weight: Weight for importance in hybrid scoring
        relevance_weight: Weight for relevance in hybrid scoring
        
    Returns:
        Dict containing:
        - memories: Selected memory objects
        - memory_count: Number of memories selected
        - total_tokens: Actual tokens used
        - budget_used_pct: Percentage of budget utilized
        - avg_importance: Average importance of selected memories
    """
    # Step 1: Calculate scores for each memory - Choose scoring method based on whether query is provided
    if query and embeddings:
        # Use hybrid importance + relevance scoring
        scored = select_by_importance_and_relevance(
            memories, query, embeddings, 
            top_k=len(memories),  # Get all memories ranked, we will filter by budget
            importance_weight=importance_weight,
            relevance_weight=relevance_weight
        )
        # Extract (memory, combined_score) tuples from full results
        scored_memories = [(mem, score) for mem, score, _, _ in scored]
    else:
        # Use pure importance scoring without contextual relevance
        scored_memories = [(mem, mem.calculate_importance_score()) for mem in memories]
        # Sort by importance score in descending order
        scored_memories.sort(key=lambda x: x[1], reverse=True)
    
    # Step 2: Greedy selection to maximize value within budget - Add memories in order of score until budget is exhausted
    selected = []
    total_tokens = 0
    
    for mem, score in scored_memories:
        # Calculate tokens required for this memory
        mem_tokens = estimate_tokens(mem.content)
        
        # Only add memory if it fits within remaining budget - This is the greedy decision: add if possible, skip otherwise
        if total_tokens + mem_tokens <= token_budget:
            selected.append((mem, score, mem_tokens))
            total_tokens += mem_tokens
        # Could continue to find smaller memories that fit, but greedy is often sufficient

    # Step 3: Compile results with metadata for monitoring and debugging
    return {
        "memories": [mem for mem, _, _ in selected],  # Just the memory objects
        "memory_count": len(selected),  # How many memories fit
        "total_tokens": total_tokens,  # Actual tokens used
        "budget_used_pct": (total_tokens / token_budget * 100) if token_budget > 0 else 0,  # Efficiency metric
        "avg_importance": sum(score for _, score, _ in selected) / len(selected) if selected else 0  # Quality metric
    }

The implementation provides token-aware memory selection through a greedy knapsack algorithm that estimates token consumption for each memory, ranks memories by importance or hybrid scores, and iteratively adds memories until the token budget is exhausted, returning both the selected memories and rich metadata about budget utilization and selection quality. Let's demonstrate adaptive behavior across three budget levels: tight budgets (30 tokens) select only 1-2 critical memories, moderate budgets (50 tokens) fit 2-3 memories, and generous budgets (100 tokens) accommodate 4+ memories, with each configuration maximizing value density within available space.

In [12]:
# Test token-constrained selection with varying budget levels
print("Token-Constrained Memory Selection")
print("=" * 70)

# Test with tight, moderate, and generous budgets
budgets = [30, 50, 100]
query = "I need laptop recommendations"

for budget in budgets:
    # Run selection with current budget
    result = select_with_token_budget(
        sample_memories,
        token_budget=budget,
        query=query,  # Use hybrid scoring with query relevance
        embeddings=embeddings
    )
    
    print(f"\nToken Budget: {budget} tokens")
    print("-" * 70)
    print(f"Selected: {result['memory_count']} memories")
    print(f"Used: {result['total_tokens']}/{budget} tokens ({result['budget_used_pct']:.1f}%)")
    print(f"Avg importance: {result['avg_importance']:.3f}")
    print(f"\nMemories included:")

    # Show which memories fit in this budget
    for i, mem in enumerate(result['memories'], 1):
        tokens = estimate_tokens(mem.content)
        # Truncate long content for display
        print(f"  {i}. [{tokens} tokens] {mem.content[:60]}...")
    
    print()

Token-Constrained Memory Selection

Token Budget: 30 tokens
----------------------------------------------------------------------
Selected: 5 memories
Used: 26/30 tokens (86.7%)
Avg importance: 0.748

Memories included:
  1. [7 tokens] Recommended cooling pad, user purchased it and left positive...
  2. [5 tokens] User prefers eco-friendly packaging and sustainable products...
  3. [5 tokens] User's budget range is $800-$1200 for electronics...
  4. [4 tokens] User mentioned they work from home...
  5. [5 tokens] User asked about laptop specifications on 2024-01-15...


Token Budget: 50 tokens
----------------------------------------------------------------------
Selected: 6 memories
Used: 31/50 tokens (62.0%)
Avg importance: 0.708

Memories included:
  1. [7 tokens] Recommended cooling pad, user purchased it and left positive...
  2. [5 tokens] User prefers eco-friendly packaging and sustainable products...
  3. [5 tokens] User's budget range is $800-$1200 for electronics...
  4. [4 

Token-constrained selection optimizes context usage:
1. Implements token estimation for memory content to calculate space requirements accurately.
2. Uses greedy selection algorithm that adds highest-value memories first until budget is exhausted, maximizing importance within constraints.
3. Tests multiple budget levels showing how memory selection adapts to available space - fewer memories at tight budgets, more at generous budgets.
4. Reports budget utilization and average importance metrics, enabling monitoring of selection efficiency and quality.

This ensures production agents make optimal use of limited context windows.

## Part 5: Production memory selection system

Bringing together all the techniques we have explored, we can now build a production-ready memory selection system that combines importance scoring, contextual relevance, and token budget management. This system should provide a clean interface for agents, handle various memory types appropriately, track selection metadata for observability, and adapt to different use cases through configurable parameters.

The production system integrates importance calculation, hybrid scoring, token budgeting, and selection logging into a cohesive architecture. This enables agents to seamlessly access the most valuable, relevant memories while respecting context constraints and providing transparency into selection decisions.

In [13]:
class MemorySelector:
    """Production memory selection system with importance-based ranking."""
    
    def __init__(self, 
                 embeddings: OpenAIEmbeddings,
                 default_importance_weight: float = 0.6,
                 default_relevance_weight: float = 0.4):
        """
        Initialize memory selector with embedding model and default weights.
        
        Args:
            embeddings: Embedding model for semantic similarity (required for hybrid selection)
            default_importance_weight: Default weight for importance in hybrid scoring
            default_relevance_weight: Default weight for relevance in hybrid scoring
        """
        self.embeddings = embeddings
        self.importance_weight = default_importance_weight
        self.relevance_weight = default_relevance_weight
    
    def select(self,
              memories: List[Memory],
              query: Optional[str] = None,
              token_budget: Optional[int] = None,
              top_k: Optional[int] = None,
              min_importance: float = 0.0,
              memory_types: Optional[List[MemoryType]] = None) -> Dict:
        """
        Select memories using appropriate strategy based on provided parameters.
        
        Strategy selection logic:
        - If token_budget provided: Use token-constrained selection
        - Elif top_k provided: Use top-k selection (hybrid if query, pure importance otherwise)
        - Else: Return all memories passing filters
        
        Args:
            memories: All available memories to select from
            query: Optional query for hybrid relevance scoring
            token_budget: Optional maximum tokens (triggers token-constrained selection)
            top_k: Optional maximum count (triggers top-k selection)
            min_importance: Minimum importance threshold (applied before selection)
            memory_types: Optional list of types to filter to (e.g., [USER_PREFERENCE, TASK_OUTCOME])
            
        Returns:
            Dict containing:
            - selected_memories: List of selected Memory objects
            - count: Number of memories selected
            - avg_importance: Average importance of selected memories
            - metadata: Strategy-specific information (method, parameters, efficiency)
            - total_available: How many memories were available after filtering
        """
        # Step 1: Apply pre-selection filters
        # Filter by memory type if specified - Allows selecting only preferences, only outcomes, etc.
        if memory_types:
            memories = [m for m in memories if m.memory_type in memory_types]
        
        # Filter by minimum importance threshold - Removes low-value memories before selection
        memories = [m for m in memories 
                   if m.calculate_importance_score() >= min_importance]
        
        # Step 2: Select appropriate strategy based on constraints
        if token_budget:
            # Use token-constrained selection to maximize value within budget
            result = select_with_token_budget(
                memories,
                token_budget=token_budget,
                query=query,  # May be None (pure importance) or provided (hybrid)
                embeddings=self.embeddings if query else None,
                importance_weight=self.importance_weight,
                relevance_weight=self.relevance_weight
            )
            selected_memories = result['memories']
            # Build metadata with token budget information
            metadata = {
                'selection_method': 'token_budget',
                'token_budget': token_budget,
                'tokens_used': result['total_tokens'],
                'budget_efficiency': result['budget_used_pct']
            }
        elif top_k:
            # Use top-k selection
            if query:
                # Hybrid selection: balance importance and relevance
                scored = select_by_importance_and_relevance(
                    memories, query, self.embeddings, top_k=top_k,
                    importance_weight=self.importance_weight,
                    relevance_weight=self.relevance_weight
                )
                # Extract just the memory objects from scored results
                selected_memories = [m for m, _, _, _ in scored]
            else:
                # Pure importance selection: no query context
                selected_memories = select_by_importance(
                    memories, top_k=top_k, min_importance=min_importance
                )
            # Build metadata indicating which variant was used
            metadata = {
                'selection_method': 'top_k_hybrid' if query else 'top_k_importance',
                'top_k': top_k
            }
        else:
            # Default: return all memories that passed filters - Useful when filters alone (type, min_importance) are sufficient
            selected_memories = memories
            metadata = {
                'selection_method': 'filter_only'
            }
        
        # Step 3: Update access tracking for selected memories - This is critical for importance scoring based on usage frequency
        for mem in selected_memories:
            mem.access_count += 1  # Increment usage counter
            mem.last_accessed = datetime.now()  # Record access time
        
        # Step 4: Compile comprehensive results
        return {
            'selected_memories': selected_memories,
            'count': len(selected_memories),
            # Calculate average importance as quality metric
            'avg_importance': sum(m.calculate_importance_score() for m in selected_memories) / len(selected_memories) if selected_memories else 0,
            'metadata': metadata,  # Strategy-specific details
            'total_available': len(memories)  # For monitoring filter effectiveness
        }

The `MemorySelector` class provides a unified production interface that integrates all selection strategies (pure importance, hybrid importance-relevance, token-constrained) into a single select method with automatic strategy dispatch based on provided parameters. The implementation applies pre-selection filters for memory type and minimum importance, intelligently chooses between token-budget, top-k, or filter-only strategies, automatically updates access tracking for selected memories to feed back into future importance calculations, and returns comprehensive results with strategy-specific metadata enabling monitoring and debugging. Let's demonstrate real-world usage patterns: tight-budget query-aware selection, type-filtered preference selection, and high-importance outcome selection with quality thresholds.

In [14]:
# Create selector and test with various scenarios
selector = MemorySelector(embeddings)

print("Production Memory Selection System")
print("=" * 70)

# Test various production scenarios
scenarios = [
    {
        'name': 'Budget-constrained with query',
        'query': "Looking for laptop within budget",
        'token_budget': 60,  # Tight budget requiring optimization
    },
    {
        'name': 'Top-3 preferences only',
        'top_k': 3,
        'memory_types': [MemoryType.USER_PREFERENCE]  # Filter to just preferences
    },
    {
        'name': 'High-importance task outcomes',
        'top_k': 2,
        'min_importance': 0.5,  # Only high-value memories
        'memory_types': [MemoryType.TASK_OUTCOME]  # Filter to task outcomes
    },
]

# Run each scenario and display results
for scenario in scenarios:
    # Extract scenario name separately (not a select() parameter)
    name = scenario.pop('name')
    # Call unified select interface with scenario parameters
    result = selector.select(sample_memories, **scenario)
    
    print(f"\nScenario: {name}")
    print("-" * 70)
    print(f"Method: {result['metadata']['selection_method']}")
    print(f"Selected: {result['count']} of {result['total_available']} memories")
    print(f"Avg importance: {result['avg_importance']:.3f}")

    # Show token usage if applicable
    if 'token_budget' in result['metadata']:
        print(f"Tokens: {result['metadata']['tokens_used']}/{result['metadata']['token_budget']}")

    # Display selected memories
    print(f"\nSelected memories:")
    for i, mem in enumerate(result['selected_memories'], 1):
        imp = mem.calculate_importance_score()
        print(f"  {i}. [Imp: {imp:.3f}] {mem.content[:60]}...")  # Truncate long content for display
    
    print()

Production Memory Selection System

Scenario: Budget-constrained with query
----------------------------------------------------------------------
Method: token_budget
Selected: 6 of 6 memories
Avg importance: 0.628
Tokens: 31/60

Selected memories:
  1. [Imp: 0.862] User prefers eco-friendly packaging and sustainable products...
  2. [Imp: 0.864] Recommended cooling pad, user purchased it and left positive...
  3. [Imp: 0.789] User's budget range is $800-$1200 for electronics...
  4. [Imp: 0.519] User mentioned they work from home...
  5. [Imp: 0.405] User asked about laptop specifications on 2024-01-15...
  6. [Imp: 0.328] Previous recommendation for expensive laptop was rejected...


Scenario: Top-3 preferences only
----------------------------------------------------------------------
Method: top_k_importance
Selected: 2 of 2 memories
Avg importance: 0.828

Selected memories:
  1. [Imp: 0.864] User prefers eco-friendly packaging and sustainable products...
  2. [Imp: 0.791] User's 

The production system provides comprehensive memory selection capabilities:
1. Implements a unified interface supporting multiple selection strategies (token budget, top-k, filtering) through a single clean API.
2. Handles optional query-based relevance scoring, gracefully falling back to importance-only when no query is provided.
3. Supports memory type filtering and importance thresholds, enabling fine-grained control over what memories are considered.
4. Tracks access patterns by incrementing counts and updating timestamps, enabling future importance calculations to consider usage frequency.
5. Returns rich metadata about selection method, efficiency metrics, and aggregate statistics for monitoring and debugging.

This architecture supports production agent systems with sophisticated memory management requirements.