# Memory system architecture

AI agents that learn from interactions and maintain context across conversations need memory systems. Without memory, every interaction starts from scratch - agents cannot remember user preferences, recall past conversations or build on previous exchanges. With well-designed memory, agents provide personalized experiences, reference historical context and improve over time. However, poorly designed memory systems create their own challenges: unbounded growth that eventually overwhelms context windows, difficulty surfacing relevant memories when needed, and storage of trivial information that dilutes signal from important facts.

Memory system architecture is about building sustainable, efficient systems for storing, organizing and retrieving information across different timescales. This involves understanding the types of memory—episodic versus semantic, short-term versus long-term—and applying appropriate strategies for each. It requires structured storage that enables efficient querying, semantic tagging that supports similarity-based retrieval, importance scoring that prioritizes valuable information, and consolidation rules that compress memories as they age. The goal is preserving the signal that matters while discarding or compressing noise.

In this notebook, we explore systematic techniques for designing memory systems that scale effectively and retrieve intelligently. We will examine different memory types and their uses, structured storage formats that enable querying, semantic tagging and metadata systems, importance and relevance scoring, consolidation rules for memory compression, and full lifecycle management from storage through retrieval to eventual discard.

In [1]:
import os
from typing import List, Dict, Optional, Any, Tuple
from dataclasses import dataclass, field
from datetime import datetime, timedelta
from enum import Enum
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field
import json

### Initialize the language model

In [2]:
# Initialize the language model
llm = ChatOpenAI(
    model="gpt-4o-mini",
    api_key=os.getenv("OPENAI_API_KEY", "").strip(),
    temperature=0  # Set to 0 for more deterministic outputs
)

embeddings = OpenAIEmbeddings(model="text-embedding-ada-002", api_key=os.getenv("OPENAI_API_KEY", "").strip())

## Part 1: Memory types and their purposes

Not all information should be stored and retrieved the same way. A fact about user preferences requires different handling than a specific conversation turn, and information needed for the current session requires different management than facts that persist across months. Understanding memory types and their appropriate uses is the foundation of effective memory system architecture.

Memory classification happens along multiple dimensions: timespan distinguishes short-term session memory from long-term persistent memory, content type separates episodic events from semantic facts, and scope differentiates single-session data from cross-session information. Each combination of these dimensions calls for different storage strategies, retrieval approaches and lifecycle management. Choosing the right memory type for each piece of information determines whether our system can find what it needs when it needs it.

When building memory systems for AI agents, we need to think about how human memory works. Our brains distinguish between remembering specific events (like "I had coffee this morning") versus general knowledge (like "I prefer coffee over tea"). Similarly, we separate what we need to remember right now in a conversation from facts we carry with us for years. This natural categorization translates directly into AI agent memory architecture. Short-term episodic memory handles conversation flow and context - those references to "it" and "that" which need immediate context. Short-term semantic memory holds session-specific facts like a current shopping budget. Long-term episodic memory preserves the timeline of interactions, while long-term semantic memory stores enduring preferences and user profiles. Each type serves a distinct purpose, and mixing them inappropriately leads to either losing critical context or drowning in irrelevant details.

### Defining memory type categories
Let's start by defining the four fundamental memory types that our system will use. These categories form the backbone of our memory architecture and determine how we handle different kinds of information.

In [3]:
class MemoryType(Enum):
    """Types of memory in AI agent systems."""
    SHORT_TERM_EPISODIC = "short_term_episodic"  # Memory that holds recent conversation turns within the current session - Cleared when the session ends
    SHORT_TERM_SEMANTIC = "short_term_semantic"  # Facts and information extracted from the current session - Cleared when the session ends
    LONG_TERM_EPISODIC = "long_term_episodic"    # Historical conversation events that persist across sessions - Stored permanently for context and history
    LONG_TERM_SEMANTIC = "long_term_semantic"    # Learned facts, preferences, and user profile information - Stored permanently and used for personalization

- Enum class: Python's `Enum` provides type-safe memory classification, preventing invalid memory types from being used.

Now let's create concrete examples for each memory type to understand what kind of information belongs in each category.

In [4]:
# Create examples for each memory type to illustrate their use cases
memory_examples = {
    # Short-term episodic: Recent conversation turns and contextual references
    MemoryType.SHORT_TERM_EPISODIC: [
        "User asked about return policy 2 messages ago",
        "Agent suggested product X in the last response",
        "Current conversation is about laptop purchases"
    ],
    # Short-term semantic: Session-specific facts and temporary information
    MemoryType.SHORT_TERM_SEMANTIC: [
        "User's budget is $1500",
        "User prefers Windows over Mac",
        "User needs laptop for video editing"
    ],
    # Long-term episodic: Historical events and interaction timeline
    MemoryType.LONG_TERM_EPISODIC: [
        "User purchased Laptop Pro X1 on Jan 15, 2024",
        "User contacted support about shipping delay on Jan 20",
        "User left 5-star review on Jan 25"
    ],
    # Long-term semantic: Persistent user profile and learned preferences
    MemoryType.LONG_TERM_SEMANTIC: [
        "User prefers email over phone contact",
        "User is a software developer",
        "User lives in California (PST timezone)"
    ]
}

# Display the examples in a readable format
print("Memory Types and Examples:")
print("="*60)
# Iterate through each memory type and display its examples
for mem_type, examples in memory_examples.items():
    # Format the memory type name for display
    print(f"\n{mem_type.value.upper().replace('_', ' ')}:")
    for example in examples:
        print(f"  • {example}")

Memory Types and Examples:

SHORT TERM EPISODIC:
  • User asked about return policy 2 messages ago
  • Agent suggested product X in the last response
  • Current conversation is about laptop purchases

SHORT TERM SEMANTIC:
  • User's budget is $1500
  • User prefers Windows over Mac
  • User needs laptop for video editing

LONG TERM EPISODIC:
  • User purchased Laptop Pro X1 on Jan 15, 2024
  • User contacted support about shipping delay on Jan 20
  • User left 5-star review on Jan 25

LONG TERM SEMANTIC:
  • User prefers email over phone contact
  • User is a software developer
  • User lives in California (PST timezone)


- The dictionary structure makes it easy to extend with more examples or modify existing ones.

Key Differences:
- Short-term: Cleared after session ends
- Long-term: Persists across sessions
- Episodic: Specific events with context
- Semantic: General facts without temporal context
  
### When to use each memory type
Knowing the memory types is only half the battle - the real skill lies in deciding which type to use for each piece of information. When a user says "I prefer email contact," should that go into short-term semantic memory for this session, or long-term semantic memory to persist across all future interactions? When they mention "I bought a laptop last month," is that a long-term episodic event to record, or just conversational noise to ignore? The decision framework for memory type selection depends on answering a few key questions: Is this information session-specific or should it persist? Is it a specific event or a general fact? How important is it for future personalization? Let's build a decision matrix that maps common use cases to appropriate memory types.

In [5]:
# Decision matrix for memory type selection - This helps determine which memory type to use in different scenarios
memory_use_cases = {
    "Maintaining conversation flow": MemoryType.SHORT_TERM_EPISODIC,  # Conversation flow requires short-term episodic for immediate context
    "Tracking user preferences in session": MemoryType.SHORT_TERM_SEMANTIC,  # Session preferences are short-term semantic (cleared after session)
    "Referencing past purchases": MemoryType.LONG_TERM_EPISODIC,  # Historical transactions are long-term episodic events
    "Personalizing future interactions": MemoryType.LONG_TERM_SEMANTIC,  # Persistent preferences are long-term semantic facts
    "Understanding context of 'it' or 'that'": MemoryType.SHORT_TERM_EPISODIC,  # Pronoun resolution needs recent conversation context
    "Remembering customer communication preferences": MemoryType.LONG_TERM_SEMANTIC,  # Communication preferences persist across all sessions
}

print("Memory Type Selection Guide:")
print("="*60)
for use_case, mem_type in memory_use_cases.items():
    print(f"\n{use_case}:")
    print(f"  → Use {mem_type.value}")

Memory Type Selection Guide:

Maintaining conversation flow:
  → Use short_term_episodic

Tracking user preferences in session:
  → Use short_term_semantic

Referencing past purchases:
  → Use long_term_episodic

Personalizing future interactions:
  → Use long_term_semantic

Understanding context of 'it' or 'that':
  → Use short_term_episodic

Remembering customer communication preferences:
  → Use long_term_semantic


This creates a decision matrix that maps common agent use cases to their appropriate memory types.
- Dictionary structure: Keys are use case descriptions, values are the recommended `MemoryType`.
- Categorization logic: Each use case is analyzed for persistence needs (short vs long-term) and information type (episodic vs semantic).

The dictionary acts as a lookup table for memory type selection during runtime. This pattern can be extended to include scoring logic or priority weighting for cases where multiple memory types might apply. In production systems, this matrix could be used by an automated classifier that determines memory type based on content analysis. The simple key-value structure makes it easy to add new use cases or modify existing mappings without changing code logic.

## Part 2: Structured storage formats
Memory must be stored in structured formats that facilitate efficient retrieval and updates. The difference between effective and ineffective memory systems often comes down to how the information is stored. Structured storage means treating each memory as a distinct entity with properties rather than cramming everything into freeform text. Each memory needs an identifier for updates, a timestamp for recency tracking, importance scores for prioritization, tags for categorization and metadata for rich filtering. This structure enables operations that would be impossible with unstructured storage: "retrieve all memories about laptops from the last week with importance above 0.8" becomes a simple query rather than text parsing nightmare. Let's see the stark difference between these approaches.

### Bad approach: Unstructured text storage
Let's first look at what happens when we store memory as unstructured text. This approach might seem simple initially, but quickly becomes unmaintainable as the system scales.

In [6]:
# Example of unstructured memory storage - everything in one text blob
# This demonstrates the problematic approach where all memories are concatenated as plain text
unstructured_memory = """
User asked about laptops. They want something for video editing. Budget is around $1500.
They previously bought a tablet from us. They prefer email contact. They mentioned they're
a software developer. They asked about the return policy. They live in California.
"""

print("❌ Unstructured memory storage:")
print(unstructured_memory)

❌ Unstructured memory storage:

User asked about laptops. They want something for video editing. Budget is around $1500.
They previously bought a tablet from us. They prefer email contact. They mentioned they're
a software developer. They asked about the return policy. They live in California.



Problems:
- Can't easily update specific facts.
- Hard to search for specific information.
- No way to prioritize important vs trivial facts.
- Can't filter by recency or relevance.
- Difficult to remove outdated information.

### Good approach: Structured memory with schemas
The solution is to define a structured schema for memory entries using Pydantic models. Each memory becomes a first-class object with well-defined fields: a unique identifier, the memory type classification, the actual content, timestamp for temporal tracking, importance score for prioritization, tags for categorization and arbitrary metadata for extensibility. This structure transforms memory from opaque text into queryable, updateable data. Let's first define our memory entry schema.

In [7]:
class MemoryEntry(BaseModel):
    """Structured memory entry with metadata and validation."""
    
    id: str  # Unique identifier for this memory entry, used for updates and retrieval
    memory_type: MemoryType  # Type of memory (short/long-term, episodic/semantic)
    content: str  # The actual content/text of the memory
    timestamp: datetime  # When this memory was created or last updated
    importance: float = Field(ge=0.0, le=1.0, description="Importance score 0-1")  # Importance score between 0.0 (trivial) and 1.0 (critical)
    tags: List[str] = Field(default_factory=list)  # List of tags for categorization and retrieval
    metadata: Dict[str, Any] = Field(default_factory=dict)  # Arbitrary metadata for additional context - Can store source, category, verification status, etc.
    
    class ConfigDict:
        # Convert enum values to their string representation when serializing
        use_enum_values = True

The schema makes memories database-ready - this structure could map directly to database columns or document fields.
- Pydantic BaseModel: Provides automatic validation, type checking, and serialization for memory entries.
- Field constraints: The importance field uses Pydantic's `Field` with `ge` (greater-than-or-equal) and `le` (less-than-or-equal) validators to enforce 0.0-1.0 range.
- Default factories: `default_factory=list` and `default_factory=dict` ensure each instance gets its own mutable default values.
- The `ConfigDict` class with `use_enum_values=True` automatically converts MemoryType enums to strings during serialization (useful for JSON export).

Now let's create some structured memory entries using this schema. We will create memories representing the same information from the unstructured example, but this time with proper structure.

In [8]:
# Create structured memory entries with all required fields
structured_memories = [
    MemoryEntry(
        id="mem_001",
        memory_type=MemoryType.SHORT_TERM_SEMANTIC,
        content="User's budget is $1500 for laptop purchase",
        timestamp=datetime.now(),
        importance=0.9,
        tags=["budget", "purchase", "laptop"],
        metadata={"category": "preference", "context": "current_session"}
    ),
    MemoryEntry(
        id="mem_002",
        memory_type=MemoryType.LONG_TERM_SEMANTIC,
        content="User prefers email contact over phone",
        timestamp=datetime.now() - timedelta(days=30),
        importance=0.7,
        tags=["communication", "preference"],
        metadata={"category": "user_preference", "verified": True}
    ),
    MemoryEntry(
        id="mem_003",
        memory_type=MemoryType.LONG_TERM_EPISODIC,
        content="User purchased Tablet Mini on 2024-01-15 for $499",
        timestamp=datetime(2024, 1, 15),
        importance=0.8,
        tags=["purchase", "tablet", "transaction"],
        metadata={"order_id": "ORD-12345", "amount": 499.00}
    ),
]

print("✅ Structured memory entries:")
print("="*60)
for mem in structured_memories:
    print(f"\nID: {mem.id}")
    print(f"Type: {mem.memory_type}")
    print(f"Content: {mem.content}")
    print(f"Importance: {mem.importance}")
    print(f"Tags: {', '.join(mem.tags)}")
    print(f"Metadata: {mem.metadata}")

✅ Structured memory entries:

ID: mem_001
Type: MemoryType.SHORT_TERM_SEMANTIC
Content: User's budget is $1500 for laptop purchase
Importance: 0.9
Tags: budget, purchase, laptop
Metadata: {'category': 'preference', 'context': 'current_session'}

ID: mem_002
Type: MemoryType.LONG_TERM_SEMANTIC
Content: User prefers email contact over phone
Importance: 0.7
Tags: communication, preference
Metadata: {'category': 'user_preference', 'verified': True}

ID: mem_003
Type: MemoryType.LONG_TERM_EPISODIC
Content: User purchased Tablet Mini on 2024-01-15 for $499
Importance: 0.8
Tags: purchase, tablet, transaction
Metadata: {'order_id': 'ORD-12345', 'amount': 499.0}


Benefits:
- Easy to query by type, tags, or importance
- Can update individual entries
- Metadata enables rich filtering
- Importance scores for prioritization

Compared to the unstructured approach, we can now: update specific memories by ID, query by importance threshold, filter by tags, sort by timestamp and aggregate by metadata categories.

## Part 3: Semantic tagging and metadata systems
Tags and metadata enable efficient retrieval and organization of memories. Having structured memory entries is valuable, but without effective tagging and indexing, retrieval still requires scanning through every single memory. Imagine having thousands of structured memories but needing to iterate through all of them to find those related to "laptop purchases" - the structure helps, but we are still doing linear search. Semantic tagging solves this by creating an inverted index: instead of going from memory to tags, we go from tags to memories.

Good tagging systems operate on multiple levels. Primary tags capture the main topic - "laptop", "budget", "purchase". Secondary tags might capture sentiment, context or relationships. Metadata provides a catch-all for additional structure that doesn't fit neatly into tags: verification status, source system, related entities, confidence scores. Together, tags and metadata transform memory retrieval from "find a needle in a haystack" to "open the drawer labeled 'needles'." Let's build a memory store with tag-based indexing.

### Building a memory store with semantic indexing
We will create a `MemoryStore` class that maintains both the memories themselves and inverted indices for fast tag-based lookup. Let's start by defining the class structure and initialization.

In [9]:
class MemoryStore:
    """In-memory storage with semantic tagging."""
    
    def __init__(self):
        self.memories: List[MemoryEntry] = []  # Main storage: list of all memory entries
        # Inverted index: maps each tag to list of memory IDs that have that tag
        self.tag_index: Dict[str, List[str]] = {}  # tag -> [memory_ids]
    
    def add_memory(self, memory: MemoryEntry):
        """Add memory and update tag index."""
        # Store the memory in our main list
        self.memories.append(memory)
        
        # Update the inverted index for each tag - This allows fast retrieval by tag without scanning all memories
        for tag in memory.tags:
            # Create a new list for this tag if it doesn't exist
            if tag not in self.tag_index:
                self.tag_index[tag] = []
             # Add this memory's ID to the tag's index
            self.tag_index[tag].append(memory.id)
    
    def get_by_tags(self, tags: List[str], match_all: bool = False) -> List[MemoryEntry]:
        """Retrieve memories by tags."""
        if match_all:
            # AND logic: memory must have ALL specified tags
            # Start with memories that have the first tag
            matching_ids = set(self.tag_index.get(tags[0], []))
            # Intersect with memories that have each subsequent tag
            for tag in tags[1:]:
                matching_ids &= set(self.tag_index.get(tag, []))
        else:
            # OR logic: memory must have AT LEAST ONE tag
            matching_ids = set()
            # Union all memory IDs across all specified tags
            for tag in tags:
                matching_ids |= set(self.tag_index.get(tag, []))

        # Convert memory IDs back to actual memory objects
        return [m for m in self.memories if m.id in matching_ids]
    
    def get_by_type(self, memory_type: MemoryType) -> List[MemoryEntry]:
        """Retrieve memories by type."""
        # Filter memories by their type attribute
        return [m for m in self.memories if m.memory_type == memory_type]
    
    def get_by_importance(self, min_importance: float) -> List[MemoryEntry]:
        """Retrieve high-importance memories."""
        # Filter memories with importance >= threshold
        high_importance = [m for m in self.memories if m.importance >= min_importance]
        
        # Sort by importance in descending order (most important first)
        return sorted(high_importance, key=lambda x: x.importance, reverse=True)

Now let's test our MemoryStore by adding the structured memories we created earlier.

In [10]:
# Create a new memory store instance
store = MemoryStore()

# Add our previously created structured memories to the store
for mem in structured_memories:
    store.add_memory(mem)

print(f"Added {len(structured_memories)} memories to the store")
print(f"Tag index now contains {len(store.tag_index)} unique tags")

Added 3 memories to the store
Tag index now contains 7 unique tags


Let's add a few more memories to make our testing more interesting.

In [11]:
# Create additional memories with different tags and attributes
additional_memories = [
    MemoryEntry(
        id="mem_004",
        memory_type=MemoryType.SHORT_TERM_SEMANTIC,
        content="User needs laptop for video editing",
        timestamp=datetime.now(),
        importance=0.95,
        tags=["requirement", "laptop", "video_editing"],
        metadata={"category": "use_case"}
    ),
    MemoryEntry(
        id="mem_005",
        memory_type=MemoryType.LONG_TERM_SEMANTIC,
        content="User is a software developer",
        timestamp=datetime.now() - timedelta(days=60),
        importance=0.6,
        tags=["occupation", "profile"],
        metadata={"category": "user_profile"}
    ),
]

# Add the new memories to our store
for mem in additional_memories:
    store.add_memory(mem)

print(f"Total memories in store: {len(store.memories)}")
print(f"Total unique tags: {len(store.tag_index)}")
print(f"Tags: {', '.join(sorted(store.tag_index.keys()))}")

Total memories in store: 5
Total unique tags: 11
Tags: budget, communication, laptop, occupation, preference, profile, purchase, requirement, tablet, transaction, video_editing


### Testing tag-based retrieval
Now let's test our tag-based retrieval system to see how efficiently we can find memories by their tags.

In [12]:
# Test retrieval by tags
print("Retrieval by tags:")
print("="*60)

# Find all memories tagged with "laptop"
laptop_memories = store.get_by_tags(["laptop"])
print(f"\nMemories tagged 'laptop' ({len(laptop_memories)}):")
for mem in laptop_memories:
    print(f"  • {mem.content}")

# Find all memories tagged with "purchase"
purchase_memories = store.get_by_tags(["purchase"])
print(f"\nMemories tagged 'purchase' ({len(purchase_memories)}):")
for mem in purchase_memories:
    print(f"  • {mem.content}")

Retrieval by tags:

Memories tagged 'laptop' (2):
  • User's budget is $1500 for laptop purchase
  • User needs laptop for video editing

Memories tagged 'purchase' (2):
  • User's budget is $1500 for laptop purchase
  • User purchased Tablet Mini on 2024-01-15 for $499


### Testing importance-based retrieval
We can also filter memories by importance to prioritize critical information.

In [13]:
# Retrieve memories above importance threshold
# This is useful when context window is limited and we need only critical info
high_importance = store.get_by_importance(0.8)

print(f"High-importance memories (≥0.8) ({len(high_importance)}):")
for mem in high_importance:
    # Display with importance score for context
    print(f"  • [{mem.importance}] {mem.content}")

High-importance memories (≥0.8) (3):
  • [0.95] User needs laptop for video editing
  • [0.9] User's budget is $1500 for laptop purchase
  • [0.8] User purchased Tablet Mini on 2024-01-15 for $499


### Testing type-based retrieval
Finally, let's retrieve memories by their type classification.

In [14]:
# Retrieve all long-term semantic memories (persistent user facts/preferences)
semantic_memories = store.get_by_type(MemoryType.LONG_TERM_SEMANTIC)

print(f"Long-term semantic memories ({len(semantic_memories)}):")
for mem in semantic_memories:
    print(f"  • {mem.content}")

Long-term semantic memories (2):
  • User prefers email contact over phone
  • User is a software developer


In production, this pattern could be extended with vector similarity search using embeddings for semantic retrieval beyond exact tag matching.

## Part 4: Importance and relevance scoring

Not all memories are equally important. Scoring helps prioritize what to retrieve and what to discard. Imagine having hundreds of memories - some critical ("user's budget is $1500"), some useful ("user prefers email"), and some trivial ("user said thanks"). Without importance scoring, retrieval treats them equally, wasting context window on low-value information. The solution is multi-dimensional scoring that considers intrinsic importance (how critical is this fact?), recency (when was this created?), and relevance (how related is this to the current query?).

Good scoring systems combine multiple signals. Base importance comes from content analysis - keywords like "budget", "need", "must" signal high importance. Recency scoring applies exponential decay - recent memories score higher, old ones fade. Relevance scoring matches memory tags against query context. The composite score weights these factors appropriately: maybe 40% importance, 30% recency, 30% relevance. This multi-factor approach ensures we surface the right memories at the right time. Let's start by seeing the problem without scoring.

In [15]:
# Example of retrieving ALL memories without any prioritization
# This demonstrates the problem: important and trivial facts mixed together
def retrieve_all_memories(store: MemoryStore) -> str:
    """Retrieve all memories without prioritization."""
    # Simply concatenate all memory contents
    return "\n".join([m.content for m in store.memories])

# Get all memories from our store
all_memories = retrieve_all_memories(store)

print("❌ All memories (no prioritization):")
print(all_memories)
print(f"\nTotal characters: {len(all_memories)}")

❌ All memories (no prioritization):
User's budget is $1500 for laptop purchase
User prefers email contact over phone
User purchased Tablet Mini on 2024-01-15 for $499
User needs laptop for video editing
User is a software developer

Total characters: 195


Problems:
- Trivial facts mixed with critical ones.
- Wastes context window on low-value info.
- No way to prioritize when space is limited.

### Implementing importance scoring
Now we will build a comprehensive scoring system that evaluates memories across multiple dimensions: intrinsic importance based on content keywords, recency using exponential time decay, relevance through tag matching and a composite score that combines all factors. This multi-dimensional approach ensures we retrieve the most valuable memories for any given context.

In [16]:
class ImportanceScorer:
    """Calculate importance scores for memories using multiple signals."""
    
    @staticmethod
    def calculate_base_importance(memory: MemoryEntry) -> float:
        """Calculate intrinsic importance based on content keywords."""
        content_lower = memory.content.lower()
        
        # Keywords indicating high-importance information - These suggest critical user requirements or decisions
        high_importance_keywords = [
            'purchase', 'order', 'budget', 'requirement', 
            'need', 'must', 'critical', 'urgent'
        ]
        
        # Keywords indicating medium-importance preferences - These suggest user preferences but not requirements
        medium_importance_keywords = [
            'prefer', 'like', 'want', 'interested'
        ]
        
        # Start with base score of 0.5 (neutral importance)
        score = 0.5

        # Add bonus for high-importance keywords
        for keyword in high_importance_keywords:
            if keyword in content_lower:
                score += 0.2

        # Add smaller bonus for medium-importance keywords
        for keyword in medium_importance_keywords:
            if keyword in content_lower:
                score += 0.1
        
        # Long-term semantic memories (user profile facts) are generally more important - These are persistently valuable across sessions
        if memory.memory_type == MemoryType.LONG_TERM_SEMANTIC:
            score += 0.1
        
        return min(score, 1.0)
    
    @staticmethod
    def calculate_recency_score(memory: MemoryEntry) -> float:
        """Calculate score based on recency using exponential decay."""
        # Calculate how many days old this memory is
        age_days = (datetime.now() - memory.timestamp).days
        
        # Apply exponential decay formula: score = e^(-age/half_life)
        # Half-life of 30 days means score drops to 0.5 after 30 days
        half_life = 30  # 30 days
        import math
        recency_score = math.exp(-age_days / half_life)
        
        return recency_score
    
    @staticmethod
    def calculate_relevance_score(memory: MemoryEntry, query_tags: List[str]) -> float:
        """Calculate relevance to current query using tag overlap."""
        # If no query tags provided, use neutral relevance
        if not query_tags:
            return 0.5
        
        # Convert to sets for efficient set operations
        memory_tags = set(memory.tags)
        query_tag_set = set(query_tags)

        # If either set is empty, use neutral relevance
        if not memory_tags or not query_tag_set:
            return 0.5

        # Calculate Jaccard similarity: intersection / union
        intersection = len(memory_tags & query_tag_set)
        union = len(memory_tags | query_tag_set)
        
        return intersection / union if union > 0 else 0.0
    
    @staticmethod
    def calculate_composite_score(memory: MemoryEntry, 
                                  query_tags: List[str] = None,
                                  weights: Dict[str, float] = None) -> float:
        """Calculate composite score from multiple factors."""
        # Default weights: balance importance, recency and relevance
        if weights is None:
            weights = {
                'importance': 0.4,  # 40% weight on intrinsic importance
                'recency': 0.3,     # 30% weight on how recent
                'relevance': 0.3    # 30% weight on query relevance
            }
        
        query_tags = query_tags or []

        # Calculate all three score components
        scores = {
            'importance': memory.importance,  # Use pre-calculated importance
            'recency': ImportanceScorer.calculate_recency_score(memory),
            'relevance': ImportanceScorer.calculate_relevance_score(memory, query_tags)
        }

        # Weighted sum of all components
        # composite = (0.4 × importance) + (0.3 × recency) + (0.3 × relevance)
        composite = sum(scores[k] * weights[k] for k in weights)
        
        return composite

We have built a multi-dimensional scoring system that evaluates memories using:
- Base importance: Keyword-based analysis assigns scores based on content criticality.
- Recency scoring: Exponential decay (e^(-t/τ)) reduces scores for older memories, with configurable half-life. The exponential decay function creates smooth aging.
- Relevance scoring: Jaccard similarity between memory tags and query tags measures contextual relevance. Jaccard similarity handles partial tag matches - memories with some relevant tags score higher than completely unrelated ones.
- Composite scoring: Weighted combination of all three factors produces final ranking score.

The scoring separates concerns: each method handles one dimension, making it easy to modify or extend individual scoring components. In production, base importance could use ML models instead of keywords, and relevance could use embedding similarity for semantic matching.

Now let's test our scoring system by scoring all memories in the context of a specific query.

In [17]:
# Define the current query context using tags
# This simulates a user asking about laptop purchases
query_tags = ["laptop", "purchase"]

print("Importance Scoring:")
print("="*60)
print(f"Query context tags: {query_tags}\n")

# Calculate composite scores for all memories
scored_memories = []
for mem in store.memories:
    # Get the composite score considering importance, recency, and relevance
    score = ImportanceScorer.calculate_composite_score(mem, query_tags)
    scored_memories.append((mem, score))

# Sort memories by score in descending order (highest scores first)
scored_memories.sort(key=lambda x: x[1], reverse=True)

# Display memories ranked by their composite scores
print("\nMemories ranked by composite score:")
for mem, score in scored_memories:
    print(f"\n[Score: {score:.3f}] {mem.content}")
    print(f"  Importance: {mem.importance:.2f} | Age: {(datetime.now() - mem.timestamp).days} days")
    print(f"  Tags: {', '.join(mem.tags)}")

Importance Scoring:
Query context tags: ['laptop', 'purchase']


Memories ranked by composite score:

[Score: 0.860] User's budget is $1500 for laptop purchase
  Importance: 0.90 | Age: 0 days
  Tags: budget, purchase, laptop

[Score: 0.755] User needs laptop for video editing
  Importance: 0.95 | Age: 0 days
  Tags: requirement, laptop, video_editing

[Score: 0.395] User purchased Tablet Mini on 2024-01-15 for $499
  Importance: 0.80 | Age: 693 days
  Tags: purchase, tablet, transaction

[Score: 0.390] User prefers email contact over phone
  Importance: 0.70 | Age: 30 days
  Tags: communication, preference

[Score: 0.281] User is a software developer
  Importance: 0.60 | Age: 60 days
  Tags: occupation, profile


## Part 5: Consolidation rules

As memories accumulate, consolidation compresses related memories to save space while preserving information. Without consolidation, memory systems grow unbounded. A user might mention their "$1500 budget" three times in a conversation, creating three nearly identical memories. Over weeks and months, this redundancy multiplies - thousands of memories saying essentially the same thing, consuming context window and slowing retrieval. The solution is intelligent consolidation that detects similar memories and merges them into single, comprehensive entries.

Effective consolidation requires careful design. First, identify similar memories using tag overlap, content similarity or embedding distance. Second, merge them intelligently - don't just delete, but create a new consolidated memory that preserves information from all sources. Third, maintain provenance by tracking which original memories contributed to the consolidated version. Fourth, preserve the most recent timestamp and highest importance score. This approach achieves dramatic space savings (often 50-70% reduction) while actually improving information quality by removing redundancy. Let's see the problem first.

In [18]:
# Demonstrate the problem: redundant memories saying essentially the same thing
# These three memories all convey the same information about budget
redundant_memories = [
    MemoryEntry(
        id="mem_101",
        memory_type=MemoryType.SHORT_TERM_SEMANTIC,
        content="User's budget is $1500",
        timestamp=datetime.now() - timedelta(minutes=30),
        importance=0.8,
        tags=["budget"],
        metadata={}
    ),
    MemoryEntry(
        id="mem_102",
        memory_type=MemoryType.SHORT_TERM_SEMANTIC,
        content="User can spend up to $1500 on laptop",
        timestamp=datetime.now() - timedelta(minutes=20),
        importance=0.8,
        tags=["budget", "laptop"],
        metadata={}
    ),
    MemoryEntry(
        id="mem_103",
        memory_type=MemoryType.SHORT_TERM_SEMANTIC,
        content="User mentioned $1500 budget for new laptop",
        timestamp=datetime.now() - timedelta(minutes=10),
        importance=0.8,
        tags=["budget", "laptop"],
        metadata={}
    ),
]

# Show the redundancy problem
print("❌ Before consolidation (redundant):")
for mem in redundant_memories:
    print(f"  • [{mem.id}] {mem.content}")

# Calculate wasted space
print(f"\nTotal memories: {len(redundant_memories)}")
print("Total characters: {}".format(sum(len(m.content) for m in redundant_memories)))

❌ Before consolidation (redundant):
  • [mem_101] User's budget is $1500
  • [mem_102] User can spend up to $1500 on laptop
  • [mem_103] User mentioned $1500 budget for new laptop

Total memories: 3
Total characters: 100


### Implementing consolidation rules
Now, we will build a consolidation system that intelligently merges similar memories. The system will use tag overlap to identify related memories, then use an LLM to create a consolidated version that preserves the essential information while eliminating redundancy.

In [19]:
class MemoryConsolidator:
    """Consolidate related memories to reduce redundancy."""
    
    def __init__(self, llm):
        # Store LLM reference for generating consolidated memories
        self.llm = llm
    
    def find_similar_memories(self, memories: List[MemoryEntry], 
                            similarity_threshold: float = 0.7) -> List[List[MemoryEntry]]:
        """Group similar memories together based on tag overlap."""
        # Simple approach: group by overlapping tags
        groups = []
        processed = set()  # Track which memories we have already grouped

        # Compare each memory with all subsequent memories
        for i, mem1 in enumerate(memories):
            # Skip if this memory is already in a group
            if mem1.id in processed:
                continue

            # Start a new group with this memory
            group = [mem1]
            processed.add(mem1.id)

            # Look for similar memories in the remaining list
            for mem2 in memories[i+1:]:
                if mem2.id in processed:
                    continue
                
                # Calculate tag overlap using Jaccard similarity
                tags1 = set(mem1.tags)
                tags2 = set(mem2.tags)
                
                if tags1 and tags2:
                    # Jaccard = intersection / union
                    overlap = len(tags1 & tags2) / len(tags1 | tags2)

                    # If overlap exceeds threshold, add to group
                    if overlap >= similarity_threshold:
                        group.append(mem2)
                        processed.add(mem2.id)

            # Only keep groups with multiple memories (single memories don't need consolidation)
            if len(group) > 1:
                groups.append(group)
        
        return groups
    
    def consolidate_group(self, group: List[MemoryEntry]) -> MemoryEntry:
        """Consolidate a group of similar memories into one using LLM."""
        # Format all memory contents for the LLM
        contents = "\n".join([f"- {m.content}" for m in group])

        # Ask LLM to create consolidated version
        prompt = f"""Consolidate these related memories into a single, concise statement:

{contents}

Consolidated memory (one sentence):"""
        
        response = self.llm.invoke(prompt)
        consolidated_content = response.content.strip()
        
        # Combine all tags from all memories (union)
        all_tags = set()
        for mem in group:
            all_tags.update(mem.tags)
        
        # Use the most recent timestamp from the group
        most_recent = max(group, key=lambda m: m.timestamp)

        # Use the highest importance score from the group
        highest_importance = max(m.importance for m in group)

        # Create consolidated memory with metadata tracking source memories
        return MemoryEntry(
            id=f"consolidated_{group[0].id}",
            memory_type=group[0].memory_type,  # Assume same type for group
            content=consolidated_content,
            timestamp=most_recent.timestamp,
            importance=highest_importance,
            tags=list(all_tags),  # Convert set back to list
            metadata={
                "consolidated_from": [m.id for m in group],
                "consolidation_date": datetime.now().isoformat()
            }
        )
    
    def consolidate_memories(self, memories: List[MemoryEntry]) -> List[MemoryEntry]:
        """Consolidate all similar memories in the list."""
        # Find groups of similar memories
        groups = self.find_similar_memories(memories)
        
        # Track which memories have been consolidated
        consolidated = []
        consolidated_ids = set()

        # Consolidate each group
        for group in groups:
            consolidated_mem = self.consolidate_group(group)
            consolidated.append(consolidated_mem)
            # Track all memory IDs that went into this consolidation
            consolidated_ids.update(m.id for m in group)
        
        # Keep memories that were not consolidated (singlets)
        for mem in memories:
            if mem.id not in consolidated_ids:
                consolidated.append(mem)
        
        return consolidated

The consolidation system implements intelligent memory merging through:
- Similarity detection: Jaccard similarity on tag sets identifies related memories above a configurable threshold. The similarity threshold (default 0.7) controls aggressiveness - lower values consolidate more, higher values are more conservative.
- LLM-powered merging: Uses the language model to create natural, consolidated summaries that preserve key information. LLM consolidation is more nuanced than simple string concatenation - it can rephrase, remove duplication, and create coherent summaries.
- Metadata preservation: Tracks provenance by storing source memory IDs in the consolidated entry.
- Property aggregation: Retains the most recent timestamp and highest importance score from the group.

In production, we might batch consolidation (run nightly), use embedding similarity instead of tag overlap for better semantic matching, or implement incremental consolidation that only processes new memories.

Now let's test consolidation on our redundant memories.

In [20]:
# Create consolidator instance
consolidator = MemoryConsolidator(llm)

print("\nConsolidating redundant memories...\n")
# Consolidate the redundant memories
consolidated_memories = consolidator.consolidate_memories(redundant_memories)

# Display the consolidated results
print("✅ After consolidation:")
for mem in consolidated_memories:
    print(f"  • [{mem.id}] {mem.content}")
    # Show which memories were consolidated if applicable
    if "consolidated_from" in mem.metadata:
        print(f"    (consolidated from: {', '.join(mem.metadata['consolidated_from'])})")

# Calculate space savings
original_count = len(redundant_memories)
consolidated_count = len(consolidated_memories)
original_chars = sum(len(m.content) for m in redundant_memories)
new_chars = sum(len(m.content) for m in consolidated_memories)
char_savings = original_chars - new_chars
percent_savings = (char_savings / original_chars * 100) if original_chars > 0 else 0

print(f"\nTotal memories: {original_count} → {consolidated_count}")
print(f"Total characters: {original_chars} → {new_chars}")
print(f"✅ Space saved: {char_savings} characters ({percent_savings:.1f}%)")


Consolidating redundant memories...

✅ After consolidation:
  • [consolidated_mem_102] User has a budget of up to $1500 for a new laptop.
    (consolidated from: mem_102, mem_103)
  • [mem_101] User's budget is $1500

Total memories: 3 → 2
Total characters: 100 → 72
✅ Space saved: 28 characters (28.0%)


## Part 6: Memory lifecycle management
Effective memory systems manage the full lifecycle: what gets stored, how it is encoded, when it is retrieved, and when it is compressed or discarded. Memory lifecycle management is the orchestration layer that brings together all the techniques we have covered. It is not enough to have structured storage, tagging, scoring and consolidation as separate pieces - they need to work together in a coordinated flow that handles memories from birth to death.

The memory lifecycle has five critical stages. Storage decides what information deserves to be remembered in the first place - filtering out trivial greetings and acknowledgments while capturing substantive facts. Encoding transforms raw text into structured entries with appropriate types, tags, and importance scores. Retrieval uses scoring to surface the most relevant memories for each query context. Consolidation periodically merges similar memories to prevent unbounded growth. Finally, discard removes memories that are both old and unimportant, making room for new information. Let's build a complete lifecycle manager that coordinates all these stages.

In [21]:
class MemoryLifecycleManager:
    """Manage the full lifecycle of memories."""
    
    def __init__(self, llm, max_memories: int = 100):
        self.llm = llm  # LLM for tag extraction and encoding decisions
        self.store = MemoryStore()  # Memory store with inverted indexing
        self.consolidator = MemoryConsolidator(llm)  # Consolidator for merging similar memories
        self.max_memories = max_memories  # Maximum number of memories before forcing cleanup
    
    # Stage 1: STORAGE DECISION - What deserves to be remembered?
    def should_store(self, content: str, context: Dict) -> bool:
        """Decide if information should be stored based on content analysis."""
        content_lower = content.lower()
        
        # Filter out trivial conversational phrases - These add no value and waste memory space
        trivial_phrases = ['hello', 'hi', 'thanks', 'ok', 'bye']
        if any(phrase in content_lower for phrase in trivial_phrases):
            return False
        
        # Store factual information, preferences and important events - These keywords indicate substantive information worth remembering
        important_indicators = [
            'prefer', 'need', 'want', 'budget', 'purchase', 
            'order', 'like', 'dislike', 'always', 'never'
        ]
        if any(indicator in content_lower for indicator in important_indicators):
            return True
        
        # Store if content is substantial (not too short) - Very short statements are likely acknowledgments, not facts
        return len(content.split()) > 5
    
    # Stage 2: ENCODING - Transform raw text into structured memory
    def encode_memory(self, content: str, context: Dict) -> MemoryEntry:
        """Encode information as a structured memory entry with tags and importance."""
        # Determine appropriate memory type based on content keywords
        if any(word in content.lower() for word in ['purchased', 'ordered', 'on 20']):
            # Past events with dates are long-term episodic
            memory_type = MemoryType.LONG_TERM_EPISODIC
        elif context.get('session_scope', False):
            # Session-specific facts are short-term semantic
            memory_type = MemoryType.SHORT_TERM_SEMANTIC
        else:
            # Default to long-term semantic for persistent facts
            memory_type = MemoryType.LONG_TERM_SEMANTIC
        
        # Extract tags using LLM - more sophisticated than keyword matching
        tag_prompt = f"""Extract 2-4 relevant tags from this statement. Return only tags separated by commas.

Statement: {content}

Tags:"""
        tag_response = self.llm.invoke(tag_prompt)
        # Clean and normalize the extracted tags
        tags = [tag.strip().lower() for tag in tag_response.content.split(',')]
        
        # Calculate importance score using our scoring system
        # Create temporary memory to use with ImportanceScorer
        temp_memory = MemoryEntry(
            id="temp",
            memory_type=memory_type,
            content=content,
            timestamp=datetime.now(),
            importance=0.5,
            tags=tags,
            metadata={}
        )
        importance = ImportanceScorer.calculate_base_importance(temp_memory)
        
        # Create final memory entry with generated ID
        memory_id = f"mem_{len(self.store.memories) + 1:03d}"
        return MemoryEntry(
            id=memory_id,
            memory_type=memory_type,
            content=content,
            timestamp=datetime.now(),
            importance=importance,
            tags=tags,
            metadata=context  # Store original context as metadata
        )
    
    # Stage 3: RETRIEVAL - Find relevant memories for query context
    def retrieve_relevant(self, query: str, query_tags: List[str], 
                         max_results: int = 5) -> List[MemoryEntry]:
        """Retrieve most relevant memories for query using composite scoring."""
        # Score all memories against the query context
        scored = []
        for mem in self.store.memories:
            # Use importance, recency, and relevance scoring
            score = ImportanceScorer.calculate_composite_score(mem, query_tags)
            scored.append((mem, score))
        
        # Sort by composite score (highest first) and return top N
        scored.sort(key=lambda x: x[1], reverse=True)
        return [mem for mem, score in scored[:max_results]]
    
    # Stage 4: CONSOLIDATION - Compress similar memories
    def consolidate_if_needed(self):
        """Consolidate memories if approaching limit."""
        # Only consolidate when at 80% capacity to avoid frequent consolidation
        if len(self.store.memories) > self.max_memories * 0.8:
            print(f"Memory approaching limit ({len(self.store.memories)}/{self.max_memories}). Consolidating...")
            
            # Target short-term memories for consolidation - Long-term memories are typically more diverse and important
            short_term = [m for m in self.store.memories 
                         if m.memory_type in [MemoryType.SHORT_TERM_EPISODIC, MemoryType.SHORT_TERM_SEMANTIC]]

            # Only consolidate if we have enough short-term memories
            if len(short_term) > 5:
                consolidated = self.consolidator.consolidate_memories(short_term)
                
                # Replace short-term memories with consolidated versions - Keep all long-term memories unchanged
                self.store.memories = [m for m in self.store.memories 
                                      if m.memory_type not in [MemoryType.SHORT_TERM_EPISODIC, MemoryType.SHORT_TERM_SEMANTIC]]
                self.store.memories.extend(consolidated)
                
                print(f"Consolidated {len(short_term)} memories into {len(consolidated)}")
    
    # Stage 5: DISCARD - Remove old, unimportant memories
    def discard_old_memories(self, max_age_days: int = 90, min_importance: float = 0.3):
        """Remove old, low-importance memories."""
        before_count = len(self.store.memories)
        
        # Calculate the cutoff date for "old" memories
        cutoff_date = datetime.now() - timedelta(days=max_age_days)

        # Keep memories that meet at least one criterion:
        # 1. Recent (within max_age_days)
        # 2. Important (above min_importance threshold)
        # 3. Long-term semantic (always preserve user profile/preferences)
        self.store.memories = [
            m for m in self.store.memories
            if (m.timestamp > cutoff_date or 
                m.importance >= min_importance or 
                m.memory_type == MemoryType.LONG_TERM_SEMANTIC)
        ]

        # Report how many memories were discarded
        discarded = before_count - len(self.store.memories)
        if discarded > 0:
            print(f"Discarded {discarded} old/low-importance memories")
    
    def add_memory(self, content: str, context: Dict = None):
        """Add memory through the full lifecycle: filter, encode, store, maintain."""
        context = context or {}
        
        # Stage 1: STORAGE DECISION - Should we remember this?
        if not self.should_store(content, context):
            print(f"Skipped (trivial): {content}")
            return
        
        # Stage 2: ENCODING - Transform to structured memory
        memory = self.encode_memory(content, context)
        self.store.add_memory(memory)
        print(f"Stored [{memory.memory_type.value}]: {content}")
        
        # Stages 4 & 5: MAINTENANCE - Consolidate and discard as needed
        self.consolidate_if_needed()
        self.discard_old_memories()

The memory lifecycle manager orchestrates all five stages of memory management:
- Storage filtering: Uses keyword-based rules to skip trivial content ("hi", "thanks") while storing substantive information.
- Intelligent encoding: LLM extracts tags, determines memory type based on content patterns, and calculates importance scores.
- Retrieval with scoring: Combines importance, recency, and relevance to rank memories for any query.
- Automatic consolidation: Triggers when approaching 80% capacity, merging similar short-term memories.
- Selective discard: Removes old, low-importance memories while always preserving long-term semantic facts.

The `add_memory` method implements the template method pattern, ensuring consistent lifecycle processing for all memories. In production, we would add persistence (database storage), async processing for LLM calls and monitoring/alerting for memory usage. The lifecycle stages could be extended: adding verification (fact-checking), enrichment (linking related memories) or archival (moving old memories to cold storage).

Let's test the complete lifecycle management system with various types of input to see how it handles filtering, encoding and maintenance.

In [22]:
# Create lifecycle manager with small limit for testing
manager = MemoryLifecycleManager(llm, max_memories=10)

# Prepare test inputs - mix of trivial and substantive content
test_inputs = [
    ("Hi there", {}),  # Should be filtered out (trivial greeting)
    ("My budget is $1500 for a laptop", {"session_scope": True}),  # Should store
    ("I need it for video editing", {"session_scope": True}),  # Should store
    ("I prefer Windows over Mac", {}),  # Should store
    ("Thanks", {}),  # Should be filtered out (trivial acknowledgment)
    ("I purchased a tablet last month", {}),  # Should store (event)
]

print("Testing Memory Lifecycle Management")
print("="*60)
print("\nAdding memories:\n")

# Process each input through the full lifecycle
for content, context in test_inputs:
    manager.add_memory(content, context)

Testing Memory Lifecycle Management

Adding memories:

Skipped (trivial): Hi there
Stored [short_term_semantic]: My budget is $1500 for a laptop
Stored [short_term_semantic]: I need it for video editing
Stored [long_term_semantic]: I prefer Windows over Mac
Skipped (trivial): Thanks
Stored [long_term_episodic]: I purchased a tablet last month


In [23]:
# Display the final state of memories
print("\n" + "="*60)
print(f"\nTotal memories stored: {len(manager.store.memories)}")
print("\nStored memories:")

# Show each memory with its details
for mem in manager.store.memories:
    print(f"  • [{mem.memory_type.value}] {mem.content}")
    print(f"    Tags: {', '.join(mem.tags)} | Importance: {mem.importance:.2f}")



Total memories stored: 4

Stored memories:
  • [short_term_semantic] My budget is $1500 for a laptop
    Tags: budget, laptop, $1500 | Importance: 0.70
  • [short_term_semantic] I need it for video editing
    Tags: video editing, editing tools, multimedia | Importance: 0.70
  • [long_term_semantic] I prefer Windows over Mac
    Tags: windows, mac, preference | Importance: 0.70
  • [long_term_episodic] I purchased a tablet last month
    Tags: tablet, purchase, electronics | Importance: 0.70


## Putting it all together: Production memory system
Now that we have built all the individual components - memory types, structured storage, semantic tagging, importance scoring, consolidation, and lifecycle management - let's create a production-ready system that ties everything together. This final implementation provides a complete API for memory-enabled AI agents, handling conversation processing, fact extraction, context retrieval and system monitoring. The production system wraps our lifecycle manager with higher-level methods designed for real-world use cases: processing conversation turns, retrieving context for queries and tracking system statistics. This is the layer that application developers would actually interact with when building memory-enabled agents.

In [25]:
class ProductionMemorySystem:
    """Production-ready memory system with all best practices."""
    
    def __init__(self, llm, max_memories: int = 100):
        # Wrap the lifecycle manager for higher-level operations
        self.manager = MemoryLifecycleManager(llm, max_memories)
    
    def process_interaction(self, user_message: str, agent_response: str, 
                          extract_facts: bool = True):
        """Process a conversation turn and extract memorable facts."""
        if extract_facts:
            # Use LLM to extract facts from conversation - This is more sophisticated than storing raw messages
            extraction_prompt = f"""Extract factual information and user preferences from this conversation. 
Return each fact as a separate line, or 'None' if no facts to extract.

User: {user_message}
Agent: {agent_response}

Extracted facts:"""
            
            response = self.manager.llm.invoke(extraction_prompt)
            # Parse the LLM response into individual facts
            facts = [f.strip() for f in response.content.split('\n') if f.strip() and f.strip().lower() != 'none']

            # Store each extracted fact through the lifecycle
            for fact in facts:
                self.manager.add_memory(fact, {"source": "conversation"})
        else:
            # Fallback: store the full user message as a memory - Useful when fact extraction is not needed
            self.manager.add_memory(
                f"User said: {user_message}",
                {"source": "conversation", "session_scope": True}
            )
    
    def get_context_for_query(self, query: str, query_tags: List[str] = None) -> str:
        """Get relevant memory context formatted for inclusion in prompts."""
        if query_tags is None:
            # Extract tags from the query using LLM - This handles queries where tags are not provided
            tag_prompt = f"""Extract 2-3 key topics/tags from this query. Return only tags separated by commas.

Query: {query}

Tags:"""
            response = self.manager.llm.invoke(tag_prompt)
            query_tags = [tag.strip().lower() for tag in response.content.split(',')]
        
        # Retrieve top relevant memories using our scoring system
        relevant = self.manager.retrieve_relevant(query, query_tags, max_results=5)

        # Handle case where no relevant memories found
        if not relevant:
            return "No relevant context from memory."
        
        # Build formatted context string for inclusion in prompts
        context_parts = ["Relevant context from memory:"]
        for mem in relevant:
            # Include both content and memory type for transparency
            context_parts.append(f"- {mem.content} [{mem.memory_type.value}]")
        
        return "\n".join(context_parts)
    
    def get_statistics(self) -> Dict:
        """Get memory system statistics for monitoring and debugging."""
        memories = self.manager.store.memories

        # Calculate comprehensive statistics
        stats = {
            "total_memories": len(memories),
            "by_type": {},  # Breakdown by memory type
            "avg_importance": sum(m.importance for m in memories) / len(memories) if memories else 0,
            "total_tags": len(self.manager.store.tag_index),
        }

        # Count memories by type
        for mem_type in MemoryType:
            count = len([m for m in memories if m.memory_type == mem_type])
            stats["by_type"][mem_type.value] = count
        
        return stats

The production memory system provides a complete, application-ready interface for memory-enabled agents:
- Conversation processing: Uses LLM to extract facts from dialogue, storing only meaningful information.
- Context retrieval: Automatically tags queries, scores memories and formats relevant context for prompt inclusion.
- Statistics monitoring: Provides insights into memory usage, type distribution and system health.
- Lifecycle automation: All memory management (filtering, encoding, consolidation, discard) happens automatically.

In production deployments, we would add: persistent storage (PostgreSQL with pgvector for similarity search), async processing (background workers for LLM calls), caching (Redis for frequently accessed memories), monitoring (Prometheus metrics, alerting on thresholds), API layer (REST/GraphQL endpoints) and multi-tenancy (isolated memory spaces per user).

Let's test the complete production system by simulating a customer service conversation.

In [26]:
# Create production memory system
memory_system = ProductionMemorySystem(llm, max_memories=50)

# Simulate a realistic customer service conversation
conversations = [
    ("I'm looking for a laptop for video editing", 
     "I can help you find the perfect laptop for video editing. What's your budget?"),
    ("My budget is around $1500", 
     "Great! For video editing at $1500, I'd recommend our Laptop Pro X1."),
    ("I prefer Windows over Mac", 
     "Perfect! The Laptop Pro X1 runs Windows 11 Pro."),
]

print("Production Memory System Demo")
print("="*60)
print("\nProcessing conversations:\n")

# Process each conversation turn
for user_msg, agent_msg in conversations:
    print(f"User: {user_msg}")
    print(f"Agent: {agent_msg}")
    # Extract and store facts from this interaction
    memory_system.process_interaction(user_msg, agent_msg)
    print()

Production Memory System Demo

Processing conversations:

User: I'm looking for a laptop for video editing
Agent: I can help you find the perfect laptop for video editing. What's your budget?
Skipped (trivial): - The user is looking for a laptop for video editing.
Stored [long_term_semantic]: - The agent is offering to help the user find a suitable laptop.
Stored [long_term_semantic]: - The agent is inquiring about the user's budget.

User: My budget is around $1500
Agent: Great! For video editing at $1500, I'd recommend our Laptop Pro X1.
Stored [long_term_semantic]: - User's budget is around $1500.
Stored [long_term_semantic]: - Agent recommends the Laptop Pro X1 for video editing at $1500.

User: I prefer Windows over Mac
Agent: Perfect! The Laptop Pro X1 runs Windows 11 Pro.
Stored [long_term_semantic]: - User prefers Windows over Mac.
Stored [long_term_semantic]: - The Laptop Pro X1 runs Windows 11 Pro.



In [27]:
# Get and display system statistics
stats = memory_system.get_statistics()

print("="*60)
print("\nMemory System Statistics:")
print(f"Total memories: {stats['total_memories']}")
print(f"Average importance: {stats['avg_importance']:.2f}")
print(f"Total unique tags: {stats['total_tags']}")

# Show breakdown by memory type
print("\nMemories by type:")
for mem_type, count in stats['by_type'].items():
    if count > 0:
        print(f"  {mem_type}: {count}")


Memory System Statistics:
Total memories: 6
Average importance: 0.68
Total unique tags: 15

Memories by type:
  long_term_semantic: 6


In [28]:
# Test context retrieval for a new query
print("\n" + "="*60)
print("\nTesting context retrieval:\n")

# Simulate a follow-up query
query = "What laptop should I buy?"

# Get relevant context from memory
context = memory_system.get_context_for_query(query)

# Display the query and retrieved context
print(f"Query: {query}")
print(f"\n{context}")



Testing context retrieval:

Query: What laptop should I buy?

Relevant context from memory:
- - The Laptop Pro X1 runs Windows 11 Pro. [long_term_semantic]
- - The agent is offering to help the user find a suitable laptop. [long_term_semantic]
- - The agent is inquiring about the user's budget. [long_term_semantic]
- - User's budget is around $1500. [long_term_semantic]
- - User prefers Windows over Mac. [long_term_semantic]


The system can be extended with: memory verification (fact-checking against knowledge bases), memory merging across sessions (recognizing returning users), memory export/import (for data portability) and memory analytics (understanding what users talk about most).