![Redis](https://redis.io/wp-content/uploads/2024/04/Logotype.svg?auto=webp&quality=85,75&width=120)

# Module 4: Memory Systems

**‚è±Ô∏è Time:** 45 minutes

## üéØ Learning Objectives

By the end of this module, you will:

1. **Understand** why memory is essential for context engineering
2. **Implement** working memory for conversation continuity
3. **Use** long-term memory for persistent knowledge
4. **Know** how Agent Memory Server handles compression automatically

---

## üìö Part 1: Why Memory Matters (10 min)

### The Grounding Problem

**Without memory**, agents can't understand references:

```
User: "Tell me about CS401"
Agent: "CS401 is Machine Learning. It covers supervised learning..."

User: "What are its prerequisites?"
Agent: ‚ùå "What does 'it' refer to? Please specify which course."
```

**With memory**, natural conversation flows:

```
User: "Tell me about CS401"
Agent: "CS401 is Machine Learning. It covers supervised learning..."

User: "What are its prerequisites?"
Agent: ‚úÖ "CS401 requires CS301 (Intro to ML) and MATH201 (Linear Algebra)."
```

### Two Types of Memory

| Memory Type | Scope | Persistence | Example |
|-------------|-------|-------------|--------|
| **Working Memory** | Session | Temporary | Current conversation |
| **Long-term Memory** | User | Persistent | "Sarah prefers online courses" |

In [1]:
# Setup
import os
import sys
import json
from pathlib import Path

repo_root = Path.cwd().parent
src_path = repo_root / "src"
if str(src_path) not in sys.path:
    sys.path.insert(0, str(src_path))

# Load environment variables
from dotenv import load_dotenv
load_dotenv()  # Try current dir first
load_dotenv(repo_root / ".env")  # Then try parent

AMS_URL = os.getenv("AGENT_MEMORY_SERVER_URL", "http://localhost:8088")
print("‚úÖ Setup complete!")
print(f"   Agent Memory Server: {AMS_URL}")

‚úÖ Setup complete!
   Agent Memory Server: http://localhost:8088


---

## üìö Part 2: Working Memory with Agent Memory Server (15 min)

### What is Agent Memory Server?

**Agent Memory Server** is a Redis-backed service that provides:
- Working memory (conversation history)
- Long-term memory (semantic search over facts)
- **Automatic compression** (truncation, sliding window, summarization)

### Key Benefit: Compression is Handled For You

You don't need to implement:
- Token counting and truncation
- Sliding window management
- Conversation summarization

**Agent Memory Server does this automatically!**

In [2]:
# Working Memory Simulation
# In production, this uses Agent Memory Server (Redis-backed)
# Here we demonstrate the pattern with a simple in-memory implementation

import uuid
from dataclasses import dataclass, field
from typing import List, Dict

@dataclass
class Message:
    role: str
    content: str

@dataclass
class WorkingMemory:
    session_id: str
    messages: List[Message] = field(default_factory=list)

# Simple in-memory store (Agent Memory Server uses Redis)
memory_store: Dict[str, WorkingMemory] = {}

print("‚úÖ Memory system initialized (demo mode)")
print("   In production: Agent Memory Server provides Redis-backed persistence")

‚úÖ Memory system initialized (demo mode)
   In production: Agent Memory Server provides Redis-backed persistence


In [3]:
# Working Memory: Store conversation messages
session_id = str(uuid.uuid4())

# Simulate a conversation
messages = [
    {"role": "user", "content": "I'm interested in machine learning courses."},
    {"role": "assistant", "content": "I found several ML courses. CS301 is great for beginners."},
    {"role": "user", "content": "What are the prerequisites for that one?"},
    {"role": "assistant", "content": "CS301 requires CS201 (Data Structures), which you've completed!"}
]

# Store messages in working memory
working_memory = WorkingMemory(session_id=session_id)
for msg in messages:
    working_memory.messages.append(Message(role=msg["role"], content=msg["content"]))

memory_store[session_id] = working_memory

print(f"‚úÖ Stored {len(messages)} messages in session {session_id[:8]}...")

‚úÖ Stored 4 messages in session b551e804...


In [4]:
# Retrieve working memory
retrieved_memory = memory_store[session_id]

print("Working Memory Contents:")
print("="*60)
for msg in retrieved_memory.messages:
    print(f"{msg.role.upper()}: {msg.content}")
print("="*60)
print(f"\nTotal messages: {len(retrieved_memory.messages)}")

Working Memory Contents:
USER: I'm interested in machine learning courses.
ASSISTANT: I found several ML courses. CS301 is great for beginners.
USER: What are the prerequisites for that one?
ASSISTANT: CS301 requires CS201 (Data Structures), which you've completed!

Total messages: 4


---

## üìö Part 3: Long-term Memory (10 min)

### What is Long-term Memory?

Long-term memory stores **facts** that persist across sessions:
- User preferences ("prefers online courses")
- Important information ("completed CS201")
- Learned context ("interested in AI career")

### Semantic Search Over Facts

Unlike working memory (sequential), long-term memory uses **semantic search**:
- Query: "What format does the student prefer?"
- Finds: "Sarah prefers online courses" (even without exact match)

In [5]:
# Long-term Memory: Store facts that persist across sessions
# In production, Agent Memory Server stores these with embeddings for semantic search

@dataclass
class LongTermFact:
    text: str
    user_id: str
    keywords: List[str] = field(default_factory=list)

# Simple long-term memory store
long_term_store: Dict[str, List[LongTermFact]] = {}

student_id = "sarah_chen_001"

facts = [
    ("Sarah prefers online courses due to her work schedule.", ["online", "preference", "format"]),
    ("Sarah has completed CS101, CS201, and MATH101.", ["completed", "courses", "prerequisites"]),
    ("Sarah is interested in machine learning and AI.", ["interest", "machine learning", "AI"]),
    ("Sarah's career goal is to become an AI Engineer.", ["career", "goal", "AI"]),
    ("Sarah learns best through hands-on projects.", ["learning", "preference", "projects"])
]

long_term_store[student_id] = [
    LongTermFact(text=text, user_id=student_id, keywords=kw) for text, kw in facts
]

print(f"‚úÖ Stored {len(facts)} facts for {student_id}")

‚úÖ Stored 5 facts for sarah_chen_001


In [6]:
# Semantic search simulation (keyword-based for demo)
# In production, Agent Memory Server uses vector embeddings for true semantic search

def search_facts(user_id: str, query: str, top_k: int = 3) -> List[LongTermFact]:
    """Simple keyword-based search (production uses embeddings)."""
    user_facts = long_term_store.get(user_id, [])
    query_words = set(query.lower().split())
    
    # Score by keyword overlap
    scored = []
    for fact in user_facts:
        fact_words = set(fact.text.lower().split()) | set(kw.lower() for kw in fact.keywords)
        score = len(query_words & fact_words)
        scored.append((score, fact))
    
    scored.sort(key=lambda x: x[0], reverse=True)
    return [fact for _, fact in scored[:top_k]]

query = "What courses has the student taken?"
results = search_facts(student_id, query)

print(f"Query: '{query}'")
print("\nRelevant Facts:")
for i, result in enumerate(results, 1):
    print(f"  {i}. {result.text}")

Query: 'What courses has the student taken?'

Relevant Facts:
  1. Sarah has completed CS101, CS201, and MATH101.
  2. Sarah prefers online courses due to her work schedule.
  3. Sarah is interested in machine learning and AI.


In [7]:
# Another semantic search
query = "learning preferences"
results = search_facts(student_id, query)

print(f"Query: '{query}'")
print("\nRelevant Facts:")
for i, result in enumerate(results, 1):
    print(f"  {i}. {result.text}")

Query: 'learning preferences'

Relevant Facts:
  1. Sarah is interested in machine learning and AI.
  2. Sarah learns best through hands-on projects.
  3. Sarah prefers online courses due to her work schedule.


---

## üìö Part 4: Automatic Compression (10 min)

### The Compression Problem

As conversations grow, they exceed token limits. Solutions:

| Strategy | How It Works | Trade-off |
|----------|--------------|----------|
| **Truncation** | Keep last N messages | Loses early context |
| **Sliding Window** | Keep recent + important | Complexity |
| **Summarization** | LLM summarizes history | Cost + latency |

### Agent Memory Server Handles This!

**You don't need to implement compression.** Agent Memory Server:
- Automatically manages conversation length
- Applies appropriate compression strategies
- Extracts and stores important facts to long-term memory

**This is why we use Agent Memory Server instead of building from scratch.**

In [8]:
# Demonstrate automatic context management
# Agent Memory Server handles this - you just call get_working_memory()

# The server automatically:
# 1. Tracks conversation length
# 2. Applies compression when needed
# 3. Extracts facts to long-term memory

print("Agent Memory Server Compression Strategies:")
print("="*60)
print("1. TRUNCATION: Keeps last N messages when limit exceeded")
print("2. SLIDING WINDOW: Keeps recent + pinned important messages")
print("3. SUMMARIZATION: LLM summarizes older messages")
print("4. FACT EXTRACTION: Important info ‚Üí long-term memory")
print("="*60)
print("\n‚úÖ All handled automatically by Agent Memory Server!")

Agent Memory Server Compression Strategies:
1. TRUNCATION: Keeps last N messages when limit exceeded
2. SLIDING WINDOW: Keeps recent + pinned important messages
3. SUMMARIZATION: LLM summarizes older messages
4. FACT EXTRACTION: Important info ‚Üí long-term memory

‚úÖ All handled automatically by Agent Memory Server!


### Memory-Enhanced RAG Query

Now let's combine memory with our RAG system from Module 2.

In [9]:
# Memory-enhanced RAG prompt assembly
# This shows how memory integrates with RAG

def build_memory_enhanced_prompt(user_query: str, session_id: str, user_id: str) -> str:
    """Build a RAG prompt with working and long-term memory."""
    
    # 1. Get working memory (conversation history)
    working_mem = memory_store.get(session_id, WorkingMemory(session_id=session_id))
    
    # 2. Search long-term memory for relevant facts
    long_term_facts = search_facts(user_id, user_query, top_k=3)
    
    # 3. Assemble full context
    system_prompt = "You are a course advisor with memory of past conversations."
    
    # Format conversation history
    history = "\n".join([f"{m.role}: {m.content}" for m in working_mem.messages[-6:]])
    
    # Format long-term facts
    facts = "\n".join([f"- {f.text}" for f in long_term_facts])
    
    # Sample course context (from Module 2)
    course_context = """Available Courses:
  ‚Ä¢ CS301: Machine Learning (intermediate, 4 credits)
  ‚Ä¢ CS401: Deep Learning (advanced, 4 credits)
  ‚Ä¢ CS402: Natural Language Processing (advanced, 3 credits)"""
    
    return f"""{system_prompt}

Student Facts:
{facts}

Recent Conversation:
{history}

{course_context}

Current Question: {user_query}"""

# Build the prompt
prompt = build_memory_enhanced_prompt(
    user_query="Based on my interests, what advanced courses should I take next?",
    session_id=session_id,
    user_id=student_id
)

print("Memory-Enhanced RAG Prompt:")
print("="*60)
print(prompt)

Memory-Enhanced RAG Prompt:
You are a course advisor with memory of past conversations.

Student Facts:
- Sarah prefers online courses due to her work schedule.
- Sarah has completed CS101, CS201, and MATH101.
- Sarah is interested in machine learning and AI.

Recent Conversation:
user: I'm interested in machine learning courses.
assistant: I found several ML courses. CS301 is great for beginners.
user: What are the prerequisites for that one?
assistant: CS301 requires CS201 (Data Structures), which you've completed!

Available Courses:
  ‚Ä¢ CS301: Machine Learning (intermediate, 4 credits)
  ‚Ä¢ CS401: Deep Learning (advanced, 4 credits)
  ‚Ä¢ CS402: Natural Language Processing (advanced, 3 credits)

Current Question: Based on my interests, what advanced courses should I take next?


In [10]:
# In production, this prompt would be sent to an LLM:
# response = client.chat.completions.create(
#     model="gpt-4o-mini",
#     messages=[{"role": "user", "content": prompt}],
#     max_tokens=500
# )

print("üìò Memory-Enhanced RAG Complete!")
print("\nThe prompt includes:")
print("  ‚Ä¢ System instructions (advisor persona)")
print("  ‚Ä¢ Long-term facts (student preferences, history)")
print("  ‚Ä¢ Working memory (recent conversation)")
print("  ‚Ä¢ Retrieved courses (from semantic search)")
print("\nThis enables personalized, context-aware responses!")

üìò Memory-Enhanced RAG Complete!

The prompt includes:
  ‚Ä¢ System instructions (advisor persona)
  ‚Ä¢ Long-term facts (student preferences, history)
  ‚Ä¢ Working memory (recent conversation)
  ‚Ä¢ Retrieved courses (from semantic search)

This enables personalized, context-aware responses!


---

## üéØ Key Takeaways

1. **Working memory** enables conversation continuity (session-scoped)
2. **Long-term memory** stores persistent facts (user-scoped)
3. **Semantic search** finds relevant facts without exact matches
4. **Agent Memory Server** handles compression automatically
5. **Memory + RAG** creates truly intelligent assistants

---

## ‚û°Ô∏è Next Module

In **Module 5: Building Agents**, you'll learn:
- LangGraph fundamentals (nodes, edges, state)
- Memory tools that LLMs can call
- Building a complete course advisor agent