# Memory Integration: Combining Working and Long-term Memory

## Introduction

In this notebook, you'll learn how to integrate working memory and long-term memory to create a complete memory system for your agent. You'll see how these two types of memory work together to provide both conversation context and persistent knowledge.

### What You'll Learn

- How working and long-term memory complement each other
- When to use each type of memory
- How to build a complete memory flow
- How automatic extraction works
- How to test multi-session conversations

### Prerequisites

- Completed `01_working_memory_with_extraction_strategies.ipynb`
- Completed `02_long_term_memory.ipynb`
- Redis 8 running locally
- Agent Memory Server running
- OpenAI API key set

## Concepts: Memory Integration

### The Complete Memory Architecture

A production agent needs both types of memory:

```
┌─────────────────────────────────────────────────┐
│              User Query                         │
└─────────────────────────────────────────────────┘
                      ↓
┌─────────────────────────────────────────────────┐
│  1. Load Working Memory (current conversation)  │
└─────────────────────────────────────────────────┘
                      ↓
┌─────────────────────────────────────────────────┐
│  2. Search Long-term Memory (relevant facts)    │
└─────────────────────────────────────────────────┘
                      ↓
┌─────────────────────────────────────────────────┐
│  3. Agent Processes with Full Context           │
└─────────────────────────────────────────────────┘
                      ↓
┌─────────────────────────────────────────────────┐
│  4. Save Working Memory (with new messages)     │
│     → Automatic extraction to long-term         │
└─────────────────────────────────────────────────┘
```

### Memory Flow in Detail

**Turn 1:**
1. Load working memory (empty)
2. Search long-term memory (empty)
3. Process query
4. Save working memory
5. Extract important facts → long-term memory

**Turn 2 (same session):**
1. Load working memory (has Turn 1 messages)
2. Search long-term memory (has extracted facts)
3. Process query with full context
4. Save working memory (Turn 1 + Turn 2)
5. Extract new facts → long-term memory

**Turn 3 (new session, same user):**
1. Load working memory (empty - new session)
2. Search long-term memory (has all extracted facts)
3. Process query with long-term context
4. Save working memory (Turn 3 only)
5. Extract facts → long-term memory

### When to Use Each Memory Type

| Scenario | Working Memory | Long-term Memory |
|----------|----------------|------------------|
| Current conversation | ✅ Always | ❌ No |
| User preferences | ❌ No | ✅ Yes |
| Recent context | ✅ Yes | ❌ No |
| Important facts | ❌ No | ✅ Yes |
| Cross-session data | ❌ No | ✅ Yes |
| Temporary info | ✅ Yes | ❌ No |

## Setup

In [None]:
import os
import asyncio
from datetime import datetime
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage
from agent_memory_client import MemoryAPIClient as MemoryClient, MemoryClientConfig

# Initialize
student_id = "student_456"
session_id_1 = "session_001"
session_id_2 = "session_002"

# Initialize memory client with proper config
import os
config = MemoryClientConfig(
    base_url=os.getenv("AGENT_MEMORY_URL", "http://localhost:8000"),
    default_namespace="redis_university"
)
memory_client = MemoryClient(config=config)

llm = ChatOpenAI(model="gpt-4o", temperature=0.7)

print(f"✅ Setup complete for {student_id}")

## Hands-on: Building Complete Memory Flow

### Session 1, Turn 1: First Interaction

Let's simulate the first turn of a conversation.

In [None]:
print("=" * 80)
print("SESSION 1, TURN 1")
print("=" * 80)

# Step 1: Load working memory (empty for first turn)
print("\n1. Loading working memory...")
_, working_memory = await memory_client.get_or_create_working_memory(
    session_id=session_id_1,
    model_name="gpt-4o"
)
print(f"   Messages in working memory: {len(working_memory.messages) if working_memory else 0}")

# Step 2: Search long-term memory (empty for first interaction)
print("\n2. Searching long-term memory...")
user_query = "Hi! I'm interested in learning about databases."
long_term_memories = await memory_client.search_long_term_memory(
    query=user_query,
    limit=3
)
print(f"   Relevant memories found: {len(long_term_memories)}")

# Step 3: Process with LLM
print("\n3. Processing with LLM...")
messages = [
    SystemMessage(content="You are a helpful class scheduling agent for Redis University."),
    HumanMessage(content=user_query)
]
response = llm.invoke(messages)
print(f"\n   User: {user_query}")
print(f"   Agent: {response.content}")

# Step 4: Save working memory
print("\n4. Saving working memory...")
from agent_memory_client import WorkingMemory, MemoryMessage

# Convert messages to MemoryMessage format
memory_messages = [MemoryMessage(**msg) for msg in []

# Create WorkingMemory object
working_memory = WorkingMemory(
    session_id=session_id_1,
    user_id="demo_user",
    messages=memory_messages,
    memories=[],
    data={}
)

await memory_client.put_working_memory(
    session_id=session_id_1,
    memory=working_memory,
    user_id="demo_user",
    model_name="gpt-4o"
)
print("   ✅ Working memory saved")
print("   ✅ Agent Memory Server will automatically extract important facts to long-term memory")

### Session 1, Turn 2: Continuing the Conversation

Let's continue the conversation in the same session.

In [None]:
print("\n" + "=" * 80)
print("SESSION 1, TURN 2")
print("=" * 80)

# Step 1: Load working memory (now has Turn 1)
print("\n1. Loading working memory...")
_, working_memory = await memory_client.get_or_create_working_memory(
    session_id=session_id_1,
    model_name="gpt-4o"
)
print(f"   Messages in working memory: {len(working_memory.messages)}")
print("   Previous context available: ✅")

# Step 2: Search long-term memory
print("\n2. Searching long-term memory...")
user_query_2 = "I prefer online courses and morning classes."
long_term_memories = await memory_client.search_long_term_memory(
    query=user_query_2,
    limit=3
)
print(f"   Relevant memories found: {len(long_term_memories)}")

# Step 3: Process with LLM (with conversation history)
print("\n3. Processing with LLM...")
messages = [
    SystemMessage(content="You are a helpful class scheduling agent for Redis University."),
]

# Add working memory messages
for msg in working_memory.messages:
    if msg.role == "user":
        messages.append(HumanMessage(content=msg.content))
    elif msg.role == "assistant":
        messages.append(AIMessage(content=msg.content))

# Add new query
messages.append(HumanMessage(content=user_query_2))

response = llm.invoke(messages)
print(f"\n   User: {user_query_2}")
print(f"   Agent: {response.content}")

# Step 4: Save working memory (with both turns)
print("\n4. Saving working memory...")
all_messages = [
    {"role": msg.role, "content": msg.content}
    for msg in working_memory.messages
]
all_messages.extend([
    {"role": "user", "content": user_query_2},
    {"role": "assistant", "content": response.content}
])

from agent_memory_client import WorkingMemory, MemoryMessage

# Convert messages to MemoryMessage format
memory_messages = [MemoryMessage(**msg) for msg in all_messages]

# Create WorkingMemory object
working_memory = WorkingMemory(
    session_id=session_id_1,
    user_id="demo_user",
    messages=memory_messages,
    memories=[],
    data={}
)

await memory_client.put_working_memory(
    session_id=session_id_1,
    memory=working_memory,
    user_id="demo_user",
    model_name="gpt-4o"
)
print("   ✅ Working memory saved with both turns")
print("   ✅ Preferences will be extracted to long-term memory")

### Verify Automatic Extraction

Let's check if the Agent Memory Server extracted facts to long-term memory.

In [None]:
# Wait a moment for extraction to complete
print("Waiting for automatic extraction...")
await asyncio.sleep(2)

# Search for extracted memories
print("\nSearching for extracted memories...\n")
memories = await memory_client.search_long_term_memory(
    query="student preferences",
    limit=5
)

if memories:
    print("✅ Extracted memories found:\n")
    for i, memory in enumerate(memories, 1):
        print(f"{i}. {memory.text}")
        print(f"   Type: {memory.memory_type} | Topics: {', '.join(memory.topics)}")
        print()
else:
    print("⏳ No memories extracted yet (extraction may take a moment)")

### Session 2: New Session, Same User

Now let's start a completely new session with the same user. Working memory will be empty, but long-term memory persists.

In [None]:
print("\n" + "=" * 80)
print("SESSION 2, TURN 1 (New Session, Same User)")
print("=" * 80)

# Step 1: Load working memory (empty - new session)
print("\n1. Loading working memory...")
_, working_memory = await memory_client.get_or_create_working_memory(
    session_id=session_id_2,
    model_name="gpt-4o"
)
print(f"   Messages in working memory: {len(working_memory.messages) if working_memory else 0}")
print("   (Empty - this is a new session)")

# Step 2: Search long-term memory (has data from Session 1)
print("\n2. Searching long-term memory...")
user_query_3 = "What database courses do you recommend for me?"
long_term_memories = await memory_client.search_long_term_memory(
    query=user_query_3,
    limit=5
)
print(f"   Relevant memories found: {len(long_term_memories)}")
if long_term_memories:
    print("\n   Retrieved memories:")
    for memory in long_term_memories:
        print(f"   - {memory.text}")

# Step 3: Process with LLM (with long-term context)
print("\n3. Processing with LLM...")
context = "\n".join([f"- {m.text}" for m in long_term_memories])
system_prompt = f"""You are a helpful class scheduling agent for Redis University.

What you know about this student:
{context}
"""

messages = [
    SystemMessage(content=system_prompt),
    HumanMessage(content=user_query_3)
]

response = llm.invoke(messages)
print(f"\n   User: {user_query_3}")
print(f"   Agent: {response.content}")
print("\n   ✅ Agent used long-term memory to personalize response!")

# Step 4: Save working memory
print("\n4. Saving working memory...")
from agent_memory_client import WorkingMemory, MemoryMessage

# Convert messages to MemoryMessage format
memory_messages = [MemoryMessage(**msg) for msg in []

# Create WorkingMemory object
working_memory = WorkingMemory(
    session_id=session_id_2,
    user_id="demo_user",
    messages=memory_messages,
    memories=[],
    data={}
)

await memory_client.put_working_memory(
    session_id=session_id_2,
    memory=working_memory,
    user_id="demo_user",
    model_name="gpt-4o"
)
print("   ✅ Working memory saved for new session")

## Testing: Memory Consolidation

Let's verify that both sessions' data is consolidated in long-term memory.

In [None]:
print("\n" + "=" * 80)
print("MEMORY CONSOLIDATION CHECK")
print("=" * 80)

# Check all memories about the student
print("\nAll memories about this student:\n")
all_memories = await memory_client.search_long_term_memory(
    query="",  # Empty query returns all
    limit=20
)

semantic_memories = [m for m in all_memories if m.memory_type == "semantic"].memories
episodic_memories = [m for m in all_memories if m.memory_type == "episodic"].memories

print(f"Semantic memories (facts): {len(semantic_memories)}")
for memory in semantic_memories.memories:
    print(f"  - {memory.text}")

print(f"\nEpisodic memories (events): {len(episodic_memories)}")
for memory in episodic_memories:
    print(f"  - {memory.text}")

print("\n✅ All memories from both sessions are consolidated in long-term memory!")

## Key Takeaways

### Memory Integration Pattern

**Every conversation turn:**
1. Load working memory (conversation history)
2. Search long-term memory (relevant facts)
3. Process with full context
4. Save working memory (triggers extraction)

### Automatic Extraction

The Agent Memory Server automatically:
- ✅ Analyzes conversations
- ✅ Extracts important facts
- ✅ Stores in long-term memory
- ✅ Deduplicates similar memories
- ✅ Organizes by type and topics

### Memory Lifecycle

```
User says something
       ↓
Stored in working memory (session-scoped)
       ↓
Automatic extraction analyzes importance
       ↓
Important facts → long-term memory (user-scoped)
       ↓
Available in future sessions
```

### Best Practices

1. **Always load working memory first** - Get conversation context
2. **Search long-term memory for relevant facts** - Use semantic search
3. **Combine both in system prompt** - Give LLM full context
4. **Save working memory after each turn** - Enable extraction
5. **Trust automatic extraction** - Don't manually extract everything

## Exercises

1. **Multi-turn conversation**: Have a 5-turn conversation about course planning. Verify memories are extracted.

2. **Cross-session test**: Start a new session and ask "What do you know about me?" Does the agent remember?

3. **Memory search**: Try different search queries to find specific memories. How does semantic search perform?

4. **Extraction timing**: How long does automatic extraction take? Test with different conversation lengths.

## Summary

In this notebook, you learned:

- ✅ Working and long-term memory work together for complete context
- ✅ Load working memory → search long-term → process → save working memory
- ✅ Automatic extraction moves important facts to long-term memory
- ✅ Long-term memory persists across sessions
- ✅ This pattern enables truly personalized, context-aware agents

**Next:** In Section 4, we'll explore optimizations like context window management, retrieval strategies, and grounding techniques.