# üß† Lab 4: Memory Systems & Session Persistence
## Module 3 - Building Stateful Banking Agents

**Duration:** 30 minutes

**Objectives:**
- Implement short-term, long-term, and episodic memory
- Build session persistence for distributed deployments
- Handle memory across separate interactions
- Optimize memory with summarization

**Banking Scenario:** Customer service agent with memory that persists across sessions

---

In [None]:
!pip install openai -q

In [None]:
import os
import json
import hashlib
import time
from datetime import datetime, timedelta
from typing import List, Dict, Optional
from dataclasses import dataclass, field, asdict

# =============================================================================
# GOOGLE COLAB SETUP - Add these secrets (click üîë icon):
#   - AZURE_OPENAI_KEY: Your API key
#   - AZURE_OPENAI_ENDPOINT: https://xxx.openai.azure.com/
#   - AZURE_OPENAI_DEPLOYMENT: Your model deployment name
# =============================================================================

DEMO_MODE = False
client = None
MODEL_NAME = "gpt-4o"

try:
    from google.colab import userdata
    AZURE_OPENAI_KEY = userdata.get('AZURE_OPENAI_KEY')
    AZURE_OPENAI_ENDPOINT = userdata.get('AZURE_OPENAI_ENDPOINT')
    try:
        MODEL_NAME = userdata.get('AZURE_OPENAI_DEPLOYMENT')
    except:
        pass
    if AZURE_OPENAI_KEY and AZURE_OPENAI_ENDPOINT:
        if not AZURE_OPENAI_ENDPOINT.startswith('http'):
            AZURE_OPENAI_ENDPOINT = 'https://' + AZURE_OPENAI_ENDPOINT
        print(f"‚úÖ Credentials loaded. Model: {MODEL_NAME}")
    else:
        raise ValueError("Missing")
except Exception:
    print("‚ö†Ô∏è Running in DEMO MODE")
    DEMO_MODE = True

if not DEMO_MODE:
    from openai import AzureOpenAI
    client = AzureOpenAI(
        api_key=AZURE_OPENAI_KEY,
        api_version="2024-06-01",
        azure_endpoint=AZURE_OPENAI_ENDPOINT
    )
    print("‚úÖ Client ready")

## Understanding Memory Types

| Memory Type | Scope | Persistence | Use Case |
|-------------|-------|-------------|----------|
| **Short-term** | Single session | In-memory | Current conversation |
| **Long-term** | Across sessions | Database | User profiles, preferences |
| **Episodic** | Specific events | Indexed storage | Past conversations |
| **Session Store** | Distributed | Redis/Cache | Multi-server deployments |

## Part 1: Basic Memory Manager

In [None]:
class BankingMemoryManager:
    """Manages short-term, long-term, and episodic memory"""
    
    def __init__(self, max_messages: int = 10):
        self.short_term: List[Dict] = []  # Current conversation
        self.long_term: Dict = {}  # Customer profiles
        self.episodic: List[Dict] = []  # Past interactions
        self.max_messages = max_messages
    
    def add_message(self, role: str, content: str):
        """Add message to short-term memory"""
        self.short_term.append({"role": role, "content": content, "timestamp": datetime.now().isoformat()})
        if len(self.short_term) > self.max_messages:
            self.short_term = self.short_term[-self.max_messages:]
    
    def get_conversation(self) -> List[Dict]:
        """Get messages for LLM (without timestamps)"""
        return [{"role": m["role"], "content": m["content"]} for m in self.short_term]
    
    def clear_conversation(self):
        self.short_term = []
    
    def set_customer_profile(self, customer_id: str, profile: Dict):
        """Store customer profile in long-term memory"""
        self.long_term[customer_id] = profile
    
    def get_customer_profile(self, customer_id: str) -> Optional[Dict]:
        return self.long_term.get(customer_id)
    
    def add_episode(self, customer_id: str, event_type: str, summary: str):
        """Record an interaction in episodic memory"""
        self.episodic.append({
            "customer_id": customer_id,
            "timestamp": datetime.now().isoformat(),
            "event_type": event_type,
            "summary": summary
        })
    
    def get_customer_episodes(self, customer_id: str, limit: int = 5) -> List[Dict]:
        """Get recent episodes for a customer"""
        return [e for e in self.episodic if e["customer_id"] == customer_id][-limit:]
    
    def build_context(self, customer_id: str) -> str:
        """Build context string for LLM"""
        parts = []
        if profile := self.get_customer_profile(customer_id):
            parts.append(f"CUSTOMER PROFILE: {json.dumps(profile)}")
        if episodes := self.get_customer_episodes(customer_id):
            parts.append("PAST INTERACTIONS: " + "; ".join(e['summary'] for e in episodes))
        return "\n".join(parts) or "New customer, no prior history."

print("‚úÖ BankingMemoryManager defined")

In [None]:
# Initialize memory with sample data
memory = BankingMemoryManager()
memory.set_customer_profile("C-123", {
    "name": "Sarah Johnson",
    "risk_tolerance": "conservative",
    "tier": "Gold",
    "years_with_bank": 8
})
memory.add_episode("C-123", "inquiry", "Asked about mortgage rates")
memory.add_episode("C-123", "dispute", "Disputed $50 charge, resolved")
memory.add_episode("C-123", "interest", "Interested in CDs")

print("‚úÖ Memory initialized")
print(f"\nContext for C-123:\n{memory.build_context('C-123')}")

## Part 2: Memory-Enabled Agent

In [None]:
class MemoryAgent:
    """Agent that uses memory for personalized responses"""
    
    def __init__(self, memory: BankingMemoryManager):
        self.memory = memory
        self.customer_id = None
    
    def start_session(self, customer_id: str):
        """Start a new session for a customer"""
        self.customer_id = customer_id
        self.memory.clear_conversation()
        print(f"üè¶ Session started for {customer_id}")
    
    def chat(self, message: str) -> str:
        """Process a message with memory context"""
        self.memory.add_message("user", message)
        
        context = self.memory.build_context(self.customer_id)
        system_prompt = f"""You are a helpful banking assistant.
{context}
Be personalized and reference the customer's history when relevant."""
        
        messages = [{"role": "system", "content": system_prompt}]
        messages.extend(self.memory.get_conversation())
        
        if DEMO_MODE or not client:
            profile = self.memory.get_customer_profile(self.customer_id) or {}
            name = profile.get('name', 'Customer')
            if 'balance' in message.lower():
                response = f"Hi {name}! Your checking account has $5,432.10 and savings has $12,500.00. [DEMO]"
            elif 'invest' in message.lower():
                response = f"{name}, given your conservative risk profile, I'd recommend our 4.5% CD. [DEMO]"
            elif 'mortgage' in message.lower():
                response = f"I see you asked about mortgages before. Current rates are around 6.5%. [DEMO]"
            else:
                response = f"How can I help you today, {name}? [DEMO]"
        else:
            try:
                result = client.chat.completions.create(model=MODEL_NAME, messages=messages)
                response = result.choices[0].message.content
            except Exception as e:
                response = f"Error: {e} [DEMO]"
        
        self.memory.add_message("assistant", response)
        return response
    
    def end_session(self, summary: str = None):
        """End session and optionally save summary to episodic memory"""
        if summary:
            self.memory.add_episode(self.customer_id, "chat", summary)
        self.customer_id = None
        print("üè¶ Session ended")

agent = MemoryAgent(memory)
print("‚úÖ Agent ready")

In [None]:
# Test the agent
agent.start_session("C-123")

print("\nüë§ USER: What's my balance?")
print(f"ü§ñ AGENT: {agent.chat('What is my balance?')}")

print("\nüë§ USER: I want to invest some money")
print(f"ü§ñ AGENT: {agent.chat('I want to invest some money')}")

print("\nüë§ USER: What about that mortgage?")
print(f"ü§ñ AGENT: {agent.chat('What about that mortgage I asked about?')}")

agent.end_session("Discussed balance, investments, mortgage")
print(f"\nüìã Episodes: {[e['summary'] for e in memory.get_customer_episodes('C-123')]}")

## Part 3: Session Persistence for Distributed Deployments

In production, sessions need to persist across:
- Multiple server instances
- Server restarts
- Load-balanced requests

In [None]:
@dataclass
class Session:
    """Serializable session for persistence"""
    session_id: str
    user_id: str
    created_at: str
    last_active: str
    messages: List[Dict] = field(default_factory=list)
    context: Dict = field(default_factory=dict)
    
    def to_dict(self) -> Dict:
        return asdict(self)
    
    @classmethod
    def from_dict(cls, data: Dict) -> 'Session':
        return cls(**data)

class SessionStore:
    """Session storage (simulates Redis/CosmosDB)"""
    
    def __init__(self, ttl_hours: int = 24):
        self._store: Dict[str, Dict] = {}  # In production: Redis/CosmosDB
        self.ttl_hours = ttl_hours
    
    def create_session(self, user_id: str) -> Session:
        """Create a new session"""
        session_id = hashlib.sha256(f"{user_id}:{time.time()}".encode()).hexdigest()[:16]
        now = datetime.now().isoformat()
        session = Session(session_id=session_id, user_id=user_id, created_at=now, last_active=now)
        self._store[session_id] = session.to_dict()
        return session
    
    def get_session(self, session_id: str) -> Optional[Session]:
        """Retrieve session (checks expiry)"""
        data = self._store.get(session_id)
        if not data:
            return None
        # Check TTL
        last_active = datetime.fromisoformat(data["last_active"])
        if datetime.now() - last_active > timedelta(hours=self.ttl_hours):
            del self._store[session_id]
            return None
        return Session.from_dict(data)
    
    def update_session(self, session: Session):
        """Update session in store"""
        session.last_active = datetime.now().isoformat()
        self._store[session.session_id] = session.to_dict()
    
    def delete_session(self, session_id: str):
        if session_id in self._store:
            del self._store[session_id]

print("‚úÖ Session persistence classes defined")

In [None]:
class PersistentAgent:
    """Agent with session persistence for distributed deployments"""
    
    def __init__(self, session_store: SessionStore, long_term_memory: BankingMemoryManager):
        self.store = session_store
        self.ltm = long_term_memory  # Long-term memory (profiles, episodes)
    
    def start_session(self, user_id: str) -> str:
        """Start new session, return session_id"""
        session = self.store.create_session(user_id)
        return session.session_id
    
    def chat(self, session_id: str, message: str) -> Dict:
        """Process message with session persistence"""
        session = self.store.get_session(session_id)
        if not session:
            return {"error": "Session expired or not found"}
        
        # Add user message
        session.messages.append({"role": "user", "content": message})
        
        # Get long-term context
        ltm_context = self.ltm.build_context(session.user_id)
        system_prompt = f"""You are a helpful banking assistant.
{ltm_context}
Be personalized and helpful."""
        
        # Generate response
        if DEMO_MODE or not client:
            profile = self.ltm.get_customer_profile(session.user_id) or {}
            name = profile.get('name', 'Customer')
            response = f"Hello {name}! Session {session_id[:8]}... Message #{len(session.messages)} [DEMO]"
        else:
            messages = [{"role": "system", "content": system_prompt}]
            messages.extend([{"role": m["role"], "content": m["content"]} for m in session.messages])
            try:
                result = client.chat.completions.create(model=MODEL_NAME, messages=messages)
                response = result.choices[0].message.content
            except Exception as e:
                response = f"Error: {e}"
        
        # Save response and persist session
        session.messages.append({"role": "assistant", "content": response})
        self.store.update_session(session)
        
        return {"response": response, "session_id": session_id, "message_count": len(session.messages)}

print("‚úÖ PersistentAgent defined")

In [None]:
# Test distributed scenario
store = SessionStore()
persistent_agent = PersistentAgent(store, memory)

print("=" * 50)
print("DISTRIBUTED SESSION DEMO")
print("=" * 50)

# Start session
session_id = persistent_agent.start_session("C-123")
print(f"\nüìã Session created: {session_id}")

# Simulate requests hitting different servers
print("\n--- Request on Server A ---")
result = persistent_agent.chat(session_id, "Hi, I need help with my account.")
print(f"ü§ñ {result['response']}")

print("\n--- Request on Server B (different instance) ---")
result = persistent_agent.chat(session_id, "What's my account balance?")
print(f"ü§ñ {result['response']}")

print("\n--- Request on Server C ---")
result = persistent_agent.chat(session_id, "What did I ask about first?")
print(f"ü§ñ {result['response']}")

print(f"\nüìä Total messages in session: {result['message_count']}")

## Part 4: Memory Summarization (Token Optimization)

For long conversations, summarize older messages to stay within token limits.

In [None]:
class SummarizingMemory:
    """Memory that summarizes old messages to save tokens"""
    
    def __init__(self, recent_count: int = 6, summarize_threshold: int = 10):
        self.messages: List[Dict] = []
        self.summary: str = ""
        self.recent_count = recent_count
        self.summarize_threshold = summarize_threshold
    
    def add_message(self, role: str, content: str):
        self.messages.append({"role": role, "content": content})
        if len(self.messages) > self.summarize_threshold:
            self._summarize()
    
    def _summarize(self):
        """Summarize older messages"""
        to_summarize = self.messages[:-self.recent_count]
        self.messages = self.messages[-self.recent_count:]
        
        # Create summary (in production, use LLM)
        summary_text = f"[Summary of {len(to_summarize)} messages]"
        self.summary = f"{self.summary} {summary_text}".strip()
        print(f"   üìù Summarized {len(to_summarize)} old messages")
    
    def get_messages_for_llm(self) -> List[Dict]:
        """Get messages with summary prefix"""
        messages = []
        if self.summary:
            messages.append({"role": "system", "content": f"Previous conversation summary: {self.summary}"})
        messages.extend(self.messages)
        return messages

# Test summarization
sum_memory = SummarizingMemory(recent_count=4, summarize_threshold=6)

print("Adding messages to trigger summarization...")
for i, (role, content) in enumerate([
    ("user", "What's my balance?"),
    ("assistant", "Your balance is $5,000."),
    ("user", "Transfer $100 to savings."),
    ("assistant", "Done! New balance: $4,900."),
    ("user", "What's my savings balance?"),
    ("assistant", "Savings: $12,600."),
    ("user", "Thanks!"),
    ("assistant", "You're welcome!"),
]):
    sum_memory.add_message(role, content)

print(f"\nüìä Recent messages: {len(sum_memory.messages)}")
print(f"üìù Summary: {sum_memory.summary}")

---
## üéÅ BONUS: Production Storage with Azure CosmosDB & Redis

**‚ö†Ô∏è Optional Section** - Requires Azure resources. Skip if you don't have CosmosDB/Redis.

Add these secrets to Colab:
- `COSMOS_ENDPOINT`: Your CosmosDB endpoint
- `COSMOS_KEY`: Your CosmosDB key
- `REDIS_HOST`: Your Redis hostname
- `REDIS_PASSWORD`: Your Redis password

In [None]:
# Install Azure SDKs (only run if using this section)
# !pip install azure-cosmos redis -q

In [None]:
# =============================================================================
# AZURE COSMOS DB - Long-term Memory Storage
# =============================================================================
# CosmosDB is ideal for:
# - Customer profiles (long-term memory)
# - Episodic memory (past interactions)
# - Global distribution with low latency
# =============================================================================

class CosmosDBMemoryStore:
    """Long-term memory storage using Azure CosmosDB"""
    
    def __init__(self, endpoint: str, key: str, database_name: str = "agent_memory"):
        """
        Initialize CosmosDB connection.
        
        Creates database and containers if they don't exist:
        - profiles: Customer profiles (partition key: /customer_id)
        - episodes: Interaction history (partition key: /customer_id)
        """
        from azure.cosmos import CosmosClient, PartitionKey
        
        self.client = CosmosClient(endpoint, key)
        
        # Create database if not exists
        self.database = self.client.create_database_if_not_exists(id=database_name)
        
        # Create containers
        self.profiles_container = self.database.create_container_if_not_exists(
            id="profiles",
            partition_key=PartitionKey(path="/customer_id"),
            offer_throughput=400  # Minimum RU/s
        )
        
        self.episodes_container = self.database.create_container_if_not_exists(
            id="episodes",
            partition_key=PartitionKey(path="/customer_id"),
            offer_throughput=400
        )
        
        print(f"‚úÖ CosmosDB connected: {database_name}")
    
    def save_profile(self, customer_id: str, profile: Dict):
        """Save or update customer profile"""
        doc = {
            "id": customer_id,
            "customer_id": customer_id,
            **profile,
            "updated_at": datetime.now().isoformat()
        }
        self.profiles_container.upsert_item(doc)
    
    def get_profile(self, customer_id: str) -> Optional[Dict]:
        """Retrieve customer profile"""
        try:
            return self.profiles_container.read_item(item=customer_id, partition_key=customer_id)
        except:
            return None
    
    def add_episode(self, customer_id: str, event_type: str, summary: str):
        """Add interaction episode"""
        doc = {
            "id": f"{customer_id}_{datetime.now().timestamp()}",
            "customer_id": customer_id,
            "event_type": event_type,
            "summary": summary,
            "timestamp": datetime.now().isoformat()
        }
        self.episodes_container.create_item(doc)
    
    def get_episodes(self, customer_id: str, limit: int = 10) -> List[Dict]:
        """Get recent episodes for customer"""
        query = f"SELECT * FROM c WHERE c.customer_id = @cid ORDER BY c.timestamp DESC OFFSET 0 LIMIT {limit}"
        items = list(self.episodes_container.query_items(
            query=query,
            parameters=[{"name": "@cid", "value": customer_id}],
            enable_cross_partition_query=False
        ))
        return items

print("‚úÖ CosmosDBMemoryStore class defined")
print("   To use: cosmos_store = CosmosDBMemoryStore(endpoint, key)")

In [None]:
# =============================================================================
# AZURE REDIS CACHE - Session Storage
# =============================================================================
# Redis is ideal for:
# - Session state (short-term, fast access)
# - Distributed caching across servers
# - Automatic TTL expiration
# =============================================================================

class RedisSessionStore:
    """Session storage using Azure Redis Cache"""
    
    def __init__(self, host: str, password: str, port: int = 6380, ttl_seconds: int = 3600):
        """
        Initialize Redis connection.
        
        Azure Redis uses SSL on port 6380 by default.
        """
        import redis
        
        self.client = redis.Redis(
            host=host,
            port=port,
            password=password,
            ssl=True,
            decode_responses=True
        )
        self.ttl = ttl_seconds
        
        # Test connection
        self.client.ping()
        print(f"‚úÖ Redis connected: {host}")
    
    def create_session(self, user_id: str) -> str:
        """Create new session, return session_id"""
        session_id = hashlib.sha256(f"{user_id}:{time.time()}".encode()).hexdigest()[:16]
        
        session_data = {
            "session_id": session_id,
            "user_id": user_id,
            "created_at": datetime.now().isoformat(),
            "messages": []
        }
        
        # Store with TTL
        self.client.setex(
            f"session:{session_id}",
            self.ttl,
            json.dumps(session_data)
        )
        
        return session_id
    
    def get_session(self, session_id: str) -> Optional[Dict]:
        """Retrieve session (auto-expires via Redis TTL)"""
        data = self.client.get(f"session:{session_id}")
        if data:
            return json.loads(data)
        return None
    
    def update_session(self, session_id: str, session_data: Dict):
        """Update session and refresh TTL"""
        self.client.setex(
            f"session:{session_id}",
            self.ttl,
            json.dumps(session_data)
        )
    
    def add_message(self, session_id: str, role: str, content: str) -> bool:
        """Add message to session"""
        session = self.get_session(session_id)
        if not session:
            return False
        
        session["messages"].append({
            "role": role,
            "content": content,
            "timestamp": datetime.now().isoformat()
        })
        
        self.update_session(session_id, session)
        return True
    
    def delete_session(self, session_id: str):
        """Delete session"""
        self.client.delete(f"session:{session_id}")

print("‚úÖ RedisSessionStore class defined")
print("   To use: redis_store = RedisSessionStore(host, password)")

In [None]:
# =============================================================================
# EXAMPLE: Using Real Azure Resources
# =============================================================================
# Uncomment and run if you have Azure resources configured
# =============================================================================

"""
# Load credentials from Colab secrets
try:
    COSMOS_ENDPOINT = userdata.get('COSMOS_ENDPOINT')
    COSMOS_KEY = userdata.get('COSMOS_KEY')
    REDIS_HOST = userdata.get('REDIS_HOST')
    REDIS_PASSWORD = userdata.get('REDIS_PASSWORD')
    
    # Initialize stores
    cosmos_store = CosmosDBMemoryStore(COSMOS_ENDPOINT, COSMOS_KEY)
    redis_store = RedisSessionStore(REDIS_HOST, REDIS_PASSWORD)
    
    # Save a customer profile to CosmosDB
    cosmos_store.save_profile("C-123", {
        "name": "Sarah Johnson",
        "tier": "Gold",
        "risk_tolerance": "conservative"
    })
    print("‚úÖ Profile saved to CosmosDB")
    
    # Create a session in Redis
    session_id = redis_store.create_session("C-123")
    redis_store.add_message(session_id, "user", "What's my balance?")
    redis_store.add_message(session_id, "assistant", "Your balance is $5,432.10")
    print(f"‚úÖ Session created in Redis: {session_id}")
    
    # Retrieve and display
    profile = cosmos_store.get_profile("C-123")
    session = redis_store.get_session(session_id)
    print(f"\nüìã Profile from CosmosDB: {profile}")
    print(f"üí¨ Session from Redis: {len(session['messages'])} messages")
    
except Exception as e:
    print(f"‚ö†Ô∏è Azure resources not configured: {e}")
    print("   Add COSMOS_ENDPOINT, COSMOS_KEY, REDIS_HOST, REDIS_PASSWORD to Colab secrets")
"""

print("üí° Uncomment the code above to test with real Azure resources")

### Azure Resource Setup (CLI Commands)

```bash
# Create Resource Group
az group create --name agent-memory-rg --location eastus

# Create CosmosDB Account (Serverless for low cost)
az cosmosdb create \
  --name agent-memory-cosmos \
  --resource-group agent-memory-rg \
  --capabilities EnableServerless

# Create Redis Cache (Basic tier for dev)
az redis create \
  --name agent-sessions-redis \
  --resource-group agent-memory-rg \
  --sku Basic \
  --vm-size c0

# Get connection strings
az cosmosdb keys list --name agent-memory-cosmos --resource-group agent-memory-rg
az redis list-keys --name agent-sessions-redis --resource-group agent-memory-rg
```

**Estimated Cost:**
- CosmosDB Serverless: ~$0.25 per 1M RUs
- Redis Basic C0: ~$16/month

---
## ‚úÖ Lab 4 Complete!

### Key Takeaways

| Memory Type | Purpose | Storage |
|-------------|---------|--------|
| Short-term | Current conversation | In-memory |
| Long-term | Customer profiles | Database |
| Episodic | Past interactions | Indexed store |
| Session | Distributed state | Redis/Cache |

### Production Considerations

1. **Session Storage**: Use Redis or Azure Cache for distributed sessions
2. **TTL Management**: Set appropriate expiry times (1-24 hours)
3. **Token Limits**: Summarize old messages to stay within limits
4. **User Profiles**: Store in database (CosmosDB, PostgreSQL)

**Next:** Open `05_semantic_kernel_agent.ipynb`