# LAB 2.2: BUILDING A STATEFUL CONVERSATION SYSTEM

**Course:** Advanced Prompt Engineering Training  
**Session:** Session 2 - Advanced Context Engineering  
**Duration:** 50 minutes  
**Difficulty:** ⭐⭐⭐⭐☆  
**Type:** Hands-on Conversation State Management

## LAB OVERVIEW

This lab focuses on **maintaining conversation state across multiple turns** without exceeding token limits. You'll learn to:

- Implement sliding window buffer memory
- Generate and use conversation summaries
- Track entities across conversation history
- Combine memory strategies for optimal performance
- Build production-ready conversation state managers

**Scenario:** You're building an AI loan advisor chatbot for a bank. Customers have extended conversations (20-30 turns) discussing loan options, asking questions, providing information, and making decisions. The chatbot must remember:
- What the customer is looking for
- Information they've already provided
- Decisions they've made
- Questions they've asked
- Context from earlier in the conversation

**Challenge:** Maintain perfect context across 30+ message exchanges while staying under 8,000 token budget.

## LEARNING OBJECTIVES

By the end of this lab, you will be able to:

✓ Implement sliding window buffer memory  
✓ Generate conversation summaries programmatically  
✓ Build entity extraction and tracking systems  
✓ Combine multiple memory strategies  
✓ Handle conversation state in production applications  
✓ Optimize token usage across long conversations

### Step 1: Import Libraries

In [None]:
# Lab 2.2: Building a Stateful Conversation System
# Advanced Prompt Engineering Training - Session 2

import os
import json
from openai import OpenAI
import tiktoken
import pandas as pd
from typing import Dict, List, Any, Optional, Tuple
from datetime import datetime
from collections import defaultdict
import re
from dotenv import load_dotenv

load_dotenv(override=True)

print("✓ Libraries imported")

### Step 2: Configure OpenAI Client

In [None]:
# Check if API key exists
if not os.environ.get("OPENAI_API_KEY"):
    raise ValueError("OPENAI_API_KEY not found. Please set it in .env file")

# Initialize OpenAI client
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

# Configuration
MODEL = os.getenv("MODEL_NAME")
TEMPERATURE = 0  # Deterministic for BFSI applications

if not MODEL:
    raise ValueError("MODEL_NAME not found. Please set it in .env file")

encoding = tiktoken.encoding_for_model(MODEL)

def count_tokens(text: str) -> int:
    """Count tokens in text"""
    return len(encoding.encode(text))

print(f"✓ Model: {MODEL}")
print(f"✓ Tokenizer: {encoding.name}")

### Step 3: Create Helper Functions

In [None]:
def call_gpt4(
    messages: List[Dict],
    system_prompt: str = "You are a helpful AI assistant.",
    temperature: float = 0
) -> Dict:
    """
    Call GPT-4 with conversation history
    
    Args:
        messages (List[Dict]): Conversation messages
        system_prompt (str): System prompt
        temperature (float): Sampling temperature
    
    Returns:
        Dict: Response with metadata
    """
    try:
        # Prepare messages
        full_messages = [{"role": "system", "content": system_prompt}]
        full_messages.extend(messages)
        
        response = client.chat.completions.create(
            model=MODEL,
            messages=full_messages,
            temperature=temperature
        )
        
        return {
            "content": response.choices[0].message.content,
            "role": "assistant",
            "prompt_tokens": response.usage.prompt_tokens,
            "completion_tokens": response.usage.completion_tokens,
            "total_tokens": response.usage.total_tokens,
            "success": True
        }
    except Exception as e:
        return {
            "content": "",
            "error": str(e),
            "success": False
        }

def calculate_conversation_tokens(messages: List[Dict]) -> int:
    """
    Calculate total tokens in conversation history
    
    Args:
        messages (List[Dict]): Conversation messages
    
    Returns:
        int: Total token count
    """
    total = 0
    for msg in messages:
        total += count_tokens(msg.get("content", ""))
    return total

print("✓ Helper functions created")

### Step 4: Create Sample Conversation

In [None]:
# Simulated multi-turn loan advisory conversation

sample_conversation = [
    {
        "role": "user",
        "content": "Hi, I'm interested in getting a mortgage for my first home."
    },
    {
        "role": "assistant",
        "content": "Hello! I'd be happy to help you explore mortgage options for your first home. To get started, can you tell me a bit about the type of property you're looking at and your approximate budget?"
    },
    {
        "role": "user",
        "content": "I'm looking at a condo in Seattle, probably around $450,000. I've been saving for a while and have about $90,000 for a down payment."
    },
    {
        "role": "assistant",
        "content": "Great! A $90,000 down payment on a $450,000 property is 20%, which means you'd avoid PMI (private mortgage insurance). That's excellent. What's your annual household income, and do you have a sense of your credit score?"
    },
    {
        "role": "user",
        "content": "My income is $85,000 per year, and my credit score is around 720 last time I checked."
    },
    {
        "role": "assistant",
        "content": "Perfect! With a 720 credit score, you'll qualify for competitive interest rates. Based on your $85,000 income and assuming minimal other debts, you should be well within the debt-to-income requirements. Have you been pre-approved, or would you like to start that process?"
    },
    {
        "role": "user",
        "content": "Not yet pre-approved. What documents would I need?"
    },
    {
        "role": "assistant",
        "content": "For pre-approval, you'll typically need: 2 years of tax returns, 2 recent pay stubs, 2-3 months of bank statements, and documentation of your down payment source. Also, we'll run a credit check. The process usually takes 3-5 business days. Should I set that up for you?"
    },
    {
        "role": "user",
        "content": "Yes, please. Also, what interest rates are you seeing right now for someone with my profile?"
    },
    {
        "role": "assistant",
        "content": "For a borrower with your profile - 720 credit score, 20% down, $85K income - we're currently seeing rates between 6.5% and 7.0% for a 30-year fixed mortgage. The exact rate will depend on the final underwriting, but you'd likely qualify for the lower end of that range. Would you prefer a 30-year fixed, or would you like to explore 15-year or ARM options?"
    },
    {
        "role": "user",
        "content": "I think 30-year fixed makes sense for me. How much would my monthly payment be approximately?"
    },
    {
        "role": "assistant",
        "content": "With a $360,000 loan ($450,000 - $90,000 down) at 6.75% for 30 years, your principal and interest payment would be approximately $2,335 per month. Add in property taxes (varies by county but ~$400/month in Seattle), HOA fees for the condo (varies, maybe $200-400/month), and homeowner's insurance (~$100/month), you're looking at a total monthly payment around $3,000-3,200. Does that fit your budget?"
    }
]

print(f"Sample conversation loaded: {len(sample_conversation)} messages")
print(f"Total tokens: {calculate_conversation_tokens(sample_conversation)}")

## CONVERSATION STATE FUNDAMENTALS

### The Stateful Conversation Problem

**Challenge:** Each API call to an LLM is stateless. The model doesn't remember previous messages unless you send them again.

**Naive Approach:**
```python
# Send entire conversation history with each request
messages = conversation_history + [new_user_message]
response = call_gpt4(messages)
```

**Problem:** After 20 turns (40 messages), you might have 8,000+ tokens just in conversation history, leaving no room for context or instructions.

### Memory Patterns Comparison

| Pattern | Token Growth | Accuracy | Complexity | Best For |
|---------|--------------|----------|------------|----------|
| **Buffer** | Linear (capped) | High (recent) | Low | Short conversations |
| **Summary** | Logarithmic | Medium | Medium | Long conversations |
| **Entity** | Sub-linear | High (facts) | High | Complex interactions |
| **Hybrid** | Optimized | Highest | High | Production systems |

### Token Budget Example

```
Token Budget: 8,000 total
├─ System Prompt: 500 tokens
├─ Conversation State: 3,000 tokens (what we manage)
├─ New User Message: 100 tokens
├─ Retrieved Context: 2,000 tokens (optional)
├─ Model Response: 1,500 tokens
└─ Safety Buffer: 900 tokens

Goal: Keep conversation state under 3,000 tokens
```

## CHALLENGE 1: BUFFER MEMORY IMPLEMENTATION

**Time:** 10 minutes  
**Objective:** Implement sliding window buffer memory

### Background

Buffer memory keeps the N most recent messages. Simple, effective for short conversations, but loses history.

### Student Exercise

In [None]:
# TODO: Implement BufferMemory class
# Requirements:
# - Store last N message pairs (user + assistant)
# - Automatically discard oldest when window exceeded
# - Provide token count

class BufferMemory:
    """
    Sliding window buffer memory
    """
    
    def __init__(self, max_pairs: int = 5):
        """
        Args:
            max_pairs (int): Maximum conversation pairs to keep
        """
        # TODO: Implement
        pass
    
    def add_message(self, role: str, content: str) -> None:
        """
        Add message to buffer
        
        Args:
            role (str): 'user' or 'assistant'
            content (str): Message content
        """
        # TODO: Implement
        pass
    
    def get_messages(self) -> List[Dict]:
        """
        Get current message buffer
        
        Returns:
            List[Dict]: Messages in buffer
        """
        # TODO: Implement
        pass
    
    def get_token_count(self) -> int:
        """
        Get total tokens in buffer
        
        Returns:
            int: Token count
        """
        # TODO: Implement
        pass

# TODO: Test with sample conversation

### Solution

In [None]:
# SOLUTION: Buffer Memory System

class BufferMemory:
    """
    Sliding window buffer memory for conversations
    """
    
    def __init__(self, max_pairs: int = 5):
        """
        Initialize buffer
        
        Args:
            max_pairs (int): Maximum conversation pairs (user + assistant) to keep
        """
        self.max_pairs = max_pairs
        self.messages = []
    
    def add_message(self, role: str, content: str) -> None:
        """
        Add message to buffer, maintaining size limit
        
        Args:
            role (str): 'user' or 'assistant'
            content (str): Message content
        """
        self.messages.append({
            "role": role,
            "content": content,
            "timestamp": datetime.now().isoformat()
        })
        
        # Enforce max_pairs limit (each pair is 2 messages)
        max_messages = self.max_pairs * 2
        if len(self.messages) > max_messages:
            # Remove oldest messages to maintain window
            self.messages = self.messages[-max_messages:]
    
    def get_messages(self) -> List[Dict]:
        """
        Get current message buffer
        
        Returns:
            List[Dict]: Messages in buffer
        """
        return self.messages
    
    def get_token_count(self) -> int:
        """
        Calculate total tokens in buffer
        
        Returns:
            int: Token count
        """
        return calculate_conversation_tokens(self.messages)
    
    def clear(self) -> None:
        """Clear all messages from buffer"""
        self.messages = []
    
    def get_stats(self) -> Dict:
        """
        Get buffer statistics
        
        Returns:
            Dict: Statistics
        """
        return {
            "message_count": len(self.messages),
            "max_pairs": self.max_pairs,
            "token_count": self.get_token_count(),
            "oldest_message": self.messages[0]["timestamp"] if self.messages else None,
            "newest_message": self.messages[-1]["timestamp"] if self.messages else None
        }

# Test buffer memory
print("BUFFER MEMORY TEST:")
print("=" * 80)

# Create buffer that keeps last 3 conversation pairs (6 messages)
buffer = BufferMemory(max_pairs=3)

# Simulate conversation
test_messages = [
    ("user", "Hi, I want a mortgage."),
    ("assistant", "I can help! What's your budget?"),
    ("user", "Around $400,000."),
    ("assistant", "Great! What's your down payment?"),
    ("user", "$80,000."),
    ("assistant", "That's 20% - excellent! Credit score?"),
    ("user", "720."),
    ("assistant", "Perfect! You'll qualify for good rates."),
    ("user", "What rate can I get?"),  # This should push out first 2 messages
    ("assistant", "Around 6.5% - 7.0% for your profile."),
]

for role, content in test_messages:
    buffer.add_message(role, content)
    stats = buffer.get_stats()
    print(f"\nAdded: {role} - '{content[:40]}...'")
    print(f"  Buffer size: {stats['message_count']} messages, {stats['token_count']} tokens")

print("\n" + "-" * 80)
print("\nFINAL BUFFER CONTENTS:")
for msg in buffer.get_messages():
    print(f"  {msg['role']}: {msg['content']}")

print("\n" + "=" * 80)

### Test Context Loss

In [None]:
# Demonstrate the limitation: context loss

print("\nCONTEXT LOSS DEMONSTRATION:")
print("=" * 80)

# The buffer has lost early context
# User's budget ($400,000) is no longer in buffer
# Let's see if the model can still answer

buffer_messages = buffer.get_messages()
test_query = "What was my budget again?"

response = call_gpt4(
    buffer_messages + [{"role": "user", "content": test_query}],
    "You are a loan advisor. Answer based only on the conversation history."
)

print(f"Query: {test_query}")
print(f"Buffer has budget info: {'$400,000' in str(buffer_messages)}")
print(f"Response: {response['content']}")

print("\n⚠ LIMITATION: Buffer memory loses early conversation context!")
print("=" * 80)

### Key Takeaways

✓ **Simple and fast** - Easy to implement  
✓ **Predictable token usage** - Constant (max_pairs × avg_tokens_per_pair)  
✓ **Good for recent context** - Maintains flow of conversation  
✗ **Loses history** - Early messages are forgotten  
✗ **Not suitable for long conversations** - Critical info may be lost

## CHALLENGE 2: CONVERSATION SUMMARY MEMORY

**Time:** 10 minutes  
**Objective:** Generate and use conversation summaries to compress history

### Background

Instead of keeping all messages, periodically summarize the conversation and keep only the summary plus recent messages.

### Student Exercise

In [None]:
# TODO: Implement SummaryMemory class
# Requirements:
# - Summarize conversation every N messages
# - Keep summary + recent buffer
# - Reduce token usage while preserving key information

class SummaryMemory:
    """
    Conversation summary memory
    """
    
    def __init__(self, summarize_every: int = 10, buffer_size: int = 4):
        """
        Args:
            summarize_every (int): Summarize after this many messages
            buffer_size (int): Keep this many recent messages
        """
        # TODO: Implement
        pass
    
    def add_message(self, role: str, content: str) -> None:
        """Add message and trigger summarization if needed"""
        # TODO: Implement
        pass
    
    def _generate_summary(self) -> str:
        """Generate conversation summary"""
        # TODO: Implement
        pass

# TODO: Test with long conversation

### Solution

In [None]:
# SOLUTION: Conversation Summary Memory

class SummaryMemory:
    """
    Conversation memory with automatic summarization
    """
    
    def __init__(
        self,
        summarize_every: int = 10,
        buffer_size: int = 4,
        max_summary_tokens: int = 500
    ):
        """
        Initialize summary memory
        
        Args:
            summarize_every (int): Summarize after this many messages
            buffer_size (int): Number of recent messages to keep
            max_summary_tokens (int): Target tokens for summary
        """
        self.summarize_every = summarize_every
        self.buffer_size = buffer_size
        self.max_summary_tokens = max_summary_tokens
        
        self.summary = ""
        self.messages = []
        self.all_messages = []  # For summarization
        self.summary_count = 0
    
    def add_message(self, role: str, content: str) -> None:
        """
        Add message and trigger summarization if needed
        
        Args:
            role (str): 'user' or 'assistant'
            content (str): Message content
        """
        message = {
            "role": role,
            "content": content,
            "timestamp": datetime.now().isoformat()
        }
        
        self.messages.append(message)
        self.all_messages.append(message)
        
        # Check if we need to summarize
        if len(self.messages) >= self.summarize_every:
            self._generate_summary()
    
    def _generate_summary(self) -> None:
        """Generate summary of conversation so far"""
        # Create summarization prompt
        conversation_text = "\n".join([
            f"{msg['role'].capitalize()}: {msg['content']}"
            for msg in self.all_messages
        ])
        
        summary_prompt = f"""
Summarize this conversation in {self.max_summary_tokens} tokens or less.
Preserve all key facts, decisions, and important details.
Focus on what matters for continuing the conversation.

CONVERSATION:
{conversation_text}

SUMMARY:
"""
        
        response = call_gpt4(
            [{"role": "user", "content": summary_prompt}],
            "You are a conversation summarizer. Be concise and factual."
        )
        
        if response['success']:
            # Update summary (or append if we already have one)
            if self.summary:
                # Combine old summary with new
                self.summary = f"{self.summary}\n\nRecent activity: {response['content']}"
            else:
                self.summary = response['content']
            
            self.summary_count += 1
            
            # Keep only most recent messages in buffer
            self.messages = self.messages[-self.buffer_size:]
    
    def get_context(self) -> List[Dict]:
        """
        Get context to send to LLM
        
        Returns:
            List[Dict]: Summary + recent messages
        """
        context = []
        
        # Add summary if exists
        if self.summary:
            context.append({
                "role": "system",
                "content": f"CONVERSATION SUMMARY:\n{self.summary}"
            })
        
        # Add recent messages
        context.extend(self.messages)
        
        return context
    
    def get_token_count(self) -> int:
        """Get total tokens in context"""
        return calculate_conversation_tokens(self.get_context())
    
    def get_stats(self) -> Dict:
        """Get memory statistics"""
        return {
            "summary_count": self.summary_count,
            "summary_tokens": count_tokens(self.summary) if self.summary else 0,
            "buffer_message_count": len(self.messages),
            "buffer_tokens": calculate_conversation_tokens(self.messages),
            "total_tokens": self.get_token_count(),
            "total_messages_processed": len(self.all_messages)
        }

# Test summary memory
print("CONVERSATION SUMMARY MEMORY TEST:")
print("=" * 80)

# Create summary memory (summarize every 6 messages, keep last 2)
summary_memory = SummaryMemory(
    summarize_every=6,
    buffer_size=2,
    max_summary_tokens=300
)

# Simulate longer conversation
long_conversation = [
    ("user", "I'm interested in a home mortgage."),
    ("assistant", "Great! What's your budget and location?"),
    ("user", "Budget is $500,000, looking in Portland."),
    ("assistant", "How much down payment do you have?"),
    ("user", "I have $100,000 saved."),
    ("assistant", "Excellent! That's 20%. What's your credit score?"),  # Triggers summarization
    ("user", "My credit score is 740."),
    ("assistant", "Perfect! With 740 credit, you qualify for premium rates."),
    ("user", "What documentation do I need?"),
    ("assistant", "Tax returns, pay stubs, bank statements. Standard process."),
    ("user", "How long does pre-approval take?"),
    ("assistant", "Usually 3-5 business days."),  # Another summarization
    ("user", "Great! And what interest rate would I get?"),
    ("assistant", "For your profile, probably 6.25% - 6.75% on a 30-year fixed."),
]

for i, (role, content) in enumerate(long_conversation):
    summary_memory.add_message(role, content)
    stats = summary_memory.get_stats()
    
    print(f"\nMessage {i+1}: {role} - '{content[:50]}...'")
    print(f"  Total messages processed: {stats['total_messages_processed']}")
    print(f"  Summaries generated: {stats['summary_count']}")
    print(f"  Current tokens: {stats['total_tokens']} (summary: {stats['summary_tokens']}, buffer: {stats['buffer_tokens']})")

print("\n" + "=" * 80)
print("\nFINAL STATE:")
print("-" * 80)

print(f"\nSUMMARY ({summary_memory.get_stats()['summary_tokens']} tokens):")
print(summary_memory.summary)

print(f"\nRECENT BUFFER ({summary_memory.get_stats()['buffer_message_count']} messages):")
for msg in summary_memory.messages:
    print(f"  {msg['role']}: {msg['content']}")

print("\n" + "=" * 80)

# Compare with naive approach
naive_tokens = calculate_conversation_tokens([
    {"role": role, "content": content}
    for role, content in long_conversation
])

print(f"\nTOKEN COMPARISON:")
print(f"  Naive (all messages): {naive_tokens} tokens")
print(f"  Summary memory: {summary_memory.get_token_count()} tokens")
print(f"  Savings: {naive_tokens - summary_memory.get_token_count()} tokens ({(1 - summary_memory.get_token_count()/naive_tokens)*100:.1f}%)")
print("=" * 80)

### Key Takeaways

✓ **Scalable** - Handles long conversations  
✓ **Token efficient** - Compresses history significantly  
✓ **Preserves key facts** - Summarization retains important information  
✗ **Lossy** - Some details are lost in summarization  
✗ **Latency** - Summarization adds processing time  
✗ **Cost** - Extra API calls for summarization

## CHALLENGE 3: ENTITY MEMORY SYSTEM

**Time:** 10 minutes  
**Objective:** Track entities (facts) across conversation

### Background

Instead of summarizing conversations, extract and track specific entities: names, numbers, decisions, facts. This preserves precision.

### Solution

In [None]:
# SOLUTION: Entity Memory System

class EntityMemory:
    """
    Entity-based conversation memory
    """
    
    def __init__(self, buffer_size: int = 3):
        """
        Initialize entity memory
        
        Args:
            buffer_size (int): Number of recent messages to keep
        """
        self.entities = {}  # entity_type -> {key: value}
        self.messages = []
        self.all_messages = []
        self.buffer_size = buffer_size
    
    def add_message(self, role: str, content: str) -> None:
        """
        Add message and extract entities
        
        Args:
            role (str): 'user' or 'assistant'
            content (str): Message content
        """
        message = {
            "role": role,
            "content": content,
            "timestamp": datetime.now().isoformat()
        }
        
        self.messages.append(message)
        self.all_messages.append(message)
        
        # Extract entities after each user message
        if role == "user":
            self._extract_entities(content)
        
        # Maintain buffer size
        if len(self.messages) > self.buffer_size * 2:
            self.messages = self.messages[-(self.buffer_size * 2):]
    
    def _extract_entities(self, user_message: str) -> None:
        """
        Extract entities from user message
        
        Args:
            user_message (str): User's message
        """
        # Use LLM to extract entities
        extraction_prompt = f"""
Extract key entities from this user message in a loan advisory conversation.

USER MESSAGE:
{user_message}

Extract in this JSON format:
{{
  "applicant_info": {{"name": "...", "age": ..., ...}},
  "property_info": {{"location": "...", "price": ..., ...}},
  "loan_info": {{"amount": ..., "down_payment": ..., ...}},
  "financial_info": {{"income": ..., "credit_score": ..., ...}}
}}

Only include entities that are explicitly mentioned. Return empty {{}} for categories with no information.

ENTITIES:
"""
        
        response = call_gpt4(
            [{"role": "user", "content": extraction_prompt}],
            "You are an entity extraction system. Output only valid JSON."
        )
        
        if response['success']:
            try:
                # Parse JSON response
                # Extract JSON from response (might have markdown backticks)
                json_match = re.search(r'\{.*\}', response['content'], re.DOTALL)
                if json_match:
                    extracted = json.loads(json_match.group())
                    
                    # Merge with existing entities
                    for entity_type, entities in extracted.items():
                        if entities:  # Only if not empty
                            if entity_type not in self.entities:
                                self.entities[entity_type] = {}
                            self.entities[entity_type].update(entities)
            except json.JSONDecodeError:
                pass  # Silently fail if JSON parsing fails
    
    def get_context(self) -> List[Dict]:
        """
        Get context with entities + recent buffer
        
        Returns:
            List[Dict]: Context messages
        """
        context = []
        
        # Add entities as structured context
        if self.entities:
            entity_text = "KNOWN INFORMATION:\n"
            for entity_type, entities in self.entities.items():
                entity_text += f"\n{entity_type.replace('_', ' ').title()}:\n"
                for key, value in entities.items():
                    entity_text += f"  - {key}: {value}\n"
            
            context.append({
                "role": "system",
                "content": entity_text.strip()
            })
        
        # Add recent messages
        context.extend(self.messages)
        
        return context
    
    def get_entity_summary(self) -> str:
        """Get readable summary of tracked entities"""
        if not self.entities:
            return "No entities tracked yet."
        
        summary = []
        for entity_type, entities in self.entities.items():
            summary.append(f"{entity_type.replace('_', ' ').title()}: {entities}")
        return "\n".join(summary)
    
    def get_stats(self) -> Dict:
        """Get memory statistics"""
        entity_count = sum(len(entities) for entities in self.entities.values())
        entity_tokens = count_tokens(self.get_entity_summary())
        
        return {
            "entity_types": len(self.entities),
            "total_entities": entity_count,
            "entity_tokens": entity_tokens,
            "buffer_messages": len(self.messages),
            "buffer_tokens": calculate_conversation_tokens(self.messages),
            "total_tokens": calculate_conversation_tokens(self.get_context())
        }

# Test entity memory
print("ENTITY MEMORY SYSTEM TEST:")
print("=" * 80)

entity_memory = EntityMemory(buffer_size=2)

# Simulate conversation with entity extraction
entity_conversation = [
    ("user", "Hi, I'm Sarah Chen and I'm looking for a mortgage."),
    ("assistant", "Hello Sarah! I'd be happy to help you with a mortgage."),
    ("user", "I'm 35 years old and I found a house in Seattle for $600,000."),
    ("assistant", "Great location! How much down payment do you have?"),
    ("user", "I can put down $120,000. My annual income is $95,000 and credit score is 750."),
    ("assistant", "Excellent! With 750 credit and 20% down, you'll get great rates."),
    ("user", "What rate can I expect?"),
    ("assistant", "For your profile, we're seeing 6.25% - 6.75% on 30-year fixed."),
]

for i, (role, content) in enumerate(entity_conversation):
    entity_memory.add_message(role, content)
    stats = entity_memory.get_stats()
    
    print(f"\nMessage {i+1}: {role}")
    if role == "user":
        print(f"  Content: '{content}'")
        print(f"  Entities tracked: {stats['total_entities']} across {stats['entity_types']} types")

print("\n" + "=" * 80)
print("\nTRACKED ENTITIES:")
print("-" * 80)
print(entity_memory.get_entity_summary())

print("\n" + "=" * 80)
print("\nRECENT BUFFER:")
print("-" * 80)
for msg in entity_memory.messages:
    print(f"  {msg['role']}: {msg['content'][:60]}...")

print("\n" + "=" * 80)

# Test entity recall
test_query = "What was my name, property price, and credit score again?"

context = entity_memory.get_context()
response = call_gpt4(
    context + [{"role": "user", "content": test_query}],
    "You are a loan advisor. Answer using known information."
)

print(f"\nENTITY RECALL TEST:")
print(f"Query: {test_query}")
print(f"Response: {response['content']}")
print(f"Context tokens: {calculate_conversation_tokens(context)}")
print("=" * 80)

### Key Takeaways

✓ **Precise** - Preserves exact values  
✓ **Efficient** - Very low token usage  
✓ **Scalable** - Entities don't grow linearly  
✓ **Queryable** - Easy to look up specific facts  
✗ **Complex** - Requires entity extraction  
✗ **Lossy** - Loses conversational flow

## CHALLENGE 4: HYBRID MEMORY APPROACH

**Time:** 10 minutes  
**Objective:** Combine multiple memory strategies

### Background

Production systems use hybrid approaches: entities for facts, buffer for flow, summaries for long-term history.

### Solution

In [None]:
# SOLUTION: Hybrid Memory System

class HybridMemory:
    """
    Production-grade hybrid memory combining all strategies
    """
    
    def __init__(
        self,
        buffer_size: int = 3,
        summarize_every: int = 10,
        track_entities: bool = True
    ):
        """
        Initialize hybrid memory
        
        Args:
            buffer_size (int): Recent messages to keep
            summarize_every (int): Summarize after this many messages
            track_entities (bool): Whether to extract entities
        """
        self.buffer_size = buffer_size
        self.summarize_every = summarize_every
        self.track_entities = track_entities
        
        # Storage
        self.entities = {}
        self.summary = ""
        self.messages = []
        self.all_messages = []
        self.summary_count = 0
    
    def add_message(self, role: str, content: str) -> None:
        """
        Add message and manage memory
        
        Args:
            role (str): 'user' or 'assistant'
            content (str): Message content
        """
        message = {
            "role": role,
            "content": content,
            "timestamp": datetime.now().isoformat()
        }
        
        self.messages.append(message)
        self.all_messages.append(message)
        
        # Extract entities from user messages
        if self.track_entities and role == "user":
            self._extract_entities_simple(content)
        
        # Check if we need to summarize
        if len(self.messages) >= self.summarize_every:
            self._generate_summary()
    
    def _extract_entities_simple(self, text: str) -> None:
        """
        Simplified entity extraction (pattern-based)
        
        Args:
            text (str): Text to extract from
        """
        # Extract numbers (amounts, credit scores, etc.)
        numbers = re.findall(r'\$?[\d,]+', text)
        
        # Extract locations (basic pattern)
        locations = re.findall(r'\b[A-Z][a-z]+(?: [A-Z][a-z]+)*\b', text)
        
        # Store in entities
        if 'numbers_mentioned' not in self.entities:
            self.entities['numbers_mentioned'] = set()
        if 'locations_mentioned' not in self.entities:
            self.entities['locations_mentioned'] = set()
        
        self.entities['numbers_mentioned'].update(numbers)
        self.entities['locations_mentioned'].update([loc for loc in locations if len(loc) > 3])
    
    def _generate_summary(self) -> None:
        """Generate summary and compress buffer"""
        # Create conversation text
        conv_text = "\n".join([
            f"{msg['role'].capitalize()}: {msg['content']}"
            for msg in self.messages[:-self.buffer_size]  # Don't summarize most recent
        ])
        
        if not conv_text:
            return
        
        summary_prompt = f"""
Concisely summarize this conversation excerpt in 200 tokens or less.
Focus on key facts, decisions, and important details.

{conv_text}
"""
        
        response = call_gpt4(
            [{"role": "user", "content": summary_prompt}],
            "You are a summarizer. Be concise and factual."
        )
        
        if response['success']:
            if self.summary:
                self.summary += f"\n{response['content']}"
            else:
                self.summary = response['content']
            
            self.summary_count += 1
            
            # Keep only recent messages
            self.messages = self.messages[-self.buffer_size * 2:]
    
    def get_context(self) -> List[Dict]:
        """
        Build context from all memory components
        
        Returns:
            List[Dict]: Context messages
        """
        context = []
        
        # 1. Add summary if exists
        if self.summary:
            context.append({
                "role": "system",
                "content": f"CONVERSATION HISTORY:\n{self.summary}"
            })
        
        # 2. Add entities if tracked
        if self.entities:
            entity_text = "KEY INFORMATION MENTIONED:\n"
            for entity_type, values in self.entities.items():
                if values:
                    entity_text += f"- {entity_type.replace('_', ' ').title()}: {', '.join(str(v) for v in list(values)[:5])}\n"
            
            context.append({
                "role": "system",
                "content": entity_text.strip()
            })
        
        # 3. Add recent message buffer
        context.extend(self.messages)
        
        return context
    
    def get_stats(self) -> Dict:
        """Get comprehensive memory statistics"""
        return {
            "summary_exists": bool(self.summary),
            "summary_tokens": count_tokens(self.summary) if self.summary else 0,
            "summary_count": self.summary_count,
            "entity_types": len(self.entities),
            "buffer_messages": len(self.messages),
            "buffer_tokens": calculate_conversation_tokens(self.messages),
            "total_context_tokens": calculate_conversation_tokens(self.get_context()),
            "messages_processed": len(self.all_messages)
        }

# Test hybrid memory
print("HYBRID MEMORY SYSTEM TEST:")
print("=" * 80)

hybrid = HybridMemory(
    buffer_size=2,
    summarize_every=6,
    track_entities=True
)

# Simulate extended conversation
extended_conv = [
    ("user", "Hi, I'm looking for a $550,000 mortgage in Denver."),
    ("assistant", "Great! Tell me about your financial situation."),
    ("user", "I earn $105,000 annually and have $110,000 for down payment."),
    ("assistant", "Excellent 20% down! What's your credit score?"),
    ("user", "It's 735."),
    ("assistant", "Perfect! You qualify for premium rates."),  # Triggers summarization
    ("user", "What interest rate can I expect?"),
    ("assistant", "For your profile, 6.5% - 7.0% on 30-year fixed."),
    ("user", "How much would my monthly payment be?"),
    ("assistant", "Principal & interest around $2,780/month."),
    ("user", "What about property taxes?"),
    ("assistant", "Denver taxes are ~0.5%, so about $2,290/year or $190/month."),  # Another summarization
    ("user", "Total monthly payment?"),
    ("assistant", "Around $3,100/month including taxes and insurance."),
]

for i, (role, content) in enumerate(extended_conv):
    hybrid.add_message(role, content)
    
    if (i + 1) % 4 == 0:  # Print stats every 4 messages
        stats = hybrid.get_stats()
        print(f"\nAfter message {i+1}:")
        print(f"  Summary: {stats['summary_exists']} ({stats['summary_tokens']} tokens)")
        print(f"  Entities: {stats['entity_types']} types tracked")
        print(f"  Buffer: {stats['buffer_messages']} messages ({stats['buffer_tokens']} tokens)")
        print(f"  Total context: {stats['total_context_tokens']} tokens")

print("\n" + "=" * 80)
print("\nFINAL HYBRID MEMORY STATE:")
print("-" * 80)

stats = hybrid.get_stats()

print(f"\n1. SUMMARY ({stats['summary_tokens']} tokens):")
if hybrid.summary:
    print(f"   {hybrid.summary[:200]}...")
else:
    print("   (No summary generated yet)")

print(f"\n2. ENTITIES ({stats['entity_types']} types):")
for entity_type, values in hybrid.entities.items():
    print(f"   - {entity_type}: {list(values)[:3]}")

print(f"\n3. RECENT BUFFER ({stats['buffer_messages']} messages):")
for msg in hybrid.messages[-4:]:
    print(f"   {msg['role']}: {msg['content'][:50]}...")

print(f"\n4. TOTAL CONTEXT: {stats['total_context_tokens']} tokens")

print("\n" + "=" * 80)

# Compare with naive approach
naive_tokens = calculate_conversation_tokens([
    {"role": role, "content": content}
    for role, content in extended_conv
])

print(f"\nEFFICIENCY COMPARISON:")
print(f"  Naive (all {len(extended_conv)} messages): {naive_tokens} tokens")
print(f"  Hybrid memory: {stats['total_context_tokens']} tokens")
print(f"  Savings: {naive_tokens - stats['total_context_tokens']} tokens ({(1 - stats['total_context_tokens']/naive_tokens)*100:.1f}%)")
print("=" * 80)

### Key Takeaways

✓ **Best of all worlds** - Combines strengths of each approach  
✓ **Production-ready** - Handles long conversations efficiently  
✓ **Flexible** - Can tune each component independently  
✓ **Accurate** - Preserves facts (entities) and flow (buffer)

## CHALLENGE 5: PRODUCTION STATE MANAGER

**Time:** 10 minutes  
**Objective:** Build complete production-ready conversation manager

### Solution

In [None]:
# SOLUTION: Production Conversation State Manager

class ConversationStateManager:
    """
    Production-grade conversation state management
    """
    
    def __init__(
        self,
        conversation_id: str,
        token_budget: int = 3000,
        strategy: str = "hybrid"
    ):
        """
        Initialize state manager
        
        Args:
            conversation_id (str): Unique conversation ID
            token_budget (int): Maximum tokens for conversation state
            strategy (str): Memory strategy ('buffer', 'summary', 'entity', 'hybrid')
        """
        self.conversation_id = conversation_id
        self.token_budget = token_budget
        self.strategy = strategy
        
        # Initialize appropriate memory system
        if strategy == "buffer":
            self.memory = BufferMemory(max_pairs=5)
        elif strategy == "summary":
            self.memory = SummaryMemory(summarize_every=8, buffer_size=3)
        elif strategy == "entity":
            self.memory = EntityMemory(buffer_size=3)
        elif strategy == "hybrid":
            self.memory = HybridMemory(buffer_size=3, summarize_every=8)
        else:
            raise ValueError(f"Unknown strategy: {strategy}")
        
        self.turn_count = 0
        self.created_at = datetime.now().isoformat()
    
    def add_turn(self, user_message: str, system_prompt: str = None) -> Dict:
        """
        Process a conversation turn
        
        Args:
            user_message (str): User's message
            system_prompt (str): Optional system prompt
        
        Returns:
            Dict: Assistant response with metadata
        """
        self.turn_count += 1
        
        # Add user message to memory
        self.memory.add_message("user", user_message)
        
        # Get context from memory
        context = self.memory.get_context() if hasattr(self.memory, 'get_context') else self.memory.get_messages()
        
        # Check token budget
        context_tokens = calculate_conversation_tokens(context)
        if context_tokens > self.token_budget:
            return {
                "success": False,
                "error": f"Context exceeds budget: {context_tokens} > {self.token_budget}",
                "content": ""
            }
        
        # Call LLM
        response = call_gpt4(
            context,
            system_prompt or "You are a helpful loan advisor assistant."
        )
        
        if response['success']:
            # Add assistant response to memory
            self.memory.add_message("assistant", response['content'])
        
        # Add metadata
        response['turn_number'] = self.turn_count
        response['context_tokens'] = context_tokens
        response['strategy'] = self.strategy
        
        return response
    
    def get_metrics(self) -> Dict:
        """Get conversation metrics"""
        memory_stats = self.memory.get_stats()
        
        return {
            "conversation_id": self.conversation_id,
            "turn_count": self.turn_count,
            "strategy": self.strategy,
            "created_at": self.created_at,
            **memory_stats
        }
    
    def export_state(self) -> Dict:
        """Export conversation state for persistence"""
        return {
            "conversation_id": self.conversation_id,
            "turn_count": self.turn_count,
            "strategy": self.strategy,
            "created_at": self.created_at,
            "memory_state": {
                "entities": getattr(self.memory, 'entities', {}),
                "summary": getattr(self.memory, 'summary', ''),
                "messages": getattr(self.memory, 'messages', [])
            }
        }

# Test production state manager
print("PRODUCTION STATE MANAGER TEST:")
print("=" * 80)

# Create conversation manager
manager = ConversationStateManager(
    conversation_id="CONV-2026-001",
    token_budget=2000,
    strategy="hybrid"
)

# Simulate realistic multi-turn conversation
conversation_turns = [
    "Hi, I need help with a mortgage application.",
    "I'm looking at a $480,000 home in Austin, Texas.",
    "I have $96,000 for down payment and my credit score is 710.",
    "My annual income is $92,000. What rate can I get?",
    "What documents do I need to provide?",
    "How long will the approval process take?",
    "Can I get pre-approved today?",
    "What's the next step?"
]

for i, user_msg in enumerate(conversation_turns):
    print(f"\n{'='*80}")
    print(f"TURN {i+1}")
    print(f"{'='*80}")
    print(f"User: {user_msg}")
    
    response = manager.add_turn(user_msg)
    
    if response['success']:
        print(f"\nAssistant: {response['content'][:150]}...")
        print(f"\nMetrics:")
        print(f"  Context tokens: {response['context_tokens']}/{manager.token_budget}")
        print(f"  Total tokens: {response['total_tokens']}")
        print(f"  Turn: {response['turn_number']}")
    else:
        print(f"\nError: {response['error']}")
        break

# Final metrics
print(f"\n{'='*80}")
print("FINAL CONVERSATION METRICS:")
print(f"{'='*80}")

metrics = manager.get_metrics()
for key, value in metrics.items():
    if not isinstance(value, (dict, list)):
        print(f"  {key}: {value}")

# Export state
state = manager.export_state()
print(f"\n{'='*80}")
print("EXPORTABLE STATE:")
print(f"{'='*80}")
print(json.dumps(state, indent=2, default=str)[:500] + "...")

print(f"\n{'='*80}")

### Key Features

✓ **Turn-based management** - Clean API for conversation flow  
✓ **Token budget enforcement** - Prevents context overflow  
✓ **Strategy selection** - Choose memory approach  
✓ **Metrics tracking** - Monitor conversation health  
✓ **State export** - Persistence ready

## LAB SUMMARY


### Token Savings Analysis

```
30-turn conversation (60 messages)

Naive approach: ~12,000 tokens
Buffer (last 6 messages): ~600 tokens (95% savings)
Summary: ~800 tokens (93% savings)  
Entity: ~400 tokens (97% savings)
Hybrid: ~1,200 tokens (90% savings, highest accuracy)
```

### Production Checklist

Before deploying stateful conversations:

- [ ] Choose appropriate memory strategy
- [ ] Set token budget (typically 2,000-4,000)
- [ ] Implement conversation ID tracking
- [ ] Add state persistence (database)
- [ ] Monitor token usage per conversation
- [ ] Set conversation timeout/max turns
- [ ] Implement conversation reset capability
- [ ] Test with realistic conversation lengths
- [ ] Handle summarization failures gracefully
- [ ] Log conversation metrics