# Day 2: State Management & Persistence in LangGraph

## 🎯 Learning Objectives
By the end of this session, you will:
- Master LangGraph's graph architecture (nodes, edges, conditional routing)
- Implement state persistence with different checkpointer types
- Understand state serialization and recovery mechanisms
- Handle errors and implement recovery patterns
- Build stateful agents that persist across sessions

## ⏱️ Session Structure (2 hours)
- **Learning Materials** (30 min): Theory and concepts
- **Hands-on Code** (60 min): Implementation and examples  
- **Practical Exercises** (30 min): Build and extend functionality

---

## 📖 Learning Materials (30 minutes)

### 📺 Video Resources
- [LangGraph Persistence Guide](https://langchain-ai.github.io/langgraph/concepts/persistence/) - Official documentation
- [DeepLearning.AI - AI Agents in LangGraph](https://www.deeplearning.ai/short-courses/ai-agents-in-langgraph/) - Module 2: State Management
- [LangChain Academy - Persistence Patterns](https://academy.langchain.com/) - Checkpointer deep dive

### 🧠 Theory: Graph State Management

#### What is State in LangGraph?
State in LangGraph represents the current data and context of your agent workflow. It flows through nodes and gets modified at each step.

**Key Concepts:**
- **Nodes**: Individual processing units that modify state
- **Edges**: Define how state flows between nodes
- **Conditional Edges**: Route state based on conditions
- **State Schema**: Defines the structure of your state (Pydantic, TypedDict, or dataclass)

#### Why Persistence?
Persistence allows agents to:
- Resume from interruptions
- Maintain conversation history
- Enable human-in-the-loop workflows
- Support debugging and replay
- Scale to long-running processes

#### Checkpointer Types
1. **InMemorySaver**: For testing and development
2. **SqliteSaver**: For local persistence and prototyping
3. **PostgresSaver**: For production environments

#### State Serialization
LangGraph uses `JsonPlusSerializer` to handle:
- LangChain objects (messages, documents)
- Pydantic models
- Python primitives
- Custom serializable objects

---
## 💻 Hands-on Code (60 minutes)

### Setup and Imports

In [None]:
# Install required packages
!pip install langgraph langchain langchain-openai pydantic python-dotenv
!pip install langgraph-checkpoint-sqlite  # For SQLite persistence
!pip install psycopg2-binary  # For PostgreSQL (optional)

In [None]:
import os
from typing import TypedDict, Literal, List, Optional
from pydantic import BaseModel, Field
from dotenv import load_dotenv

# LangGraph imports
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.checkpoint.sqlite import SqliteSaver
from langgraph.types import Command

# LangChain imports
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage, BaseMessage
from langchain_core.prompts import ChatPromptTemplate

# Load environment variables
load_dotenv()

# Configure OpenAI
openai_api_key = os.getenv("OPENAI_API_KEY")
if not openai_api_key:
    print("⚠️ Please set OPENAI_API_KEY in your .env file")
    print("Example: OPENAI_API_KEY=sk-...")
else:
    print("✅ OpenAI API key loaded successfully")

### 1. Basic State Schema with Pydantic

In [None]:
# Define state using Pydantic for type safety
class AgentState(BaseModel):
    """State schema for our agent with type validation"""
    messages: List[BaseMessage] = Field(default_factory=list, description="Conversation history")
    user_id: str = Field(description="User identifier")
    session_id: str = Field(description="Session identifier")
    task_status: Literal["pending", "processing", "completed", "error"] = Field(default="pending")
    metadata: dict = Field(default_factory=dict, description="Additional metadata")
    step_count: int = Field(default=0, description="Number of processing steps")

# Alternative: Using TypedDict (more performant, less validation)
class SimpleState(TypedDict):
    """Simpler state using TypedDict"""
    messages: List[BaseMessage]
    task_status: str
    step_count: int

print("📋 State schemas defined successfully")
print(f"AgentState fields: {list(AgentState.model_fields.keys())}")

### 2. Building a Basic Graph with State Flow

In [None]:
# Initialize OpenAI model
llm = ChatOpenAI(
    model="gpt-3.5-turbo",
    temperature=0,
    openai_api_key=openai_api_key
)

def process_input(state: AgentState) -> AgentState:
    """Process user input and update state"""
    print(f"📝 Processing input for user: {state.user_id}")
    
    # Get the last message
    if state.messages:
        last_message = state.messages[-1]
        if isinstance(last_message, HumanMessage):
            # Generate AI response
            response = llm.invoke([last_message])
            
            # Update state
            state.messages.append(response)
            state.step_count += 1
            state.task_status = "processing"
            
            print(f"🤖 Generated response: {response.content[:100]}...")
    
    return state

def finalize_response(state: AgentState) -> AgentState:
    """Finalize the response and mark as completed"""
    print(f"✅ Finalizing response for session: {state.session_id}")
    
    state.task_status = "completed"
    state.metadata["completed_at"] = "2025-01-13"  # In real app, use datetime.now()
    
    return state

# Conditional edge function
def should_continue(state: AgentState) -> Literal["finalize", "process_more"]:
    """Decide whether to continue processing or finalize"""
    if state.step_count >= 2:  # Simple condition
        return "finalize"
    return "process_more"

print("🔧 Node functions defined successfully")

### 3. InMemorySaver - For Development and Testing

In [None]:
# Create graph with InMemorySaver
def create_basic_graph():
    """Create a basic graph with in-memory persistence"""
    
    # Create the graph
    graph = StateGraph(AgentState)
    
    # Add nodes
    graph.add_node("process_input", process_input)
    graph.add_node("finalize", finalize_response)
    
    # Add edges
    graph.add_edge(START, "process_input")
    graph.add_conditional_edges(
        "process_input",
        should_continue,
        {
            "finalize": "finalize",
            "process_more": "process_input"  # Loop back
        }
    )
    graph.add_edge("finalize", END)
    
    # Compile with InMemorySaver
    memory = InMemorySaver()
    app = graph.compile(checkpointer=memory)
    
    return app, memory

# Test the basic graph
app, memory = create_basic_graph()
print("🎯 Basic graph created with InMemorySaver")

# Test with sample state
initial_state = AgentState(
    messages=[HumanMessage(content="Hello, can you help me learn LangGraph?")],
    user_id="user123",
    session_id="session456"
)

config = {"configurable": {"thread_id": "test-thread"}}

print("\n🚀 Running graph...")
result = app.invoke(initial_state, config=config)
print(f"\n📊 Final state: Task status = {result.task_status}, Steps = {result.step_count}")
print(f"💬 Messages: {len(result.messages)} total")

### 4. SqliteSaver - For Local Persistence

In [None]:
def create_sqlite_graph():
    """Create a graph with SQLite persistence"""
    
    # Create the graph (same structure)
    graph = StateGraph(AgentState)
    graph.add_node("process_input", process_input)
    graph.add_node("finalize", finalize_response)
    graph.add_edge(START, "process_input")
    graph.add_conditional_edges(
        "process_input",
        should_continue,
        {"finalize": "finalize", "process_more": "process_input"}
    )
    graph.add_edge("finalize", END)
    
    # Compile with SQLite persistence
    sqlite_saver = SqliteSaver.from_conn_string("checkpoints.db")
    app = graph.compile(checkpointer=sqlite_saver)
    
    return app, sqlite_saver

# Create SQLite-backed graph
sqlite_app, sqlite_saver = create_sqlite_graph()
print("💾 SQLite graph created - data persists to 'checkpoints.db'")

# Test persistence
config_sqlite = {"configurable": {"thread_id": "persistent-thread"}}

# First run
print("\n🔄 First run with SQLite persistence...")
persistent_state = AgentState(
    messages=[HumanMessage(content="Tell me about state persistence in LangGraph")],
    user_id="persistent_user",
    session_id="persistent_session"
)

result1 = sqlite_app.invoke(persistent_state, config=config_sqlite)
print(f"📈 After first run: {result1.task_status}, Steps: {result1.step_count}")

# Simulate resuming from checkpoint
print("\n🔄 Resuming from checkpoint...")
# Get current state
current_state = sqlite_app.get_state(config_sqlite)
print(f"📋 Retrieved state: Steps = {current_state.values.step_count}")
print(f"💬 Messages in history: {len(current_state.values.messages)}")

### 5. Error Handling and Recovery

In [None]:
def error_prone_process(state: AgentState) -> AgentState:
    """A node that might fail to demonstrate error handling"""
    print(f"⚡ Processing step {state.step_count + 1}")
    
    # Simulate an error on step 2
    if state.step_count == 1:
        state.task_status = "error"
        state.metadata["error"] = "Simulated processing error"
        print("❌ Error occurred during processing")
        raise Exception("Simulated processing error")
    
    # Normal processing
    if state.messages:
        last_message = state.messages[-1]
        if isinstance(last_message, HumanMessage):
            response = llm.invoke([last_message])
            state.messages.append(response)
    
    state.step_count += 1
    state.task_status = "processing"
    
    return state

def recovery_node(state: AgentState) -> AgentState:
    """Recovery node that handles errors"""
    print("🔧 Attempting recovery...")
    
    if state.task_status == "error":
        # Reset error state
        state.task_status = "processing"
        state.metadata["recovered"] = True
        state.messages.append(AIMessage(content="I encountered an error but recovered successfully."))
        print("✅ Recovery successful")
    
    return state

def create_error_handling_graph():
    """Create a graph with error handling capabilities"""
    
    graph = StateGraph(AgentState)
    
    # Add nodes including recovery
    graph.add_node("process", error_prone_process)
    graph.add_node("recover", recovery_node)
    graph.add_node("finalize", finalize_response)
    
    # Add edges
    graph.add_edge(START, "process")
    
    # Conditional routing based on state
    def route_after_process(state: AgentState) -> Literal["recover", "finalize", "process"]:
        if state.task_status == "error":
            return "recover"
        elif state.step_count >= 3:
            return "finalize"
        else:
            return "process"
    
    graph.add_conditional_edges(
        "process",
        route_after_process,
        {"recover": "recover", "finalize": "finalize", "process": "process"}
    )
    
    graph.add_edge("recover", "process")  # Try again after recovery
    graph.add_edge("finalize", END)
    
    # Use SQLite for persistence during error recovery
    saver = SqliteSaver.from_conn_string("error_recovery.db")
    app = graph.compile(checkpointer=saver)
    
    return app

# Test error handling
error_app = create_error_handling_graph()
print("🛡️ Error handling graph created")

error_config = {"configurable": {"thread_id": "error-test"}}
error_state = AgentState(
    messages=[HumanMessage(content="Test error handling")],
    user_id="error_user",
    session_id="error_session"
)

print("\n🧪 Testing error handling and recovery...")
try:
    final_result = error_app.invoke(error_state, config=error_config)
    print(f"🎯 Final result: {final_result.task_status}")
    print(f"🔄 Recovery attempted: {'recovered' in final_result.metadata}")
except Exception as e:
    print(f"❌ Unhandled error: {e}")
    # Show how to recover from checkpoint
    print("📁 Checking saved checkpoint...")
    saved_state = error_app.get_state(error_config)
    if saved_state:
        print(f"💾 Checkpoint exists with status: {saved_state.values.task_status}")

### 6. Advanced State Management Patterns

In [None]:
# Complex state with nested data
class AdvancedAgentState(BaseModel):
    """Advanced state with complex data structures"""
    messages: List[BaseMessage] = Field(default_factory=list)
    user_profile: dict = Field(default_factory=dict)
    conversation_context: dict = Field(default_factory=dict)
    processing_history: List[dict] = Field(default_factory=list)
    current_task: Optional[str] = None
    subtasks: List[str] = Field(default_factory=list)
    metadata: dict = Field(default_factory=dict)

def context_aware_processor(state: AdvancedAgentState) -> AdvancedAgentState:
    """Process with context awareness"""
    print("🧠 Context-aware processing...")
    
    # Add to processing history
    state.processing_history.append({
        "step": len(state.processing_history) + 1,
        "timestamp": "2025-01-13T10:00:00Z",
        "action": "context_processing"
    })
    
    # Update conversation context
    if state.messages:
        last_msg = state.messages[-1]
        state.conversation_context["last_topic"] = "LangGraph learning"
        state.conversation_context["message_count"] = len(state.messages)
        
        # Generate contextual response
        context_prompt = f"""
        Based on the conversation context: {state.conversation_context}
        User message: {last_msg.content if hasattr(last_msg, 'content') else str(last_msg)}
        
        Provide a helpful response about LangGraph state management.
        """
        
        response = llm.invoke([HumanMessage(content=context_prompt)])
        state.messages.append(response)
    
    return state

# Create advanced graph
def create_advanced_graph():
    """Create graph with advanced state management"""
    
    graph = StateGraph(AdvancedAgentState)
    graph.add_node("context_process", context_aware_processor)
    graph.add_edge(START, "context_process")
    graph.add_edge("context_process", END)
    
    # Use SQLite with custom table name
    saver = SqliteSaver.from_conn_string("advanced_state.db")
    app = graph.compile(checkpointer=saver)
    
    return app

# Test advanced state management
advanced_app = create_advanced_graph()
print("🚀 Advanced state management graph created")

advanced_state = AdvancedAgentState(
    messages=[HumanMessage(content="How does state serialization work?")],
    user_profile={"name": "Developer", "experience": "intermediate"},
    current_task="learning_state_management",
    subtasks=["understand_checkpoints", "implement_persistence"]
)

advanced_config = {"configurable": {"thread_id": "advanced-test"}}

print("\n🔬 Testing advanced state management...")
advanced_result = advanced_app.invoke(advanced_state, config=advanced_config)

print(f"📊 Processing history steps: {len(advanced_result.processing_history)}")
print(f"🎯 Current task: {advanced_result.current_task}")
print(f"📝 Context keys: {list(advanced_result.conversation_context.keys())}")
print(f"💬 Total messages: {len(advanced_result.messages)}")

---
## 🛠️ Practical Exercises (30 minutes)

### Exercise 1: Build a Persistent Task Manager
**Goal**: Create an agent that manages a to-do list with SQLite persistence.

**Requirements**:
- State should track: tasks list, completed tasks, current priority
- Implement: add_task, complete_task, list_tasks nodes
- Use SQLite for persistence
- Handle task priorities (high, medium, low)

In [None]:
# Exercise 1: Your implementation here
class TaskManagerState(BaseModel):
    """State for task management system"""
    # TODO: Define your state schema
    pass

def add_task_node(state: TaskManagerState) -> TaskManagerState:
    """Add a new task"""
    # TODO: Implement task addition logic
    pass

def complete_task_node(state: TaskManagerState) -> TaskManagerState:
    """Mark a task as completed"""
    # TODO: Implement task completion logic
    pass

# TODO: Create and test your task manager graph
print("📝 Exercise 1: Implement your task manager here")

### Exercise 2: Implement Error Recovery System
**Goal**: Build a robust system that can recover from various types of failures.

**Requirements**:
- Create nodes that simulate different error types
- Implement recovery strategies for each error type  
- Use checkpoints to resume from last good state
- Log all recovery attempts

In [None]:
# Exercise 2: Your implementation here
class RobustAgentState(BaseModel):
    """State for robust error handling system"""
    # TODO: Define your state schema for error tracking
    pass

def network_operation(state: RobustAgentState) -> RobustAgentState:
    """Simulate network operation that might fail"""
    # TODO: Implement with potential network errors
    pass

def data_processing(state: RobustAgentState) -> RobustAgentState:
    """Simulate data processing that might fail"""
    # TODO: Implement with potential processing errors
    pass

# TODO: Create your robust error handling graph
print("🛡️ Exercise 2: Implement your error recovery system here")

### Challenge: Create a Stateful Conversation Agent
**Goal**: Build an advanced conversational agent that maintains context across sessions.

**Advanced Requirements**:
- Multi-turn conversation with context retention
- User preference learning and adaptation
- Conversation summarization for long sessions
- PostgreSQL persistence for production-ready deployment
- Session management with automatic cleanup

In [None]:
# Challenge: Your implementation here
class ConversationState(BaseModel):
    """Advanced conversation state"""
    # TODO: Design comprehensive conversation state
    pass

# TODO: Implement your advanced conversational agent
print("🎯 Challenge: Build your advanced conversation agent here")
print("💡 Hint: Consider conversation summarization, user preferences, and session management")

---
## 📚 Solutions and Best Practices

### Exercise 1 Solution: Task Manager

In [None]:
# Complete solution for Exercise 1
from enum import Enum
from datetime import datetime

class Priority(str, Enum):
    HIGH = "high"
    MEDIUM = "medium"
    LOW = "low"

class Task(BaseModel):
    id: str
    title: str
    priority: Priority
    created_at: str
    completed: bool = False

class TaskManagerState(BaseModel):
    tasks: List[Task] = Field(default_factory=list)
    completed_tasks: List[Task] = Field(default_factory=list)
    current_priority: Priority = Priority.MEDIUM
    last_action: str = ""
    user_id: str

def add_task_node(state: TaskManagerState) -> TaskManagerState:
    """Add a new task based on the last message"""
    # In a real implementation, you'd parse the task from user input
    new_task = Task(
        id=f"task_{len(state.tasks) + 1}",
        title=f"Sample task {len(state.tasks) + 1}",
        priority=state.current_priority,
        created_at=datetime.now().isoformat()
    )
    
    state.tasks.append(new_task)
    state.last_action = f"Added task: {new_task.title}"
    
    return state

def complete_task_node(state: TaskManagerState) -> TaskManagerState:
    """Complete the first pending task"""
    if state.tasks:
        task_to_complete = state.tasks.pop(0)
        task_to_complete.completed = True
        state.completed_tasks.append(task_to_complete)
        state.last_action = f"Completed task: {task_to_complete.title}"
    else:
        state.last_action = "No tasks to complete"
    
    return state

def list_tasks_node(state: TaskManagerState) -> TaskManagerState:
    """List current tasks"""
    pending_count = len(state.tasks)
    completed_count = len(state.completed_tasks)
    
    state.last_action = f"Listed tasks: {pending_count} pending, {completed_count} completed"
    
    return state

# Create task manager graph
def create_task_manager():
    graph = StateGraph(TaskManagerState)
    
    graph.add_node("add_task", add_task_node)
    graph.add_node("complete_task", complete_task_node) 
    graph.add_node("list_tasks", list_tasks_node)
    
    # Simple routing - in real app, would parse user intent
    graph.add_edge(START, "add_task")
    graph.add_edge("add_task", "list_tasks")
    graph.add_edge("list_tasks", END)
    
    saver = SqliteSaver.from_conn_string("task_manager.db")
    return graph.compile(checkpointer=saver)

# Test the task manager
task_app = create_task_manager()
task_state = TaskManagerState(user_id="user123")
task_config = {"configurable": {"thread_id": "task-session"}}

result = task_app.invoke(task_state, config=task_config)
print(f"✅ Task Manager Solution: {result.last_action}")
print(f"📋 Pending tasks: {len(result.tasks)}, Completed: {len(result.completed_tasks)}")

---
## 🔧 Troubleshooting Common Issues

### Serialization Errors
```python
# ❌ Common issue: Non-serializable objects in state
# Fix: Use Pydantic models or ensure all objects are JSON-serializable

# ✅ Good practice
class SerializableState(BaseModel):
    data: dict  # JSON-serializable
    timestamp: str  # Use string instead of datetime
```

### Database Connection Issues
```python
# ✅ Always use connection strings properly
try:
    saver = SqliteSaver.from_conn_string("my_app.db")
except Exception as e:
    print(f"Database error: {e}")
    # Fallback to in-memory
    saver = InMemorySaver()
```

### State Schema Changes
```python
# ✅ Handle schema migrations gracefully
def migrate_state(old_state: dict) -> AgentState:
    """Migrate old state format to new schema"""
    # Add migration logic here
    return AgentState(**old_state)
```

---
## 📖 Summary and Next Steps

### What You've Learned:
✅ **Graph Architecture**: Nodes, edges, and conditional routing  
✅ **State Management**: Pydantic schemas and type safety  
✅ **Persistence**: InMemory, SQLite, and PostgreSQL checkpointers  
✅ **Error Handling**: Recovery patterns and checkpoint resumption  
✅ **Advanced Patterns**: Complex state structures and context awareness  

### Best Practices Covered:
- Use Pydantic for type-safe state management
- Choose appropriate checkpointer for your use case
- Implement proper error handling and recovery
- Design state schemas for scalability
- Use meaningful thread IDs for session management

### Tomorrow's Preview (Day 3):
🧠 **Memory Systems & Knowledge Management**
- Semantic, episodic, and procedural memory
- Vector storage with OpenAI embeddings
- Cross-session knowledge retention
- Memory management tools and strategies

### Resources for Further Learning:
- [LangGraph Persistence Documentation](https://langchain-ai.github.io/langgraph/concepts/persistence/)
- [Checkpointer Reference](https://langchain-ai.github.io/langgraph/reference/checkpoints/)
- [State Management Best Practices](https://langchain-ai.github.io/langgraph/how-tos/)

**🎯 You're now ready to build stateful, persistent LangGraph applications!**

In [None]:
# Clean up resources
print("🧹 Session complete! Database files created:")
import os
db_files = [f for f in os.listdir('.') if f.endswith('.db')]
for db_file in db_files:
    print(f"  📁 {db_file}")

print("\n🎉 Day 2 Complete! You've mastered state management and persistence in LangGraph.")
print("🚀 Ready for Day 3: Memory Systems & Knowledge Management")