# Advanced Patterns for Production LLM Systems

**Duration**: ~1-1.5 hours (streamlined)

## What You'll Learn

This notebook covers **essential patterns for building production LLM systems**:

1. **Framework Decision Guide** - When to use LangChain vs LangGraph
2. **Memory Patterns** - Managing conversation context
3. **Multi-Agent Basics** - Supervisor pattern for team workflows
4. **Human-in-the-Loop** - Approval workflows and interrupts
5. **Production Checklist** - Readiness for deployment

## Prerequisites

✅ **Completed Notebooks 03 & 04** - LangChain Essentials, LangGraph Essentials  
✅ OpenAI API key  
✅ Understanding of LCEL, RAG, and basic agents

## Learning Approach

**Concept-Focused**: We'll cover WHAT each pattern does and WHY it matters, with simple working examples.

For detailed implementations, see the **Advanced Reference** section at the end of this notebook.

---

In [None]:
# Install packages
!pip install -qU langchain langchain-openai langgraph langgraph-checkpoint-sqlite

print("✅ Packages installed!")

In [None]:
# Setup API key
import os
from getpass import getpass

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key: ")

print("✅ API key configured!")

---

## Section 1: Framework Decision Guide

### When to Use LangChain vs LangGraph?

**Decision Matrix**:

| Use Case | Framework | Why |
|----------|-----------|-----|
| Simple RAG chatbot | **LangChain (LCEL)** | Linear workflow, no cycles |
| Document Q&A | **LangChain (LCEL)** | Stateless retrieve → answer |
| Code generator with testing | **LangGraph** | Retry loops needed |
| Multi-agent content team | **LangGraph** | Multiple agents, shared state |
| Approval workflows | **LangGraph** | Human interrupts required |
| Research agent with decisions | **LangGraph** | Conditional routing |

### Quick Decision Flow

```
Do you need loops or retries?
├─ YES → LangGraph
└─ NO → Do you need complex shared state?
    ├─ YES → LangGraph
    └─ NO → Do you need multiple agents?
        ├─ YES → LangGraph
        └─ NO → LangChain (LCEL)
```

### Key Insight

- **LangChain (LCEL)** = Stateless chains with pipe syntax: `prompt | llm | parser`
- **LangGraph** = Stateful workflows with cycles, decisions, and multi-agent coordination

**Rule of thumb**: Start with LangChain. Migrate to LangGraph when you need state, loops, or multi-agent.

---

In [None]:
# Decision helper function
def should_use_langgraph(
    needs_loops=False,
    needs_shared_state=False,
    needs_multi_agent=False,
    needs_human_approval=False
):
    """Quick framework decision helper"""
    if any([needs_loops, needs_shared_state, needs_multi_agent, needs_human_approval]):
        return "LangGraph"
    return "LangChain (LCEL)"

# Test examples
print("Simple RAG chatbot:", should_use_langgraph())
print("Code gen with retry:", should_use_langgraph(needs_loops=True))
print("Multi-agent team:", should_use_langgraph(needs_multi_agent=True))
print("Approval workflow:", should_use_langgraph(needs_human_approval=True))

---

## Section 2: Memory Patterns

### Why Memory Matters

LLM chains are **stateless by default**. Each call is independent.

**Memory enables**:
- Multi-turn conversations
- Context retention across messages
- Personalized responses based on history

### Memory Pattern Comparison

| Pattern | When to Use | Pros | Cons |
|---------|-------------|------|------|
| **Buffer Memory** | Short conversations (< 10 turns) | Preserves all context | Token costs grow |
| **Window Memory** | Medium conversations (10-50 turns) | Fixed cost (last N messages) | Loses older context |
| **Summary Memory** | Long conversations (50+ turns) | Constant token cost | May lose details |

### Modern Approach: RunnableWithMessageHistory

The 2025 standard for adding memory to LCEL chains.

**Key components**:
1. **Session management** - `session_id` separates conversations
2. **Message storage** - `ChatMessageHistory` stores messages
3. **Automatic injection** - History automatically added to prompts

---

In [None]:
# Simple conversational chatbot with memory
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory

# Create base chain
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{question}")
])

chain = prompt | llm

# Session storage
store = {}  # session_id -> ChatMessageHistory

def get_session_history(session_id: str):
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

# Add memory to chain
chain_with_memory = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history"
)

# Test
config = {"configurable": {"session_id": "user-123"}}

r1 = chain_with_memory.invoke({"question": "My name is Alice"}, config)
print(f"Turn 1: {r1.content}")

r2 = chain_with_memory.invoke({"question": "What's my name?"}, config)
print(f"Turn 2: {r2.content}")

print("\n✅ Memory works - name remembered across turns!")

---

## Section 3: Multi-Agent Basics

### The Supervisor Pattern

**Problem**: Complex tasks need specialized skills (research, writing, coding, etc.)

**Solution**: Supervisor pattern - one orchestrator coordinates multiple specialist agents.

```
           ┌─────────────┐
           │ Supervisor  │  ← Routes work to specialists
           └──────┬──────┘
                  │
       ┌──────────┼──────────┐
       ↓          ↓          ↓
  [Researcher] [Writer] [Reviewer]
```

### When to Use

- Content creation teams (research → write → edit)
- Data analysis pipelines (extract → transform → analyze)
- Customer support (classify → route → respond)

### Key Concept

**Supervisor** decides which specialist agent to call next based on current state. All agents share a common state.

---

In [None]:
# Simple 2-agent example: Researcher + Writer
from langgraph.graph import StateGraph, START, END
from typing import TypedDict, Annotated
from operator import add

# Shared state
class TeamState(TypedDict):
    messages: Annotated[list[str], add]
    topic: str
    research: str
    article: str
    next_agent: str

# Agents
def researcher(state: TeamState):
    topic = state["topic"]
    research = f"Research on {topic}: Key facts include..."
    return {
        "research": research,
        "messages": ["Researcher: Completed research"]
    }

def writer(state: TeamState):
    research = state["research"]
    article = f"Article based on: {research[:50]}..."
    return {
        "article": article,
        "messages": ["Writer: Completed article"]
    }

def supervisor(state: TeamState):
    """Decides next agent"""
    if not state.get("research"):
        return {"next_agent": "researcher"}
    elif not state.get("article"):
        return {"next_agent": "writer"}
    else:
        return {"next_agent": "END"}

# Build graph
workflow = StateGraph(TeamState)
workflow.add_node("supervisor", supervisor)
workflow.add_node("researcher", researcher)
workflow.add_node("writer", writer)

# Routes
workflow.add_edge(START, "supervisor")
workflow.add_conditional_edges(
    "supervisor",
    lambda s: s["next_agent"],
    {"researcher": "researcher", "writer": "writer", "END": END}
)
workflow.add_edge("researcher", "supervisor")
workflow.add_edge("writer", "supervisor")

team = workflow.compile()

# Run
result = team.invoke({"messages": [], "topic": "AI safety", "research": "", "article": "", "next_agent": ""})
print("\n".join(result["messages"]))
print(f"\nFinal article: {result['article']}")

---

## Section 4: Human-in-the-Loop

### Why Human Approval Matters

**Use cases**:
- Publishing content (review before public release)
- Executing sensitive operations (database deletions, API calls)
- Financial transactions (payment approvals)
- Policy decisions (final human judgment)

### How Interrupts Work

1. Graph executes normally
2. Reaches node with `interrupt_before` → **PAUSES**
3. Saves state via checkpointing
4. Waits for human input
5. Resume execution after approval

### Key Concept

**Checkpointing required**: Can't interrupt without saving state!

---

In [None]:
# Simple approval workflow
from langgraph.checkpoint.memory import MemorySaver

class ApprovalState(TypedDict):
    content: str
    approved: bool

def generate_content(state: ApprovalState):
    return {"content": "Draft: Important announcement..."}

def publish_content(state: ApprovalState):
    return {"content": f"Published: {state['content']}"}

# Build workflow
approval_workflow = StateGraph(ApprovalState)
approval_workflow.add_node("generate", generate_content)
approval_workflow.add_node("publish", publish_content)

approval_workflow.add_edge(START, "generate")
approval_workflow.add_edge("generate", "publish")
approval_workflow.add_edge("publish", END)

# Compile with interrupt
memory = MemorySaver()
app = approval_workflow.compile(
    checkpointer=memory,
    interrupt_before=["publish"]  # Pause before publishing
)

# Run - will pause at publish
config = {"configurable": {"thread_id": "approval-1"}}
state = app.invoke({"content": "", "approved": False}, config)

print(f"Generated content: {state['content']}")
print("\n⏸️  PAUSED - Waiting for human approval...")
print("(In production, human would review and approve)")

# Resume (simulate approval)
final_state = app.invoke(None, config)
print(f"\n{final_state['content']}")
print("✅ Human-in-the-loop workflow completed!")

---

## Section 5: Production Readiness Checklist

Before deploying LLM systems to production, ensure you have these patterns in place:

### Reliability

- ✅ **Retry logic** with exponential backoff (handle transient failures)
- ✅ **Circuit breakers** (fail fast when service is down)
- ✅ **Timeouts** (prevent hanging requests)
- ✅ **Fallback responses** (graceful degradation)

### Security

- ✅ **Input validation** (prevent prompt injection)
- ✅ **Output filtering** (content moderation)
- ✅ **PII detection** (redact sensitive data)
- ✅ **Rate limiting** (prevent abuse)

### Performance

- ✅ **Caching** (in-memory, prompt caching, semantic caching)
- ✅ **Streaming** (faster perceived latency)
- ✅ **Batching** (process multiple requests efficiently)

### Quality

- ✅ **Hallucination detection** (LLM-as-judge, fact-checking)
- ✅ **Output validation** (schema enforcement with Pydantic)
- ✅ **Evaluation metrics** (accuracy, relevance, quality scores)

### Observability

- ✅ **Logging** (structured logs for debugging)
- ✅ **Metrics** (latency, error rate, token usage, cache hit rate)
- ✅ **Tracing** (LangSmith or similar for request tracking)
- ✅ **Alerting** (notify on errors, anomalies)

**Next**: See `06_production_patterns_evaluation.ipynb` for detailed implementation of these patterns.

---

---

## Section 6: Summary & Key Takeaways

### What You Learned

**Framework Decision-Making**:
- ✅ When to use LangChain vs LangGraph
- ✅ Decision matrix for architecture choices

**Essential Patterns**:
- ✅ Memory management for multi-turn conversations
- ✅ Multi-agent supervisor pattern for complex workflows
- ✅ Human-in-the-loop for approval workflows

**Production Readiness**:
- ✅ Comprehensive checklist for deployment
- ✅ Understanding of reliability, security, performance, and quality requirements

### Key Insights

1. **Start Simple**: Use LangChain (LCEL) until you need LangGraph's advanced features
2. **Memory**: `RunnableWithMessageHistory` is the 2025 standard
3. **Multi-Agent**: Supervisor pattern scales to complex team workflows
4. **Human Approval**: Requires checkpointing + interrupts
5. **Production**: See comprehensive checklist above

### Next Steps

1. **Practice**: Build a conversational chatbot with memory
2. **Experiment**: Try multi-agent pattern with 3+ agents
3. **Production Patterns**: Study `06_production_patterns_evaluation.ipynb`
4. **Deploy**: Follow production checklist before going live

### Resources

- [LangChain Documentation](https://python.langchain.com/)
- [LangGraph Documentation](https://www.langchain.com/langgraph)
- [LangSmith for Observability](https://www.langchain.com/langsmith)

---

**Well done!** You now understand the key patterns for building production LLM systems. 🎉

---

## Advanced Reference

This section contains **detailed implementations** for those who want to dive deeper. Not required for the main learning flow.

### Available Examples:
1. **Advanced State Management** - Custom reducers, Annotated types
2. **Complex Multi-Agent** - 3+ agent teams with routing logic
3. **Production Error Handling** - Retry patterns, circuit breakers
4. **Advanced Memory** - Window memory, summary memory implementations

**Note**: These are reference materials. The concepts above are sufficient for most use cases.

---

### Advanced Example 1: Custom State Reducers

For specialized state merge behavior beyond default replacement.

In [None]:
# Advanced: Custom state reducers
from typing import Annotated
import operator

class AdvancedState(TypedDict):
    # Append to list
    messages: Annotated[list, operator.add]
    
    # Take maximum value
    score: Annotated[int, lambda x, y: max(x, y)]
    
    # Concatenate strings with separator
    notes: Annotated[str, lambda x, y: f"{x} | {y}"]
    
    # Default: replace
    status: str

# Usage in nodes
def node1(state: AdvancedState):
    return {
        "messages": ["Message 1"],
        "score": 10,
        "notes": "Node1 executed",
        "status": "processing"
    }

def node2(state: AdvancedState):
    return {
        "messages": ["Message 2"],
        "score": 15,  # Will take max(10, 15) = 15
        "notes": "Node2 executed",
        "status": "completed"  # Will replace "processing"
    }

print("✅ Advanced state management pattern")

### Advanced Example 2: Error Handling with Retry

Production-grade error handling pattern.

In [None]:
# Advanced: Retry logic with exponential backoff
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10)
)
def call_llm_with_retry(prompt: str):
    """LLM call with automatic retry on failures"""
    from langchain_openai import ChatOpenAI
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
    return llm.invoke(prompt)

# Test
try:
    result = call_llm_with_retry("What is 2+2?")
    print(f"Result: {result.content}")
    print("✅ Retry pattern implemented (will retry up to 3 times on failure)")
except Exception as e:
    print(f"Failed after retries: {e}")

---

**End of Notebook**

For production patterns (guardrails, caching, hallucination detection, etc.), continue to:
👉 **06_production_patterns_evaluation.ipynb**