# LangGraph Tutorial: Building State Machine Agents

**Recommended for: Production systems requiring explicit control flow**

This tutorial will teach you how to build AI agents using LangGraph, a framework that models agent workflows as explicit state machines. By the end, you'll understand:

1. What LangGraph is and how it differs from other frameworks
2. How to define state with TypedDict
3. How to create nodes (processing steps) and edges (transitions)
4. How to implement tool calling loops
5. How to integrate with DSPy for prompt optimization
6. How to track experiments with MLFlow

---

## Why LangGraph?

LangGraph treats agent workflows as **graphs**—nodes represent processing steps, edges represent transitions. This makes workflows:

- **Debuggable**: You can see exactly which step failed
- **Visualizable**: Generate diagrams of your workflow
- **Checkpointable**: Save and resume long-running workflows
- **Testable**: Test individual nodes in isolation

## 1. Installation & Setup

In [None]:
# Install required packages
# !pip install langchain langchain-openai langgraph httpx beautifulsoup4 python-dotenv

In [None]:
import os
import json
from typing import TypedDict, Annotated, Optional
from dotenv import load_dotenv

load_dotenv()

# Verify API key
api_key = os.getenv("OPENAI_API_KEY")
print(f"API key found: {api_key[:8]}..." if api_key else "WARNING: No API key found")

## 2. Core Concepts: State and Graphs

In LangGraph, **state** is a TypedDict that flows through your graph. Each node can read and modify the state.

In [None]:
from langgraph.graph import StateGraph, END

# Define the state that flows through our workflow
class ClaimState(TypedDict):
    """State for CAT claim processing."""
    # Input fields
    city: str
    state: str
    postcode: str
    date: str
    
    # Fields populated by weather agent
    latitude: Optional[float]
    longitude: Optional[float]
    thunderstorms: Optional[str]
    strong_wind: Optional[str]
    
    # Fields populated by eligibility agent
    cat_status: Optional[str]
    decision: Optional[str]
    confidence: Optional[float]
    reasoning: Optional[str]

print("State schema defined!")

## 3. Define Tools with @tool Decorator

In [None]:
import httpx
from bs4 import BeautifulSoup
from langchain_core.tools import tool

@tool
def geocode_location(city: str, state: str, postcode: str) -> dict:
    """Convert an Australian address to latitude/longitude coordinates.
    
    Args:
        city: City name (e.g., "Brisbane")
        state: Australian state code (e.g., "QLD")
        postcode: Postcode (e.g., "4000")
    
    Returns:
        Dictionary with latitude, longitude, and display_name
    """
    print(f"  Tool: geocode_location({city}, {state}, {postcode})")
    
    query = f"{city}, {state}, {postcode}, Australia"
    with httpx.Client() as client:
        response = client.get(
            "https://nominatim.openstreetmap.org/search",
            params={"q": query, "format": "json", "countrycodes": "au"},
            headers={"User-Agent": "LangGraphTutorial/1.0"},
            timeout=10.0
        )
        data = response.json()
    
    if data:
        result = {
            "latitude": float(data[0]["lat"]),
            "longitude": float(data[0]["lon"]),
            "display_name": data[0].get("display_name", "")
        }
    else:
        result = {"error": f"Location not found: {query}"}
    
    print(f"    Result: {result}")
    return result


@tool
def get_bom_weather(lat: float, lon: float, date: str, state: str) -> dict:
    """Fetch weather observations from Australian Bureau of Meteorology.
    
    Args:
        lat: Latitude
        lon: Longitude
        date: Date in YYYY-MM-DD format
        state: Australian state code
    
    Returns:
        Dictionary with thunderstorms and strong_wind observations
    """
    print(f"  Tool: get_bom_weather({lat}, {lon}, {date}, {state})")
    
    with httpx.Client() as client:
        response = client.get(
            "https://reg.bom.gov.au/cgi-bin/climate/storms/get_storms.py",
            params={
                "lat": round(lat, 1),
                "lon": round(lon, 1),
                "date": date,
                "state": state,
                "unique_id": "langgraph_tutorial"
            },
            timeout=15.0
        )
    
    soup = BeautifulSoup(response.text, 'html.parser')
    thunderstorms = "No reports or observations"
    strong_wind = "No reports or observations"
    
    for row in soup.find_all('tr'):
        cells = row.find_all(['td', 'th'])
        if len(cells) >= 2:
            weather_type = cells[0].get_text(strip=True).lower()
            status = cells[1].get_text(strip=True)
            if 'thunderstorm' in weather_type:
                thunderstorms = status or "No reports or observations"
            elif 'wind' in weather_type:
                strong_wind = status or "No reports or observations"
    
    result = {"thunderstorms": thunderstorms, "strong_wind": strong_wind}
    print(f"    Result: {result}")
    return result

tools = [geocode_location, get_bom_weather]
print(f"Defined {len(tools)} tools")

## 4. Create Graph Nodes

Nodes are functions that receive state and return state updates.

In [None]:
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage, ToolMessage

# Create LLM with tools
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
llm_with_tools = llm.bind_tools(tools)

def weather_agent_node(state: ClaimState) -> dict:
    """Weather verification node with tool loop."""
    print("\n--- Weather Agent Node ---")
    
    messages = [
        SystemMessage(content="""You are a Weather Verification Agent.
        
STEPS:
1. Use geocode_location to get coordinates for the address
2. Use get_bom_weather to fetch weather observations
3. Return your findings

Always use your tools - never make up data."""),
        HumanMessage(content=f"Verify weather for {state['city']}, {state['state']}, {state['postcode']} on {state['date']}")
    ]
    
    # Tool loop - keep calling until no more tool calls
    coordinates = {}
    weather = {}
    
    for _ in range(5):  # Max iterations
        response = llm_with_tools.invoke(messages)
        messages.append(response)
        
        if not response.tool_calls:
            break
            
        # Execute each tool call
        for tool_call in response.tool_calls:
            tool_name = tool_call["name"]
            tool_args = tool_call["args"]
            
            # Find and execute the tool
            for t in tools:
                if t.name == tool_name:
                    result = t.invoke(tool_args)
                    
                    # Store results for state update
                    if tool_name == "geocode_location":
                        coordinates = result
                    elif tool_name == "get_bom_weather":
                        weather = result
                    
                    messages.append(ToolMessage(
                        content=json.dumps(result),
                        tool_call_id=tool_call["id"]
                    ))
                    break
    
    # Return state updates
    return {
        "latitude": coordinates.get("latitude"),
        "longitude": coordinates.get("longitude"),
        "thunderstorms": weather.get("thunderstorms"),
        "strong_wind": weather.get("strong_wind")
    }

print("Weather agent node defined!")

In [None]:
def eligibility_agent_node(state: ClaimState) -> dict:
    """Eligibility determination node (no tools - pure reasoning)."""
    print("\n--- Eligibility Agent Node ---")
    
    messages = [
        SystemMessage(content="""You are a Claims Eligibility Agent.

RULES:
- BOTH thunderstorms AND strong wind "Observed" = CONFIRMED → APPROVED
- Only ONE "Observed" = POSSIBLE → REVIEW
- Neither "Observed" = NOT_CAT → DENIED

Respond with JSON: {"cat_status": "...", "decision": "...", "confidence": 0.X, "reasoning": "..."}"""),
        HumanMessage(content=f"""Evaluate CAT eligibility:
- Location: {state['city']}, {state['state']}, {state['postcode']}
- Coordinates: ({state['latitude']}, {state['longitude']})
- Date: {state['date']}
- Thunderstorms: {state['thunderstorms']}
- Strong Wind: {state['strong_wind']}""")
    ]
    
    response = llm.invoke(messages)
    
    # Parse JSON from response
    try:
        # Extract JSON from response
        content = response.content
        if "```json" in content:
            content = content.split("```json")[1].split("```")[0]
        elif "```" in content:
            content = content.split("```")[1].split("```")[0]
        
        result = json.loads(content)
    except:
        result = {
            "cat_status": "UNKNOWN",
            "decision": "REVIEW",
            "confidence": 0.5,
            "reasoning": response.content
        }
    
    return {
        "cat_status": result.get("cat_status"),
        "decision": result.get("decision"),
        "confidence": result.get("confidence"),
        "reasoning": result.get("reasoning")
    }

print("Eligibility agent node defined!")

## 5. Build the Graph

In [None]:
# Create the graph
workflow = StateGraph(ClaimState)

# Add nodes
workflow.add_node("weather_agent", weather_agent_node)
workflow.add_node("eligibility_agent", eligibility_agent_node)

# Define edges (flow)
workflow.set_entry_point("weather_agent")
workflow.add_edge("weather_agent", "eligibility_agent")
workflow.add_edge("eligibility_agent", END)

# Compile the graph
app = workflow.compile()

print("Graph compiled!")
print("\nGraph structure:")
print("  START → weather_agent → eligibility_agent → END")

In [None]:
# Run the graph
initial_state = {
    "city": "Brisbane",
    "state": "QLD",
    "postcode": "4000",
    "date": "2025-03-07",
    "latitude": None,
    "longitude": None,
    "thunderstorms": None,
    "strong_wind": None,
    "cat_status": None,
    "decision": None,
    "confidence": None,
    "reasoning": None
}

print("Running graph...")
print("="*60)

final_state = app.invoke(initial_state)

print("\n" + "="*60)
print("FINAL STATE")
print("="*60)
print(json.dumps(final_state, indent=2))

## 6. Adding Conditional Routing

LangGraph supports conditional edges for complex workflows.

In [None]:
def should_escalate(state: ClaimState) -> str:
    """Determine if claim needs escalation based on weather data."""
    thunderstorms = state.get("thunderstorms", "")
    strong_wind = state.get("strong_wind", "")
    
    # Both observed = potential high-value claim, escalate
    if "Observed" in thunderstorms and "Observed" in strong_wind:
        return "priority_review"
    return "standard_review"

# Example of a more complex graph with conditional routing
# (Not executed - just showing the pattern)

complex_workflow_example = """
# Create graph with conditional routing
workflow = StateGraph(ClaimState)

workflow.add_node("weather_agent", weather_agent_node)
workflow.add_node("standard_review", standard_eligibility_node)
workflow.add_node("priority_review", priority_eligibility_node)

workflow.set_entry_point("weather_agent")

# Conditional edge based on weather severity
workflow.add_conditional_edges(
    "weather_agent",
    should_escalate,
    {
        "priority_review": "priority_review",
        "standard_review": "standard_review"
    }
)

workflow.add_edge("standard_review", END)
workflow.add_edge("priority_review", END)
"""

print("Conditional routing example:")
print(complex_workflow_example)

## 7. DSPy Integration for Prompt Optimization

In [None]:
import dspy

# Configure DSPy
lm = dspy.LM(model="openai/gpt-4o-mini", max_tokens=1000)
dspy.configure(lm=lm)

# Define DSPy signature for eligibility
class EligibilityClassifier(dspy.Signature):
    """Classify CAT event eligibility based on weather observations."""
    thunderstorms: str = dspy.InputField()
    strong_wind: str = dspy.InputField()
    decision: str = dspy.OutputField(desc="APPROVED, REVIEW, or DENIED")
    reasoning: str = dspy.OutputField(desc="Brief explanation")

classifier = dspy.ChainOfThought(EligibilityClassifier)

# Test the DSPy classifier
result = classifier(
    thunderstorms="Observed",
    strong_wind="No reports or observations"
)
print(f"DSPy result: {result.decision} - {result.reasoning}")

In [None]:
# Create a LangGraph node that uses DSPy
def dspy_eligibility_node(state: ClaimState) -> dict:
    """Eligibility node using DSPy-optimized classifier."""
    print("\n--- DSPy Eligibility Node ---")
    
    result = classifier(
        thunderstorms=state["thunderstorms"] or "Unknown",
        strong_wind=state["strong_wind"] or "Unknown"
    )
    
    # Map decision to cat_status
    cat_status_map = {
        "APPROVED": "CONFIRMED",
        "REVIEW": "POSSIBLE",
        "DENIED": "NOT_CAT"
    }
    
    return {
        "cat_status": cat_status_map.get(result.decision, "UNKNOWN"),
        "decision": result.decision,
        "confidence": 0.85,  # DSPy doesn't output confidence by default
        "reasoning": result.reasoning
    }

# Create graph with DSPy node
dspy_workflow = StateGraph(ClaimState)
dspy_workflow.add_node("weather_agent", weather_agent_node)
dspy_workflow.add_node("dspy_eligibility", dspy_eligibility_node)
dspy_workflow.set_entry_point("weather_agent")
dspy_workflow.add_edge("weather_agent", "dspy_eligibility")
dspy_workflow.add_edge("dspy_eligibility", END)

dspy_app = dspy_workflow.compile()

print("DSPy-enhanced graph compiled!")

In [None]:
# Run the DSPy-enhanced graph
print("Running DSPy-enhanced graph...")
print("="*60)

dspy_result = dspy_app.invoke(initial_state)

print("\n" + "="*60)
print("DSPY RESULT")
print("="*60)
print(f"Decision: {dspy_result['decision']}")
print(f"Reasoning: {dspy_result['reasoning']}")

## 8. MLFlow Integration

In [None]:
import mlflow

mlflow.set_experiment("LangGraph-CAT-Claims")

def process_with_tracking(city: str, state: str, postcode: str, date: str):
    """Process a claim with MLFlow tracking."""
    
    with mlflow.start_run(run_name=f"{city}-{date}"):
        # Log inputs
        mlflow.log_param("city", city)
        mlflow.log_param("state", state)
        mlflow.log_param("postcode", postcode)
        mlflow.log_param("date", date)
        mlflow.log_param("framework", "LangGraph")
        
        # Run the graph
        initial = {
            "city": city, "state": state, "postcode": postcode, "date": date,
            "latitude": None, "longitude": None, "thunderstorms": None,
            "strong_wind": None, "cat_status": None, "decision": None,
            "confidence": None, "reasoning": None
        }
        
        result = app.invoke(initial)
        
        # Log outputs
        mlflow.log_param("thunderstorms", result.get("thunderstorms"))
        mlflow.log_param("strong_wind", result.get("strong_wind"))
        mlflow.log_param("decision", result.get("decision"))
        
        if result.get("confidence"):
            mlflow.log_metric("confidence", result.get("confidence"))
        if result.get("latitude"):
            mlflow.log_metric("latitude", result.get("latitude"))
        if result.get("longitude"):
            mlflow.log_metric("longitude", result.get("longitude"))
        
        print(f"Logged: {city} → {result.get('decision')}")
        return result

# Process with tracking
tracked_result = process_with_tracking("Brisbane", "QLD", "4000", "2025-03-07")
print("\nRun 'mlflow ui' to view results")

## 9. Summary

In this tutorial, you learned:

1. **StateGraph**: Define workflows as explicit graphs
2. **TypedDict**: Schema for state that flows through nodes
3. **Nodes**: Functions that process and update state
4. **Edges**: Transitions between nodes (including conditional)
5. **Tool loops**: Implement within nodes for tool-calling agents
6. **DSPy Integration**: Use DSPy classifiers as nodes
7. **MLFlow**: Track experiments across runs

### When to Use LangGraph

- Complex workflows with conditional routing
- Need for checkpointing/resumption
- Visual debugging requirements
- Production systems with multiple steps