```{contents}
```
## **Exception Capture in LangGraph**

Exception capture in LangGraph is the **systematic detection, propagation, and handling of runtime failures inside a graph execution**, allowing workflows to **recover, retry, branch, escalate, or terminate safely** without corrupting state or crashing the system.

---

### **1. Why Exception Capture Matters**

LLM systems interact with unreliable components:

| Failure Source | Example                         |
| -------------- | ------------------------------- |
| LLM            | Timeouts, invalid JSON          |
| Tools          | API failure, malformed response |
| State          | Missing keys, schema violation  |
| Infrastructure | Network errors, resource limits |

Without structured exception handling, workflows become **fragile** and **non-recoverable**.

---

### **2. Failure Model in LangGraph**

LangGraph treats exceptions as **first-class control-flow events**.

Execution becomes:

> **Node → Try → Catch → Decide → Continue / Recover / Terminate**

---

### **3. Basic Exception Capture at Node Level**

```python
def safe_llm(state):
    try:
        response = llm.invoke(state["prompt"])
        return {"result": response}
    except Exception as e:
        return {"error": str(e)}
```

State now carries the failure signal.

---

### **4. Routing on Exceptions**

```python
def router(state):
    if "error" in state:
        return "handle_error"
    return "next_step"
```

```python
builder.add_conditional_edges(
    "safe_llm",
    router,
    {
        "handle_error": "error_handler",
        "next_step": "process_result"
    }
)
```

This transforms exceptions into **explicit execution paths**.

---

### **5. Dedicated Error Handling Node**

```python
def error_handler(state):
    log(state["error"])
    return {"recovered": True}
```

Error nodes can:

* Log
* Alert
* Modify state
* Trigger fallback
* Escalate to human

---

### **6. Retry Pattern**

```python
def retry_router(state):
    attempts = state.get("attempts", 0)
    if "error" in state and attempts < 3:
        return "retry"
    return "fail"
```

```python
builder.add_edge("error_handler", "safe_llm")
```

State drives bounded retries.

---

### **7. Fallback Pattern**

```python
def primary_model(state):
    raise RuntimeError("Primary failed")

def fallback_model(state):
    return {"result": "Recovered with backup"}
```

Graph structure:

```
Primary → ErrorHandler → Fallback → Continue
```

---

### **8. Exception Capture with Human-in-the-Loop**

```python
def escalate(state):
    notify_human(state["error"])
    return {"awaiting_review": True}
```

Used for **compliance, finance, medical, security** workflows.

---

### **9. Checkpointing & Recovery**

When exception occurs:

* Current state is **checkpointed**
* Graph can be **resumed after fix**
* No loss of execution history

```python
graph.invoke(input, config={"recursion_limit": 10})
```

---

### **10. Production-Grade Failure Policies**

| Policy               | Behavior               |
| -------------------- | ---------------------- |
| Retry                | Auto recover           |
| Fallback             | Switch models/tools    |
| Circuit Breaker      | Stop repeated failures |
| Escalation           | Human review           |
| Graceful Degradation | Partial results        |
| Kill-Switch          | Immediate stop         |

---

### **11. Complete Example**

```python
from langgraph.graph import StateGraph, END

class State(TypedDict):
    data: str
    error: str
    attempts: int

def risky(state):
    if state["attempts"] < 2:
        raise ValueError("Fail")
    return {"data": "Success"}

def safe(state):
    try:
        return risky(state)
    except Exception as e:
        return {"error": str(e), "attempts": state.get("attempts", 0) + 1}

def route(state):
    if "error" in state:
        return "safe"
    return END

builder = StateGraph(State)
builder.add_node("safe", safe)
builder.set_entry_point("safe")
builder.add_conditional_edges("safe", route, {"safe": "safe", END: END})

graph = builder.compile()
print(graph.invoke({"attempts": 0}))
```

---

### **12. Mental Model**

> **Exceptions become state.
> State drives control flow.
> Control flow guarantees system stability.**

This is the foundation of **robust, autonomous, production-grade LLM systems**.


### Demonstration

In [3]:
from typing import TypedDict, Optional
from langgraph.graph import StateGraph, END

# ---------------- State ----------------

class State(TypedDict):
    attempts: int
    result: Optional[str]
    error: Optional[str]

MAX_RETRIES = 2

# ---------------- Risky Operation ----------------

def risky_node(state: State):
    if state["attempts"] < MAX_RETRIES:
        raise ValueError("Simulated failure")
    return {"result": "Task succeeded", "error": None}

# ---------------- Safe Wrapper ----------------

def safe_node(state: State):
    try:
        return risky_node(state)
    except Exception as e:
        return {
            "error": str(e),
            "attempts": state["attempts"] + 1,
            "result": None
        }

# ---------------- Router ----------------

def router(state: State):
    if state["error"] is not None and state["attempts"] <= MAX_RETRIES:
        return "retry"
    return END

# ---------------- Graph ----------------

builder = StateGraph(State)
builder.add_node("safe", safe_node)
builder.set_entry_point("safe")

builder.add_conditional_edges(
    "safe",
    router,
    {
        "retry": "safe",
        END: END
    }
)

graph = builder.compile()

# ---------------- Execution ----------------

output = graph.invoke({"attempts": 0, "result": None, "error": None})
print(output)

{'attempts': 2, 'result': 'Task succeeded', 'error': None}
