```{contents}
```
## **Error Node in LangGraph**

An **Error Node** in LangGraph is a specialized control-flow construct used to **detect, capture, route, and recover from failures** during graph execution.
It enables LangGraph to behave like a **fault-tolerant distributed system** rather than a fragile pipeline.

---

### **1. Motivation**

LLM workflows interact with:

* External APIs
* Tools
* Databases
* Models with stochastic behavior

Failures are inevitable:

* Network timeouts
* Invalid tool outputs
* Model hallucinations
* Data corruption
* Resource exhaustion

LangGraph introduces **Error Nodes** to make failures **explicit, inspectable, and recoverable**.

---

### **2. Conceptual Model**

```
Normal Flow
   |
   v
[ Node A ] ---- error ----> [ Error Node ]
   |                          |
   v                          v
[ Node B ]              Recovery / Retry / Abort
```

An **Error Node** is executed **only when an exception or failure is raised** in upstream nodes.

---

### **3. Core Responsibilities**

| Responsibility   | Description                          |
| ---------------- | ------------------------------------ |
| Error Capture    | Intercept runtime exceptions         |
| State Enrichment | Attach error details to state        |
| Routing          | Decide next recovery path            |
| Recovery         | Apply fix, retry, fallback, or abort |
| Observability    | Log and checkpoint failure           |

---

### **4. Error Node Structure**

Error nodes receive the **same shared state**, augmented with error metadata:

```python
class State(TypedDict):
    input: str
    output: str
    error: str
    retries: int
```

---

### **5. Implementing an Error Node**

```python
def risky_node(state):
    if state["input"] == "fail":
        raise ValueError("Simulated failure")
    return {"output": state["input"].upper()}
```

```python
def error_handler(state):
    return {
        "error": "Recovered from failure",
        "retries": state.get("retries", 0) + 1
    }
```

```python
builder.add_node("risky", risky_node)
builder.add_node("error", error_handler)
```

---

### **6. Routing Errors**

LangGraph automatically routes exceptions to an error edge:

```python
builder.add_edge("risky", "success")
builder.add_edge("risky", "error", on_error=True)
```

---

### **7. Controlled Recovery with Conditions**

```python
def route_after_error(state):
    if state["retries"] < 3:
        return "risky"
    return END
```

```python
builder.add_conditional_edges("error", route_after_error, {
    "risky": "risky",
    END: END
})
```

This creates a **bounded retry loop**.

---

### **8. Common Error Handling Patterns**

| Pattern          | Purpose                     |
| ---------------- | --------------------------- |
| Retry            | Transient failures          |
| Fallback         | Use alternative model/tool  |
| Degrade          | Skip optional steps         |
| Abort            | Stop safely                 |
| Human Escalation | Manual intervention         |
| Circuit Breaker  | Disable unstable components |

---

### **9. Production Guarantees Enabled**

| Capability            | Benefit                 |
| --------------------- | ----------------------- |
| Checkpoint on failure | Safe resume             |
| State diff logging    | Postmortem analysis     |
| Replay                | Deterministic debugging |
| Bounded retries       | Prevent infinite loops  |
| Audit trails          | Compliance              |

---

### **10. Why Error Nodes Matter**

Without Error Nodes:

* Pipelines crash
* State is lost
* No recovery path

With Error Nodes:

* Systems self-heal
* Failures become first-class citizens
* Production stability is achieved

---

### **11. Mental Model**

An Error Node turns a graph from a **script** into a **resilient control system**:

> **Execution = Computation + Monitoring + Recovery**


### Demonstration

In [3]:
from langgraph.graph import StateGraph, END
from typing import TypedDict
import random
class State(TypedDict):
    input: str
    output: str
    error: str
    retries: int

def risky_node(state: State):
    if random.random() < 0.6:
        raise RuntimeError("Transient failure occurred")
    return {"output": state["input"].upper()}

def error_node(state: State):
    return {
        "error": "Recovered from failure",
        "retries": state.get("retries", 0) + 1
    }

def success_node(state: State):
    print("SUCCESS:", state["output"])
    return {}


In [5]:
builder = StateGraph(State)

builder.add_node("risky", risky_node)
builder.add_node("error", error_node)
builder.add_node("success", success_node)

builder.set_entry_point("risky")
builder.add_edge("risky", "success")
builder.add_edge("risky", "error")


<langgraph.graph.state.StateGraph at 0x21252907710>

In [6]:
def route_after_error(state: State):
    if state["retries"] < 3:
        return "risky"
    return END

builder.add_conditional_edges(
    "error",
    route_after_error,
    {"risky": "risky", END: END}
)


<langgraph.graph.state.StateGraph at 0x21252907710>

In [8]:
graph = builder.compile()
result = graph.invoke({
    "input": "langgraph",
    "output": None,
    "error": None,
    "retries": 0
})

print("\nFinal State:", result)



SUCCESS: LANGGRAPH


RuntimeError: Transient failure occurred