```{contents}
```
## **Rollback in LangGraph**

**Rollback** in LangGraph is the ability to **revert a running or completed graph execution to a previous consistent state** and resume execution from that point.
It is a core reliability mechanism for building **fault-tolerant, human-supervised, production-grade LLM systems**.

---

### **1. Why Rollback Is Necessary**

LLM workflows are **non-deterministic** and operate over long, multi-step processes.
Failures occur due to:

* model hallucinations
* tool failures
* invalid state updates
* policy violations
* human corrections

Without rollback, the entire run must be discarded.
With rollback, the system becomes **recoverable and controllable**.

---

### **2. Conceptual Model**

LangGraph treats execution as a sequence of **state transitions**:

```
S₀ → S₁ → S₂ → S₃ → S₄ → ...
```

Each `Sᵢ` is a **checkpointed state snapshot**.

**Rollback = revert to any previous Sᵢ and continue execution**

```
S₀ → S₁ → S₂ → S₃ → S₄
            ↑
         rollback here
```

---

### **3. Components That Enable Rollback**

| Component        | Purpose                   |
| ---------------- | ------------------------- |
| Checkpoint Store | Persists snapshots        |
| State Serializer | Saves state consistently  |
| Thread ID        | Identifies execution      |
| Versioned State  | Enables historical access |
| Execution Engine | Replays from snapshot     |

---

### **4. How Rollback Works Internally**

1. Every node emits a **partial state update**
2. Reducer merges update into current state
3. Updated state is **checkpointed**
4. If failure occurs:

   * last valid snapshot is loaded
   * execution pointer resets
   * graph resumes from the selected node

---

### **5. Enabling Checkpointing**

```python
from langgraph.checkpoint.sqlite import SqliteSaver

checkpointer = SqliteSaver("checkpoints.db")

graph = builder.compile(checkpointer=checkpointer)
```

---

### **6. Triggering Rollback**

Rollback happens when:

* a node raises an exception
* human rejects an intermediate result
* a policy condition fails
* retry budget is exceeded

Example with controlled failure:

```python
def risky_node(state):
    if state["value"] > 3:
        raise Exception("Invalid state")
    return {"value": state["value"] + 1}
```

LangGraph automatically rolls back to the last checkpoint.

---

### **7. Manual Rollback & Resume**

Using **thread_id**:

```python
config = {"configurable": {"thread_id": "session-42"}}
graph.invoke({"value": 0}, config=config)
```

If the run crashes, resume later:

```python
graph.invoke(None, config=config)
```

LangGraph reloads the last checkpoint and continues.

---

### **8. Human-in-the-Loop Rollback**

```python
def review_node(state):
    if not state["approved"]:
        raise Exception("Rejected by human")
    return state
```

A rejection automatically triggers rollback.

---

### **9. Production Use Cases**

| Use Case           | Why Rollback Matters       |
| ------------------ | -------------------------- |
| Autonomous agents  | Recover from bad plans     |
| Compliance systems | Undo unsafe decisions      |
| Long pipelines     | Avoid full restarts        |
| Human review       | Correct mistakes           |
| Multi-agent debate | Rewind incorrect reasoning |

---

### **10. Safety Guarantees**

| Guarantee      | Meaning                |
| -------------- | ---------------------- |
| Consistency    | No corrupted state     |
| Durability     | Survives crashes       |
| Recoverability | Resume anytime         |
| Auditability   | Full execution history |

---

### **11. Comparison With Traditional Systems**

| Traditional Pipeline | LangGraph       |
| -------------------- | --------------- |
| No rollback          | Full rollback   |
| Crash = lost work    | Crash = resume  |
| Stateless            | Stateful        |
| Hard to debug        | Fully traceable |

---

### **12. Mental Model**

> **LangGraph execution behaves like a database transaction log.**
> Each step is logged, reversible, and replayable.

This is what allows **safe autonomy**.


### Demonstration

In [1]:
from typing import TypedDict

class State(TypedDict):
    value: int


def increment(state: State):
    return {"value": state["value"] + 1}

def risky(state: State):
    if state["value"] == 3:
        raise RuntimeError("Simulated failure")
    return state


In [2]:
from langgraph.graph import StateGraph, END

builder = StateGraph(State)

builder.add_node("inc", increment)
builder.add_node("risk", risky)

builder.set_entry_point("inc")
builder.add_edge("inc", "risk")
builder.add_edge("risk", "inc")


<langgraph.graph.state.StateGraph at 0x1ddaff283e0>

In [3]:
from langgraph.checkpoint.memory import InMemorySaver

checkpointer = InMemorySaver()
graph = builder.compile(checkpointer=checkpointer)


In [4]:
config = {"configurable": {"thread_id": "session-1"}}

try:
    graph.invoke({"value": 0}, config=config)
except:
    print("Execution failed. Rolled back.")


Execution failed. Rolled back.
