```{contents}
```
## Rollback Strategy

A **rollback strategy** in LangGraph is a **controlled mechanism for reverting a workflow’s execution state to a previously known safe point** when errors, inconsistencies, unsafe outputs, or policy violations occur.
It is a foundational requirement for **fault tolerance, reliability, and production-grade safety** in stateful LLM systems.

---

### **1. Motivation: Why Rollback is Necessary**

LLM workflows are:

* **Non-deterministic**
* **Long-running**
* **Multi-step**
* **Tool-integrated**
* **Stateful**

Failures can occur at any stage:

| Failure Source      | Example                         |
| ------------------- | ------------------------------- |
| Model hallucination | Incorrect financial computation |
| Tool failure        | API outage                      |
| Data corruption     | Invalid state mutation          |
| Policy violation    | Unsafe content                  |
| Human rejection     | Reviewer disapproves            |

Without rollback, the entire execution must restart.
With rollback, the system **recovers from the last correct state**.

---

### **2. Core Concepts of Rollback in LangGraph**

| Concept        | Role                               |
| -------------- | ---------------------------------- |
| Checkpoint     | Snapshot of state at safe boundary |
| State Version  | Ordered history of state           |
| Recovery Point | Where execution can resume         |
| Replay         | Re-execution from checkpoint       |
| Compensation   | Undo side-effects of failed steps  |

LangGraph enables rollback using its **built-in state persistence and checkpointing architecture**.

---

### **3. Where Rollback Fits in Execution Lifecycle**

```
Invoke → Execute Node A → Checkpoint → Execute Node B → Checkpoint
        → Execute Node C → ❌ Failure
                          ↓
                  Rollback to Checkpoint after Node B
                          ↓
                     Resume Execution
```

---

### **4. Implementing Rollback with Checkpoints**

### **Step 1 — Enable Persistence**

```python
from langgraph.checkpoint.sqlite import SqliteSaver

checkpointer = SqliteSaver("state.db")
graph = builder.compile(checkpointer=checkpointer)
```

This activates **automatic checkpointing**.

---

### **Step 2 — Define Recovery Logic**

```python
def validate(state):
    if not is_valid(state["result"]):
        raise ValueError("Invalid output")
    return {}
```

```python
builder.add_node("validate", validate)
```

Any exception triggers rollback to the previous checkpoint.

---

### **Step 3 — Resume from Failure**

```python
graph.invoke(input, config={"thread_id": "order-42"})
```

LangGraph loads the last good checkpoint for that `thread_id` and continues.

---

### **5. Manual Rollback Control**

You can explicitly revert execution:

```python
checkpointer.rollback(thread_id="order-42")
```

Then resume:

```python
graph.invoke(input, config={"thread_id": "order-42"})
```

---

### **6. Rollback + Human-in-the-Loop**

```python
def approval(state):
    if not state["approved"]:
        raise Exception("Rejected by human")
```

Rejection → rollback → human correction → resume.

This enables **safe governance loops**.

---

### **7. Compensation Pattern (Side-Effect Safety)**

For irreversible actions:

| Step           | Safety Mechanism     |
| -------------- | -------------------- |
| Payment charge | Reversal transaction |
| Email sent     | Apology / retraction |
| DB update      | Compensating write   |

Model in graph:

```
Action → Validate → Commit
         ↓ fail
       Compensate → Rollback
```

---

### **8. Production Guarantees Provided**

| Property        | Achieved by Rollback    |
| --------------- | ----------------------- |
| Consistency     | State always valid      |
| Fault tolerance | Failures recoverable    |
| Auditability    | Full execution history  |
| Safety          | No permanent corruption |
| Resilience      | Long-running stability  |

---

### **9. Typical Enterprise Use Cases**

* Financial workflows
* Autonomous agents
* Compliance pipelines
* Multi-agent systems
* Human-reviewed automation
* Mission-critical LLM services

---

### **10. Mental Model**

LangGraph rollback behaves like a **transaction system for LLM workflows**:

> **Try → Validate → Commit
> Fail → Revert → Fix → Retry**


### Demonstration

In [3]:
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.sqlite import SqliteSaver
from typing import TypedDict

# -------------------------------
# 1. Define Shared State
# -------------------------------
class State(TypedDict):
    value: int
    approved: bool

# -------------------------------
# 2. Define Nodes
# -------------------------------
def step_a(state):
    print("Step A")
    return {"value": state["value"] + 1}

def step_b(state):
    print("Step B")
    return {"value": state["value"] + 1}

def validate(state):
    print("Validating...")
    if state["value"] == 2:
        raise Exception("Validation failed!")  # triggers rollback
    return {}

# -------------------------------
# 3. Build Graph
# -------------------------------
builder = StateGraph(State)
builder.add_node("A", step_a)
builder.add_node("B", step_b)
builder.add_node("V", validate)

builder.set_entry_point("A")
builder.add_edge("A", "B")
builder.add_edge("B", "V")
builder.add_edge("V", END)

# -------------------------------
# 4. Enable Checkpointing
# -------------------------------
import sqlite3
checkpointer = SqliteSaver(sqlite3.connect("rollback_demo.db",check_same_thread=False))
graph = builder.compile(checkpointer=checkpointer)

# -------------------------------
# 5. First Run (Will Fail)
# -------------------------------
print("\n--- First Run ---")
try:
    graph.invoke({"value": 0, "approved": True}, config={"thread_id": "job1"})
except Exception as e:
    print("Failure:", e)

# -------------------------------
# 6. Fix State & Resume (Rollback in Action)
# -------------------------------
print("\n--- Resume After Fix ---")
graph.invoke({"value": 5, "approved": True}, config={"thread_id": "job1"})



--- First Run ---
Step A
Step B
Validating...
Failure: Validation failed!

--- Resume After Fix ---
Step A
Step B
Validating...


{'value': 7, 'approved': True}