```{contents}
```
## Replay Mode

**Replay Mode** in LangGraph is a **deterministic re-execution mechanism** that allows you to **reproduce, inspect, and debug past graph runs** by restoring execution from saved **checkpoints** and **state snapshots**.
It is fundamental for **debugging, auditing, evaluation, compliance, and system reliability** in production LLM systems.

---

### **1. Motivation and Intuition**

LLM workflows are:

* **Non-deterministic** (model randomness, external tools)
* **Stateful** (long-lived conversations, agents)
* **Distributed** (multi-step pipelines)

Replay Mode solves:

| Problem                 | Solution                |
| ----------------------- | ----------------------- |
| Non-reproducible bugs   | Deterministic replay    |
| Hidden failures         | Step-by-step inspection |
| Production incidents    | Post-mortem analysis    |
| Compliance requirements | Verifiable execution    |

**Core idea:**

> Capture *every important execution event* →
> Restore the exact state →
> Re-run from any point.

---

### **2. Conceptual Architecture**

```
Execution Run
   │
   ├─ State Snapshots
   ├─ Node Inputs / Outputs
   ├─ Tool Calls
   ├─ LLM Responses
   └─ Metadata
        ↓
Checkpoint Store
        ↓
Replay Engine
        ↓
Deterministic Re-Execution
```

---

### **3. What Is Captured During Execution**

| Artifact             | Purpose             |
| -------------------- | ------------------- |
| State snapshot       | Full graph state    |
| Node transitions     | Execution order     |
| LLM inputs / outputs | Reproduce reasoning |
| Tool invocations     | Side-effect trace   |
| Errors / exceptions  | Debugging           |
| Timestamps           | Timing analysis     |
| Metadata             | Audit, metrics      |

---

### **4. Enabling Replay in LangGraph**

Replay relies on **checkpointing**.

```python
from langgraph.checkpoint.sqlite import SqliteSaver
from langgraph.graph import StateGraph

checkpointer = SqliteSaver("checkpoints.db")

graph = builder.compile(checkpointer=checkpointer)
```

Every execution is now **persisted**.

---

### **5. Running a Graph Normally**

```python
result = graph.invoke(
    input_state,
    config={"thread_id": "session-42"}
)
```

This creates a **versioned execution history** for `session-42`.

---

### **6. Replaying an Execution**

```python
replayed = graph.invoke(
    None,
    config={
        "thread_id": "session-42",
        "replay": True
    }
)
```

The engine:

1. Loads last checkpoint
2. Restores state
3. Re-executes deterministically

---

### **7. Partial Replay (Resume From Step N)**

```python
replayed = graph.invoke(
    None,
    config={
        "thread_id": "session-42",
        "checkpoint_id": "step_7"
    }
)
```

This resumes execution from **exactly that point**.

---

### **8. Practical Use Cases**

| Use Case     | Benefit                   |
| ------------ | ------------------------- |
| Debugging    | Reproduce production bugs |
| Evaluation   | Compare model changes     |
| Compliance   | Full audit trail          |
| Human review | Inspect decisions         |
| Recovery     | Resume after crash        |

---

### **9. Replay vs Normal Execution**

| Feature       | Normal            | Replay                    |
| ------------- | ----------------- | ------------------------- |
| LLM calls     | Live              | Recorded or deterministic |
| Tools         | Live              | Simulated or replayed     |
| State         | Fresh             | Restored                  |
| Execution     | Non-deterministic | Deterministic             |
| Debuggability | Limited           | Full                      |

---

### **10. Production Best Practices**

* Always enable checkpointing in production
* Store checkpoints in durable DB (Postgres / S3)
* Use thread IDs per user/session
* Enable replay for compliance-critical systems
* Log all side-effects for safe replay

---

### **11. Mental Model**

Replay Mode turns LangGraph from a **workflow engine** into a **time machine** for your AI system:

> **Any execution → Any moment → Fully reproducible**




### Demonstration

In [4]:
# ---------------------- LangGraph Replay Mode : Complete Demo ----------------------

from typing import TypedDict
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.sqlite import SqliteSaver
import sqlite3

# ---------------------- 1. Define State ----------------------

class State(TypedDict):
    counter: int

# ---------------------- 2. Define Nodes ----------------------

def increment(state: State):
    return {"counter": state["counter"] + 1}

def stop_check(state: State):
    return {"done": state["counter"] >= 3}

# ---------------------- 3. Build Graph ----------------------

builder = StateGraph(State)

builder.add_node("inc", increment)
builder.add_node("check", stop_check)

builder.set_entry_point("inc")
builder.add_edge("inc", "check")

builder.add_conditional_edges(
    "check",
    lambda s: END if s["done"] else "inc",
    {"inc": "inc", END: END}
)

# ---------------------- 4. Enable Checkpointing ----------------------

conn = sqlite3.connect("replay.db",check_same_thread=False)
checkpointer = SqliteSaver(conn)
graph = builder.compile(checkpointer=checkpointer)

# ---------------------- 5. Normal Execution ----------------------

print("NORMAL RUN:")
out1 = graph.invoke({"counter": 0}, config={"thread_id": "demo-session"})
print(out1)

# ---------------------- 6. Replay Execution ----------------------

print("\nREPLAY RUN:")
out2 = graph.invoke(None, config={"thread_id": "demo-session", "replay": True})
print(out2)

# ---------------------- 7. Resume From Last Checkpoint ----------------------

print("\nRESUME FROM CHECKPOINT:")
out3 = graph.invoke(None, config={"thread_id": "demo-session"})
print(out3)


NORMAL RUN:
{'counter': 3}

REPLAY RUN:
{'counter': 3}

RESUME FROM CHECKPOINT:
{'counter': 3}
