```{contents}
```
## **Replay in LangGraph**

**Replay** in LangGraph is the capability to **re-execute a previously run workflow from a saved checkpoint**, using the **exact historical state and control flow**, without recomputing earlier steps.
It enables **debugging, auditing, failure recovery, experimentation, and compliance** for production-grade LLM systems.

---

### **1. Why Replay Exists**

LLM workflows are **long, expensive, and stateful**.
If a failure occurs after 15 steps, recomputing from step 1 is inefficient and unsafe.

Replay allows you to:

| Purpose          | Benefit                        |
| ---------------- | ------------------------------ |
| Debugging        | Inspect exact execution        |
| Failure recovery | Resume without restarting      |
| Auditing         | Reconstruct decisions          |
| Experimentation  | Test new logic from same state |
| Compliance       | Reproduce past behavior        |

---

### **2. Conceptual Model**

LangGraph treats execution as a **timeline of immutable state snapshots**.

```
State₀ → State₁ → State₂ → State₃ → ... → Stateₙ
           ↑
         Replay from here
```

Each snapshot is a **checkpoint**.

---

### **3. How Replay Works Internally**

Replay is built on three components:

| Component        | Role                           |
| ---------------- | ------------------------------ |
| Checkpoint Store | Saves state after each node    |
| Thread ID        | Identifies workflow instance   |
| Execution Engine | Restores and resumes execution |

At runtime:

1. Node executes
2. State updated
3. Checkpoint persisted
4. Execution pointer advanced

---

### **4. Enabling Replay (with Checkpointing)**

```python
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import StateGraph

checkpointer = MemorySaver()

graph = builder.compile(checkpointer=checkpointer)
```

Every node execution now creates a **checkpoint**.

---

### **5. Running a Graph with a Thread ID**

```python
result = graph.invoke(
    {"input": "Analyze report"},
    config={"thread_id": "case-42"}
)
```

This assigns the execution to a **persistent thread**.

---

### **6. Inspecting Execution History**

```python
for checkpoint in checkpointer.list("case-42"):
    print(checkpoint)
```

Each checkpoint contains:

* Node name
* Input state
* Output state
* Timestamp

---

### **7. Replaying from a Checkpoint**

```python
graph.invoke(
    None,
    config={
        "thread_id": "case-42",
        "resume_from": checkpoint_id
    }
)
```

LangGraph restores the saved state and continues execution **from that exact point**.

---

### **8. Practical Use Cases**

| Scenario           | Use                                    |
| ------------------ | -------------------------------------- |
| Model bug          | Replay execution and inspect reasoning |
| System crash       | Resume from last checkpoint            |
| Human correction   | Modify state and continue              |
| What-if testing    | Change logic from mid-run              |
| Audit & compliance | Reconstruct decisions                  |

---

### **9. Production Safety Controls**

| Control                 | Purpose                     |
| ----------------------- | --------------------------- |
| State immutability      | Prevents corruption         |
| Versioned checkpoints   | Supports schema changes     |
| Deterministic execution | Guarantees identical replay |
| Human approval          | Safe resumption             |
| Audit logging           | Compliance traceability     |

---

### **10. Mental Model**

Replay turns LangGraph into a **time machine for AI systems**:

> You can stop, inspect, rewind, modify, and continue any execution.

---

### **11. Comparison: Replay vs Rerun**

| Feature | Replay | Rerun |
|--------|-------|
Reuses past state | Yes | No |
Reuses tool outputs | Yes | No |
Cost | Low | High |
Determinism | Guaranteed | Not guaranteed |
Debugging value | High | Low |

---

### **12. When Replay Is Essential**

Replay is required in:

* Autonomous agents
* Multi-step planning
* Long-running workflows
* Regulated systems (finance, healthcare, legal)
* High-cost LLM pipelines

### Demonstration


In [1]:
from typing import TypedDict
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver


class State(TypedDict):
    count: int


def increment(state: State) -> State:
    print(f"Incrementing: {state['count']} -> {state['count'] + 1}")
    return {"count": state["count"] + 1}

def stop_if_done(state: State):
    if state["count"] >= 3:
        return END
    return "increment"



In [2]:
builder = StateGraph(State)

builder.add_node("increment", increment)
builder.set_entry_point("increment")

builder.add_conditional_edges(
    "increment",
    stop_if_done,
    {
        "increment": "increment",
        END: END
    }
)


<langgraph.graph.state.StateGraph at 0x2191e89ba10>

In [3]:
checkpointer = MemorySaver()
graph = builder.compile(checkpointer=checkpointer)


In [4]:
graph.invoke(
    {"count": 0},
    config={"thread_id": "demo"}
)


Incrementing: 0 -> 1
Incrementing: 1 -> 2
Incrementing: 2 -> 3


{'count': 3}

In [None]:
checkpoints = list(
    checkpointer.list(
        {"configurable": {"thread_id": "demo"}}
    )
)

for c in checkpoints:
    print(c.checkpoint)


{'v': 4, 'ts': '2025-12-28T09:11:56.433050+00:00', 'id': '1f0e3cd4-20b6-6a07-8003-a86733ef495f', 'channel_versions': {'__start__': '00000000000000000000000000000002.0.08965396517704971', 'count': '00000000000000000000000000000005.0.7207157692347761', 'branch:to:increment': '00000000000000000000000000000005.0.7207157692347761'}, 'versions_seen': {'__input__': {}, '__start__': {'__start__': '00000000000000000000000000000001.0.8163405563058226'}, 'increment': {'branch:to:increment': '00000000000000000000000000000004.0.15693087536000916'}}, 'updated_channels': ['count'], 'channel_values': {'count': 3}}
{'v': 4, 'ts': '2025-12-28T09:11:56.431611+00:00', 'id': '1f0e3cd4-20b3-61d0-8002-73072e04fa44', 'channel_versions': {'__start__': '00000000000000000000000000000002.0.08965396517704971', 'count': '00000000000000000000000000000004.0.15693087536000916', 'branch:to:increment': '00000000000000000000000000000004.0.15693087536000916'}, 'versions_seen': {'__input__': {}, '__start__': {'__start__': 