```{contents}
```
## **Checkpoint Store in LangGraph**

A **Checkpoint Store** in LangGraph is a **persistent execution memory system** that records the full internal state of a graph at key execution points, enabling **recovery, replay, inspection, debugging, and long-running autonomous workflows**.

---

### **1. Motivation**

LLM workflows are:

* long-running
* non-deterministic
* error-prone
* stateful
* often human-supervised

A crash or interruption without checkpoints causes **complete loss of progress**.
Checkpointing transforms the graph into a **fault-tolerant distributed system**.

---

### **2. Conceptual Model**

```
Execution Step → State Snapshot → Checkpoint Store
                              ↓
                         Recovery / Replay
```

Each checkpoint stores:

* **Thread ID**
* **Execution step**
* **Full state object**
* **Graph position**
* **Timestamp**
* **Metadata**

---

### **3. Architecture**

```
LangGraph Runtime
   |
Checkpoint Interface
   |
-------------------------------------
| Memory | Redis | Postgres | S3 |
-------------------------------------
```

The store is **pluggable** and **backend-agnostic**.

---

### **4. Checkpoint Life Cycle**

| Stage   | Description              |
| ------- | ------------------------ |
| Create  | After node execution     |
| Persist | Serialized to store      |
| Load    | Retrieved during resume  |
| Replay  | Re-execute from snapshot |
| Inspect | Debugging / monitoring   |
| Expire  | Cleanup old checkpoints  |

---

### **5. Why Checkpoints Are Fundamental**

| Problem           | Checkpoint Solution   |
| ----------------- | --------------------- |
| Process crash     | Resume execution      |
| LLM hallucination | Replay safely         |
| Human correction  | Inject state          |
| Compliance        | Full audit trail      |
| Scalability       | Distributed execution |

---

### **6. Basic Usage Example**

```python
from langgraph.checkpoint.sqlite import SqliteSaver

checkpointer = SqliteSaver.from_conn_string("checkpoints.db")

graph = builder.compile(checkpointer=checkpointer)

result = graph.invoke({"input": "Analyze sales data"}, 
                      config={"thread_id": "run-001"})
```

---

### **7. Recovery & Resume**

```python
graph.invoke(
    None,
    config={"thread_id": "run-001"}
)
```

LangGraph automatically loads the **latest checkpoint** and continues.

---

### **8. Human-in-the-Loop with Checkpoints**

```python
from langgraph.graph import interrupt

def review_node(state):
    feedback = interrupt("Approve output?")
    return {"approved": feedback}
```

If execution halts, the state is saved.
After human input, execution resumes from the same point.

---

### **9. Production Patterns**

| Pattern             | Usage                    |
| ------------------- | ------------------------ |
| Disaster recovery   | Resume after crash       |
| Safe agents         | Rollback bad actions     |
| Debugging           | Inspect internal state   |
| Compliance          | Immutable execution logs |
| Multi-day workflows | Persist across sessions  |

---

### **10. Checkpoint vs Memory**

| Feature     | Checkpoint          | Memory            |
| ----------- | ------------------- | ----------------- |
| Purpose     | Execution recovery  | Knowledge storage |
| Granularity | Full state snapshot | Selective facts   |
| Lifetime    | Short to medium     | Long term         |
| Usage       | Runtime reliability | Reasoning quality |

---

### **11. Backends**

| Backend   | Use Case    |
| --------- | ----------- |
| SQLite    | Local dev   |
| Redis     | High speed  |
| Postgres  | Production  |
| S3 / Blob | Large scale |

---

### **12. Advanced Capabilities**

* Versioned state history
* Time-travel debugging
* Branching execution
* Distributed execution synchronization
* Auditable AI pipelines

---

### **13. Mental Model**

> A LangGraph Checkpoint Store is the **transaction log of an autonomous AI system**.

Without it, production-grade agents are **unsafe and fragile**.


### Demonstration

In [2]:
# One-cell LangGraph Checkpoint Demonstration

from typing import TypedDict
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import InMemorySaver
import time

# ---------------------------
# 1. Define State Schema
# ---------------------------

class State(TypedDict):
    step: int
    message: str

# ---------------------------
# 2. Define Nodes
# ---------------------------

def work_node(state: State):
    time.sleep(1)  # simulate long computation
    new_step = state["step"] + 1
    print(f"Working... step {new_step}")
    return {"step": new_step, "message": f"Progress {new_step}"}

def router(state: State):
    if state["step"] >= 3:
        return END
    return "work"

# ---------------------------
# 3. Build Graph
# ---------------------------

builder = StateGraph(State)
builder.add_node("work", work_node)
builder.set_entry_point("work")
builder.add_conditional_edges("work", router, {"work": "work", END: END})

# ---------------------------
# 4. Attach Checkpoint Store
# ---------------------------

checkpointer = InMemorySaver()
graph = builder.compile(checkpointer=checkpointer)

# ---------------------------
# 5. Execute (Run 1)
# ---------------------------

print("Starting execution...")
graph.invoke({"step": 0, "message": ""}, config={"thread_id": "demo-run"})


Starting execution...
Working... step 1
Working... step 2
Working... step 3


{'step': 3, 'message': 'Progress 3'}