# SESSION 11 : Persistence in LangGraph | Time Travel in LangGraph 

https://youtu.be/_IPP7_Bi8uA?list=PLKnIA16_RmvYsvB8qkUQuJmJNuiCUJFPL

## Persistence

In **LangGraph**, **persistence** means the ability to **save and restore the state of a graph across runs**.


Normally, when you run a LangGraph workflow (nodes + edges + state updates), the execution state exists only in memory. If the system crashes or you want to continue later, everything would be lost. Persistence solves this by storing state externally.

### Key points about **persistence in LangGraph**:

1. **State Storage**

   * The graph’s **state** (variables, memory, messages, intermediate outputs, etc.) is stored in a backend.
   * Supported backends include **SQLite, PostgreSQL, Redis, MongoDB, Weaviate, DynamoDB**, or even simple file-based persistence.



2. **Checkpoints**

   * LangGraph uses **checkpoints** to capture the state at specific points in execution.
   * When you restart, the graph can **resume execution from the last checkpoint**, instead of starting over.



3. **Use Cases**

   * Long-running workflows (e.g., multi-turn conversations, document analysis).
   * Fault-tolerant execution (resuming after crashes).
   * Multi-agent systems where state continuity matters.
   * Human-in-the-loop workflows where you pause and later resume.



4. **How it looks in practice**
   You typically initialize a graph with a **persistence layer**:

   ```python
   from langgraph.checkpoint.sqlite import SqliteSaver

   # create persistence
   checkpoint = SqliteSaver.from_conn_string("checkpoints.db")

   # pass persistence when compiling graph
   app = graph.compile(checkpointer=checkpoint)
   ```

   Now the graph will automatically save state into `checkpoints.db` and reload it if you call it again with the same thread ID.

👉 In short: **persistence in LangGraph is about checkpointing and resuming execution by storing the graph state externally.**


### 1. **Short Term Memory**

* The **current working state** of the graph (variables, intermediate outputs, recent messages).
* Stored in memory but checkpointed so it can be **restored later**.

---

### 2. **Fault Tolerance**

* If the system crashes, you don’t lose progress.
* Because state is checkpointed, the graph can **resume from the last saved point** instead of restarting from scratch.

---

### 3. **Human in the Loop**

* Execution can **pause at a checkpoint**, wait for human input/approval, and then **resume later**.
* Useful for workflows that need **manual verification or decision-making**.

---

### 4. **Time Travel**

* You can **roll back** to an earlier checkpoint and re-run from there.
* Helps with debugging, auditing, or exploring “what-if” scenarios.

---

👉 In short:

* **Short Term Memory** = working state,


* **Fault Tolerance** = recover after crashes,


* **Human in the Loop** = pause + resume with human input,


* **Time Travel** = roll back to past checkpoints.


## Checkpointers

A **checkpointer** is a component that **saves and restores the graph’s state** at different points of execution.
Think of it as a "save slot" in a video game — whenever the graph reaches a stable point, the checkpointer saves the **state, memory, and progress**, so you can resume later from there.

### 🔹 How Checkpointers Work

1. **Save (Checkpointing)**

   * After a node executes (or at key stages), the checkpointer **stores the state** (variables, intermediate outputs, messages, etc.) into a backend.
   * This ensures that if execution stops, you don’t lose progress.



2. **Load (Resuming)**

   * When you rerun the graph with the same `thread_id` (or execution ID), the checkpointer **retrieves the last saved state** and continues from there instead of starting from scratch.

### 🔹 Example

```python
from langgraph.checkpoint.sqlite import SqliteSaver
from langgraph.graph import StateGraph

# Create a checkpointer (SQLite backend)
checkpointer = SqliteSaver.from_conn_string("checkpoints.db")

# Build a simple graph
graph = StateGraph()

# Compile graph with persistence
app = graph.compile(checkpointer=checkpointer)

# Run execution with a thread ID
app.invoke({"input": "Hello"}, config={"thread_id": "123"})
```

* The **checkpointer** here (`SqliteSaver`) saves state into `checkpoints.db`.
* If you run the graph again with `thread_id="123"`, it **resumes** from where it left off.

### 🔹 Types of Checkpointers

LangGraph provides different checkpointer implementations depending on the backend you want:

* **In-memory** → for temporary runs (lost after shutdown).


* **SQLite** → `SqliteSaver` (good for local development).


* **PostgreSQL / DynamoDB / MongoDB / Redis / Weaviate** → for production use cases.

✅ **In short:**
A **checkpointer** is the engine of persistence in LangGraph.
It decides **where** and **how** the state is saved, and makes it possible to pause, resume, and recover workflows reliably.


## Threads

In **LangGraph persistence**, a **thread** is simply an **independent execution context** (or “conversation/session”) that is tracked by a unique **thread ID**.

### 🔹 What is a Thread?

* Think of a **thread** as a **timeline of state** for a graph.


* Each thread has its own **state history**, managed via **checkpoints**.


* Multiple threads can coexist — just like having different chat sessions or workflows running in parallel.

### 🔹 Why Threads?

* Without threads, every run of your graph would overwrite the last state.


* Threads let you **separate different runs** of the same workflow.


* Example: If you build a chatbot, each user gets their **own thread**, so their history doesn’t mix with others.

### 🔹 How Threads Work

1. When you run a graph, you pass a **`thread_id`** in the config.

   ```python
   app.invoke({"input": "Hello"}, config={"thread_id": "user_123"})
   ```

   * The graph stores checkpoints for this specific thread.
   * Next time you call with the same `thread_id`, it **resumes** from the last checkpoint.



2. If you use a **different thread ID**, the graph starts **fresh** with no prior state.

   ```python
   app.invoke({"input": "Hi"}, config={"thread_id": "user_456"})
   ```

   → This runs independently from `user_123`.

### 🔹 Example: Multi-user chatbot

```python
# User A conversation
app.invoke({"input": "Hello"}, config={"thread_id": "user_A"})
app.invoke({"input": "How are you?"}, config={"thread_id": "user_A"})
# persists "Hello → How are you?"

# User B conversation (separate thread)
app.invoke({"input": "Hi"}, config={"thread_id": "user_B"})
# persists only "Hi"
```

* **User A’s thread** stores their entire chat history.


* **User B’s thread** has its own isolated history.


* Both are checkpointed via the persistence layer.

✅ **In short:**

* A **thread** = one independent execution context (like a chat session).


* **Checkpoints** are tied to threads, so you can pause/resume per thread.


* This makes persistence multi-user and scalable.


## Time-travel

**Time Travel** means the ability to **go back to a previous checkpointed state** of a graph and resume execution from there — instead of only continuing forward from the latest state.

In other words, every time the graph saves a checkpoint (via the checkpointer), that snapshot can be revisited later like a "save point" in a video game.

### 🔹 How It Works

* Each **thread** has multiple **checkpoints** saved over time.


* Normally, you resume from the **latest checkpoint**.


* With **time travel**, you can pick an **earlier checkpoint** and continue execution as if you were at that moment in the past.

### 🔹 Why is Time Travel Useful?



1. **Debugging**

   * If something goes wrong later in the workflow, you can rewind to an earlier state and step through the execution again.



2. **What-if Scenarios**

   * You can branch off from an earlier state and test a **different path of execution** without rerunning the whole workflow from the beginning.



3. **Auditing / Traceability**

   * Lets you review **past states** for compliance, reproducibility, or analysis.



4. **Human-in-the-Loop Corrections**

   * If a human gives incorrect input at some point, you can **rewind to before that input** and re-run with the corrected one.

---

### 🔹 Example

```python
# Suppose thread_id="user_123" has checkpoints at steps 1, 2, 3

# By default, resume from latest (step 3)
app.invoke({"input": "continue"}, config={"thread_id": "user_123"})

# With time travel → resume from step 1
app.invoke({"input": "redo"}, config={"thread_id": "user_123", "checkpoint_id": 1})
```

Here, you’ve effectively “traveled back in time” to step 1 and restarted execution from there.

---

✅ **In short:**
**Time travel** in LangGraph persistence lets you **rewind to an earlier checkpoint** and continue execution from that point.
It’s useful for **debugging, auditing, branching experiments, and correcting mistakes**.