# Tutorial 8 - State Management (Checkpoints and Time Travel)

## Where You Are in the Learning Journey

```
 Tutorials 1-5      Tutorial 6         Tutorial 7         Tutorial 8
 RAG Fundamentals   ReAct Agent        Reflection         State
 (the retrieval     (T6)               Self-Correction    Management
  pipeline)                            (T7)               (you are here)
```

**What changed from Tutorial 7:** the agent loop is extended with a
**StateManager** that snapshots agent state at every step.
You can pause the agent, inspect any past state, and rewind to any checkpoint.

**What you will learn in this tutorial:**
- What agent state is and why it matters for long-running tasks
- What a checkpoint is: a labelled snapshot of the agent's progress
- How 'time travel' works: restoring the agent to a previous state
- How to use human-in-the-loop review before the agent continues
- Practical use cases for checkpoints (debugging, auditing, recovery)

**Prerequisites:** Tutorials 6 and 7 (understand ReAct and reflection). Python basics.

```mermaid
flowchart LR
    S0[Initial State] --> CP0[Checkpoint 0: start]
    CP0 --> A1[Step 1: retrieve]
    A1 --> CP1[Checkpoint 1: after_step_1]
    CP1 --> A2[Step 2: reason]
    A2 --> CP2[Checkpoint 2: after_step_2]
    CP2 -- Error --> CP1
    CP1 -- Rewind --> A2b[Step 2 retry]
```


## Why Does an Agent Need State Management?

### The Problem with Stateless Loops

In Tutorials 6 and 7, the agent loop runs from start to finish without saving
any intermediate snapshots. This works for short tasks but creates problems for
real-world systems:

| Problem | What Can Go Wrong |
|---------|------------------|
| Long task interrupted | Network error at step 7 of 10 means starting over |
| Wrong tool call | Agent calls the wrong tool; no way to rewind without re-running everything |
| Audit requirement | Cannot show a regulator what the agent did step by step |
| Human approval needed | Cannot pause at a sensitive step and wait for a human to approve |

### The Solution: Checkpoints

A **checkpoint** is a labelled snapshot of the agent's complete state at a
specific point in time. Think of it like a save point in a video game:

```
Game without saves  : if you lose, you restart from the beginning
Game with saves     : if you lose, you reload from the last save point

Agent without CPs   : if step 7 fails, you re-run all 10 steps
Agent with CPs      : if step 7 fails, you rewind to checkpoint at step 6
```

### What Is Time Travel?

'Time travel' means rewinding the agent state to any saved checkpoint and
resuming from there. You can:
- Inspect what the agent knew at any past step
- Re-run a failed step with different parameters
- Replay from a checkpoint to test a different code path


## What Is Inside an AgentState?

The `AgentState` dataclass stores everything needed to resume or inspect the agent:

| Field | Type | What it stores |
|-------|------|----------------|
| `question` | str | The original user question |
| `steps` | list[dict] | Every Thought-Action-Observation cycle so far |
| `status` | str | 'running', 'paused', or 'completed' |
| `current_answer` | str | The agent's most recent draft answer |

A `Checkpoint` wraps an `AgentState` with:
- `checkpoint_id`: an 8-character unique identifier
- `step_number`: how many steps had been completed when the snapshot was taken
- `label`: a human-readable name (e.g. 'before_retrieval', 'after_step_2')

The `StateManager` stores all checkpoints in memory and provides:
- `save_checkpoint(state, label)` - snapshot and store
- `load_checkpoint(id)` - retrieve by id
- `list_checkpoints()` - list all stored snapshots in step order
- `rewind_to(id)` - restore to a past snapshot


In [None]:
import importlib
import os
from pathlib import Path
import shutil
import subprocess
import sys

import pandas as pd
from dotenv import load_dotenv

if shutil.which("uv") is None:
    print("uv not found. Installing with pip...")
    subprocess.run([sys.executable, "-m", "pip", "install", "uv"], check=True)

cwd = Path.cwd().resolve()
repo_root = next(
    (path for path in [cwd, *cwd.parents] if (path / "pyproject.toml").exists() and (path / "src").exists()),
    cwd,
)
os.chdir(repo_root)
src_path = repo_root / "src"
if str(src_path) not in sys.path:
    sys.path.insert(0, str(src_path))

REQUIRED_PACKAGES = ["openai", "chromadb", "numpy", "pandas", "rank_bm25", "sentence_transformers", "dotenv"]
PIP_NAME_MAP = {"rank_bm25": "rank-bm25", "sentence_transformers": "sentence-transformers", "dotenv": "python-dotenv"}

def find_missing(packages):
    importlib.invalidate_caches()
    return [pkg for pkg in packages if importlib.util.find_spec(pkg) is None]

missing = find_missing(REQUIRED_PACKAGES)
if missing:
    print("Missing packages:", missing)
    subprocess.run(["uv", "sync"], check=True)

missing_after_sync = find_missing(REQUIRED_PACKAGES)
if missing_after_sync:
    pip_targets = [PIP_NAME_MAP.get(pkg, pkg) for pkg in missing_after_sync]
    subprocess.run([sys.executable, "-m", "pip", "install", *pip_targets], check=True)

final_missing = find_missing(REQUIRED_PACKAGES)
if final_missing:
    raise ImportError(f"Dependencies still missing: {final_missing}")

from rag_tutorials.io_utils import load_handbook_documents, load_queries
from rag_tutorials.chunking import semantic_chunk_documents
from rag_tutorials.pipeline import build_dense_retriever
from rag_tutorials.qa import answer_with_context

load_dotenv()
if not os.getenv("OPENAI_API_KEY"):
    raise EnvironmentError("OPENAI_API_KEY is required")

embedding_model = os.getenv("OPENAI_EMBEDDING_MODEL", "text-embedding-3-small")
chat_model = os.getenv("OPENAI_CHAT_MODEL", "gpt-4.1-mini")

handbook_path = Path("data/handbook_manual.txt")
queries_path = Path("data/queries.jsonl")
if not handbook_path.exists() or not queries_path.exists():
    raise FileNotFoundError("Run: uv run python scripts/generate_data.py")

documents = load_handbook_documents(handbook_path)
queries = load_queries(queries_path)
chunks = semantic_chunk_documents(documents)
dense_retriever, _ = build_dense_retriever(
    chunks=chunks,
    collection_name="agent_tutorial_dense",
    embedding_model=embedding_model,
)

In [None]:
from rag_tutorials.agent_state import AgentState, StateManager
from rag_tutorials.agent_loop import run_react_loop, AgentStep

TOP_K = 3

def retrieve_tool(query: str) -> str:
    """Retrieve top-k chunks and return as a formatted string."""
    results = dense_retriever(query, top_k=TOP_K)
    if not results:
        return "No relevant chunks found."
    parts = [f"Chunk {i+1} [{r.chunk_id}]: {r.text}" for i, r in enumerate(results)]
    return "\n\n".join(parts)

tools = {"retrieve": retrieve_tool}
print("Agent loop and state manager ready.")

## Demo: Building Agent State with Checkpoints Step by Step

The simplest way to understand state management is to build it manually:
create an `AgentState`, add steps one by one, save checkpoints at each step,
then rewind to a past checkpoint and observe what changed.

This cell does not call any LLM. It is pure data-structure manipulation so you
can see what is happening without API costs.


In [None]:
# Manual state-building demo (no LLM calls)

manager = StateManager()

# Create initial state
state = AgentState(question="What is the international work limit?")
cid_start = manager.save_checkpoint(state, label="start")
print(f"Checkpoint saved: {cid_start!r}  (label=start, steps={len(state.steps)})")

# Simulate step 1: retrieve
state.steps.append({
    "thought": "I need to retrieve the international work policy.",
    "action": "retrieve",
    "action_input": "international work days limit",
    "observation": "Employees may work internationally for up to 14 days per year.",
})
cid_step1 = manager.save_checkpoint(state, label="after_retrieve")
print(f"Checkpoint saved: {cid_step1!r}  (label=after_retrieve, steps={len(state.steps)})")

# Simulate step 2: reason and finish
state.steps.append({
    "thought": "I have enough information to answer.",
    "action": "finish",
    "action_input": "The limit is 14 days.",
    "observation": "",
})
state.current_answer = "The limit is 14 days."
state.status = "completed"
cid_done = manager.save_checkpoint(state, label="completed")
print(f"Checkpoint saved: {cid_done!r}  (label=completed, steps={len(state.steps)})")

# List all checkpoints
print("\nAll checkpoints in step order:")
for cp in manager.list_checkpoints():
    print(f"  id={cp.checkpoint_id}  step={cp.step_number}  label={cp.label!r}")

## Time Travel: Rewinding to a Past Checkpoint

Now we will demonstrate time travel: restore the agent to its state after step 1
(before it produced the final answer) and observe that the rewound state contains
exactly one step and no current_answer.

In a real system you would use this to:
- Inspect what context the agent had before making a decision
- Re-run a step with different tool parameters
- Let a human approve or modify the state before the agent continues


In [None]:
# Rewind to after_retrieve checkpoint and inspect

print("Current state (before rewind):")
print(f"  Steps          : {len(state.steps)}")
print(f"  Current answer : {repr(state.current_answer)}")
print(f"  Status         : {state.status}")

# Time travel
rewound = manager.rewind_to(cid_step1)

print("\nRewound state (back to after_retrieve checkpoint):")
print(f"  Steps          : {len(rewound.steps)}")
print(f"  Current answer : {repr(rewound.current_answer)}")
print(f"  Status         : {rewound.status}")
print(f"\nStep 1 observation: {rewound.steps[0]['observation']}")

print("\nOriginal state is unchanged (rewind returns a copy):")
print(f"  Steps          : {len(state.steps)}")

## Live Demo: Agent Loop with Automatic Checkpointing

Now we run the actual ReAct loop and save a checkpoint after each step.
After the loop finishes, we use the StateManager to inspect any intermediate
state and rewind to it.

**What to observe:**
- A new checkpoint appears after each tool call
- You can see the exact observation the agent received at each step
- Rewinding gives you the state exactly as it was at any checkpoint


In [None]:
# Run a checkpointed agent loop

question = "What is the remote work VPN requirement and who does it apply to?"

cp_manager = StateManager()
agent_state = AgentState(question=question)
cp_manager.save_checkpoint(agent_state, label="start")

# Run the agent loop and collect steps for checkpointing
agent_result = run_react_loop(
    question=question,
    tools=tools,
    model=chat_model,
    max_steps=5,
)

# Save a checkpoint after each step that was recorded
for i, step in enumerate(agent_result.steps, start=1):
    agent_state.steps.append({
        "thought": step.thought,
        "action": step.action,
        "action_input": step.action_input,
        "observation": step.observation,
    })
    cp_manager.save_checkpoint(agent_state, label=f"after_step_{i}")

agent_state.current_answer = agent_result.answer
agent_state.status = "completed"
cp_manager.save_checkpoint(agent_state, label="completed")

print("Checkpoints saved during the agent run:")
for cp in cp_manager.list_checkpoints():
    print(f"  id={cp.checkpoint_id}  step={cp.step_number}  label={cp.label!r}")

print(f"\nFinal answer: {agent_result.answer[:300]}")

In [None]:
# Inspect any intermediate checkpoint and compare with the final state

all_cps = cp_manager.list_checkpoints()

# Pick the first non-start checkpoint (after step 1) if it exists
if len(all_cps) >= 2:
    mid_cp = all_cps[1]  # checkpoint after first step
    restored = cp_manager.rewind_to(mid_cp.checkpoint_id)

    print(f"Inspecting checkpoint: id={mid_cp.checkpoint_id!r}  label={mid_cp.label!r}")
    print(f"  Steps in checkpoint   : {len(restored.steps)}")
    print(f"  Current answer at CP  : {repr(restored.current_answer) or '(none yet)'}")
    if restored.steps:
        last = restored.steps[-1]
        print(f"  Last action           : {last['action']}({repr(last['action_input'])[:60]})")
        print(f"  Last observation      : {last['observation'][:200]}..." if len(last['observation']) > 200 else f"  Last observation      : {last['observation']}")

    print(f"\nFinal state has {len(agent_state.steps)} steps vs checkpoint has {len(restored.steps)} steps.")
else:
    print("Agent completed in one step; no intermediate checkpoint to inspect.")
    print("Try a more complex question to see multiple checkpoints.")

## Human-in-the-Loop with Checkpoints

State management unlocks **human-in-the-loop** workflows. The pattern is:

```
1. Run agent until a sensitive step (e.g., before writing to a database)
2. Save a checkpoint and set state.status = 'paused'
3. Show the human what the agent is about to do
4. Human approves -> resume from checkpoint
   Human rejects  -> rewind to an earlier checkpoint and adjust
```

The next cell demonstrates the pause-and-inspect pattern without actually
pausing for real human input (we auto-approve for demonstration purposes).


In [None]:
# Human-in-the-loop demonstration (auto-approved)

approval_manager = StateManager()
approval_state = AgentState(question=question, status="running")
start_cid = approval_manager.save_checkpoint(approval_state, label="start")

# Simulate adding one step and pausing for approval
if agent_result.steps:
    first_step = agent_result.steps[0]
    approval_state.steps.append({
        "thought": first_step.thought,
        "action": first_step.action,
        "action_input": first_step.action_input,
        "observation": first_step.observation,
    })
    approval_state.status = "paused"  # waiting for human review
    pause_cid = approval_manager.save_checkpoint(approval_state, label="paused_for_review")

    print("Agent paused. Human review required.")
    print(f"  Checkpoint id : {pause_cid}")
    print(f"  Last action   : {first_step.action}({repr(first_step.action_input)[:60]})")
    print(f"  Observation   : {first_step.observation[:150]}..." if len(first_step.observation) > 150 else f"  Observation   : {first_step.observation}")
    print()

    # Auto-approve for demo purposes
    human_decision = "approve"
    print(f"Human decision: {human_decision!r}")

    if human_decision == "approve":
        approval_state.status = "running"
        approval_manager.save_checkpoint(approval_state, label="approved_resumed")
        print("Agent resumed. Continuing to final answer...")
        print(f"Final answer: {agent_result.answer[:200]}")
    else:
        # Rewind to before the step
        approval_state = approval_manager.rewind_to(start_cid)
        print("Rewound to start. Agent will retry.")
else:
    print("No steps to pause on for this question.")

## Learning Checkpoint: State Management

### What Works

- Checkpoints let you inspect exactly what the agent knew at any past step:
  what context it had, what tools it called, and what answers it considered.
- Time travel restores a full agent state from any checkpoint, enabling
  retry, audit, and debugging workflows.
- Human-in-the-loop review is possible by pausing at any step and resuming
  or rewinding based on human feedback.

### What Does Not Work Well

- This tutorial's StateManager stores checkpoints in memory. A production
  system needs persistent storage (database, file system) so checkpoints
  survive process restarts.
- Checkpointing adds overhead if saved very frequently in a tight loop.
  In practice, save checkpoints at meaningful boundaries (before a tool call,
  after a human decision) rather than every single computation.

### Where to Go Next

The three agent tutorials (6, 7, 8) form a self-contained extension of the
RAG series:

- **Tutorial 6** showed how to wrap RAG retrieval as an agent tool.
- **Tutorial 7** showed how a Critic improves answer quality.
- **Tutorial 8** showed how to save, inspect, and rewind agent state.

Together they build the foundation for production-grade agentic systems:
reliable retrieval + quality control + auditability.
