# ‚è∞ Time-Travel Debugging

> **Replay and debug agent decisions step-by-step.**

## Learning Objectives

By the end of this notebook, you will:
1. Understand the Flight Recorder concept
2. Record agent execution traces
3. Replay decisions at any point in time
4. Debug why an agent made a specific choice
5. Compare alternate execution paths

---

## Why Time-Travel Debugging?

**Problem:** When an agent makes a bad decision, traditional debugging shows the final state‚Äînot how it got there.

**Solution:** Record every decision point. Replay the agent's reasoning at any moment.

```
Traditional Debugging:       Time-Travel Debugging:
                             
    Start                        Start
      ‚Üì                            ‚Üì [recorded]
    Step 1                       Step 1 ‚Üê replay here
      ‚Üì                            ‚Üì [recorded]
    Step 2                       Step 2 ‚Üê replay here
      ‚Üì                            ‚Üì [recorded]
    Error! ‚Üê see this only       Step 3 ‚Üê replay here
                                   ‚Üì [recorded]
                                 Error! ‚Üê see full context
```

---

## Step 1: Install Dependencies

In [None]:
!pip install agent-os --quiet

## Step 2: Initialize the Flight Recorder

The Flight Recorder captures every decision point during agent execution.

In [None]:
from agent_os import KernelSpace
from agent_os.flight_recorder import FlightRecorder, Checkpoint

# Create kernel with flight recorder enabled
kernel = KernelSpace(policy="strict")
recorder = FlightRecorder(storage_path="./flight_data")

# Connect recorder to kernel
kernel.attach_recorder(recorder)

print("‚úÖ Flight Recorder initialized")
print(f"   Storage: ./flight_data")

## Step 3: Create an Agent with Checkpoints

In [None]:
@kernel.register
async def analysis_agent(task: str):
    """
    An agent that performs multi-step analysis.
    Each step is checkpointed for replay.
    """
    results = []
    
    # Step 1: Parse input
    recorder.checkpoint(
        name="parse_input",
        state={"task": task},
        reasoning="Starting task parsing"
    )
    parsed = {"raw": task, "tokens": task.split()}
    results.append(f"Parsed {len(parsed['tokens'])} tokens")
    
    # Step 2: Analyze content
    recorder.checkpoint(
        name="analyze_content",
        state={"parsed": parsed},
        reasoning="Analyzing content structure"
    )
    analysis = {
        "word_count": len(parsed['tokens']),
        "has_question": "?" in task,
        "has_action": any(w in task.lower() for w in ["do", "make", "create"])
    }
    results.append(f"Analysis complete: {analysis}")
    
    # Step 3: Generate response
    recorder.checkpoint(
        name="generate_response",
        state={"analysis": analysis},
        reasoning="Generating final response based on analysis"
    )
    
    if analysis['has_question']:
        response = "This is a question. I would provide an answer."
    elif analysis['has_action']:
        response = "This is an action request. I would execute it."
    else:
        response = "This is a statement. I acknowledge it."
    
    results.append(f"Response: {response}")
    
    # Final checkpoint
    recorder.checkpoint(
        name="complete",
        state={"results": results, "response": response},
        reasoning="Task completed successfully"
    )
    
    return response

print("‚úÖ Agent registered with checkpoints")

## Step 4: Execute and Record

In [None]:
# Start recording session
session_id = recorder.start_session(agent_id="analysis_agent")
print(f"üé¨ Recording session: {session_id}")

# Execute multiple tasks
tasks = [
    "What is the weather today?",
    "Create a summary of the Q4 report",
    "The project deadline is Friday"
]

for task in tasks:
    result = await kernel.execute(analysis_agent, task)
    print(f"\nüìù Task: {task}")
    print(f"   Result: {result}")

# Stop recording
recorder.stop_session()
print(f"\nüõë Recording stopped. {recorder.checkpoint_count()} checkpoints saved.")

## Step 5: List All Checkpoints

In [None]:
# Get all checkpoints from the session
checkpoints = recorder.list_checkpoints(session_id)

print(f"üìä Session: {session_id}")
print(f"   Total checkpoints: {len(checkpoints)}")
print("\nüîç Checkpoint Timeline:")
print("-" * 70)

for i, cp in enumerate(checkpoints):
    print(f"\n  [{i+1}] {cp.name}")
    print(f"      Time: {cp.timestamp}")
    print(f"      Reasoning: {cp.reasoning}")

## Step 6: Replay a Specific Checkpoint

Go back in time to see the agent's state at any point:

In [None]:
# Replay checkpoint by name
checkpoint = recorder.replay(session_id, checkpoint_name="analyze_content")

print("‚è™ REPLAYING: analyze_content")
print("=" * 60)
print(f"\nüìç Timestamp: {checkpoint.timestamp}")
print(f"üß† Reasoning: {checkpoint.reasoning}")
print(f"\nüì¶ State at this point:")

import json
print(json.dumps(checkpoint.state, indent=2))

## Step 7: Compare Execution Paths

Run the same agent with different inputs and compare:

In [None]:
# Record two different sessions
session_a = recorder.start_session(agent_id="analysis_agent")
await kernel.execute(analysis_agent, "What time is it?")
recorder.stop_session()

session_b = recorder.start_session(agent_id="analysis_agent")
await kernel.execute(analysis_agent, "Make me a sandwich")
recorder.stop_session()

# Compare the two paths
comparison = recorder.compare_sessions(session_a, session_b)

print("üìä Execution Path Comparison")
print("=" * 60)
print(f"\nSession A: {session_a}")
print(f"Session B: {session_b}")
print(f"\nüîÄ Divergence point: {comparison['divergence_point']}")
print(f"\nüìù Session A at divergence:")
print(f"   Reasoning: {comparison['session_a_state']['reasoning']}")
print(f"\nüìù Session B at divergence:")
print(f"   Reasoning: {comparison['session_b_state']['reasoning']}")

## Step 8: Debug a Failure

When something goes wrong, replay to find the root cause:

In [None]:
@kernel.register
async def buggy_agent(task: str):
    """An agent with a bug we need to debug."""
    
    recorder.checkpoint("start", {"task": task}, "Beginning task")
    
    # Step 1: Process input
    data = {"input": task, "processed": False}
    recorder.checkpoint("process", {"data": data}, "Processing input")
    
    # Step 2: Bug! We forget to set processed=True
    # data["processed"] = True  # BUG: This line is missing!
    recorder.checkpoint("after_process", {"data": data}, "After processing")
    
    # Step 3: This fails because processed is False
    if not data["processed"]:
        recorder.checkpoint("error", {"data": data}, "ERROR: Data not processed!")
        raise ValueError("Data was not processed correctly!")
    
    return "Success"

# Run and catch the error
session_debug = recorder.start_session(agent_id="buggy_agent")
try:
    await kernel.execute(buggy_agent, "test task")
except ValueError as e:
    print(f"‚ùå Error: {e}")
finally:
    recorder.stop_session()

In [None]:
# Debug: Replay checkpoints to find the bug
print("üîç Debugging the failure...")
print("=" * 60)

checkpoints = recorder.list_checkpoints(session_debug)

for cp in checkpoints:
    print(f"\nüìç Checkpoint: {cp.name}")
    print(f"   Reasoning: {cp.reasoning}")
    
    if 'data' in cp.state:
        print(f"   data['processed'] = {cp.state['data'].get('processed')}")

print("\nüí° Bug found! 'processed' is never set to True")
print("   Fix: Add 'data[\"processed\"] = True' after processing")

## Step 9: Export for Analysis

In [None]:
# Export session data for external analysis
export_data = recorder.export_session(session_id, format="json")

# Save to file
with open("debug_export.json", "w") as f:
    f.write(export_data)

print("üì§ Exported to debug_export.json")
print("   Can be loaded into visualization tools or analyzed offline")

## Cleanup

In [None]:
import os
import shutil

# Remove demo files
if os.path.exists("./flight_data"):
    shutil.rmtree("./flight_data")
    print("üóëÔ∏è  Removed flight_data/")

if os.path.exists("debug_export.json"):
    os.remove("debug_export.json")
    print("üóëÔ∏è  Removed debug_export.json")

---

## Summary

| Feature | What It Does |
|---------|-------------|
| `FlightRecorder` | Records agent execution |
| `checkpoint()` | Saves state at a decision point |
| `replay()` | Go back to any checkpoint |
| `compare_sessions()` | Diff two execution paths |
| `export_session()` | Export for external analysis |

### Quick Reference

```python
from agent_os.flight_recorder import FlightRecorder

# Setup
recorder = FlightRecorder(storage_path="./data")
kernel.attach_recorder(recorder)

# Record
session = recorder.start_session(agent_id="my_agent")
recorder.checkpoint("name", {"state": "data"}, "reasoning")
recorder.stop_session()

# Replay
cp = recorder.replay(session, checkpoint_name="name")
print(cp.state, cp.reasoning)

# Compare
diff = recorder.compare_sessions(session_a, session_b)
```

---

## Next Steps

- [04-cross-model-verification](04-cross-model-verification.ipynb) - Detect hallucinations
- [05-multi-agent-coordination](05-multi-agent-coordination.ipynb) - Trust between agents
- [Flight Recorder Documentation](../docs/flight-recorder.md)