feat(agent): add sliding replay window and configurable self-verification#2
feat(agent): add sliding replay window and configurable self-verification#2simonovic86 merged 8 commits intomainfrom
Conversation
Implement deterministic single-tick replay verification (CM-4) with a new internal/replay package. The replay engine creates an isolated wazero sandbox with replay-mode hostcalls that return recorded observation values, resumes from a checkpoint, executes one tick, and compares the resulting state against the expected post-tick checkpoint. Wire replay into the live tick loop: each tick captures pre/post state and the sealed event log, and the periodic checkpoint (every 5s) triggers replay verification of the last tick. Divergences are logged but do not halt execution (EI-6: Safety Over Liveness). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Source node includes replay data (pre-tick state + event log) in the migration package. Target node replays the last tick in an isolated sandbox before accepting the agent — if the replayed state diverges from the checkpoint, the migration is rejected. Adds ReplayData/ReplayEntry protocol types, staleness guard to ensure replay data matches the stored checkpoint, and ExtractAgentState helper for v1 checkpoint parsing. Backward compatible: nil replay data means verification is skipped. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Spin up two in-process libp2p nodes, migrate an agent between them, and verify the full survival chain: tick → checkpoint → migrate → replay-verify → resume → continue ticking. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Richer example agent that tracks tick count, birth time, uptime, and a running luck value from random bytes. Demonstrates all three observation hostcalls with CM-4 replay-compatible lifecycle functions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tion Replace single-tick replay state with a sliding window of TickSnapshots, enabling verification of any recent tick. Add --replay-window and --verify-interval CLI flags to configure retention size and verification frequency. Decouple verification from checkpoint timer to run on a tick counter instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 92bc522ad7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if _, err := tickFn.Call(ctx); err != nil { | ||
| result.Error = fmt.Errorf("replay: agent_tick: %w", err) |
There was a problem hiding this comment.
Bound replay tick execution with a timeout
ReplayTick invokes agent_tick with tickFn.Call(ctx) but never creates a deadline-bound context, so replay duration is fully unbounded by default. In the migration flow, this is called from handleIncomingMigration using context.Background(), which means a transferred agent whose agent_tick hangs can block replay verification indefinitely and tie up the migration handler. Use a bounded timeout context for replay ticks (matching live tick policy) to prevent migration DoS from malformed or malicious packages.
Useful? React with 👍 / 👎.
| // Capture pre-tick state for replay verification | ||
| preState, err := i.captureState(ctx) | ||
| if err != nil { | ||
| return fmt.Errorf("pre-tick checkpoint failed: %w", err) | ||
| } |
There was a problem hiding this comment.
Enforce timeout and metering for replay snapshot captures
Instance.Tick now captures pre/post replay states via captureState outside the timed agent_tick section, and these calls run on the parent context rather than the 100ms tick timeout. Since agent_checkpoint is guest-provided code, an agent can do expensive work (or loop) during checkpoint capture and bypass both timeout enforcement and budget charging, undermining execution safety and billing invariants. Snapshot capture should be deadline-bound and included in per-tick cost accounting.
Useful? React with 👍 / 👎.
…eplay+metering improvements - Add -race flag to test pipeline and CI job timeout (15min) - Fix MigrateAgent EI-1 race: compare-and-delete prevents removing a concurrently registered instance for the same agent ID - Add 2-minute context timeout and 30s stream read deadline to incoming migration handler to prevent resource exhaustion - Cache WASM compilation in replay engine via wazero.CompilationCache shared across ReplayTick invocations (IMPROVEMENTS #1) - Store PostStateHash [32]byte instead of full post-state copy in TickSnapshot, halving snapshot memory (IMPROVEMENTS #2) - Use nanosecond precision for tick metering so sub-microsecond ticks are no longer free (IMPROVEMENTS #10) - Replace custom bytesEqual with bytes.Equal Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…eplay+metering improvements (#4) - Add -race flag to test pipeline and CI job timeout (15min) - Fix MigrateAgent EI-1 race: compare-and-delete prevents removing a concurrently registered instance for the same agent ID - Add 2-minute context timeout and 30s stream read deadline to incoming migration handler to prevent resource exhaustion - Cache WASM compilation in replay engine via wazero.CompilationCache shared across ReplayTick invocations (IMPROVEMENTS #1) - Store PostStateHash [32]byte instead of full post-state copy in TickSnapshot, halving snapshot memory (IMPROVEMENTS #2) - Use nanosecond precision for tick metering so sub-microsecond ticks are no longer free (IMPROVEMENTS #10) - Replace custom bytesEqual with bytes.Equal Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Replace MustInstantiate with error-returning Instantiate across all WASM init sites. Extract shared tick timeout constant to config.TickTimeout. Unify manifest sidecar loading into pkg/manifest.LoadSidecarData. Fix CI to install TinyGo before tests so WASM integration tests run. Add test coverage reporting. Add validateIncomingManifest and LoadSidecarData tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: code structure, docs accuracy, and test coverage Extract tick loop logic from cmd/igord/main.go into internal/runner for testability. Deduplicate captureState/replayResume (3 copies) into internal/wasmutil. Update HOSTCALL_ABI.md from "Design Draft" to reflect implemented state. Add WASM hash mismatch and receipt corruption tests. Flag unimplemented spec items (CM-5, OA-2, EI-11) in status doc. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: code quality, security hardening, and test coverage (#2) Replace MustInstantiate with error-returning Instantiate across all WASM init sites. Extract shared tick timeout constant to config.TickTimeout. Unify manifest sidecar loading into pkg/manifest.LoadSidecarData. Fix CI to install TinyGo before tests so WASM integration tests run. Add test coverage reporting. Add validateIncomingManifest and LoadSidecarData tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
TickSnapshots (default: 16), enabling verification of any recent tick rather than only the most recent one--replay-windowand--verify-intervalCLI flags to configure replay retention size and verification frequencyTest plan
make checkpasses (fmt, vet, lint, 52 tests)TestReplayWindow_Eviction— verifies FIFO eviction with window size 3 over 5 ticksTestLatestSnapshot— verifies accessor for empty and populated windowsTestTick_RecordObservations— verifies snapshot storage in ReplayWindowReplayWindowAPIReplayWindowSizeandVerifyIntervalfields🤖 Generated with Claude Code