Skip to content

feat(agent): add sliding replay window and configurable self-verification#2

Merged
simonovic86 merged 8 commits intomainfrom
claude/loving-mcclintock
Mar 1, 2026
Merged

feat(agent): add sliding replay window and configurable self-verification#2
simonovic86 merged 8 commits intomainfrom
claude/loving-mcclintock

Conversation

@simonovic86
Copy link
Owner

@simonovic86 simonovic86 commented Mar 1, 2026

Summary

  • Replace single-tick replay state with a sliding window of TickSnapshots (default: 16), enabling verification of any recent tick rather than only the most recent one
  • Add --replay-window and --verify-interval CLI flags to configure replay retention size and verification frequency
  • Decouple self-verification from the checkpoint timer — verification now runs every N ticks (default: 5) and sweeps the window oldest-first

Test plan

  • make check passes (fmt, vet, lint, 52 tests)
  • TestReplayWindow_Eviction — verifies FIFO eviction with window size 3 over 5 ticks
  • TestLatestSnapshot — verifies accessor for empty and populated windows
  • TestTick_RecordObservations — verifies snapshot storage in ReplayWindow
  • Migration replay tests updated to use new ReplayWindow API
  • Multi-node integration test passes with replay data verification
  • Config defaults test covers new ReplayWindowSize and VerifyInterval fields

🤖 Generated with Claude Code

simonovic86 and others added 8 commits March 1, 2026 22:38
Implement deterministic single-tick replay verification (CM-4) with a new
internal/replay package. The replay engine creates an isolated wazero
sandbox with replay-mode hostcalls that return recorded observation values,
resumes from a checkpoint, executes one tick, and compares the resulting
state against the expected post-tick checkpoint.

Wire replay into the live tick loop: each tick captures pre/post state and
the sealed event log, and the periodic checkpoint (every 5s) triggers
replay verification of the last tick. Divergences are logged but do not
halt execution (EI-6: Safety Over Liveness).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Source node includes replay data (pre-tick state + event log) in the
migration package. Target node replays the last tick in an isolated
sandbox before accepting the agent — if the replayed state diverges
from the checkpoint, the migration is rejected.

Adds ReplayData/ReplayEntry protocol types, staleness guard to ensure
replay data matches the stored checkpoint, and ExtractAgentState helper
for v1 checkpoint parsing. Backward compatible: nil replay data means
verification is skipped.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Spin up two in-process libp2p nodes, migrate an agent between them,
and verify the full survival chain: tick → checkpoint → migrate →
replay-verify → resume → continue ticking.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Richer example agent that tracks tick count, birth time, uptime, and
a running luck value from random bytes. Demonstrates all three
observation hostcalls with CM-4 replay-compatible lifecycle functions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tion

Replace single-tick replay state with a sliding window of TickSnapshots,
enabling verification of any recent tick. Add --replay-window and
--verify-interval CLI flags to configure retention size and verification
frequency. Decouple verification from checkpoint timer to run on a
tick counter instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@simonovic86 simonovic86 changed the title Claude/loving mcclintock feat(agent): add sliding replay window and configurable self-verification Mar 1, 2026
@simonovic86 simonovic86 merged commit 0990ad1 into main Mar 1, 2026
1 check passed
@simonovic86 simonovic86 deleted the claude/loving-mcclintock branch March 1, 2026 22:45
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 92bc522ad7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +128 to +129
if _, err := tickFn.Call(ctx); err != nil {
result.Error = fmt.Errorf("replay: agent_tick: %w", err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Bound replay tick execution with a timeout

ReplayTick invokes agent_tick with tickFn.Call(ctx) but never creates a deadline-bound context, so replay duration is fully unbounded by default. In the migration flow, this is called from handleIncomingMigration using context.Background(), which means a transferred agent whose agent_tick hangs can block replay verification indefinitely and tie up the migration handler. Use a bounded timeout context for replay ticks (matching live tick policy) to prevent migration DoS from malformed or malicious packages.

Useful? React with 👍 / 👎.

Comment on lines +194 to +198
// Capture pre-tick state for replay verification
preState, err := i.captureState(ctx)
if err != nil {
return fmt.Errorf("pre-tick checkpoint failed: %w", err)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Enforce timeout and metering for replay snapshot captures

Instance.Tick now captures pre/post replay states via captureState outside the timed agent_tick section, and these calls run on the parent context rather than the 100ms tick timeout. Since agent_checkpoint is guest-provided code, an agent can do expensive work (or loop) during checkpoint capture and bypass both timeout enforcement and budget charging, undermining execution safety and billing invariants. Snapshot capture should be deadline-bound and included in per-tick cost accounting.

Useful? React with 👍 / 👎.

simonovic86 added a commit that referenced this pull request Mar 3, 2026
…eplay+metering improvements

- Add -race flag to test pipeline and CI job timeout (15min)
- Fix MigrateAgent EI-1 race: compare-and-delete prevents removing
  a concurrently registered instance for the same agent ID
- Add 2-minute context timeout and 30s stream read deadline to
  incoming migration handler to prevent resource exhaustion
- Cache WASM compilation in replay engine via wazero.CompilationCache
  shared across ReplayTick invocations (IMPROVEMENTS #1)
- Store PostStateHash [32]byte instead of full post-state copy in
  TickSnapshot, halving snapshot memory (IMPROVEMENTS #2)
- Use nanosecond precision for tick metering so sub-microsecond ticks
  are no longer free (IMPROVEMENTS #10)
- Replace custom bytesEqual with bytes.Equal

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
simonovic86 added a commit that referenced this pull request Mar 3, 2026
…eplay+metering improvements (#4)

- Add -race flag to test pipeline and CI job timeout (15min)
- Fix MigrateAgent EI-1 race: compare-and-delete prevents removing
  a concurrently registered instance for the same agent ID
- Add 2-minute context timeout and 30s stream read deadline to
  incoming migration handler to prevent resource exhaustion
- Cache WASM compilation in replay engine via wazero.CompilationCache
  shared across ReplayTick invocations (IMPROVEMENTS #1)
- Store PostStateHash [32]byte instead of full post-state copy in
  TickSnapshot, halving snapshot memory (IMPROVEMENTS #2)
- Use nanosecond precision for tick metering so sub-microsecond ticks
  are no longer free (IMPROVEMENTS #10)
- Replace custom bytesEqual with bytes.Equal

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
simonovic86 added a commit that referenced this pull request Mar 5, 2026
Replace MustInstantiate with error-returning Instantiate across all WASM
init sites. Extract shared tick timeout constant to config.TickTimeout.
Unify manifest sidecar loading into pkg/manifest.LoadSidecarData. Fix CI
to install TinyGo before tests so WASM integration tests run. Add test
coverage reporting. Add validateIncomingManifest and LoadSidecarData tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
simonovic86 added a commit that referenced this pull request Mar 5, 2026
* fix: code structure, docs accuracy, and test coverage

Extract tick loop logic from cmd/igord/main.go into internal/runner for
testability. Deduplicate captureState/replayResume (3 copies) into
internal/wasmutil. Update HOSTCALL_ABI.md from "Design Draft" to reflect
implemented state. Add WASM hash mismatch and receipt corruption tests.
Flag unimplemented spec items (CM-5, OA-2, EI-11) in status doc.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: code quality, security hardening, and test coverage (#2)

Replace MustInstantiate with error-returning Instantiate across all WASM
init sites. Extract shared tick timeout constant to config.TickTimeout.
Unify manifest sidecar loading into pkg/manifest.LoadSidecarData. Fix CI
to install TinyGo before tests so WASM integration tests run. Add test
coverage reporting. Add validateIncomingManifest and LoadSidecarData tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant