fix(runtime): harden Phase 3 — validate bounds, fix leaks, add tests#7
fix(runtime): harden Phase 3 — validate bounds, fix leaks, add tests#7simonovic86 merged 1 commit intomainfrom
Conversation
Hardening pass before Phase 4 (Economics): - Validate hostcall bounds: cap rand_bytes at 64MB and log_emit at 1MB to prevent OOM panics (EI-6), check malloc null pointer return - Reject negative budget/price in checkpoint parsing (RE-3), guard int64 overflow in cost calculation - Fix migration: return error on stale checkpoint deletion failure (EI-1), hold lock during instance Close to prevent use-after-close - Plug memory leaks: copy-to-new-slice on eventlog and replay window eviction instead of slice truncation - Clean stale .tmp files on FSProvider startup (RE-1 crash recovery) - Add config validation (PricePerSecond, ReplayWindowSize, ReplayMode) - Remove unused protocol types, consolidate logging in cmd/igord - Add 13 new tests covering all hardening fixes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 81aa9bbd91
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if length == 0 { | ||
| return 0 |
There was a problem hiding this comment.
Record zero-length rand calls for deterministic replay
Return early on length == 0 skips eventLog.Record, but replay always consumes one RandBytes entry per rand_bytes invocation (internal/replay/engine.go registerReplayRand calls iter.next unconditionally). In replay modes (on-migrate/full), any tick that calls rand_bytes(ptr, 0) will shift the replay iterator and produce a false divergence/rejection even though execution was valid on the source node.
Useful? React with 👍 / 👎.
| if length == 0 || length > maxLogBytes { | ||
| return |
There was a problem hiding this comment.
Preserve log_emit entries when dropping oversized or empty logs
This early return drops log_emit calls from the event log, but replay consumes one LogEmit entry for every hostcall invocation (internal/replay/engine.go registerReplayLog uses iter.next on each call). In replay-enabled flows, agents that emit empty logs or logs above the cap will produce missing entries and trigger deterministic replay failures during verification/migration.
Useful? React with 👍 / 👎.
Implements the remaining optimization items from IMPROVEMENTS.md: - #9 Arena-backed event log allocation to reduce GC pressure - #3 Observation-weighted snapshot retention (replaces FIFO eviction) - #5 Configurable replay divergence escalation (log/pause/intensify/migrate) - #4 Multi-tick chain replay verification (N ticks in single wazero instance) - #7 SDK checkpoint serialization helpers (Encoder/Decoder with chainable API) - #6 Adaptive tick rate with agent hint (Tick() returns bool, 10ms/1s intervals) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements the remaining optimization items from IMPROVEMENTS.md: - #9 Arena-backed event log allocation to reduce GC pressure - #3 Observation-weighted snapshot retention (replaces FIFO eviction) - #5 Configurable replay divergence escalation (log/pause/intensify/migrate) - #4 Multi-tick chain replay verification (N ticks in single wazero instance) - #7 SDK checkpoint serialization helpers (Encoder/Decoder with chainable API) - #6 Adaptive tick rate with agent hint (Tick() returns bool, 10ms/1s intervals) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Changes
Input Validation & Bounds Checking
Validate()method to enforce positivePricePerSecond, non-negativeReplayWindowSize/VerifyInterval, and validReplayModevalueslog_emitbuffer to 1 MB andrand_bytesto 64 MB; return error code -2 for oversized requestsMemory Leak Fixes
EventLog.SealTick()andInstance.Tick()to release references and free underlying arrayscleanStaleTempFiles()toFSProvider.NewFSProvider()to remove leftover.tmpcheckpoint files from interrupted writesMigration & Error Handling
MigrateAgent()lock handling and ensure Close is called under lockInstance.Resume()Testing
AgentPackage,AgentTransfer, andAgentStartedCode Quality
logging.Info/Errorto direct logger methods