perf(execution): raw wasm module bytes + eager userland compile#216
Merged
NathanFlurry merged 5 commits intoJul 2, 2026
Conversation
…s via event The sidecar drains process output before emitting process_exited and the frame stream + event pump are FIFO, so once the exit event is observed no trailing output can follow it; the host-side quiet-turn drain (2 turns at 10ms) only remains for the snapshot-poll fallback exit path. Also stop awaiting signal-state refreshes on the exec critical path (start + first output) — kill paths await the in-flight refresh instead — and clean up the parked refresh in finishProcess so fallback exits release it too. Warm host-driven exec p50: node -e '' 30ms -> ~17ms, wasm true 52ms -> 39ms; whole wasm-command-floor lane down ~13ms/row; bench:gate green. New regression test: 10 sequential fast-exit 64KiB stdout captures.
Bake the constant wasm runner + wasi shim (~300KB of JS previously recompiled per exec) into the per-process V8 userland snapshot via guest_runtime.snapshot_userland_code, keyed and cached process-wide. Mode env AGENTOS_WASM_SNAPSHOT_RUNNER=auto|block|off: auto probes the snapshot cache without blocking (async warm kicked once per process) and falls back to the byte-identical inline runner until ready, so the cold path is unchanged. The sync-RPC glue is re-evaluated per exec (the snapshot-baked copy cannot bind session bridge fns). Module bytes are now base64-encoded once per module in a bounded, fingerprint-validated cache (64 entries, warn on evict, debug log reports per-entry and cumulative cache bytes) instead of fs::read+encode per exec. Adds AGENTOS_V8_SESSION_PHASES timers (snapshot_get / blob_clone / isolate_new / user_code_execute) and a measured NOTE against EagerCompile at snapshot build (moves cost to isolate deserialize). Warm wasm floor p50: true 39->33ms, pwd 48->43ms, ls-empty 81->75ms, date-version 127->115ms (52/62/96/140 before PR #212); cold first-exec improved; wasm suite green; bench:gate green. Remaining floor is isolate-per-exec + module decode — pooling tracked as Stage C.
This was referenced Jul 2, 2026
Member
Author
The filtered install (--filter @secure-exec/benchmarks...) omitted packages/build-tools, so every bench-gate run died in the v8-bridge build script (missing Node dependencies) before gating anything — the PR gate has never actually run green on CI.
Pre-created V8 session workers (thread + snapshot-restored isolate, keyed by snapshot digest + heap limit) are claimed at session create via a warm hint, taking isolate_new (~4.2ms) and blob_clone off the per-exec critical path. Workers are never reused across guests (each exec still gets a virgin isolate); a wrong hint falls back to the existing lazy-create path. Capacity per key via AGENTOS_V8_WARM_ISOLATES (default 2, 0 disables); refill after claim; pool transitions logged; warm_worker_hit/miss counters in the session phases output. Seeding is deliberately conservative: auto mode stays snapshot-only and only AGENTOS_WASM_SNAPSHOT_RUNNER=block seeds workers (wasm-runner key + default node key). Reason: creating isolates on background threads while other isolates execute guest WebAssembly deterministically SIGSEGVs the pinned V8 130 (WasmCodePointerTable::AllocateUninitializedEntry inside Isolate::New, even under the isolate lifecycle lock — the race is against V8-internal wasm threads). Backtraces + notes in the rusty-v8-upgrade todo; enabling default seeding rides that upgrade. Warm floors (block, this host): node -e '' 17 -> 14.4ms p50, wasm true 33 -> ~30ms (warm_worker_hit 64/67, isolate_new down to 3 background calls); pwd 42.6 -> 37.5ms, printf-0b 44.7 -> 40.2ms, date 115 -> 111ms. wasm suite 8.9s (was 96s, pool+snapshot amortize debug isolates); v8-runtime tests and bench:gate green.
Module bytes now ride the in-process runtime protocol as Arc<Vec<u8>> from a raw-bytes cache (same bounded/fingerprinted semantics as the base64 cache it replaces) and are injected pre-exec as a __agentOSWasmModuleBytes Uint8Array; the runner prefers it over the AGENTOS_WASM_MODULE_BASE64 env fallback, so the per-exec base64 encode/decode (4.3ms @267KB, ~23ms @2.5MB) is gone in both snapshot and inline modes. The userland (wasm-runner) snapshot script is now compiled with EagerCompile: the fatter-blob isolate cost it causes is prepaid by parked warm workers, so the runner function no longer lazy-compiles inside user_code_execute. Warm floors (block, this host): wasm true 30 -> 19.7ms p50 (52ms at the start of the 3.2 campaign), echo hello 24.0ms, pwd 29.3ms, ls-empty ~39ms (was 96), date --version 30.4ms (was 140 — module-size tax eliminated), printf-64k 136ms (stdout streaming path, separate finding). Injection failure fails the exec loudly. wasm + v8-runtime suites green; bench:gate green (require_100_small flaked 4.3x once and passed 1.13x on rerun — bimodal, pre-existing, tracked).
e831166 to
7e0fa8e
Compare
e6691d0 to
7be089c
Compare
This was referenced Jul 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Module bytes now ride the in-process runtime protocol as Arc<Vec> from a
raw-bytes cache (same bounded/fingerprinted semantics as the base64 cache it
replaces) and are injected pre-exec as a __agentOSWasmModuleBytes Uint8Array;
the runner prefers it over the AGENTOS_WASM_MODULE_BASE64 env fallback, so the
per-exec base64 encode/decode (4.3ms @267KB, ~23ms @2.5MB) is gone in both
snapshot and inline modes. The userland (wasm-runner) snapshot script is now
compiled with EagerCompile: the fatter-blob isolate cost it causes is prepaid
by parked warm workers, so the runner function no longer lazy-compiles inside
user_code_execute.
Warm floors (block, this host): wasm true 30 -> 19.7ms p50 (52ms at the start
of the 3.2 campaign), echo hello 24.0ms, pwd 29.3ms, ls-empty ~39ms (was 96),
date --version 30.4ms (was 140 — module-size tax eliminated), printf-64k 136ms
(stdout streaming path, separate finding). Injection failure fails the exec
loudly. wasm + v8-runtime suites green; bench:gate green (require_100_small
flaked 4.3x once and passed 1.13x on rerun — bimodal, pre-existing, tracked).