Skip to content

perf(execution): raw wasm module bytes + eager userland compile#216

Merged
NathanFlurry merged 5 commits into
mainfrom
stack/perf-execution-raw-wasm-module-bytes-eager-userland-compile-xusommqr
Jul 2, 2026
Merged

perf(execution): raw wasm module bytes + eager userland compile#216
NathanFlurry merged 5 commits into
mainfrom
stack/perf-execution-raw-wasm-module-bytes-eager-userland-compile-xusommqr

Conversation

@NathanFlurry

@NathanFlurry NathanFlurry commented Jul 2, 2026

Copy link
Copy Markdown
Member

Module bytes now ride the in-process runtime protocol as Arc<Vec> from a
raw-bytes cache (same bounded/fingerprinted semantics as the base64 cache it
replaces) and are injected pre-exec as a __agentOSWasmModuleBytes Uint8Array;
the runner prefers it over the AGENTOS_WASM_MODULE_BASE64 env fallback, so the
per-exec base64 encode/decode (4.3ms @267KB, ~23ms @2.5MB) is gone in both
snapshot and inline modes. The userland (wasm-runner) snapshot script is now
compiled with EagerCompile: the fatter-blob isolate cost it causes is prepaid
by parked warm workers, so the runner function no longer lazy-compiles inside
user_code_execute.

Warm floors (block, this host): wasm true 30 -> 19.7ms p50 (52ms at the start
of the 3.2 campaign), echo hello 24.0ms, pwd 29.3ms, ls-empty ~39ms (was 96),
date --version 30.4ms (was 140 — module-size tax eliminated), printf-64k 136ms
(stdout streaming path, separate finding). Injection failure fails the exec
loudly. wasm + v8-runtime suites green; bench:gate green (require_100_small
flaked 4.3x once and passed 1.13x on rerun — bimodal, pre-existing, tracked).

…s via event

The sidecar drains process output before emitting process_exited and the
frame stream + event pump are FIFO, so once the exit event is observed no
trailing output can follow it; the host-side quiet-turn drain (2 turns at
10ms) only remains for the snapshot-poll fallback exit path. Also stop
awaiting signal-state refreshes on the exec critical path (start + first
output) — kill paths await the in-flight refresh instead — and clean up
the parked refresh in finishProcess so fallback exits release it too.

Warm host-driven exec p50: node -e '' 30ms -> ~17ms, wasm true 52ms ->
39ms; whole wasm-command-floor lane down ~13ms/row; bench:gate green.
New regression test: 10 sequential fast-exit 64KiB stdout captures.
Bake the constant wasm runner + wasi shim (~300KB of JS previously
recompiled per exec) into the per-process V8 userland snapshot via
guest_runtime.snapshot_userland_code, keyed and cached process-wide.
Mode env AGENTOS_WASM_SNAPSHOT_RUNNER=auto|block|off: auto probes the
snapshot cache without blocking (async warm kicked once per process)
and falls back to the byte-identical inline runner until ready, so the
cold path is unchanged. The sync-RPC glue is re-evaluated per exec (the
snapshot-baked copy cannot bind session bridge fns). Module bytes are
now base64-encoded once per module in a bounded, fingerprint-validated
cache (64 entries, warn on evict, debug log reports per-entry and
cumulative cache bytes) instead of fs::read+encode per exec. Adds
AGENTOS_V8_SESSION_PHASES timers (snapshot_get / blob_clone /
isolate_new / user_code_execute) and a measured NOTE against
EagerCompile at snapshot build (moves cost to isolate deserialize).

Warm wasm floor p50: true 39->33ms, pwd 48->43ms, ls-empty 81->75ms,
date-version 127->115ms (52/62/96/140 before PR #212); cold first-exec
improved; wasm suite green; bench:gate green. Remaining floor is
isolate-per-exec + module decode — pooling tracked as Stage C.
@NathanFlurry

NathanFlurry commented Jul 2, 2026

Copy link
Copy Markdown
Member Author

Stack for rivet-dev/secure-exec

Get stack: forklift get 216
Push local edits: forklift submit
Merge when ready: forklift merge 216

The filtered install (--filter @secure-exec/benchmarks...) omitted
packages/build-tools, so every bench-gate run died in the v8-bridge build
script (missing Node dependencies) before gating anything — the PR gate has
never actually run green on CI.
Pre-created V8 session workers (thread + snapshot-restored isolate, keyed by
snapshot digest + heap limit) are claimed at session create via a warm hint,
taking isolate_new (~4.2ms) and blob_clone off the per-exec critical path.
Workers are never reused across guests (each exec still gets a virgin
isolate); a wrong hint falls back to the existing lazy-create path. Capacity
per key via AGENTOS_V8_WARM_ISOLATES (default 2, 0 disables); refill after
claim; pool transitions logged; warm_worker_hit/miss counters in the session
phases output.

Seeding is deliberately conservative: auto mode stays snapshot-only and only
AGENTOS_WASM_SNAPSHOT_RUNNER=block seeds workers (wasm-runner key + default
node key). Reason: creating isolates on background threads while other
isolates execute guest WebAssembly deterministically SIGSEGVs the pinned V8
130 (WasmCodePointerTable::AllocateUninitializedEntry inside Isolate::New,
even under the isolate lifecycle lock — the race is against V8-internal wasm
threads). Backtraces + notes in the rusty-v8-upgrade todo; enabling default
seeding rides that upgrade.

Warm floors (block, this host): node -e '' 17 -> 14.4ms p50, wasm true
33 -> ~30ms (warm_worker_hit 64/67, isolate_new down to 3 background calls);
pwd 42.6 -> 37.5ms, printf-0b 44.7 -> 40.2ms, date 115 -> 111ms. wasm suite
8.9s (was 96s, pool+snapshot amortize debug isolates); v8-runtime tests and
bench:gate green.
Module bytes now ride the in-process runtime protocol as Arc<Vec<u8>> from a
raw-bytes cache (same bounded/fingerprinted semantics as the base64 cache it
replaces) and are injected pre-exec as a __agentOSWasmModuleBytes Uint8Array;
the runner prefers it over the AGENTOS_WASM_MODULE_BASE64 env fallback, so the
per-exec base64 encode/decode (4.3ms @267KB, ~23ms @2.5MB) is gone in both
snapshot and inline modes. The userland (wasm-runner) snapshot script is now
compiled with EagerCompile: the fatter-blob isolate cost it causes is prepaid
by parked warm workers, so the runner function no longer lazy-compiles inside
user_code_execute.

Warm floors (block, this host): wasm true 30 -> 19.7ms p50 (52ms at the start
of the 3.2 campaign), echo hello 24.0ms, pwd 29.3ms, ls-empty ~39ms (was 96),
date --version 30.4ms (was 140 — module-size tax eliminated), printf-64k 136ms
(stdout streaming path, separate finding). Injection failure fails the exec
loudly. wasm + v8-runtime suites green; bench:gate green (require_100_small
flaked 4.3x once and passed 1.13x on rerun — bimodal, pre-existing, tracked).
@NathanFlurry NathanFlurry force-pushed the stack/perf-v8-runtime-parked-warm-session-workers-zssyplrp branch from e831166 to 7e0fa8e Compare July 2, 2026 20:05
@NathanFlurry NathanFlurry force-pushed the stack/perf-execution-raw-wasm-module-bytes-eager-userland-compile-xusommqr branch from e6691d0 to 7be089c Compare July 2, 2026 20:05
@NathanFlurry NathanFlurry changed the base branch from stack/perf-v8-runtime-parked-warm-session-workers-zssyplrp to main July 2, 2026 23:42
@NathanFlurry NathanFlurry merged commit 7be089c into main Jul 2, 2026
3 checks passed
@NathanFlurry NathanFlurry deleted the stack/perf-execution-raw-wasm-module-bytes-eager-userland-compile-xusommqr branch July 2, 2026 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant