Skip to content

feat(vcr-ra): spill on register exhaustion — remove the optimized-path hard-fail (#242, VCR-RA-001)#580

Merged
avrabe merged 1 commit into
mainfrom
feat/242-exhaustion-spill
Jul 2, 2026
Merged

feat(vcr-ra): spill on register exhaustion — remove the optimized-path hard-fail (#242, VCR-RA-001)#580
avrabe merged 1 commit into
mainfrom
feat/242-exhaustion-spill

Conversation

@avrabe

@avrabe avrabe commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

The hard-fail history

The optimized path (ir_to_arm) hard-failed on register exhaustion: #496 made alloc_i32_scratch / alloc_i64_pair flag pool exhaustion and decline the whole function to the direct selector — the honest fix that replaced the silent R12-borrow miscompile (R12/IP is the encoder's indexed-load scratch, #212). Honest, but it meant any function exceeding the R4-R8 pool never got the optimized path's benefits — the literal VCR-RA-001 "remove the register-exhaustion hard-fail" claim stayed open.

Why exhaustion hard-failed despite existing spill support: the spill machinery in ir_to_arm was dead flag-off — the Const handler could evict-and-spill, but any pressure that triggers eviction also exhausts the (smaller) scratch pool, which set the decline flag and discarded the body. The reload half was also unsound for real pressure (both operands of one op reload into R12; most handlers never reload at all) — all masked by the decline.

Allocation-time Belady spill (flag: SYNTH_SPILL_ON_EXHAUST, default OFF)

A pre-step before each IR instruction (only when the flag is on AND every opcode in the function is modeled):

  • (a) reinstate spilled sources — reload each into a pool register (LDR rd, [SP,#slot]), put it back in vreg_to_arm, so get_arm_reg never returns the R12 placeholder;
  • (b) free a dest register — if the handler will allocate a fresh scratch and the pool is full, evict the live vreg with the farthest next use (Belady, linear next-use table over the IR; deterministic tie-breaks) via STR victim, [SP,#slot]. Victims always get a store (the function's final result is read by the epilogue, invisible to the linear table); the epilogue reloads a spilled final result straight into R0.

If no victim is evictable (everything pinned), the pre-step falls through and the #496 decline fires exactly as flag-off — the lever degrades to the status quo, never to a miscompile.

Also fixed under the flag (latent, unreachable flag-off because any spill implied a decline): the appended trailing return missed the ADD SP frame epilogue, and Const-handler exhaustion could hand out reserved R9/R10/R11 (globals base / memsize / base-CSE) or clobber R3 — the pre-step pre-frees a pool register for Const dests too, so its allocator never reaches those fallbacks.

Flag choice

A separate flag, not SYNTH_SPILL_REALLOC: that lever is a post-hoc rewrite of already-allocated code (byte-level, same function set), while this one changes which functions take the optimized path — entangling them would make the realloc flip judge a population change. The flip decision later covers both.

i64 scope (honest)

i64 pair exhaustion keeps declining even flag-on. i64 values are two separate vregs with a pair invariant the v1 spill model does not carry (#518 class); spill_on_exhaust_supported also excludes control flow (a back-edge invalidates the linear next-use table), calls, and non-param locals. One unsupported opcode keeps the whole function on the #496 decline.

Red→green

scripts/repro/spill_on_exhaust_242.wat — 10 param-derived i32 values simultaneously live (const-folding can't collapse them), non-commutative fold:

  • RED (flag-off): declines; SYNTH_RECOVERY_STATS=1 shows rung=spill (direct selector's ladder produced the code). Pinned by spill_on_exhaust_red_flag_off_declines_to_direct_spill_rung.
  • GREEN (flag-on): stays on the optimized path (no recovery rung fires), and spill_on_exhaust_242_differential.py matches wasmtime on all 8 vectors under unicorn (incl. vectors where params force asymmetric sub/xor results).

Unit tests: #496 decline pinned; balance invariant (every LDR [SP,#x] preceded by a STR [SP,#x]); no R12 ships flag-on; Belady first victim is the farthest-next-use register (R8), not LRU's oldest (R4); below the pressure edge the lever is a byte-level no-op; scope classifier pinned.

Flag-off proof (STOP condition)

  • 68/68 corpus fixtures byte-identical (full-ELF sha256) vs a main-built (bad0901) binary — both self-contained and --relocatable.
  • frozen_codegen_bytes 3/3, const-CSE golden, recovery_stats_242, promotion_exhaustion_fallback_474 all green.
  • cargo test -p synth-synthesis -p synth-cli fully green; cargo fmt --check and cargo clippy -p synth-synthesis -p synth-cli --all-targets -- -D warnings clean.

Flag-on validation

  • flight_seam_differential.py: seam 0x07FDF307 MATCH.
  • const_cse_differential.py: PASS (with the flag on top).
  • r12_spill_496_differential.py (control_step + flight_seam_flat, self-contained): PASS both flag states.
  • New fixture differential: PASS.
  • Corpus: 68/68 compile flag-on (zero failures). Ten corpus functions move off recovery rungs onto the optimized path; execution-checked vs wasmtime under unicorn flag-on: high_pressure_i32, filter_axis (.wat+.wasm), signed_div_const, uxth_fold, const_cse_direct, const_cse — all PASS. Residual: gust_kernel (1 of 5 fns moved) is compile-validated only here — its gust_poll needs a globals-base (R9) harness that fails identically flag-off, so no regression evidence; it's gale's silicon fixture family and the flip gate covers it.

Size (honest)

The newly-optimized fixture is larger on the optimized path today: hp = 216 B flag-on vs 98 B on the direct path (flag-off). The greedy linear allocator never frees dead vregs' registers, so at the pressure edge nearly every op evicts (STR/LDR churn). The win of this PR is removing the hard-fail class and keeping such functions on the optimized path where the downstream levers (range-realloc, const-CSE, spill-realloc forwarding/DCE #569/#576/#579) apply — dead-vreg pool release and slot reuse are the follow-on size levers before any flip.

Found while testing (pre-existing, NOT touched here)

The direct selector's spill rung miscompiles this fixture's shape: mul.w r2,r0,r1 clobbers a live spill candidate and a later ldr r0,[sp,#8] reads a slot never stored (disasm of the flag-off build, both --relocatable and self-contained). Flag-off vectors like hp(1,2) return 0xffffd1ce vs wasmtime's 0x2e37. That's instruction_selector.rs (out of scope per this PR's charter) — will file separately with the fixture as repro.

🤖 Generated with Claude Code

…h hard-fail (#242, VCR-RA-001)

The optimized path (ir_to_arm) declined every function whose R4-R8 scratch
pool exhausted (#496 — the honest fix that replaced the R12-borrow
miscompile). Behind SYNTH_SPILL_ON_EXHAUST=1 (default off), a pre-step now
spills at ALLOCATION time instead: when the pool is full, the live vreg with
the FARTHEST next use (Belady, linear next-use table over the IR) is STRed to
a fresh frame slot and its register reused; spilled sources are reloaded into
pool registers at their next use (never the flag-off R12 placeholder, which
cannot carry two operands and collides with the encoder's IP scratch).

Scope (v1, honest): straight-line i32-only functions — no i64 pairs
(alloc_i64_pair exhaustion keeps declining), no control flow (a back-edge
would invalidate the linear next-use table), no calls, no non-param locals.
One unsupported opcode keeps the whole function on the #496 decline.

Also fixed under the flag (latent, unreachable flag-off because any spill
implied a decline): the appended trailing return missed the ADD SP epilogue,
and Const-handler exhaustion could hand out reserved R9/R10/R11 or clobber
R3 — the pre-step now pre-frees a pool register for Const dests too.

Red→green: scripts/repro/spill_on_exhaust_242.wat (10 param-derived live
values) declines flag-off (rung=spill) and compiles on the optimized path
flag-on, matching wasmtime on all vectors under unicorn
(spill_on_exhaust_242_differential.py).

Flag-off proof: 68/68 corpus fixtures byte-identical (full-ELF sha256) vs a
main-built binary, both self-contained and --relocatable; frozen bytes 3/3;
const-CSE golden green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@codecov

codecov Bot commented Jul 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 82.48473% with 86 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/synth-synthesis/src/optimizer_bridge.rs 82.48% 86 Missing ⚠️

📢 Thoughts on this report? Let us know!

@avrabe avrabe merged commit 9474111 into main Jul 2, 2026
24 of 25 checks passed
@avrabe avrabe deleted the feat/242-exhaustion-spill branch July 2, 2026 18:40
avrabe added a commit that referenced this pull request Jul 2, 2026
…r-stored slot (#581) (#582)

The direct selector's #253 add/sub (and bitwise/addr) immediate folds delete
the const materialization BY source_line. Under the spill-on-exhaustion retry
rung (VCR-RA-001 3b-lite), alloc_temp_or_spill for the const's temp first
emits the victim's spill store `STR rX,[sp,#slot]` tagged with the SAME
source_line — the fold deleted the store along with the MOVW, leaving the
victim's vstack entry marked Spilled with no store. Its later reload read a
never-written frame slot: silent wrong value on the shipped --relocatable
path (spill_on_exhaust_242.wat hp(1,2) = 0xffffd1ce vs wasmtime 0x2e37).

Fix: `is_const_materialization` (MOVW/MOVT/MVN) filters every by-source_line
drop — drop_prev_const_materialization, splice_out_addr_const_materialization,
and the #209 reciprocal-mult dead-divisor retain — so spill stores survive any
fold. Defensive `assert_spill_reloads_have_stores` at the end of
select_with_stack (internal-bug panic pattern): a reload from the reserved
spill area without a preceding store to that slot can no longer leave the
selector silently.

Gates: new scripts/repro/spill_rung_581_differential.py (minimal fold-shape
fixture + the original #580 discovery fixture, unicorn vs wasmtime, direct
path) red on main → 12/12 green; unit fold_preserves_spill_store_581;
frozen 3/3 bit-identical; r12_spill_496 + AAPCS oracles green;
cargo test -p synth-synthesis -p synth-cli green; clippy -D warnings clean.

Closes #581

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
avrabe added a commit that referenced this pull request Jul 2, 2026
… Belady spilling default-on (#585)

Caps the four-lane arc: slot liveness (#579), exhaustion spill (#580),
spill-rung fix (#582), SYNTH_SPILL_REALLOC flip + refreeze (#583).
VCR-RA-001 -> verified; rivet release status v0.24.0: cuttable. Pin sweep +
lock + CHANGELOG.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant