feat(vcr-ra): spill on register exhaustion — remove the optimized-path hard-fail (#242, VCR-RA-001)#580
Merged
Merged
Conversation
…h hard-fail (#242, VCR-RA-001) The optimized path (ir_to_arm) declined every function whose R4-R8 scratch pool exhausted (#496 — the honest fix that replaced the R12-borrow miscompile). Behind SYNTH_SPILL_ON_EXHAUST=1 (default off), a pre-step now spills at ALLOCATION time instead: when the pool is full, the live vreg with the FARTHEST next use (Belady, linear next-use table over the IR) is STRed to a fresh frame slot and its register reused; spilled sources are reloaded into pool registers at their next use (never the flag-off R12 placeholder, which cannot carry two operands and collides with the encoder's IP scratch). Scope (v1, honest): straight-line i32-only functions — no i64 pairs (alloc_i64_pair exhaustion keeps declining), no control flow (a back-edge would invalidate the linear next-use table), no calls, no non-param locals. One unsupported opcode keeps the whole function on the #496 decline. Also fixed under the flag (latent, unreachable flag-off because any spill implied a decline): the appended trailing return missed the ADD SP epilogue, and Const-handler exhaustion could hand out reserved R9/R10/R11 or clobber R3 — the pre-step now pre-frees a pool register for Const dests too. Red→green: scripts/repro/spill_on_exhaust_242.wat (10 param-derived live values) declines flag-off (rung=spill) and compiles on the optimized path flag-on, matching wasmtime on all vectors under unicorn (spill_on_exhaust_242_differential.py). Flag-off proof: 68/68 corpus fixtures byte-identical (full-ELF sha256) vs a main-built binary, both self-contained and --relocatable; frozen bytes 3/3; const-CSE golden green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
avrabe
added a commit
that referenced
this pull request
Jul 2, 2026
…r-stored slot (#581) (#582) The direct selector's #253 add/sub (and bitwise/addr) immediate folds delete the const materialization BY source_line. Under the spill-on-exhaustion retry rung (VCR-RA-001 3b-lite), alloc_temp_or_spill for the const's temp first emits the victim's spill store `STR rX,[sp,#slot]` tagged with the SAME source_line — the fold deleted the store along with the MOVW, leaving the victim's vstack entry marked Spilled with no store. Its later reload read a never-written frame slot: silent wrong value on the shipped --relocatable path (spill_on_exhaust_242.wat hp(1,2) = 0xffffd1ce vs wasmtime 0x2e37). Fix: `is_const_materialization` (MOVW/MOVT/MVN) filters every by-source_line drop — drop_prev_const_materialization, splice_out_addr_const_materialization, and the #209 reciprocal-mult dead-divisor retain — so spill stores survive any fold. Defensive `assert_spill_reloads_have_stores` at the end of select_with_stack (internal-bug panic pattern): a reload from the reserved spill area without a preceding store to that slot can no longer leave the selector silently. Gates: new scripts/repro/spill_rung_581_differential.py (minimal fold-shape fixture + the original #580 discovery fixture, unicorn vs wasmtime, direct path) red on main → 12/12 green; unit fold_preserves_spill_store_581; frozen 3/3 bit-identical; r12_spill_496 + AAPCS oracles green; cargo test -p synth-synthesis -p synth-cli green; clippy -D warnings clean. Closes #581 Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
avrabe
added a commit
that referenced
this pull request
Jul 2, 2026
… Belady spilling default-on (#585) Caps the four-lane arc: slot liveness (#579), exhaustion spill (#580), spill-rung fix (#582), SYNTH_SPILL_REALLOC flip + refreeze (#583). VCR-RA-001 -> verified; rivet release status v0.24.0: cuttable. Pin sweep + lock + CHANGELOG. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The hard-fail history
The optimized path (
ir_to_arm) hard-failed on register exhaustion: #496 madealloc_i32_scratch/alloc_i64_pairflag pool exhaustion and decline the whole function to the direct selector — the honest fix that replaced the silent R12-borrow miscompile (R12/IP is the encoder's indexed-load scratch, #212). Honest, but it meant any function exceeding the R4-R8 pool never got the optimized path's benefits — the literal VCR-RA-001 "remove the register-exhaustion hard-fail" claim stayed open.Why exhaustion hard-failed despite existing spill support: the spill machinery in
ir_to_armwas dead flag-off — theConsthandler could evict-and-spill, but any pressure that triggers eviction also exhausts the (smaller) scratch pool, which set the decline flag and discarded the body. The reload half was also unsound for real pressure (both operands of one op reload into R12; most handlers never reload at all) — all masked by the decline.Allocation-time Belady spill (flag:
SYNTH_SPILL_ON_EXHAUST, default OFF)A pre-step before each IR instruction (only when the flag is on AND every opcode in the function is modeled):
LDR rd, [SP,#slot]), put it back invreg_to_arm, soget_arm_regnever returns the R12 placeholder;STR victim, [SP,#slot]. Victims always get a store (the function's final result is read by the epilogue, invisible to the linear table); the epilogue reloads a spilled final result straight into R0.If no victim is evictable (everything pinned), the pre-step falls through and the #496 decline fires exactly as flag-off — the lever degrades to the status quo, never to a miscompile.
Also fixed under the flag (latent, unreachable flag-off because any spill implied a decline): the appended trailing return missed the
ADD SPframe epilogue, and Const-handler exhaustion could hand out reserved R9/R10/R11 (globals base / memsize / base-CSE) or clobber R3 — the pre-step pre-frees a pool register forConstdests too, so its allocator never reaches those fallbacks.Flag choice
A separate flag, not
SYNTH_SPILL_REALLOC: that lever is a post-hoc rewrite of already-allocated code (byte-level, same function set), while this one changes which functions take the optimized path — entangling them would make the realloc flip judge a population change. The flip decision later covers both.i64 scope (honest)
i64 pair exhaustion keeps declining even flag-on. i64 values are two separate vregs with a pair invariant the v1 spill model does not carry (#518 class);
spill_on_exhaust_supportedalso excludes control flow (a back-edge invalidates the linear next-use table), calls, and non-param locals. One unsupported opcode keeps the whole function on the #496 decline.Red→green
scripts/repro/spill_on_exhaust_242.wat— 10 param-derived i32 values simultaneously live (const-folding can't collapse them), non-commutative fold:SYNTH_RECOVERY_STATS=1showsrung=spill(direct selector's ladder produced the code). Pinned byspill_on_exhaust_red_flag_off_declines_to_direct_spill_rung.spill_on_exhaust_242_differential.pymatches wasmtime on all 8 vectors under unicorn (incl. vectors where params force asymmetric sub/xor results).Unit tests: #496 decline pinned; balance invariant (every
LDR [SP,#x]preceded by aSTR [SP,#x]); no R12 ships flag-on; Belady first victim is the farthest-next-use register (R8), not LRU's oldest (R4); below the pressure edge the lever is a byte-level no-op; scope classifier pinned.Flag-off proof (STOP condition)
bad0901) binary — both self-contained and--relocatable.frozen_codegen_bytes3/3, const-CSE golden,recovery_stats_242,promotion_exhaustion_fallback_474all green.cargo test -p synth-synthesis -p synth-clifully green;cargo fmt --checkandcargo clippy -p synth-synthesis -p synth-cli --all-targets -- -D warningsclean.Flag-on validation
flight_seam_differential.py: seam 0x07FDF307 MATCH.const_cse_differential.py: PASS (with the flag on top).r12_spill_496_differential.py(control_step + flight_seam_flat, self-contained): PASS both flag states.high_pressure_i32,filter_axis(.wat+.wasm),signed_div_const,uxth_fold,const_cse_direct,const_cse— all PASS. Residual:gust_kernel(1 of 5 fns moved) is compile-validated only here — itsgust_pollneeds a globals-base (R9) harness that fails identically flag-off, so no regression evidence; it's gale's silicon fixture family and the flip gate covers it.Size (honest)
The newly-optimized fixture is larger on the optimized path today:
hp= 216 B flag-on vs 98 B on the direct path (flag-off). The greedy linear allocator never frees dead vregs' registers, so at the pressure edge nearly every op evicts (STR/LDR churn). The win of this PR is removing the hard-fail class and keeping such functions on the optimized path where the downstream levers (range-realloc, const-CSE, spill-realloc forwarding/DCE #569/#576/#579) apply — dead-vreg pool release and slot reuse are the follow-on size levers before any flip.Found while testing (pre-existing, NOT touched here)
The direct selector's spill rung miscompiles this fixture's shape:
mul.w r2,r0,r1clobbers a live spill candidate and a laterldr r0,[sp,#8]reads a slot never stored (disasm of the flag-off build, both--relocatableand self-contained). Flag-off vectors likehp(1,2)return 0xffffd1ce vs wasmtime's 0x2e37. That'sinstruction_selector.rs(out of scope per this PR's charter) — will file separately with the fixture as repro.🤖 Generated with Claude Code