feat(vcr-ra): liveness-based spill re-choice spike, flag-off (#242, VCR-RA-001)#569
Merged
Merged
Conversation
…CR-RA-001) flat_flight's hot segment runs peak register pressure 11 > the R0-R8 pool of 9, so every pressure-guarded optimization declines there and the greedy lowering's spill placement is naive (gale: 17 spills + 61% redundant const materializations on silicon). This ships the bounded spike toward Belady/farthest-first spill choice, as a post-hoc pass like apply_const_cse: - REPORT (measure-only, SYNTH_SPILL_REPORT=1): spill_choice_report — per straight-line segment, the frame-slot traffic actually emitted vs the reload/store count a farthest-next-use (Belady MIN) allocation over a k-register pool would need. flat_flight's peak-11 segment: actual 3ld+3st vs belady(k=9) 0ld+0st — all of it is recovery headroom. - REWRITE (simplest strictly-profitable case, SYNTH_SPILL_REALLOC=1): apply_spill_realloc — slot-value forwarding BETWEEN reloads. Exactly the case forward_stack_reloads misses: when pressure clobbers the spill store's SOURCE register, reload #2..#n can still forward from reload #1 (or a reg-reg copy). ldr -> 1-cycle mov (1-for-1) or outright deletion when the target already holds the value. Per-segment commit gates: (a) semantics by construction, (b) never grows (asserted), (c) post-transform peak value pressure fits the pool or does not exceed the pre-transform peak. Measured (debug, 2026-07-02): flight_seam::flight_algo 306->300 B, 3 of 6 surviving reloads forwarded (6ld -> 3ld); flat_flight honestly unchanged (its 3 surviving reloads have no live holder — recovering them needs the actual spill RE-CHOICE step, the next VCR-RA-001 increment). Flag-off is byte-identical (frozen_codegen_bytes 3/3 + const_cse golden); flag-on matches wasmtime on const_cse_differential.py and frame_slot_dce_differential.py (flight_algo anchor 0x07FDF307 preserved). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
avrabe
added a commit
that referenced
this pull request
Jul 2, 2026
…ag-off) + gimli 0.34 (#571) Cuts the accumulated increment: the last RV32 lever port (#568), the VCR-RA-001 spill spike with the CI-locked flat_flight Belady target (#569), and the gimli 0.34 bump (#535). VCR-RA-001 stays `implemented` (NOT strengthened to verified — the spike verified its increment, not the full allocator claim) and is re-scoped to v0.23.0. Pin sweep + lock + CHANGELOG. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
avrabe
added a commit
that referenced
this pull request
Jul 2, 2026
… recovered (#242, VCR-RA-001) (#576) Stage 2 of the spill re-choice, succeeding the #569 spike behind the same SYNTH_SPILL_REALLOC flag: where NO register still holds a spilled slot's value at its reload (stage 1's honest decline — flat_flight's 3 surviving pairs), the value was evicted only because the greedy lowering re-used the holding register while a provably-dead register existed — exactly the eviction the Belady (farthest-next-use) MIN plan avoids. The rewrite renames each in-window kill-def of the holder (def + every use, via rewrite_op) onto a register proven dead across that def's live range (untouched in-range; first touch afterwards is a pure in-segment def), so the value stays register-resident and the reload dissolves. Per-segment commit gates: (a) same value flow — EXECUTABLE: the rewritten segment's symbolic value trace (segment_value_trace: slot<->value dissolution, exit register and slot state) must equal the original's; (b) strictly fewer instructions AND strictly smaller estimated bytes — a count-neutral mov-fold is discarded, so the function never grows; (c) post-transform pool (R0-R8) value pressure <= 9; (d) sub-word / register-offset [sp] accesses and unknown-slot reloads disqualify the segment (the #483-class frame-slot conservatism). flat_flight (the CI-locked target): 412 -> 396 B, frame traffic 3ld+3st -> 0ld+2st — all three reloads dissolve (Belady's 0-load side fully met) and pair #1's store goes dead; the two surviving stores are blocked by the frame-slot reach-end conservatism (a slot live to function end is not provably dead), not by the re-choice. Corpus sweep (68 repro fixtures x optimized+relocatable): 40 function-instances shrink, zero grow, zero flag-on compile failures. Flag-off byte-identical (frozen_codegen_bytes 3/3, const_cse golden). Flag-on differentials green: const_cse, frame_slot_dce, flight_seam inlined+flat (anchor 0x07FDF307), high_pressure_i32, and a 4-input unicorn-vs-wasmtime run of the rewritten flat_flight itself. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation (const-CSE PR2 finding, #562)
flat_flight's hot segment runs peak register pressure 11 > the R0–R8 pool of 9, so every pressure-guarded optimization (const-CSE PR2, the extending-alias hoist) correctly declines there, and the greedy lowering's spill placement is naive — gale measured 17 spills + 61% redundant const materializations on silicon (G474RE, #209). The only lever that wins on a genuinely saturated segment is smarter spill placement: evict the value with the farthest next use (Belady). This PR is the bounded, flag-off first step of VCR-RA-001, shipped as a post-hoc rewrite pass like
apply_const_cse— not a new allocator.What shipped (scope level 2, stated honestly)
Full greedy-spill-choice replacement did not fit the no-grow gate in one PR (the swap rewrite fundamentally adds a save
mov+ a counter-reload, +2 instructions). Per the scope contract, this ships the honest smaller increment:liveness::spill_choice_report(instrs, k)(wired behindSYNTH_SPILL_REPORT=1, measure-only likeSYNTH_SHADOW_ALLOC). Per straight-line segment it dissolves the emitted frame traffic back into an abstract value trace (str/ldr [sp,#N]bind slot↔value, so reload consumers are uses of the original value; unknown-slot reloads stay charged to Belady too) and replays it with farthest-next-use eviction over a k-register pool. The greedy−Belady delta is the measured recovery headroom for the full spill-choice rewrite.liveness::apply_spill_reallocbehindSYNTH_SPILL_REALLOC=1: slot-value forwarding between reloads. This is exactly the case default-onforward_stack_reloadsmisses: it forwards only from the spill store's SOURCE register, so when pressure clobbers that source (the genuine-spill case), its reloads survive — but reload Analyze codebase and plan next steps #2..#n provably still have the value register-resident in reload feat(backend): Add register allocation, code generation, and CFG optimizations #1's target (tracked through reg-reg copies, killed on any redefinition, slot overwrite, unpinnable[sp]access,Push/Pop, SP def). Each suchldrbecomes a 1-cyclemov(1-for-1) or is deleted when the target already holds the value. Per-segment commit gates: (a) semantics identical by construction, (b) instruction count never grows (asserted), (c) post-transform peak value pressure ≤ pool or ≤ pre-transform peak — never turns a fitting segment into a spilling one, never worsens a saturated one.Measured (debug build, 2026-07-02, optimized path)
flight_algocontroller_step/filter_stepflat_flight[spill-report]on flat_flight's hot segment:len=106 peak=11 actual=3ld+3st belady(k=9)=0ld+0st— all of its surviving frame traffic is recoverable by a value-based allocation, but none of it by forwarding (the holders are all clobbered — the greedy allocator reuses them). Recovering those needs the actual spill RE-CHOICE step; that is the next VCR-RA-001 increment, and this report is its now-CI-locked baseline (spill_realloc_242.rsclaim 4).Gates (all foreground, exit-code-checked)
cargo build -p synth-cli✅frozen_codegen_bytes3/3 ✅;const_cse_reduction_242golden (incl. the pinned flag-off.textFNV) unchanged ✅ — the pass is opt-in env-gated, off ⇒ zero byte change.SYNTH_SPILL_REALLOC=1exported):scripts/repro/const_cse_differential.pyPASS ✅;scripts/repro/frame_slot_dce_differential.pyPASS ✅ (flight_algo result anchor 0x07FDF307 preserved, results == wasmtime). Note:flight_seam_differential.pyis broken on main independently of this PR (it looks upfunc_0/func_1, which Emit DWARF in the ARM/RISC-V ELF — synth drops debug info; bare-metal output is not source-debuggable #394's real-name DWARF change renamed — pre-existing, verified failing flag-off too).cargo test -p synth-synthesis488+ ✅ (10 new unit tests: forwarding, deletion, holder-clobber/push-pop/slot-overwrite blocking, mov propagation, non-vacuous pressure-gate decline, Belady mechanics at k=2 and k=9, unknown-slot honesty)cargo test -p synth-cliall 17 binaries ✅ (newspill_realloc_242.rs: no-grow corpus gate + non-vacuous firing floor + flat_flight equality + headroom report oracle)cargo fmt --check✅,cargo clippy -p synth-synthesis -p synth-cli --all-targets -- -D warnings✅Refs #242 (VCR-RA-001).
🤖 Generated with Claude Code