Skip to content

feat(vcr-ra): liveness-based spill re-choice spike, flag-off (#242, VCR-RA-001)#569

Merged
avrabe merged 1 commit into
mainfrom
feat/242-vcr-ra-liveness-spilling
Jul 2, 2026
Merged

feat(vcr-ra): liveness-based spill re-choice spike, flag-off (#242, VCR-RA-001)#569
avrabe merged 1 commit into
mainfrom
feat/242-vcr-ra-liveness-spilling

Conversation

@avrabe

@avrabe avrabe commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Motivation (const-CSE PR2 finding, #562)

flat_flight's hot segment runs peak register pressure 11 > the R0–R8 pool of 9, so every pressure-guarded optimization (const-CSE PR2, the extending-alias hoist) correctly declines there, and the greedy lowering's spill placement is naive — gale measured 17 spills + 61% redundant const materializations on silicon (G474RE, #209). The only lever that wins on a genuinely saturated segment is smarter spill placement: evict the value with the farthest next use (Belady). This PR is the bounded, flag-off first step of VCR-RA-001, shipped as a post-hoc rewrite pass like apply_const_cse — not a new allocator.

What shipped (scope level 2, stated honestly)

Full greedy-spill-choice replacement did not fit the no-grow gate in one PR (the swap rewrite fundamentally adds a save mov + a counter-reload, +2 instructions). Per the scope contract, this ships the honest smaller increment:

  1. REPORT-ONLY Belady analysisliveness::spill_choice_report(instrs, k) (wired behind SYNTH_SPILL_REPORT=1, measure-only like SYNTH_SHADOW_ALLOC). Per straight-line segment it dissolves the emitted frame traffic back into an abstract value trace (str/ldr [sp,#N] bind slot↔value, so reload consumers are uses of the original value; unknown-slot reloads stay charged to Belady too) and replays it with farthest-next-use eviction over a k-register pool. The greedy−Belady delta is the measured recovery headroom for the full spill-choice rewrite.
  2. The simplest strictly-profitable rewriteliveness::apply_spill_realloc behind SYNTH_SPILL_REALLOC=1: slot-value forwarding between reloads. This is exactly the case default-on forward_stack_reloads misses: it forwards only from the spill store's SOURCE register, so when pressure clobbers that source (the genuine-spill case), its reloads survive — but reload Analyze codebase and plan next steps #2..#n provably still have the value register-resident in reload feat(backend): Add register allocation, code generation, and CFG optimizations #1's target (tracked through reg-reg copies, killed on any redefinition, slot overwrite, unpinnable [sp] access, Push/Pop, SP def). Each such ldr becomes a 1-cycle mov (1-for-1) or is deleted when the target already holds the value. Per-segment commit gates: (a) semantics identical by construction, (b) instruction count never grows (asserted), (c) post-transform peak value pressure ≤ pool or ≤ pre-transform peak — never turns a fitting segment into a spilling one, never worsens a saturated one.

Measured (debug build, 2026-07-02, optimized path)

fixture function flag-off flag-on reloads
flight_seam.wat flight_algo 306 B 300 B 6 ld → 3 ld (3 forwarded)
flight_seam.wat controller_step / filter_step 250/180 B 250/180 B (no growth) 0 forwarded
flat_flight.loom.wasm flat_flight 412 B 412 B (honestly unchanged) 0 forwarded

[spill-report] on flat_flight's hot segment: len=106 peak=11 actual=3ld+3st belady(k=9)=0ld+0stall of its surviving frame traffic is recoverable by a value-based allocation, but none of it by forwarding (the holders are all clobbered — the greedy allocator reuses them). Recovering those needs the actual spill RE-CHOICE step; that is the next VCR-RA-001 increment, and this report is its now-CI-locked baseline (spill_realloc_242.rs claim 4).

Gates (all foreground, exit-code-checked)

  • cargo build -p synth-cli
  • Flag-OFF byte-identical: frozen_codegen_bytes 3/3 ✅; const_cse_reduction_242 golden (incl. the pinned flag-off .text FNV) unchanged ✅ — the pass is opt-in env-gated, off ⇒ zero byte change.
  • Flag-ON differentials (SYNTH_SPILL_REALLOC=1 exported): scripts/repro/const_cse_differential.py PASS ✅; scripts/repro/frame_slot_dce_differential.py PASS ✅ (flight_algo result anchor 0x07FDF307 preserved, results == wasmtime). Note: flight_seam_differential.py is broken on main independently of this PR (it looks up func_0/func_1, which Emit DWARF in the ARM/RISC-V ELF — synth drops debug info; bare-metal output is not source-debuggable #394's real-name DWARF change renamed — pre-existing, verified failing flag-off too).
  • cargo test -p synth-synthesis 488+ ✅ (10 new unit tests: forwarding, deletion, holder-clobber/push-pop/slot-overwrite blocking, mov propagation, non-vacuous pressure-gate decline, Belady mechanics at k=2 and k=9, unknown-slot honesty)
  • cargo test -p synth-cli all 17 binaries ✅ (new spill_realloc_242.rs: no-grow corpus gate + non-vacuous firing floor + flat_flight equality + headroom report oracle)
  • cargo fmt --check ✅, cargo clippy -p synth-synthesis -p synth-cli --all-targets -- -D warnings

Refs #242 (VCR-RA-001).

🤖 Generated with Claude Code

…CR-RA-001)

flat_flight's hot segment runs peak register pressure 11 > the R0-R8 pool
of 9, so every pressure-guarded optimization declines there and the greedy
lowering's spill placement is naive (gale: 17 spills + 61% redundant const
materializations on silicon). This ships the bounded spike toward
Belady/farthest-first spill choice, as a post-hoc pass like apply_const_cse:

- REPORT (measure-only, SYNTH_SPILL_REPORT=1): spill_choice_report — per
  straight-line segment, the frame-slot traffic actually emitted vs the
  reload/store count a farthest-next-use (Belady MIN) allocation over a
  k-register pool would need. flat_flight's peak-11 segment: actual
  3ld+3st vs belady(k=9) 0ld+0st — all of it is recovery headroom.

- REWRITE (simplest strictly-profitable case, SYNTH_SPILL_REALLOC=1):
  apply_spill_realloc — slot-value forwarding BETWEEN reloads. Exactly the
  case forward_stack_reloads misses: when pressure clobbers the spill
  store's SOURCE register, reload #2..#n can still forward from reload #1
  (or a reg-reg copy). ldr -> 1-cycle mov (1-for-1) or outright deletion
  when the target already holds the value. Per-segment commit gates:
  (a) semantics by construction, (b) never grows (asserted),
  (c) post-transform peak value pressure fits the pool or does not exceed
  the pre-transform peak.

Measured (debug, 2026-07-02): flight_seam::flight_algo 306->300 B, 3 of 6
surviving reloads forwarded (6ld -> 3ld); flat_flight honestly unchanged
(its 3 surviving reloads have no live holder — recovering them needs the
actual spill RE-CHOICE step, the next VCR-RA-001 increment). Flag-off is
byte-identical (frozen_codegen_bytes 3/3 + const_cse golden); flag-on
matches wasmtime on const_cse_differential.py and
frame_slot_dce_differential.py (flight_algo anchor 0x07FDF307 preserved).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@codecov

codecov Bot commented Jul 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.67832% with 19 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/synth-synthesis/src/liveness.rs 96.52% 19 Missing ⚠️

📢 Thoughts on this report? Let us know!

@avrabe avrabe merged commit fd18d56 into main Jul 2, 2026
24 checks passed
@avrabe avrabe deleted the feat/242-vcr-ra-liveness-spilling branch July 2, 2026 10:18
avrabe added a commit that referenced this pull request Jul 2, 2026
…ag-off) + gimli 0.34 (#571)

Cuts the accumulated increment: the last RV32 lever port (#568), the
VCR-RA-001 spill spike with the CI-locked flat_flight Belady target (#569),
and the gimli 0.34 bump (#535). VCR-RA-001 stays `implemented` (NOT
strengthened to verified — the spike verified its increment, not the full
allocator claim) and is re-scoped to v0.23.0. Pin sweep + lock + CHANGELOG.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
avrabe added a commit that referenced this pull request Jul 2, 2026
… recovered (#242, VCR-RA-001) (#576)

Stage 2 of the spill re-choice, succeeding the #569 spike behind the same
SYNTH_SPILL_REALLOC flag: where NO register still holds a spilled slot's
value at its reload (stage 1's honest decline — flat_flight's 3 surviving
pairs), the value was evicted only because the greedy lowering re-used the
holding register while a provably-dead register existed — exactly the
eviction the Belady (farthest-next-use) MIN plan avoids. The rewrite
renames each in-window kill-def of the holder (def + every use, via
rewrite_op) onto a register proven dead across that def's live range
(untouched in-range; first touch afterwards is a pure in-segment def), so
the value stays register-resident and the reload dissolves.

Per-segment commit gates:
 (a) same value flow — EXECUTABLE: the rewritten segment's symbolic value
     trace (segment_value_trace: slot<->value dissolution, exit register
     and slot state) must equal the original's;
 (b) strictly fewer instructions AND strictly smaller estimated bytes —
     a count-neutral mov-fold is discarded, so the function never grows;
 (c) post-transform pool (R0-R8) value pressure <= 9;
 (d) sub-word / register-offset [sp] accesses and unknown-slot reloads
     disqualify the segment (the #483-class frame-slot conservatism).

flat_flight (the CI-locked target): 412 -> 396 B, frame traffic
3ld+3st -> 0ld+2st — all three reloads dissolve (Belady's 0-load side
fully met) and pair #1's store goes dead; the two surviving stores are
blocked by the frame-slot reach-end conservatism (a slot live to function
end is not provably dead), not by the re-choice. Corpus sweep (68 repro
fixtures x optimized+relocatable): 40 function-instances shrink, zero
grow, zero flag-on compile failures. Flag-off byte-identical
(frozen_codegen_bytes 3/3, const_cse golden). Flag-on differentials green:
const_cse, frame_slot_dce, flight_seam inlined+flat (anchor 0x07FDF307),
high_pressure_i32, and a 4-input unicorn-vs-wasmtime run of the rewritten
flat_flight itself.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant