feat(vcr-ra): const-CSE PR2 — 32-bit movw+movt + pressure-guarded extending-alias hoist (#242)#562
Merged
Merged
Conversation
…ending-alias hoist (#242) const-CSE PR1 made `apply_const_cse` a post-hoc, size-guarded pass but recovered almost none of gale's measured redundant-const win: its extractor saw only 16-bit `movw`/`mov #imm` (not 32-bit `movw+movt`) and required `ra != rd`, so the greedy selector's SAME-register re-materialization (const clobbered between uses, no register holding it) was invisible. PR2 adds two pieces, flag-off behind `SYNTH_CONST_CSE`: 1. `const_units` reconstructs 32-bit `movw+movt` pairs, so large constants are visible to CSE. 2. A same-register **extending-alias hoist**: for a value re-materialized into one register >=2x in a straight-line segment, pin it in a register that is provably FREE across the reuse window (`free_reg_over`), delete the repeats, and retarget the reads. Because it adds one live register, every touched segment is gated on post-transform peak pressure <= ALLOCATABLE_POOL (9) in addition to the #242 no-grow size guard — it can never turn a fitting segment into a spilling one. `apply_const_cse` now runs two chained, individually-guarded passes: the PR1 cross-register fold, then the PR2 hoist ON PASS 1's OUTPUT (so the hoist observes the register uses Pass 1 aliased — the fix for a direct-path miscompile where moving a materialization's destination stranded a Pass-1 alias). Gates: - Flag-OFF byte-identical: `frozen_codegen_bytes` 3/3, const-CSE golden hash unchanged (shared `const_materialization`/`redundant_const_defs` untouched; all new code reachable only under the flag). - Flag-ON win (measured): flight_seam::flight_algo 306->302 B; const_cse::spill12 236->148 B (the 32-bit movw+movt hoist); no function grows across the corpus. flat_flight stays 412 B — its hot segment peaks at 11 > pool 9 (already spilling), so the pressure guard correctly declines; recovering it needs the separate liveness-based spilling lever, not const-CSE. - Correctness: `const_cse_differential.py` green (optimized + direct paths). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
const-CSE PR2 — win recovery (VCR-RA, #242)
PR1 made
apply_const_csea post-hoc, per-segment size-guarded pass, but it recovered almost none of the redundant-const win gale measured (61% of flat_flight's materializations). Its value extractor saw only 16-bitmovw/mov #imm(not 32-bitmovw+movt) and requiredra != rd, so the greedy selector's dominant pattern — the same register re-materialized at each reuse, clobbered in between, with no register holding the value — was invisible.What PR2 adds (all flag-off behind
SYNTH_CONST_CSE)movw+movtreconstruction (const_units): an adjacentmovw rd,#lo ; movt rd,#hibecomes one 32-bit unit, so large constants are visible to CSE.free_reg_over), delete the repeats, and retarget the reads. Because this introduces one extra live register, every touched segment is gated on post-transform peak pressure ≤ALLOCATABLE_POOL(9) in addition to the Epic: verified-codegen infrastructure (VCR-*) — replace the patch-accreting selector + allocator #242 no-grow size guard — so it can never turn a fitting segment into a spilling one. Post-hoc removal+retarget, not inline two-vreg aliasing, so it does not reintroduce the alias-eviction hazard; the only risk is pressure, which the guard measures directly.apply_const_csenow runs two chained, individually-guarded passes: the PR1 cross-register fold, then the PR2 hoist on Pass 1's output. Running Pass 2 on the post-fold stream is load-bearing — it lets the hoist observe (and correctly retarget) a register use that Pass 1 aliased onto the register whose materialization Pass 2 then moves. This fixed a direct-path miscompile caught by the differential.Flag-OFF byte-identical (the STOP condition — verified)
cargo test -p synth-cli --test frozen_codegen_bytes→ 3/3.const_cse_off_matches_frozen_baseline_242(FNV-1a0xa68a…, 576 B) unchanged.const_materialization/redundant_const_defsleft untouched; all new code is reachable only under the flag.Flag-ON measured win + no-function-grows
flight_seam::flight_algoconst_cse::spill12movw+movthoist)flat_flight::flat_flightNo function grows anywhere in the corpus.
flat_flightdeliberately does not shrink: its hot segment has peak register pressure 11 > pool 9 (it already spills), so the pressure guard correctly declines every hoist — the extra live register would force a spill. Recovering flat_flight's redundant consts needs the separate liveness-based spilling lever (VCR-RA SSA allocator), not const-CSE. The corpus test asserts flat_flight merely does not grow.Correctness
SYNTH_CONST_CSE=1 python scripts/repro/const_cse_differential.py→ PASS (optimized path large3/small3/neg/mixed/ctrl/spill12 + direct--relocatablepath r1/r2, all bit-identical to wasmtime; the direct r1 case exercises the Pass 1 → Pass 2 interaction).Tests / checks
const_units_reconstructs_a_32bit_movw_movt_pair,const_cse_hoists_a_same_register_reuse_into_a_free_register_242,const_cse_hoist_declines_when_value_is_live_out_of_segment_242; all 5 existing const-CSE unit tests still pass.const_cse_reduction_242.rscorpus assertions (flight_seam + spill12 shrink; whole-corpus no-grow including flat_flight).cargo test --workspace --exclude synth-verifygreen;cargo fmt --check;cargo clippy --workspace --all-targets -- -D warnings.Kept flag-off; the default-on flip remains a later silicon-gated step.
🤖 Generated with Claude Code