Skip to content

feat(vcr-ra): const-CSE PR2 — 32-bit movw+movt + pressure-guarded extending-alias hoist (#242)#562

Merged
avrabe merged 1 commit into
mainfrom
feat/242-const-cse-pr2-win-recovery
Jul 2, 2026
Merged

feat(vcr-ra): const-CSE PR2 — 32-bit movw+movt + pressure-guarded extending-alias hoist (#242)#562
avrabe merged 1 commit into
mainfrom
feat/242-const-cse-pr2-win-recovery

Conversation

@avrabe

@avrabe avrabe commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

const-CSE PR2 — win recovery (VCR-RA, #242)

PR1 made apply_const_cse a post-hoc, per-segment size-guarded pass, but it recovered almost none of the redundant-const win gale measured (61% of flat_flight's materializations). Its value extractor saw only 16-bit movw/mov #imm (not 32-bit movw+movt) and required ra != rd, so the greedy selector's dominant pattern — the same register re-materialized at each reuse, clobbered in between, with no register holding the value — was invisible.

What PR2 adds (all flag-off behind SYNTH_CONST_CSE)

  1. 32-bit movw+movt reconstruction (const_units): an adjacent movw rd,#lo ; movt rd,#hi becomes one 32-bit unit, so large constants are visible to CSE.
  2. Same-register extending-alias hoist: for a value re-materialized into one register ≥2× in a straight-line segment, pin it in a register that is provably FREE across the reuse window (free_reg_over), delete the repeats, and retarget the reads. Because this introduces one extra live register, every touched segment is gated on post-transform peak pressure ≤ ALLOCATABLE_POOL (9) in addition to the Epic: verified-codegen infrastructure (VCR-*) — replace the patch-accreting selector + allocator #242 no-grow size guard — so it can never turn a fitting segment into a spilling one. Post-hoc removal+retarget, not inline two-vreg aliasing, so it does not reintroduce the alias-eviction hazard; the only risk is pressure, which the guard measures directly.

apply_const_cse now runs two chained, individually-guarded passes: the PR1 cross-register fold, then the PR2 hoist on Pass 1's output. Running Pass 2 on the post-fold stream is load-bearing — it lets the hoist observe (and correctly retarget) a register use that Pass 1 aliased onto the register whose materialization Pass 2 then moves. This fixed a direct-path miscompile caught by the differential.

Flag-OFF byte-identical (the STOP condition — verified)

  • cargo test -p synth-cli --test frozen_codegen_bytes3/3.
  • const-CSE golden test const_cse_off_matches_frozen_baseline_242 (FNV-1a 0xa68a…, 576 B) unchanged.
  • Shared const_materialization / redundant_const_defs left untouched; all new code is reachable only under the flag.

Flag-ON measured win + no-function-grows

function off on Δ
flight_seam::flight_algo 306 302 −4
const_cse::spill12 236 148 −88 (32-bit movw+movt hoist)
flat_flight::flat_flight 412 412 0

No function grows anywhere in the corpus. flat_flight deliberately does not shrink: its hot segment has peak register pressure 11 > pool 9 (it already spills), so the pressure guard correctly declines every hoist — the extra live register would force a spill. Recovering flat_flight's redundant consts needs the separate liveness-based spilling lever (VCR-RA SSA allocator), not const-CSE. The corpus test asserts flat_flight merely does not grow.

Correctness

  • SYNTH_CONST_CSE=1 python scripts/repro/const_cse_differential.pyPASS (optimized path large3/small3/neg/mixed/ctrl/spill12 + direct --relocatable path r1/r2, all bit-identical to wasmtime; the direct r1 case exercises the Pass 1 → Pass 2 interaction).

Tests / checks

  • New lib tests: const_units_reconstructs_a_32bit_movw_movt_pair, const_cse_hoists_a_same_register_reuse_into_a_free_register_242, const_cse_hoist_declines_when_value_is_live_out_of_segment_242; all 5 existing const-CSE unit tests still pass.
  • Extended const_cse_reduction_242.rs corpus assertions (flight_seam + spill12 shrink; whole-corpus no-grow including flat_flight).
  • cargo test --workspace --exclude synth-verify green; cargo fmt --check; cargo clippy --workspace --all-targets -- -D warnings.

Kept flag-off; the default-on flip remains a later silicon-gated step.

🤖 Generated with Claude Code

…ending-alias hoist (#242)

const-CSE PR1 made `apply_const_cse` a post-hoc, size-guarded pass but recovered
almost none of gale's measured redundant-const win: its extractor saw only
16-bit `movw`/`mov #imm` (not 32-bit `movw+movt`) and required `ra != rd`, so
the greedy selector's SAME-register re-materialization (const clobbered between
uses, no register holding it) was invisible.

PR2 adds two pieces, flag-off behind `SYNTH_CONST_CSE`:

1. `const_units` reconstructs 32-bit `movw+movt` pairs, so large constants are
   visible to CSE.
2. A same-register **extending-alias hoist**: for a value re-materialized into
   one register >=2x in a straight-line segment, pin it in a register that is
   provably FREE across the reuse window (`free_reg_over`), delete the repeats,
   and retarget the reads. Because it adds one live register, every touched
   segment is gated on post-transform peak pressure <= ALLOCATABLE_POOL (9) in
   addition to the #242 no-grow size guard — it can never turn a fitting segment
   into a spilling one.

`apply_const_cse` now runs two chained, individually-guarded passes: the PR1
cross-register fold, then the PR2 hoist ON PASS 1's OUTPUT (so the hoist observes
the register uses Pass 1 aliased — the fix for a direct-path miscompile where
moving a materialization's destination stranded a Pass-1 alias).

Gates:
- Flag-OFF byte-identical: `frozen_codegen_bytes` 3/3, const-CSE golden hash
  unchanged (shared `const_materialization`/`redundant_const_defs` untouched; all
  new code reachable only under the flag).
- Flag-ON win (measured): flight_seam::flight_algo 306->302 B; const_cse::spill12
  236->148 B (the 32-bit movw+movt hoist); no function grows across the corpus.
  flat_flight stays 412 B — its hot segment peaks at 11 > pool 9 (already
  spilling), so the pressure guard correctly declines; recovering it needs the
  separate liveness-based spilling lever, not const-CSE.
- Correctness: `const_cse_differential.py` green (optimized + direct paths).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@codecov

codecov Bot commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.05634% with 14 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/synth-synthesis/src/liveness.rs 96.05% 14 Missing ⚠️

📢 Thoughts on this report? Let us know!

@avrabe avrabe merged commit fdcc123 into main Jul 2, 2026
24 checks passed
@avrabe avrabe deleted the feat/242-const-cse-pr2-win-recovery branch July 2, 2026 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant