feat(opt): span-based CSE replacement infrastructure + verifier-gap finding (v0.9.0 PR-K2) by avrabe · Pull Request #111 · pulseengine/loom

avrabe · 2026-05-14T16:54:00Z

Summary

Implements the span-based replacement mechanism for Expr::Call that PR-K (v0.8.0) couldn't ship because the existing CSE replacement loop only handled single-instruction Const exprs.

Critical finding during implementation: the Z3 translation validator models every Instruction::Call as a fresh symbolic constant (verify.rs:4035, conservative empty-sig-context path). When CSE tries to replace two call $pure(x) with local.tee + local.get, Z3 produces a counterexample (R1 + R2 → 2*R3 where R1 and R2 are independent BVs) and the per-function verify_or_revert reverts every dedup attempt. Per LOOM's proof-first policy, bypassing is not allowed. PR-K3 will fix this in verify.rs by modeling pure+no-trap calls as uninterpreted-function applications f(args) so that two identical calls produce equal BVs.

What lands

CSEAction::ReplaceSpanWithLoad variant for span-based replacement.
Phase 4 dispatch in the replacement loop, with skip_until window to handle overlapping spans.
Five defense-in-depth gates on top of PR-K's recognition guards:
1. Each arg must be a single-instruction pure pusher (Const / LocalGet).
2. Measured span length must equal args.len() + 1 (proves the span IS the call subtree).
3. occupied[] bitmap rejects spans overlapping another planned transform.
4. No LocalSet / LocalTee of an arg-local may appear between the first call and last duplicate.
5. PR-K's worth_dedup cost gate still runs unchanged.

The TranslationValidator::verify_or_revert is the downstream safety net — any unsound transform reverts cleanly, which is exactly what happens today (until PR-K3).

Tests

test_cse_dedupes_pure_clamp_calls_via_span_replacement — pins the EXPECTED end state. Marked #[ignore] with a comment explaining the PR-K3 verifier gap.
test_cse_dedupes_repeated_pure_calls — also #[ignore]'d for the same reason (updated comment).
Existing 3 safety tests (don't dedupe impure / may-trap / different-args) continue to pass.

Narrowing applied

Arg shapes: only single-instruction pure pushers (Const / LocalGet). Binary / Unary / nested-Call arg subtrees deferred — they need wider span-overlap analysis AND the verifier wouldn't accept them today anyway.
Touched only loom-core/src/lib.rs (336 inserts / 27 deletes).

Soundness reasoning

The mechanism is dominated by TranslationValidator::verify_or_revert. Any unsound transform reverts. The non-Call Const path is preserved bit-identically except for the new overlap check, which is conservative-pessimistic. PR-K3 will enable the dedup; this PR ships the mechanism.

Gale measurement

Zero bytes (gale lacks the duplicate-pure-call pattern even if the verifier permitted it). Infrastructure for PR-K3.

🤖 Generated with Claude Code

PR-K (v0.8.0) added IPA-aware RECOGNITION of pure+no-trap+single-result Call exprs in CSE: they're hashed, cost-gated, and tracked in hash_to_positions, but Phase 4's replacement loop only substituted single-instruction Const exprs. PR-K2 wires the REPLACEMENT side with span-based substitution so a duplicate Call's full `[arg_start ..= call_pos]` range collapses into one `local.get N`. ## What lands 1. New CSEAction::ReplaceSpanWithLoad { local_idx, end_pos } variant. The first occurrence still gets SaveToLocal (LocalTee after the Call); each subsequent occurrence gets ReplaceSpanWithLoad — the apply loop emits one local.get and skips through end_pos. 2. Phase 4 dispatch on Expr variant: - Const32 / Const64: pre-existing tee+load (unchanged behavior), now also runs through an `occupied[]` span-overlap check so it never clashes with a Call dedup planned in the same iteration. - Call { args, .. }: span dedup gated by FIVE safety checks (defense-in-depth on top of PR-K's recognition guards): a. Every arg must be a single-instruction pure pusher (Const or LocalGet). Binary/Unary/nested-Call args are deferred to a follow-up — they need a wider span-overlap analysis. b. Each occurrence's measured span length must equal args.len()+1. A shorter span would mean PR-K's leftmost-arg-start tracking missed an instruction, and we can't prove the span is exactly the call subtree. c. No span may overlap any already-planned transform (the `occupied[]` bitmap is union of every prior span). d. No LocalSet / LocalTee of any LocalGet-arg local may appear between the first call and the last duplicate's arg-start. (Conservative: caching across mutation would fold a stale value.) e. The existing PR-K cost gate `worth_dedup` still runs in Phase 2 — same +4 fixed overhead, same break-even at cost>=5 N=3. No change to that calculation. 3. Apply loop honors an outstanding `skip_until: Option<usize>` marker so a ReplaceSpanWithLoad action absorbs every instruction in the duplicate's span. ## Known gap (PR-K3 target) The two positive tests for Call CSE remain `#[ignore]`'d, with updated reasons. The mechanism is in place and exercises cleanly, but the existing Z3 translation validator models every Instruction::Call as a FRESH symbolic constant (verify.rs:4035 falls through to `BV::new_const("call_unknown_result", 32)` for the empty signature-context the CSE pass uses today). Two identical pure+no-trap calls thus translate to two INDEPENDENT BVs, and the optimization `R1 + R2 -> 2 * R3` is treated as unproven — `verify_or_revert` reverts every Call dedup attempt. Per the project's proof-first philosophy ("if we cannot prove it, we must not do it"), this is the correct behavior. Unblocking the test is PR-K3: teach the verifier to model pure+no-trap call results as `f(args)` uninterpreted-function applications when the IPA proves purity. With that, the dedup mechanism shipped here becomes immediately useful with no further CSE-side changes. The three existing safety tests (impure / may-trap / different-args) continue to pass — those rely on PR-K's RECOGNITION gates which this PR didn't touch. ## Narrowing applied The PR-K2 spec contemplated a NARROWER ship if the full design couldn't be made confident within an hour. The narrowing here is two-fold: - Arg shapes: only Const / LocalGet args participate (point 2a). Binary / Unary / nested-Call arg subtrees would need a wider span-overlap analysis that the verifier wouldn't accept anyway today; deferring keeps soundness and reduces risk surface. - Tests `#[ignore]`'d on the verifier blocker, with rationale. ## Measurement gale_in_baseline (1941 B) optimizes to 2826 B total wasm, with a code-section delta of 0 bytes vs v0.8.0 baseline (still 795 B code section). Expected — gale lacks the duplicate-pure-call pattern, and even if it had one the verifier would revert. Trace: REQ-3, REQ-5, REQ-14 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Bump workspace version 0.8.0 → 0.9.0. Measurement and harvest release. Three PRs landed in parallel via worktree-isolated agents: #110 PR-P corpus-wide LOOM vs wasm-opt -O3 measurement harness #111 PR-K2 span-based CSE replacement infrastructure + verifier-gap #112 PR-L2 Souper rule set: 3 → 12 identities, wire into pipeline First objective measured wins: | Workload | LOOM Δ% | wasm-opt Δ% | |----------|--------:|------------:| | gale | -4.9% | -0.8% (LOOM wins by 4.1 pts) | simple_component | -18.8% | n/a (wasm-opt errors) | calc_component | -11.3% | n/a (wasm-opt errors) | calculator_root | -0.4% | n/a (wasm-opt errors) LOOM beats wasm-opt on gale; PR-M's adapter spec delivers double- digit wins on adapter-heavy components. wasm-opt cannot process Component-Model components at all — LOOM's strategic moat. Critical finding (PR-K2): the Z3 verifier models every Call as a fresh symbolic constant, so cross-call CSE dedup is rejected even when the replacement is correct. PR-K3 (verifier-side) will model pure+no-trap calls as uninterpreted-function applications.

avrabe merged commit 62eeb21 into main May 14, 2026
9 of 18 checks passed

avrabe deleted the release/v0.9.0-pr-k2-span-cse branch May 14, 2026 17:01

avrabe mentioned this pull request May 14, 2026

release: v0.9.0 #113

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(opt): span-based CSE replacement infrastructure + verifier-gap finding (v0.9.0 PR-K2)#111

feat(opt): span-based CSE replacement infrastructure + verifier-gap finding (v0.9.0 PR-K2)#111
avrabe merged 1 commit into
mainfrom
release/v0.9.0-pr-k2-span-cse

avrabe commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

avrabe commented May 14, 2026

Summary

What lands

Tests

Narrowing applied

Soundness reasoning

Gale measurement

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant