Skip to content

fix(opt): decouple slot-stack from inst_id in wasm_to_ir (#121)#122

Open
avrabe wants to merge 1 commit into
mainfrom
fix/issue-121-wasm-to-ir-slot-stack
Open

fix(opt): decouple slot-stack from inst_id in wasm_to_ir (#121)#122
avrabe wants to merge 1 commit into
mainfrom
fix/issue-121-wasm-to-ir-slot-stack

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented May 18, 2026

Summary

Closes #121 — the root architectural fix that PR #117's five rounds of continue patches were skating around. The wasm_to_ir lowering now tracks producer vregs via an explicit `slot_stack: Vec` parallel to `inst_id`, instead of overloading `inst_id` as both the IR id and the wasm-stack-slot index.

This is silicon-priority: the bug Gale reported in the field (wasm modules with Drop/LocalSet/Store mid-stream) is fixed here. PR #117's temporary demotion of `wasm_ops_lower_or_error` to `gating: false` is unaffected (the demote commit was on #117's branch, never reached main), but #117 may now rebase and revert its workflow change once this lands.

What was broken

`OptimizerBridge::wasm_to_ir` used a single `inst_id` counter as both the unique IR-instruction id AND a vreg-slot index. Binary/unary handlers used `inst_id.saturating_sub(N)` to look up operands, assuming a 1:1 correspondence with wasm value stack positions. That assumption broke whenever a wasm op consumed a slot without producing a vreg (Drop, LocalSet, GlobalSet, Stores, control-flow ops...). The result was either:

  • A loud panic at `get_arm_reg` (the defensive guard at line 1670 — what the fuzz harness kept hitting).
  • A silent miscompilation reading whatever stale vreg was bound to the consumed slot (the Gale class — much worse on hardware).

The non-optimized path (`select_with_stack`) was unaffected because it uses a real value stack.

What this PR does

Drive-by fix

`i64_operand_count` was missing the i64 div/rem variants (I64DivS/U, I64RemS/U). The old `inst_id.saturating_sub(4)` math happened to fortuitously work for the existing `i64_div.wast` test due to saturating-sub at slot boundaries; the slot_stack refactor unmasked this as a real `pop()` on an empty stack. Added the four ops to `i64_operand_count` to resolve cleanly.

Tests

`crates/synth-synthesis/tests/regression_issue_121_slot_stack.rs` — 12 new tests, all passing:

Panic-free shapes (the fuzz-found inputs):

  • `drop_between_producer_and_consumer` — the PR fix(lowering): return Err on stack underflow instead of panic — fuzz #113 #117 round-6 input.
  • `local_set_between_producer_and_consumer`, `global_set_between_producer_and_consumer`, `i32_store_between_producer_and_consumer`.
  • `block_loop_end_between_producer_and_consumer`, `br_if_between_producer_and_consumer`, `local_tee_then_consumer`.
  • `double_drop_then_const`, `mixed_i32_i64_with_drop`, `i64_drop_between_i64_consts`.

Semantic correctness (proving the silent-miscompilation path is fixed, not just the panic path):

  • `drop_preserves_correct_value_for_consumer` — `[const(7), const(11), drop, popcnt]` must operate on 7, not 11. Asserts the Popcnt instruction's src points at const(7)'s slot.
  • `local_set_preserves_correct_value_for_consumer` — same shape with LocalSet instead of Drop.

Full workspace test (excluding synth-verify — z3 network issue): 1041 passing, 0 regressions. The 4 existing AAPCS/i64 regression tests from PR #100/#101/#103/#104 plus the #93 memset/i64-shift tests all continue to pass.

Test plan

  • CI green across Test / Clippy / Format / Z3 / Kani / Bazel.
  • Gating fuzz harness `wasm_ops_lower_or_error` passes on the bd4ae7f/120c187/round-6 corpus seeds (it'll run automatically).
  • No regression in existing AAPCS / i64 / i32 selector tests.
  • Once this lands, PR fix(lowering): return Err on stack underflow instead of panic — fuzz #113 #117 can be rebased; its temporary demotion of `wasm_ops_lower_or_error` to non-gating is no longer needed.

Refs

🤖 Generated with Claude Code

`OptimizerBridge::wasm_to_ir` overloaded `inst_id` as both the unique IR
instruction id AND a vreg-slot index, with back-references like
`inst_id.saturating_sub(2)` assuming a one-to-one correspondence with
the wasm value stack. That assumption broke whenever any wasm op
consumed a stack slot without producing one — Drop, LocalSet, GlobalSet,
the i32/i64 store family, BrIf, and the structural Block/Loop/End
markers. The next binary or unary op's back-reference would then index
a stale or never-mapped vreg, and `get_arm_reg` would either trip the
PR #101 defensive panic or (pre-PR-101) silently fall back to R0 — the
silent-miscompilation class first surfaced in issue #93.

Gale (the real-hardware test rig) caught WASM modules in the field
that tripped this on production silicon; the cargo-fuzz
`wasm_ops_lower_or_error` harness on PR #117 surfaced the same class
six different ways (Nop/Unreachable/Return were closed there; Drop,
LocalSet, Store, Block/Loop/End remained until this PR).

Fix: introduce `slot_stack: Vec<u32>` in `wasm_to_ir` that mirrors the
wasm value stack. Each producer pushes its dest vreg onto slot_stack;
each consumer pops to discover its source vreg. `inst_id` reverts to
its original meaning — a monotonically increasing unique IR id — and
is no longer used for slot lookup.

i64 values occupy two consecutive entries on slot_stack (lo first,
then hi), matching the (dest_lo, dest_hi) two-vreg-pair layout already
used by i64 opcodes. I64ExtendI32U/S aliases dest_lo to the consumed
i32 src vreg by IR convention (preserved); I32WrapI64 aliases dest to
src_lo (preserved). Drop becomes an explicit `slot_stack.pop(); continue`
no-IR-emit arm; Nop/Unreachable/Return emit Opcode::Nop with no
slot_stack effect.

Drive-by: `i64_operand_count` was missing I64DivS/I64DivU/I64RemS/
I64RemU (so `analyze_i64_local_gets` failed to mark their i64
operands), which was masked by the same inst_id-slot scrambling.
Added them; the existing i64-div WAST tests now exercise the i64
LocalGet path instead of fortuitously-correct i32 Loads.

The catch-all `_ => Opcode::Nop` is preserved as a bug-finder: unknown
ops do not touch slot_stack, so subsequent consumers fail loudly via
`slot_stack.pop().expect(...)` instead of silently mis-binding vregs.

Regression coverage: new
`crates/synth-synthesis/tests/regression_issue_121_slot_stack.rs`
exercises Drop/LocalSet/GlobalSet/Store/BrIf/Block-Loop-End/LocalTee
between producer-and-consumer plus i32 and i64 variants. Two
semantic-correctness probes confirm that Popcnt reads the surviving
stack value (not the dropped one) — proving the fix addresses silent
miscompilation, not just the panic.

Test delta: +12 tests, 0 regressions. The 4 fuzz-related regression
tests from #100/#101/#103/#104 plus the #93 memset/i64-shift tests
all continue to pass.

Refs: issue #121, PR #117 (fuzz-harness reproductions), issue #93
(silent-drop class), PR #101 (defensive panic), PR #100 (fuzz harness).
@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

❌ Patch coverage is 75.32957% with 131 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/synth-synthesis/src/optimizer_bridge.rs 75.32% 131 Missing ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

wasm_to_ir: inst_id overloaded as both IR-id and vreg-slot — decouple via explicit slot_stack

1 participant