fix(opt): decouple slot-stack from inst_id in wasm_to_ir (#121)#122
Open
avrabe wants to merge 1 commit into
Open
fix(opt): decouple slot-stack from inst_id in wasm_to_ir (#121)#122avrabe wants to merge 1 commit into
avrabe wants to merge 1 commit into
Conversation
`OptimizerBridge::wasm_to_ir` overloaded `inst_id` as both the unique IR instruction id AND a vreg-slot index, with back-references like `inst_id.saturating_sub(2)` assuming a one-to-one correspondence with the wasm value stack. That assumption broke whenever any wasm op consumed a stack slot without producing one — Drop, LocalSet, GlobalSet, the i32/i64 store family, BrIf, and the structural Block/Loop/End markers. The next binary or unary op's back-reference would then index a stale or never-mapped vreg, and `get_arm_reg` would either trip the PR #101 defensive panic or (pre-PR-101) silently fall back to R0 — the silent-miscompilation class first surfaced in issue #93. Gale (the real-hardware test rig) caught WASM modules in the field that tripped this on production silicon; the cargo-fuzz `wasm_ops_lower_or_error` harness on PR #117 surfaced the same class six different ways (Nop/Unreachable/Return were closed there; Drop, LocalSet, Store, Block/Loop/End remained until this PR). Fix: introduce `slot_stack: Vec<u32>` in `wasm_to_ir` that mirrors the wasm value stack. Each producer pushes its dest vreg onto slot_stack; each consumer pops to discover its source vreg. `inst_id` reverts to its original meaning — a monotonically increasing unique IR id — and is no longer used for slot lookup. i64 values occupy two consecutive entries on slot_stack (lo first, then hi), matching the (dest_lo, dest_hi) two-vreg-pair layout already used by i64 opcodes. I64ExtendI32U/S aliases dest_lo to the consumed i32 src vreg by IR convention (preserved); I32WrapI64 aliases dest to src_lo (preserved). Drop becomes an explicit `slot_stack.pop(); continue` no-IR-emit arm; Nop/Unreachable/Return emit Opcode::Nop with no slot_stack effect. Drive-by: `i64_operand_count` was missing I64DivS/I64DivU/I64RemS/ I64RemU (so `analyze_i64_local_gets` failed to mark their i64 operands), which was masked by the same inst_id-slot scrambling. Added them; the existing i64-div WAST tests now exercise the i64 LocalGet path instead of fortuitously-correct i32 Loads. The catch-all `_ => Opcode::Nop` is preserved as a bug-finder: unknown ops do not touch slot_stack, so subsequent consumers fail loudly via `slot_stack.pop().expect(...)` instead of silently mis-binding vregs. Regression coverage: new `crates/synth-synthesis/tests/regression_issue_121_slot_stack.rs` exercises Drop/LocalSet/GlobalSet/Store/BrIf/Block-Loop-End/LocalTee between producer-and-consumer plus i32 and i64 variants. Two semantic-correctness probes confirm that Popcnt reads the surviving stack value (not the dropped one) — proving the fix addresses silent miscompilation, not just the panic. Test delta: +12 tests, 0 regressions. The 4 fuzz-related regression tests from #100/#101/#103/#104 plus the #93 memset/i64-shift tests all continue to pass. Refs: issue #121, PR #117 (fuzz-harness reproductions), issue #93 (silent-drop class), PR #101 (defensive panic), PR #100 (fuzz harness).
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #121 — the root architectural fix that PR #117's five rounds of
continuepatches were skating around. The wasm_to_ir lowering now tracks producer vregs via an explicit `slot_stack: Vec` parallel to `inst_id`, instead of overloading `inst_id` as both the IR id and the wasm-stack-slot index.This is silicon-priority: the bug Gale reported in the field (wasm modules with Drop/LocalSet/Store mid-stream) is fixed here. PR #117's temporary demotion of `wasm_ops_lower_or_error` to `gating: false` is unaffected (the demote commit was on #117's branch, never reached main), but #117 may now rebase and revert its workflow change once this lands.
What was broken
`OptimizerBridge::wasm_to_ir` used a single `inst_id` counter as both the unique IR-instruction id AND a vreg-slot index. Binary/unary handlers used `inst_id.saturating_sub(N)` to look up operands, assuming a 1:1 correspondence with wasm value stack positions. That assumption broke whenever a wasm op consumed a slot without producing a vreg (Drop, LocalSet, GlobalSet, Stores, control-flow ops...). The result was either:
The non-optimized path (`select_with_stack`) was unaffected because it uses a real value stack.
What this PR does
Drive-by fix
`i64_operand_count` was missing the i64 div/rem variants (I64DivS/U, I64RemS/U). The old `inst_id.saturating_sub(4)` math happened to fortuitously work for the existing `i64_div.wast` test due to saturating-sub at slot boundaries; the slot_stack refactor unmasked this as a real `pop()` on an empty stack. Added the four ops to `i64_operand_count` to resolve cleanly.
Tests
`crates/synth-synthesis/tests/regression_issue_121_slot_stack.rs` — 12 new tests, all passing:
Panic-free shapes (the fuzz-found inputs):
Semantic correctness (proving the silent-miscompilation path is fixed, not just the panic path):
Full workspace test (excluding synth-verify — z3 network issue): 1041 passing, 0 regressions. The 4 existing AAPCS/i64 regression tests from PR #100/#101/#103/#104 plus the #93 memset/i64-shift tests all continue to pass.
Test plan
Refs
🤖 Generated with Claude Code