fix(opt): decline linear-memory ops — optimized path miscompiled addresses (#178)#179
Merged
Merged
Conversation
…esses (#178) The optimized (default) wasm_to_ir → ir_to_arm path miscompiles i32/i64 linear-memory load/store: the MemLoad/MemStore address operand is dropped (the 'ADD base, base, Raddr' comes out with garbage registers), so the access hits a fixed address (linmem_base + 0x100) regardless of the operand — for BOTH a dynamic pointer-param address AND a constant one. Any function dereferencing memory reads/writes the wrong location. This is the on-target blocker for gale's pointer-taking shims (z_impl_k_sem_give reads sem->count from 0x20000100, not *sem). --no-optimize (select_with_stack) is correct: 'ldr [fp, Raddr]'. So optimize_full now declines any module using linear-memory load/store with a typed error, and the backend falls back to select_with_stack — correct-but-unoptimized beats fast-but-wrong. Verified: the optimized pointer-deref output now byte-matches --no-optimize ('ldr [fp, r0]'). Regression tests: pointer_deref_optimized_matches_no_optimize_178 (end-to-end: optimized == --no-optimize, no fixed-0x100 load) + the optimized-memory audit tests updated to assert the decline. The optimized memory path's repair + re-enable (root-causing the MemLoad ADD-register corruption) is a tracked follow-up; issue_104 CSE tests are #[ignore]d until then. Closes #178. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This was referenced May 30, 2026
Open
avrabe
added a commit
that referenced
this pull request
May 30, 2026
…#182) Bumps workspace 0.11.2 → 0.11.3, sweeps the intra-workspace path-dep pins + MODULE.bazel, promotes [Unreleased] → [0.11.3] with the falsification statement. PR #179 (closes #178): the optimized path dropped the load/store address operand (fixed-0x100 access regardless of the pointer); now declines memory modules to the correct select_with_stack path. Follow-ups: #180 (re-enable optimized memory), #181 (host-ABI contract).
avrabe
added a commit
that referenced
this pull request
May 30, 2026
…cause #180, re-enable optimized memory path (#183) * fix(encoder): high-register Thumb ADD/ADDS/SUBS use 32-bit .W form (#180) Root-cause of the optimized linear-memory miscompilation (#178, mitigated in #179 by declining memory ops). The bug was in the Thumb *encoder*, not the optimizer lowering: `ArmOp::Add` (and `Adds`/`Subs`) reg-forms unconditionally emitted the 16-bit encoding, whose 3-bit register fields overflow for high registers. The MemLoad/MemStore base scratch is R12, so `add ip, ip, r0` was emitted as the corrupt `adds r4, r5, r1` (0x186C) — silently dropping the address operand, making every optimized pointer-deref read/write a fixed address. (i64 `Adds`/`Subs` low-word ops hit the same class via R8-R11 pairs.) Fix: guard the 16-bit form on rd/rn/rm < 8 and fall back to 32-bit ADD.W/ADDS.W/SUBS.W for high registers, exactly as the Sub handler already did. - arm_encoder.rs: high-reg guards + encode_thumb32_adds/subs_reg_raw helpers - optimizer_bridge.rs: remove the #179 decline + is_linear_memory_op (obsolete) - byte-level encoder regression tests (EB0C 0C00 / EB1A 0A08 / EBBA 0A08) - re-enable the 4 #[ignore]d issue_104 memory CSE tests - audit_optimized_aapcs: re-assert optimized memory codegen (not the decline) - wast_compile: rewrite the #178 CLI test to verify the address operand is used (no corrupt 6C 18; contains add.w ip, ip, rN) instead of byte-equality Verified: optimized `i32.load`/`i32.store` of a pointer param now lowers to `movw/movt ip; add.w ip, ip, r0; ldr/str [ip,#off]` (byte-checked via objdump), matching --no-optimize semantics. Full suite: 1282 passed, 0 failed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * chore(release): v0.11.4 — workspace pin sweep + CHANGELOG (#180) Bump [workspace.package] version and every path-dep version pin 0.11.3 → 0.11.4 in lockstep (Cargo.toml ×11 + MODULE.bazel + Cargo.lock), per the mandatory pre-tag pin-sweep. CHANGELOG [0.11.4] with falsification statement. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(encoder): return Err (not panic) on PC operand; fix collapsible-if (#185) Two fuzz/lint fixes surfaced while landing #180: - #185: `encoder_no_panic` found a pre-existing panic — feeding PC (R15) as a data operand to a Thumb-2 op guarded by `verify_reg_bits` aborts under `-Cdebug-assertions`. Convert the 11 `verify_reg_bits` debug-assert sites to a fallible `reg_bits_checked() -> Result<()>` that returns a typed Err, matching the established "return Err, not panic" pattern (#101/#120/#132). Add a deterministic unit test + seed the crash input into the fuzz corpus. - Clippy (CI rust 1.96.0) flagged a collapsible `if let { if }` in the #167 wast_compile test; rewrite as a let-chain. The encoder has further pre-existing totality gaps (arithmetic underflow in ARM32 BCond offset, etc.) tracked in #186 — out of scope for this bugfix. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * style(backend): fix doc_lazy_continuation in #167 test doc comment CI clippy (rust 1.96.0) flags `-D clippy::doc-lazy-continuation`: the doc line starting with `>= num_imports` reads as a malformed blockquote. Wrap it in a code span so no doc line begins with `>`. Pre-existing on main; unblocks the v0.11.4 Clippy gate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes the gale-team miscompilation #178 — the on-target blocker for pointer-dereferencing shims.
Root cause
The optimized (default)
wasm_to_ir → ir_to_armpath miscompiles i32/i64 linear-memory load/store: theMemLoad/MemStoreaddress operand is dropped (theADD base, base, Raddremits garbage registers, e.g.adds r4, r5, r1), so the access hits a fixed address (linmem_base + 0x100) regardless of the operand. Confirmed for both a dynamic pointer-param address and a constant one.z_impl_k_sem_givereadssem->countfrom0x20000100instead of*sem.Fix
--no-optimize(select_with_stack) is correct (ldr [fp, Raddr]). Sooptimize_fullnow declines any module using linear-memory load/store (typedErr), and the backend falls back toselect_with_stack— correct-but-unoptimized beats fast-but-wrong for a compiler that's supposed to be verified.Verified: the optimized pointer-deref output now byte-matches
--no-optimize(ldr [fp, r0]); nomovw ip,#0x100.Regression tests
pointer_deref_optimized_matches_no_optimize_178— end-to-end: optimized.text==--no-optimize.text, and no fixed-0x100load signature.issue_104CSE tests#[ignore]d pending the optimizer repair.Tradeoff + follow-up
This disables optimization for memory-using modules (they were miscompiling). Re-enabling — root-causing the
MemLoadADD-register corruption and fixing the optimized memory codegen — is filed as a follow-up.Note on the host-ABI design comment
The latest #178 comment raises the broader
__linear_memory_base/native-pointer contract (separate from this bug). Addressed in a dedicated design issue; this PR is just the "optimized dynamic load must use the operand" fix the comment calls out as needed regardless.Closes #178.
🤖 Generated with Claude Code