fix(opt): decline linear-memory ops — optimized path miscompiled addresses (#178) by avrabe · Pull Request #179 · pulseengine/synth

avrabe · 2026-05-30T05:15:44Z

Fixes the gale-team miscompilation #178 — the on-target blocker for pointer-dereferencing shims.

Root cause

The optimized (default) wasm_to_ir → ir_to_arm path miscompiles i32/i64 linear-memory load/store: the MemLoad/MemStore address operand is dropped (the ADD base, base, Raddr emits garbage registers, e.g. adds r4, r5, r1), so the access hits a fixed address (linmem_base + 0x100) regardless of the operand. Confirmed for both a dynamic pointer-param address and a constant one. z_impl_k_sem_give reads sem->count from 0x20000100 instead of *sem.

Fix

--no-optimize (select_with_stack) is correct (ldr [fp, Raddr]). So optimize_full now declines any module using linear-memory load/store (typed Err), and the backend falls back to select_with_stack — correct-but-unoptimized beats fast-but-wrong for a compiler that's supposed to be verified.

Verified: the optimized pointer-deref output now byte-matches --no-optimize (ldr [fp, r0]); no movw ip,#0x100.

Regression tests

pointer_deref_optimized_matches_no_optimize_178 — end-to-end: optimized .text == --no-optimize .text, and no fixed-0x100 load signature.
Optimized-memory audit tests updated to assert the decline; issue_104 CSE tests #[ignore]d pending the optimizer repair.

Tradeoff + follow-up

This disables optimization for memory-using modules (they were miscompiling). Re-enabling — root-causing the MemLoad ADD-register corruption and fixing the optimized memory codegen — is filed as a follow-up.

Note on the host-ABI design comment

The latest #178 comment raises the broader __linear_memory_base/native-pointer contract (separate from this bug). Addressed in a dedicated design issue; this PR is just the "optimized dynamic load must use the operand" fix the comment calls out as needed regardless.

Closes #178.

🤖 Generated with Claude Code

…esses (#178) The optimized (default) wasm_to_ir → ir_to_arm path miscompiles i32/i64 linear-memory load/store: the MemLoad/MemStore address operand is dropped (the 'ADD base, base, Raddr' comes out with garbage registers), so the access hits a fixed address (linmem_base + 0x100) regardless of the operand — for BOTH a dynamic pointer-param address AND a constant one. Any function dereferencing memory reads/writes the wrong location. This is the on-target blocker for gale's pointer-taking shims (z_impl_k_sem_give reads sem->count from 0x20000100, not *sem). --no-optimize (select_with_stack) is correct: 'ldr [fp, Raddr]'. So optimize_full now declines any module using linear-memory load/store with a typed error, and the backend falls back to select_with_stack — correct-but-unoptimized beats fast-but-wrong. Verified: the optimized pointer-deref output now byte-matches --no-optimize ('ldr [fp, r0]'). Regression tests: pointer_deref_optimized_matches_no_optimize_178 (end-to-end: optimized == --no-optimize, no fixed-0x100 load) + the optimized-memory audit tests updated to assert the decline. The optimized memory path's repair + re-enable (root-causing the MemLoad ADD-register corruption) is a tracked follow-up; issue_104 CSE tests are #[ignore]d until then. Closes #178. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…#182) Bumps workspace 0.11.2 → 0.11.3, sweeps the intra-workspace path-dep pins + MODULE.bazel, promotes [Unreleased] → [0.11.3] with the falsification statement. PR #179 (closes #178): the optimized path dropped the load/store address operand (fixed-0x100 access regardless of the pointer); now declines memory modules to the correct select_with_stack path. Follow-ups: #180 (re-enable optimized memory), #181 (host-ABI contract).

…cause #180, re-enable optimized memory path (#183) * fix(encoder): high-register Thumb ADD/ADDS/SUBS use 32-bit .W form (#180) Root-cause of the optimized linear-memory miscompilation (#178, mitigated in #179 by declining memory ops). The bug was in the Thumb *encoder*, not the optimizer lowering: `ArmOp::Add` (and `Adds`/`Subs`) reg-forms unconditionally emitted the 16-bit encoding, whose 3-bit register fields overflow for high registers. The MemLoad/MemStore base scratch is R12, so `add ip, ip, r0` was emitted as the corrupt `adds r4, r5, r1` (0x186C) — silently dropping the address operand, making every optimized pointer-deref read/write a fixed address. (i64 `Adds`/`Subs` low-word ops hit the same class via R8-R11 pairs.) Fix: guard the 16-bit form on rd/rn/rm < 8 and fall back to 32-bit ADD.W/ADDS.W/SUBS.W for high registers, exactly as the Sub handler already did. - arm_encoder.rs: high-reg guards + encode_thumb32_adds/subs_reg_raw helpers - optimizer_bridge.rs: remove the #179 decline + is_linear_memory_op (obsolete) - byte-level encoder regression tests (EB0C 0C00 / EB1A 0A08 / EBBA 0A08) - re-enable the 4 #[ignore]d issue_104 memory CSE tests - audit_optimized_aapcs: re-assert optimized memory codegen (not the decline) - wast_compile: rewrite the #178 CLI test to verify the address operand is used (no corrupt 6C 18; contains add.w ip, ip, rN) instead of byte-equality Verified: optimized `i32.load`/`i32.store` of a pointer param now lowers to `movw/movt ip; add.w ip, ip, r0; ldr/str [ip,#off]` (byte-checked via objdump), matching --no-optimize semantics. Full suite: 1282 passed, 0 failed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * chore(release): v0.11.4 — workspace pin sweep + CHANGELOG (#180) Bump [workspace.package] version and every path-dep version pin 0.11.3 → 0.11.4 in lockstep (Cargo.toml ×11 + MODULE.bazel + Cargo.lock), per the mandatory pre-tag pin-sweep. CHANGELOG [0.11.4] with falsification statement. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(encoder): return Err (not panic) on PC operand; fix collapsible-if (#185) Two fuzz/lint fixes surfaced while landing #180: - #185: `encoder_no_panic` found a pre-existing panic — feeding PC (R15) as a data operand to a Thumb-2 op guarded by `verify_reg_bits` aborts under `-Cdebug-assertions`. Convert the 11 `verify_reg_bits` debug-assert sites to a fallible `reg_bits_checked() -> Result<()>` that returns a typed Err, matching the established "return Err, not panic" pattern (#101/#120/#132). Add a deterministic unit test + seed the crash input into the fuzz corpus. - Clippy (CI rust 1.96.0) flagged a collapsible `if let { if }` in the #167 wast_compile test; rewrite as a let-chain. The encoder has further pre-existing totality gaps (arithmetic underflow in ARM32 BCond offset, etc.) tracked in #186 — out of scope for this bugfix. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * style(backend): fix doc_lazy_continuation in #167 test doc comment CI clippy (rust 1.96.0) flags `-D clippy::doc-lazy-continuation`: the doc line starting with `>= num_imports` reads as a malformed blockquote. Wrap it in a code span so no doc line begins with `>`. Pre-existing on main; unblocks the v0.11.4 Clippy gate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

avrabe merged commit 7b4f4b5 into main May 30, 2026
3 of 11 checks passed

avrabe deleted the fix/optimizer-memory-miscompile-178 branch May 30, 2026 05:16

avrabe mentioned this pull request May 30, 2026

fix(encoder): high-register Thumb ADD/ADDS/SUBS use 32-bit .W — root-cause #180, re-enable optimized memory path #183

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(opt): decline linear-memory ops — optimized path miscompiled addresses (#178)#179

fix(opt): decline linear-memory ops — optimized path miscompiled addresses (#178)#179
avrabe merged 1 commit into
mainfrom
fix/optimizer-memory-miscompile-178

avrabe commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

avrabe commented May 30, 2026

Root cause

Fix

Regression tests

Tradeoff + follow-up

Note on the host-ABI design comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant