Skip to content

fix(opt): decline linear-memory ops — optimized path miscompiled addresses (#178)#179

Merged
avrabe merged 1 commit into
mainfrom
fix/optimizer-memory-miscompile-178
May 30, 2026
Merged

fix(opt): decline linear-memory ops — optimized path miscompiled addresses (#178)#179
avrabe merged 1 commit into
mainfrom
fix/optimizer-memory-miscompile-178

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented May 30, 2026

Fixes the gale-team miscompilation #178 — the on-target blocker for pointer-dereferencing shims.

Root cause

The optimized (default) wasm_to_ir → ir_to_arm path miscompiles i32/i64 linear-memory load/store: the MemLoad/MemStore address operand is dropped (the ADD base, base, Raddr emits garbage registers, e.g. adds r4, r5, r1), so the access hits a fixed address (linmem_base + 0x100) regardless of the operand. Confirmed for both a dynamic pointer-param address and a constant one. z_impl_k_sem_give reads sem->count from 0x20000100 instead of *sem.

Fix

--no-optimize (select_with_stack) is correct (ldr [fp, Raddr]). So optimize_full now declines any module using linear-memory load/store (typed Err), and the backend falls back to select_with_stack — correct-but-unoptimized beats fast-but-wrong for a compiler that's supposed to be verified.

Verified: the optimized pointer-deref output now byte-matches --no-optimize (ldr [fp, r0]); no movw ip,#0x100.

Regression tests

  • pointer_deref_optimized_matches_no_optimize_178 — end-to-end: optimized .text == --no-optimize .text, and no fixed-0x100 load signature.
  • Optimized-memory audit tests updated to assert the decline; issue_104 CSE tests #[ignore]d pending the optimizer repair.

Tradeoff + follow-up

This disables optimization for memory-using modules (they were miscompiling). Re-enabling — root-causing the MemLoad ADD-register corruption and fixing the optimized memory codegen — is filed as a follow-up.

Note on the host-ABI design comment

The latest #178 comment raises the broader __linear_memory_base/native-pointer contract (separate from this bug). Addressed in a dedicated design issue; this PR is just the "optimized dynamic load must use the operand" fix the comment calls out as needed regardless.

Closes #178.

🤖 Generated with Claude Code

…esses (#178)

The optimized (default) wasm_to_ir → ir_to_arm path miscompiles
i32/i64 linear-memory load/store: the MemLoad/MemStore address operand
is dropped (the 'ADD base, base, Raddr' comes out with garbage
registers), so the access hits a fixed address (linmem_base + 0x100)
regardless of the operand — for BOTH a dynamic pointer-param address
AND a constant one. Any function dereferencing memory reads/writes the
wrong location. This is the on-target blocker for gale's pointer-taking
shims (z_impl_k_sem_give reads sem->count from 0x20000100, not *sem).

--no-optimize (select_with_stack) is correct: 'ldr [fp, Raddr]'. So
optimize_full now declines any module using linear-memory load/store
with a typed error, and the backend falls back to select_with_stack —
correct-but-unoptimized beats fast-but-wrong. Verified: the optimized
pointer-deref output now byte-matches --no-optimize ('ldr [fp, r0]').

Regression tests: pointer_deref_optimized_matches_no_optimize_178
(end-to-end: optimized == --no-optimize, no fixed-0x100 load) +
the optimized-memory audit tests updated to assert the decline.

The optimized memory path's repair + re-enable (root-causing the
MemLoad ADD-register corruption) is a tracked follow-up; issue_104 CSE
tests are #[ignore]d until then.

Closes #178.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@avrabe avrabe merged commit 7b4f4b5 into main May 30, 2026
3 of 11 checks passed
@avrabe avrabe deleted the fix/optimizer-memory-miscompile-178 branch May 30, 2026 05:16
avrabe added a commit that referenced this pull request May 30, 2026
…#182)

Bumps workspace 0.11.2 → 0.11.3, sweeps the intra-workspace path-dep
pins + MODULE.bazel, promotes [Unreleased] → [0.11.3] with the
falsification statement.

PR #179 (closes #178): the optimized path dropped the load/store
address operand (fixed-0x100 access regardless of the pointer); now
declines memory modules to the correct select_with_stack path.
Follow-ups: #180 (re-enable optimized memory), #181 (host-ABI contract).
avrabe added a commit that referenced this pull request May 30, 2026
…cause #180, re-enable optimized memory path (#183)

* fix(encoder): high-register Thumb ADD/ADDS/SUBS use 32-bit .W form (#180)

Root-cause of the optimized linear-memory miscompilation (#178, mitigated in
#179 by declining memory ops). The bug was in the Thumb *encoder*, not the
optimizer lowering: `ArmOp::Add` (and `Adds`/`Subs`) reg-forms unconditionally
emitted the 16-bit encoding, whose 3-bit register fields overflow for high
registers. The MemLoad/MemStore base scratch is R12, so `add ip, ip, r0` was
emitted as the corrupt `adds r4, r5, r1` (0x186C) — silently dropping the
address operand, making every optimized pointer-deref read/write a fixed
address. (i64 `Adds`/`Subs` low-word ops hit the same class via R8-R11 pairs.)

Fix: guard the 16-bit form on rd/rn/rm < 8 and fall back to 32-bit
ADD.W/ADDS.W/SUBS.W for high registers, exactly as the Sub handler already did.

- arm_encoder.rs: high-reg guards + encode_thumb32_adds/subs_reg_raw helpers
- optimizer_bridge.rs: remove the #179 decline + is_linear_memory_op (obsolete)
- byte-level encoder regression tests (EB0C 0C00 / EB1A 0A08 / EBBA 0A08)
- re-enable the 4 #[ignore]d issue_104 memory CSE tests
- audit_optimized_aapcs: re-assert optimized memory codegen (not the decline)
- wast_compile: rewrite the #178 CLI test to verify the address operand is used
  (no corrupt 6C 18; contains add.w ip, ip, rN) instead of byte-equality

Verified: optimized `i32.load`/`i32.store` of a pointer param now lowers to
`movw/movt ip; add.w ip, ip, r0; ldr/str [ip,#off]` (byte-checked via objdump),
matching --no-optimize semantics. Full suite: 1282 passed, 0 failed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* chore(release): v0.11.4 — workspace pin sweep + CHANGELOG (#180)

Bump [workspace.package] version and every path-dep version pin 0.11.3 → 0.11.4
in lockstep (Cargo.toml ×11 + MODULE.bazel + Cargo.lock), per the mandatory
pre-tag pin-sweep. CHANGELOG [0.11.4] with falsification statement.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(encoder): return Err (not panic) on PC operand; fix collapsible-if (#185)

Two fuzz/lint fixes surfaced while landing #180:

- #185: `encoder_no_panic` found a pre-existing panic — feeding PC (R15) as a
  data operand to a Thumb-2 op guarded by `verify_reg_bits` aborts under
  `-Cdebug-assertions`. Convert the 11 `verify_reg_bits` debug-assert sites to a
  fallible `reg_bits_checked() -> Result<()>` that returns a typed Err, matching
  the established "return Err, not panic" pattern (#101/#120/#132). Add a
  deterministic unit test + seed the crash input into the fuzz corpus.

- Clippy (CI rust 1.96.0) flagged a collapsible `if let { if }` in the #167
  wast_compile test; rewrite as a let-chain.

The encoder has further pre-existing totality gaps (arithmetic underflow in
ARM32 BCond offset, etc.) tracked in #186 — out of scope for this bugfix.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* style(backend): fix doc_lazy_continuation in #167 test doc comment

CI clippy (rust 1.96.0) flags `-D clippy::doc-lazy-continuation`: the doc line
starting with `>= num_imports` reads as a malformed blockquote. Wrap it in a
code span so no doc line begins with `>`. Pre-existing on main; unblocks the
v0.11.4 Clippy gate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

arm optimizer: dynamic i32.load/store address constant-folded to a fixed 0x100 — pointer params ignored (--no-optimize correct)

1 participant