Skip to content

fix(encoder): high-register Thumb ADD/ADDS/SUBS use 32-bit .W — root-cause #180, re-enable optimized memory path#183

Merged
avrabe merged 4 commits into
mainfrom
fix/encoder-high-reg-add-180
May 30, 2026
Merged

fix(encoder): high-register Thumb ADD/ADDS/SUBS use 32-bit .W — root-cause #180, re-enable optimized memory path#183
avrabe merged 4 commits into
mainfrom
fix/encoder-high-reg-add-180

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented May 30, 2026

Summary

Root-causes and fixes the optimizer linear-memory miscompilation (#178, mitigated in #179; tracked for repair in #180). The bug was not in the optimizer lowering — it was in the Thumb encoder.

ArmOp::Add (and Adds/Subs) register-forms unconditionally emitted the 16-bit encoding, whose 3-bit register fields overflow for high registers. MemLoad/MemStore use R12 as the base+address scratch, so add ip, ip, r0 was emitted as the corrupt adds r4, r5, r1 (0x186C) — silently dropping the address operand, making every optimized pointer-deref hit a fixed address. The i64 low-word Adds/Subs ops hit the same class via R8–R11 register pairs.

Fix: guard the 16-bit form on rd/rn/rm < 8 and fall back to 32-bit ADD.W/ADDS.W/SUBS.W for high registers — exactly as the Sub handler already did. The bug was a copy-paste divergence (Add lacked the guard Sub had).

Before → after (the #178 reproducer, optimized path)

8:  186c        adds r4, r5, r1      ; ❌ operand dropped, fixed-address load
8:  eb0c 0c00   add.w ip, ip, r0     ; ✅ base + pointer operand
c:  f8dc 4000   ldr.w r4, [ip]

Changes

Verification

Follow-up (out of scope, audit-noted)

  • The CMN register-form encoder (arm_encoder.rs) is unguarded for high registers but is only reached today via the immediate form (32-bit CMN.W); filed as an audit follow-up rather than a speculative fallback.

Closes #180.

🤖 Generated with Claude Code

avrabe and others added 2 commits May 30, 2026 08:05
)

Root-cause of the optimized linear-memory miscompilation (#178, mitigated in
#179 by declining memory ops). The bug was in the Thumb *encoder*, not the
optimizer lowering: `ArmOp::Add` (and `Adds`/`Subs`) reg-forms unconditionally
emitted the 16-bit encoding, whose 3-bit register fields overflow for high
registers. The MemLoad/MemStore base scratch is R12, so `add ip, ip, r0` was
emitted as the corrupt `adds r4, r5, r1` (0x186C) — silently dropping the
address operand, making every optimized pointer-deref read/write a fixed
address. (i64 `Adds`/`Subs` low-word ops hit the same class via R8-R11 pairs.)

Fix: guard the 16-bit form on rd/rn/rm < 8 and fall back to 32-bit
ADD.W/ADDS.W/SUBS.W for high registers, exactly as the Sub handler already did.

- arm_encoder.rs: high-reg guards + encode_thumb32_adds/subs_reg_raw helpers
- optimizer_bridge.rs: remove the #179 decline + is_linear_memory_op (obsolete)
- byte-level encoder regression tests (EB0C 0C00 / EB1A 0A08 / EBBA 0A08)
- re-enable the 4 #[ignore]d issue_104 memory CSE tests
- audit_optimized_aapcs: re-assert optimized memory codegen (not the decline)
- wast_compile: rewrite the #178 CLI test to verify the address operand is used
  (no corrupt 6C 18; contains add.w ip, ip, rN) instead of byte-equality

Verified: optimized `i32.load`/`i32.store` of a pointer param now lowers to
`movw/movt ip; add.w ip, ip, r0; ldr/str [ip,#off]` (byte-checked via objdump),
matching --no-optimize semantics. Full suite: 1282 passed, 0 failed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Bump [workspace.package] version and every path-dep version pin 0.11.3 → 0.11.4
in lockstep (Cargo.toml ×11 + MODULE.bazel + Cargo.lock), per the mandatory
pre-tag pin-sweep. CHANGELOG [0.11.4] with falsification statement.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
#185)

Two fuzz/lint fixes surfaced while landing #180:

- #185: `encoder_no_panic` found a pre-existing panic — feeding PC (R15) as a
  data operand to a Thumb-2 op guarded by `verify_reg_bits` aborts under
  `-Cdebug-assertions`. Convert the 11 `verify_reg_bits` debug-assert sites to a
  fallible `reg_bits_checked() -> Result<()>` that returns a typed Err, matching
  the established "return Err, not panic" pattern (#101/#120/#132). Add a
  deterministic unit test + seed the crash input into the fuzz corpus.

- Clippy (CI rust 1.96.0) flagged a collapsible `if let { if }` in the #167
  wast_compile test; rewrite as a let-chain.

The encoder has further pre-existing totality gaps (arithmetic underflow in
ARM32 BCond offset, etc.) tracked in #186 — out of scope for this bugfix.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@avrabe avrabe self-assigned this May 30, 2026
@avrabe
Copy link
Copy Markdown
Contributor Author

avrabe commented May 30, 2026

Update: two CI findings fixed in 7167951

While CI ran, two issues surfaced — both addressed:

  1. Clippy (CI rust 1.96.0) flagged a collapsible_if in the pre-existing arm backend: every wasm call lowered to __meld_dispatch_import placeholder — no inlining, no relocations, non-linkable ELF #167 wast_compile test (a if let { if }). Rewritten as a let-chain.

  2. encoder_no_panic fuzz found a pre-existing panic (identical code on main, unrelated to opt: repair + re-enable the optimized linear-memory path (root-cause the MemLoad ADD-register corruption) #180): a PC/R15 operand fed to a verify_reg_bits-guarded Thumb-2 op aborts under -Cdebug-assertions. Converted the 11 contract sites to a fallible reg_bits_checked() -> Result<()> (typed Err, per the fix(opt): defensive panic on unmapped vreg instead of silent R0 fallback #101/wasm_to_ir: unmapped vreg panic still trips on compiler_builtins (float::div) and gale_compute_ipi_mask after v0.2.1's #97 memset fix #120/fix(opt): ir_to_arm returns Result — panic-free optimized path, fuzz re-gated #132 pattern) + deterministic unit test. Filed as arm encoder: panics (debug_assert) on PC/R15 operand instead of returning Err — fuzz-found, pre-existing #185 (fixed here).

The empty-corpus fuzzer finds a class of pre-existing totality gaps (e.g. an ARM32 BCond offset arithmetic underflow), so Fuzz Smoke is intermittently red on any PR — it is not a required check (branch protection requires none), and these gaps predate this PR. Tracked as #186 rather than expanding this bugfix unboundedly.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

CI clippy (rust 1.96.0) flags `-D clippy::doc-lazy-continuation`: the doc line
starting with `>= num_imports` reads as a malformed blockquote. Wrap it in a
code span so no doc line begins with `>`. Pre-existing on main; unblocks the
v0.11.4 Clippy gate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@avrabe avrabe merged commit 7ccc413 into main May 30, 2026
8 of 12 checks passed
@avrabe avrabe deleted the fix/encoder-high-reg-add-180 branch May 30, 2026 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

opt: repair + re-enable the optimized linear-memory path (root-cause the MemLoad ADD-register corruption)

1 participant