fix(encoder): high-register Thumb ADD/ADDS/SUBS use 32-bit .W — root-cause #180, re-enable optimized memory path#183
Conversation
) Root-cause of the optimized linear-memory miscompilation (#178, mitigated in #179 by declining memory ops). The bug was in the Thumb *encoder*, not the optimizer lowering: `ArmOp::Add` (and `Adds`/`Subs`) reg-forms unconditionally emitted the 16-bit encoding, whose 3-bit register fields overflow for high registers. The MemLoad/MemStore base scratch is R12, so `add ip, ip, r0` was emitted as the corrupt `adds r4, r5, r1` (0x186C) — silently dropping the address operand, making every optimized pointer-deref read/write a fixed address. (i64 `Adds`/`Subs` low-word ops hit the same class via R8-R11 pairs.) Fix: guard the 16-bit form on rd/rn/rm < 8 and fall back to 32-bit ADD.W/ADDS.W/SUBS.W for high registers, exactly as the Sub handler already did. - arm_encoder.rs: high-reg guards + encode_thumb32_adds/subs_reg_raw helpers - optimizer_bridge.rs: remove the #179 decline + is_linear_memory_op (obsolete) - byte-level encoder regression tests (EB0C 0C00 / EB1A 0A08 / EBBA 0A08) - re-enable the 4 #[ignore]d issue_104 memory CSE tests - audit_optimized_aapcs: re-assert optimized memory codegen (not the decline) - wast_compile: rewrite the #178 CLI test to verify the address operand is used (no corrupt 6C 18; contains add.w ip, ip, rN) instead of byte-equality Verified: optimized `i32.load`/`i32.store` of a pointer param now lowers to `movw/movt ip; add.w ip, ip, r0; ldr/str [ip,#off]` (byte-checked via objdump), matching --no-optimize semantics. Full suite: 1282 passed, 0 failed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Bump [workspace.package] version and every path-dep version pin 0.11.3 → 0.11.4 in lockstep (Cargo.toml ×11 + MODULE.bazel + Cargo.lock), per the mandatory pre-tag pin-sweep. CHANGELOG [0.11.4] with falsification statement. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
#185) Two fuzz/lint fixes surfaced while landing #180: - #185: `encoder_no_panic` found a pre-existing panic — feeding PC (R15) as a data operand to a Thumb-2 op guarded by `verify_reg_bits` aborts under `-Cdebug-assertions`. Convert the 11 `verify_reg_bits` debug-assert sites to a fallible `reg_bits_checked() -> Result<()>` that returns a typed Err, matching the established "return Err, not panic" pattern (#101/#120/#132). Add a deterministic unit test + seed the crash input into the fuzz corpus. - Clippy (CI rust 1.96.0) flagged a collapsible `if let { if }` in the #167 wast_compile test; rewrite as a let-chain. The encoder has further pre-existing totality gaps (arithmetic underflow in ARM32 BCond offset, etc.) tracked in #186 — out of scope for this bugfix. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Update: two CI findings fixed in
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
CI clippy (rust 1.96.0) flags `-D clippy::doc-lazy-continuation`: the doc line starting with `>= num_imports` reads as a malformed blockquote. Wrap it in a code span so no doc line begins with `>`. Pre-existing on main; unblocks the v0.11.4 Clippy gate. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Summary
Root-causes and fixes the optimizer linear-memory miscompilation (#178, mitigated in #179; tracked for repair in #180). The bug was not in the optimizer lowering — it was in the Thumb encoder.
ArmOp::Add(andAdds/Subs) register-forms unconditionally emitted the 16-bit encoding, whose 3-bit register fields overflow for high registers.MemLoad/MemStoreuse R12 as the base+address scratch, soadd ip, ip, r0was emitted as the corruptadds r4, r5, r1(0x186C) — silently dropping the address operand, making every optimized pointer-deref hit a fixed address. The i64 low-wordAdds/Subsops hit the same class via R8–R11 register pairs.Fix: guard the 16-bit form on
rd/rn/rm < 8and fall back to 32-bitADD.W/ADDS.W/SUBS.Wfor high registers — exactly as theSubhandler already did. The bug was a copy-paste divergence (Add lacked the guard Sub had).Before → after (the #178 reproducer, optimized path)
Changes
arm_encoder.rs: high-register guards onAdd/Adds/Subs+encode_thumb32_adds/subs_reg_rawhelpersoptimizer_bridge.rs: remove the fix(opt): decline linear-memory ops — optimized path miscompiled addresses (#178) #179 decline and the now-obsoleteis_linear_memory_opEB0C 0C00/EB1A 0A08/EBBA 0A08)#[ignore]dissue_104memory-CSE testsaudit_optimized_aapcs: re-assert optimized memory codegen (was asserting the decline)wast_compile: rewrite the arm optimizer: dynamic i32.load/store address constant-folded to a fixed 0x100 — pointer params ignored (--no-optimize correct) #178 CLI test to verify the address operand is used (no corrupt6C 18; containsadd.w ip, ip, rN) — the old byte-equality premise only held under the declineVerification
--exclude synth-verifyper z3 quirk)arm-none-eabi-objdumpon the arm optimizer: dynamic i32.load/store address constant-folded to a fixed 0x100 — pointer params ignored (--no-optimize correct) #178 reproducer--no-optimizehonor the r0 operand)Follow-up (out of scope, audit-noted)
arm_encoder.rs) is unguarded for high registers but is only reached today via the immediate form (32-bit CMN.W); filed as an audit follow-up rather than a speculative fallback.Closes #180.
🤖 Generated with Claude Code