Skip to content

arm optimizer: dynamic i32.load/store address constant-folded to a fixed 0x100 — pointer params ignored (--no-optimize correct) #178

@avrabe

Description

@avrabe

Summary

On synth 8f213351 (post-v0.11.2), the optimized path miscompiles i32.load/i32.store of a dynamic address (an address from local.get / a function parameter): it lowers to a load/store from a fixed constant address (linmem_base + 0x100), ignoring the dynamic operand entirely. --no-optimize is correct (ldr [fp, r0] — base register + param), so this is an optimizer bug. Any wasm function that dereferences a pointer parameter, compiled with optimization on (the default), reads/writes the wrong memory.

This is the on-target blocker for the cross-language-LTO-via-wasm route: the C shim's z_impl_k_sem_give(struct k_sem *sem) reads sem->count from 0x20000100 instead of from the sem pointer the kernel passes in r0, so it cannot work as a drop-in native kernel primitive.

(The relocation/encoding issues #167/#173/#174 are all fixed and verified — calls now resolve correctly. This is a separate, memory-addressing bug. gale's verified functions like gale_k_sem_give_decide(count, limit, has_waiter) are unaffected because they take values, not pointers — which is why GALE_USE_SYNTH worked. The bug surfaces the moment a wasm function dereferences a pointer arg.)

Minimal reproduction

(module
  (memory 1)
  (func $load_field (export "load_field") (param i32) (result i32)
    local.get 0
    i32.load offset=0)          ;; return *(i32*)param0
  (func $store_field (export "store_field") (param i32) (param i32)
    local.get 0
    local.get 1
    i32.store offset=4))        ;; param0->field4 = param1
synth compile ptr.wat --target cortex-m4f --all-exports --relocatable -o ptr.o
arm-zephyr-eabi-objcopy -O binary --only-section=.text ptr.o ptr.bin
arm-zephyr-eabi-objdump -D -b binary -marm -Mforce-thumb ptr.bin
00000000 <load_field>:
   0:  movw  ip, #0x100        ; \
   4:  movt  ip, #0x2000       ;  > ip = 0x20000100  (constant — should be linmem_base + r0)
   8:  adds  r4, r5, r1        ; dead/garbage (r5,r1 not initialized)
   a:  ldr.w r4, [ip]          ; load from 0x20000100 — r0 (the address operand) is never used
   e:  mov   r0, r4
  10:  bx    lr

00000014 <store_field>:
  14:  movw  ip, #0x100
  18:  movt  ip, #0x2000        ; ip = 0x20000100 again
  1e:  str.w r1, [ip, #4]       ; store to 0x20000104 — r0 ignored

--no-optimize is correct — it uses the param via the linear-memory base register fp:

00000000 <load_field>:
   0:  stmdb sp!, {r4,r5,r6,r7,r8,lr}
   4:  ldr.w r0, [fp, r0]      ; linmem_base(fp) + param0  — CORRECT, uses the operand
   8:  ldmia.w sp!, {...,pc}
0000000c <store_field>:
  10:  add.w ip, r0, #4        ; param0 + offset 4
  14:  str.w r1, [fp, ip]      ; store at linmem_base(fp) + (param0+4)  — CORRECT

So the optimizer is dropping the dynamic operand and substituting the constant 0x100. (The dead adds r4, r5, r1 in the optimized output, with r5/r1 uninitialized, looks like the remains of the intended base+offset computation after the operand was constant-folded away.)

Real-module manifestation

merged.both.wasm's z_impl_k_sem_give (the wasm shim hosting the C↔Rust seam):

800009e: movw ip,#0x100 ; movt ip,#0x2000   → ip = 0x20000100
80000a8: ldr.w r8,[ip]                       → sem->count   (kernel passed sem in r0 — ignored)
80000b6: ldr.w ip,[ip,#4]                    → sem->limit
80000fa: str.w r5,[sp,#4]                     → (the new_count store likewise never reaches sem)

The kernel calls z_impl_k_sem_give(real_sem_ptr), but the code reads/writes 0x20000100 — wrong RAM — so the semaphore is never actually updated.

Impact

Blocker for any optimized synth arm output that dereferences pointer parameters / does dynamic memory access — i.e., any wasm function operating on host-provided memory. Functions that take and return only scalar values are unaffected (which is why gale_k_sem_give_decide(count, limit, has_waiter) and the GALE_USE_SYNTH value-only path work).

Secondary / architectural note (separate from the optimizer bug)

Even the correct --no-optimize form uses fp as the linear-memory base ([fp, r0]). For a --relocatable object dropped into a native host (Zephyr kernel) where the caller passes a real pointer in r0 via a plain AAPCS call, the effective address becomes linmem_base + real_ptr, which is wrong unless linmem_base == 0. So making a pointer-dereferencing wasm function callable as a native drop-in also needs a way to set the linmem base to 0 (or otherwise treat pointer params as native addresses). Possibly already covered by the Meld/kiln host model (#34); flagging in case a --relocatable/native-ABI mode should zero the linmem base.

Environment

  • synth 8f213351 (post-v0.11.2), arm backend, target cortex-m4f
  • optimized (default) path miscompiles; --no-optimize is correct
  • Zephyr SDK 1.0.1 binutils 2.43.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions