Summary
On synth 8f213351 (post-v0.11.2), the optimized path miscompiles i32.load/i32.store of a dynamic address (an address from local.get / a function parameter): it lowers to a load/store from a fixed constant address (linmem_base + 0x100), ignoring the dynamic operand entirely. --no-optimize is correct (ldr [fp, r0] — base register + param), so this is an optimizer bug. Any wasm function that dereferences a pointer parameter, compiled with optimization on (the default), reads/writes the wrong memory.
This is the on-target blocker for the cross-language-LTO-via-wasm route: the C shim's z_impl_k_sem_give(struct k_sem *sem) reads sem->count from 0x20000100 instead of from the sem pointer the kernel passes in r0, so it cannot work as a drop-in native kernel primitive.
(The relocation/encoding issues #167/#173/#174 are all fixed and verified — calls now resolve correctly. This is a separate, memory-addressing bug. gale's verified functions like gale_k_sem_give_decide(count, limit, has_waiter) are unaffected because they take values, not pointers — which is why GALE_USE_SYNTH worked. The bug surfaces the moment a wasm function dereferences a pointer arg.)
Minimal reproduction
(module
(memory 1)
(func $load_field (export "load_field") (param i32) (result i32)
local.get 0
i32.load offset=0) ;; return *(i32*)param0
(func $store_field (export "store_field") (param i32) (param i32)
local.get 0
local.get 1
i32.store offset=4)) ;; param0->field4 = param1
synth compile ptr.wat --target cortex-m4f --all-exports --relocatable -o ptr.o
arm-zephyr-eabi-objcopy -O binary --only-section=.text ptr.o ptr.bin
arm-zephyr-eabi-objdump -D -b binary -marm -Mforce-thumb ptr.bin
00000000 <load_field>:
0: movw ip, #0x100 ; \
4: movt ip, #0x2000 ; > ip = 0x20000100 (constant — should be linmem_base + r0)
8: adds r4, r5, r1 ; dead/garbage (r5,r1 not initialized)
a: ldr.w r4, [ip] ; load from 0x20000100 — r0 (the address operand) is never used
e: mov r0, r4
10: bx lr
00000014 <store_field>:
14: movw ip, #0x100
18: movt ip, #0x2000 ; ip = 0x20000100 again
1e: str.w r1, [ip, #4] ; store to 0x20000104 — r0 ignored
--no-optimize is correct — it uses the param via the linear-memory base register fp:
00000000 <load_field>:
0: stmdb sp!, {r4,r5,r6,r7,r8,lr}
4: ldr.w r0, [fp, r0] ; linmem_base(fp) + param0 — CORRECT, uses the operand
8: ldmia.w sp!, {...,pc}
0000000c <store_field>:
10: add.w ip, r0, #4 ; param0 + offset 4
14: str.w r1, [fp, ip] ; store at linmem_base(fp) + (param0+4) — CORRECT
So the optimizer is dropping the dynamic operand and substituting the constant 0x100. (The dead adds r4, r5, r1 in the optimized output, with r5/r1 uninitialized, looks like the remains of the intended base+offset computation after the operand was constant-folded away.)
Real-module manifestation
merged.both.wasm's z_impl_k_sem_give (the wasm shim hosting the C↔Rust seam):
800009e: movw ip,#0x100 ; movt ip,#0x2000 → ip = 0x20000100
80000a8: ldr.w r8,[ip] → sem->count (kernel passed sem in r0 — ignored)
80000b6: ldr.w ip,[ip,#4] → sem->limit
80000fa: str.w r5,[sp,#4] → (the new_count store likewise never reaches sem)
The kernel calls z_impl_k_sem_give(real_sem_ptr), but the code reads/writes 0x20000100 — wrong RAM — so the semaphore is never actually updated.
Impact
Blocker for any optimized synth arm output that dereferences pointer parameters / does dynamic memory access — i.e., any wasm function operating on host-provided memory. Functions that take and return only scalar values are unaffected (which is why gale_k_sem_give_decide(count, limit, has_waiter) and the GALE_USE_SYNTH value-only path work).
Secondary / architectural note (separate from the optimizer bug)
Even the correct --no-optimize form uses fp as the linear-memory base ([fp, r0]). For a --relocatable object dropped into a native host (Zephyr kernel) where the caller passes a real pointer in r0 via a plain AAPCS call, the effective address becomes linmem_base + real_ptr, which is wrong unless linmem_base == 0. So making a pointer-dereferencing wasm function callable as a native drop-in also needs a way to set the linmem base to 0 (or otherwise treat pointer params as native addresses). Possibly already covered by the Meld/kiln host model (#34); flagging in case a --relocatable/native-ABI mode should zero the linmem base.
Environment
- synth
8f213351 (post-v0.11.2), arm backend, target cortex-m4f
- optimized (default) path miscompiles;
--no-optimize is correct
- Zephyr SDK 1.0.1 binutils 2.43.1
Summary
On synth
8f213351(post-v0.11.2), the optimized path miscompilesi32.load/i32.storeof a dynamic address (an address fromlocal.get/ a function parameter): it lowers to a load/store from a fixed constant address (linmem_base + 0x100), ignoring the dynamic operand entirely.--no-optimizeis correct (ldr [fp, r0]— base register + param), so this is an optimizer bug. Any wasm function that dereferences a pointer parameter, compiled with optimization on (the default), reads/writes the wrong memory.This is the on-target blocker for the cross-language-LTO-via-wasm route: the C shim's
z_impl_k_sem_give(struct k_sem *sem)readssem->countfrom0x20000100instead of from thesempointer the kernel passes inr0, so it cannot work as a drop-in native kernel primitive.(The relocation/encoding issues #167/#173/#174 are all fixed and verified — calls now resolve correctly. This is a separate, memory-addressing bug. gale's verified functions like
gale_k_sem_give_decide(count, limit, has_waiter)are unaffected because they take values, not pointers — which is whyGALE_USE_SYNTHworked. The bug surfaces the moment a wasm function dereferences a pointer arg.)Minimal reproduction
--no-optimizeis correct — it uses the param via the linear-memory base registerfp:So the optimizer is dropping the dynamic operand and substituting the constant
0x100. (The deadadds r4, r5, r1in the optimized output, withr5/r1uninitialized, looks like the remains of the intendedbase+offsetcomputation after the operand was constant-folded away.)Real-module manifestation
merged.both.wasm'sz_impl_k_sem_give(the wasm shim hosting the C↔Rust seam):The kernel calls
z_impl_k_sem_give(real_sem_ptr), but the code reads/writes0x20000100— wrong RAM — so the semaphore is never actually updated.Impact
Blocker for any optimized synth
armoutput that dereferences pointer parameters / does dynamic memory access — i.e., any wasm function operating on host-provided memory. Functions that take and return only scalar values are unaffected (which is whygale_k_sem_give_decide(count, limit, has_waiter)and theGALE_USE_SYNTHvalue-only path work).Secondary / architectural note (separate from the optimizer bug)
Even the correct
--no-optimizeform usesfpas the linear-memory base ([fp, r0]). For a--relocatableobject dropped into a native host (Zephyr kernel) where the caller passes a real pointer inr0via a plain AAPCS call, the effective address becomeslinmem_base + real_ptr, which is wrong unlesslinmem_base == 0. So making a pointer-dereferencing wasm function callable as a native drop-in also needs a way to set the linmem base to 0 (or otherwise treat pointer params as native addresses). Possibly already covered by the Meld/kiln host model (#34); flagging in case a--relocatable/native-ABI mode should zero the linmem base.Environment
8f213351(post-v0.11.2),armbackend, targetcortex-m4f--no-optimizeis correct