Summary
On synth v0.11.9 (00b64c2d), a pointer parameter that is live across calls is not preserved in a complex (register-heavy, i64) function — the subsequent i32.load (local.get <param>) reads from a fixed linmem_base + 0x100 + <clobbered r0> instead of the spilled/reloaded param. This is the #188 class, but #188's minimal repro is fixed and all my minimal reductions now compile correctly — only the full function reproduces it.
The function is the loom-inlined gale seam: z_impl_k_sem_give with gale_k_sem_give_decide already inlined (great — loom v1.1.5 dissolves the seam; this object compiles + links with 5 real kernel relocations). But the sem->count/sem->limit loads are wrong, so it can't run on-target.
Compiled z_impl (the bug)
88: sub sp, #24
8c: movw r8, #0x400
90: bl k_spin_lock ; r0 clobbered
9a: bl z_unpend_first_thread ; r0 := thread
a0: movw ip, #0x100
a4: movt ip, #0x2000 ; ip = 0x20000100
a8: add ip, ip, r0 ; r0 = thread, NOT sem
ac: ldr ip, [ip] ; sem->count from 0x20000100 + thread — WRONG
b8: add ip, ip, r0 ; same for sem->limit
bc: ldr ip, [ip, #4]
sem (r0 at entry) is live across k_spin_lock and z_unpend_first_thread but is neither spilled/reloaded nor kept in a callee-saved reg; the loads use the clobbered r0 plus a spurious +0x100.
Minimal cases all PASS (so it's complex-function / register-pressure specific)
These three all compile correctly (param spilled to [sp] and reloaded, then ldr [fp, r0]):
So the per-call preservation works in isolation; something about the full function (it's the one that used to hit #171 i64 regalloc, now compilable only after loom inlines decide) regresses the param preservation under register pressure.
Reproduction
WAT of the loom-inlined module (the repro): https://gist.github.com/avrabe/d30f965ac96b8ca4cac7d4fea2832a6a
# (from the synth#167 gist's merged.both.wasm)
loom optimize merged.both.wasm --passes inline -o merged.both.loom.wasm # loom v1.1.5
synth compile merged.both.loom.wasm --target cortex-m4f --all-exports --relocatable -o out.o
arm-zephyr-eabi-objcopy -O binary --only-section=.text out.o t.bin
arm-zephyr-eabi-objdump -D -b binary -marm -Mforce-thumb t.bin # see z_impl @ 0x88
Impact
This is the last blocker for the on-target wasm-cross-LTO measurement: the object now compiles (no #171 skip), links (5 real kernel relocations), and the seam is dissolved — but z_impl reads the semaphore from the wrong address, so it would corrupt memory on the NUCLEO-G474RE. Once the param is preserved here, we can flash and capture the handoff-cycle number.
Environment
- synth v0.11.9 (
00b64c2d); input produced by loom v1.1.5 (95a5f982)
- target cortex-m4f; Zephyr SDK 1.0.1 binutils 2.43.1
Summary
On synth v0.11.9 (
00b64c2d), a pointer parameter that is live across calls is not preserved in a complex (register-heavy, i64) function — the subsequenti32.load (local.get <param>)reads from a fixedlinmem_base + 0x100 + <clobbered r0>instead of the spilled/reloaded param. This is the #188 class, but #188's minimal repro is fixed and all my minimal reductions now compile correctly — only the full function reproduces it.The function is the loom-inlined gale seam:
z_impl_k_sem_givewithgale_k_sem_give_decidealready inlined (great — loom v1.1.5 dissolves the seam; this object compiles + links with 5 real kernel relocations). But thesem->count/sem->limitloads are wrong, so it can't run on-target.Compiled
z_impl(the bug)sem(r0 at entry) is live acrossk_spin_lockandz_unpend_first_threadbut is neither spilled/reloaded nor kept in a callee-saved reg; the loads use the clobbered r0 plus a spurious+0x100.Minimal cases all PASS (so it's complex-function / register-pressure specific)
These three all compile correctly (param spilled to
[sp]and reloaded, thenldr [fp, r0]):i64.store/i32.loadthrough the param (live3.wat)So the per-call preservation works in isolation; something about the full function (it's the one that used to hit #171 i64 regalloc, now compilable only after loom inlines
decide) regresses the param preservation under register pressure.Reproduction
WAT of the loom-inlined module (the repro): https://gist.github.com/avrabe/d30f965ac96b8ca4cac7d4fea2832a6a
Impact
This is the last blocker for the on-target wasm-cross-LTO measurement: the object now compiles (no #171 skip), links (5 real kernel relocations), and the seam is dissolved — but
z_implreads the semaphore from the wrong address, so it would corrupt memory on the NUCLEO-G474RE. Once the param is preserved here, we can flash and capture the handoff-cycle number.Environment
00b64c2d); input produced by loom v1.1.5 (95a5f982)