Skip to content

fix(adapter): pad CABI alignment in async-lift retptr writeback (LS-A-10)#148

Merged
avrabe merged 1 commit into
mainfrom
fix/cabi-alignment-async-retptr
May 14, 2026
Merged

fix(adapter): pad CABI alignment in async-lift retptr writeback (LS-A-10)#148
avrabe merged 1 commit into
mainfrom
fix/cabi-alignment-async-retptr

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented May 14, 2026

Summary

Pre-release Mythos delta-pass on `meld-core/src/adapter/fact.rs` (the
Tier-5 file changed by v0.8.0) found a real bug. Release-blocking
for v0.8.0
— this PR fixes it.

The bug

Both `generate_async_callback_adapter` and the new
`generate_async_stackful_adapter` wrote each task.return result global
to the caller's retptr buffer with the byte offset advanced by only
the flat size of the value (4 or 8 bytes) — never aligning up to the
next field's natural alignment.

For any P3 async lift returning an aggregate whose flat lowering mixes
4-byte and 8-byte values — record `{u32, u64}`, tuple `(s32, s64)`,
`result<u64, u32>` — the canonical ABI requires the 8-byte field at
offset 8 (with 4 bytes of padding after the 4-byte field). The emitters
wrote it at offset 4.

Wasm engines treat `MemArg.align` as a hint and do not trap on
misalignment, so this was silent data corruption: the caller's
canon.lower reads the retptr per canonical ABI and sees stale / zero
bytes for the high-order field. Affects every downstream runtime
consuming a meld-fused module (kiln, wasmtime, browsers).

Why both emitters?

The defect pre-dated #146 / #147. The recent stackful emitter inherited
it via a faithful pattern copy from the callback emitter — exactly the
shared-pattern bug category where a refactor for code reuse amplifies
a latent defect because the original code looked correct enough to
copy.

The `emit_param_copy_step` refactor in #147 covered step 0.5 but did
not cover the step-3 retptr writeback. This PR closes that gap.

Fix

New `emit_globals_to_retptr_cabi` helper:

```rust
let (size, align) = natural(val_ty);
offset = align_up(offset, align);
// emit store at offset
offset += size;
```

`align_up` is the standard power-of-two bit-twiddle
`(n + a - 1) & !(a - 1)`. Both emitters now route through this single
CABI-correct writeback.

Test

Regression test `cabi_alignment_stackful_retptr_writes_i64_at_offset_8`
builds a synthetic component with `[I32, I64]` result globals, decodes
the emitted `i64.store` offset via ULEB128, and asserts it equals 8.

Before fix: test fails with offset 4 (confirmed).
After fix: test passes; 207/207 lib + 11/11 P3 async integration green.

Safety case

  • New loss scenario LS-A-10 approved in
    `safety/stpa/loss-scenarios.yaml` under UCA-A-13 (H-4 / H-4.5).
  • Causal factors documented including the "shared-pattern bug" trap
    and the "wasm align is a hint" silent-corruption multiplier.
  • Discovered by the v0.8.0 pre-release Mythos delta-pass per
    `scripts/mythos/discover.md`. Oracle pair: Kani harness + PoC test.

v0.8.0 release gate

Per `docs/roadmap.md`: "Releases without a Mythos delta-pass on
changed Tier-5 files are blocked." The pass ran, surfaced LS-A-10,
this PR fixes it and adds the regression. v0.8.0 cut proceeds after
this lands.

Test plan

  • `cargo test -p meld-core --lib` — 207 pass
  • `cargo test -p meld-core --test p3_async_lowering` — 11 pass
  • `cargo clippy --all-targets -- -D warnings` — clean
  • `cargo fmt --check` — clean
  • CI green on smithy runners

Refs

🤖 Generated with Claude Code

…-10)

Both generate_async_callback_adapter and generate_async_stackful_adapter
wrote result globals to the caller's retptr buffer by advancing the
byte offset by only the flat size of each value (4 or 8 bytes) and
never aligning up to the next field's natural alignment.

For any P3 async lift whose flat-lowered result globals contain an i32
or f32 followed by an i64 or f64 — e.g. record {u32, u64}, tuple
(s32, s64), result<u64, u32> — the canonical ABI requires the 8-byte
field at offset 8 (with 4 bytes of padding after the 4-byte field). The
emitters wrote it at offset 4. Wasm engines treat MemArg.align as a
hint and do not trap on misalignment, so the bug was silent data
corruption: the caller's canon.lower reads the retptr per canonical
ABI and sees stale/zero bytes for the high-order field.

The defect predated #146 / #147; the recent stackful emitter inherited
it via a faithful pattern copy from the callback emitter. The
emit_param_copy_step refactor from #147 did not cover the retptr
writeback, so the duplication remained.

Fix:
- New emit_globals_to_retptr_cabi helper aligns the offset up before
  every store (size, align) = natural(val_ty); offset = align_up(offset,
  align); offset += size). align_up uses the standard power-of-two bit-
  twiddle (n + a - 1) & !(a - 1).
- Both emitters now route through this single CABI-correct writeback.
- Regression test cabi_alignment_stackful_retptr_writes_i64_at_offset_8
  builds a synthetic [I32, I64] result, decodes the emitted i64.store
  offset via ULEB, and asserts it equals 8.

Discovery:
- v0.8.0 pre-release Mythos delta-pass on adapter/fact.rs per
  scripts/mythos/discover.md. Oracle pair: Kani harness (offset
  divergence between emitter loop and align_up reference model) +
  PoC failing test (the regression test added here).
- LS-A-10 added to safety/stpa/loss-scenarios.yaml under UCA-A-13 with
  approved status (the Mythos validate.md round produced a clean
  confirmation).

Tests: 207 lib pass (206 prior + 1 regression), 11 p3_async_lowering
pass (callback path unchanged behaviour for same-aligned results).

Refs: LS-A-10, UCA-A-13, H-4, H-4.5
Discovered by: Mythos discover.md v0.8.0 delta-pass

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@avrabe avrabe added this to the v0.8.0 milestone May 14, 2026
@avrabe avrabe merged commit 9c3d131 into main May 14, 2026
9 checks passed
@avrabe avrabe deleted the fix/cabi-alignment-async-retptr branch May 14, 2026 05:21
@avrabe avrabe mentioned this pull request May 14, 2026
7 tasks
avrabe added a commit that referenced this pull request May 14, 2026
P3 stackful lifting trampoline emitter (#140 / SR-32, sub-#94).

Phase 1 (#146) added the four thread_* host intrinsics under
pulseengine:async with pinned signatures, the uses_stackful_lift()
derived helper, and ADR-1 addendum (a) documenting the two-mode
policy.

Phase 2 (#147) shipped the trampoline emitter. The dispatcher routes
async-lift sites without a [callback] companion to a new
generate_async_stackful_adapter; the trampoline calls the lift
function directly and relies on the runtime to treat the call as a
transparent fiber boundary (transparent suspension on task.wait). The
Phase 1 thread_* ABI remains valid for component-internal concurrency
but is not consumed by the trampoline. ADR-1 addendum (b) records the
shipped design versus the originally sketched thread_new + drive loop
and why the simpler direct-call design was chosen.

Pre-release Mythos delta-pass on adapter/fact.rs surfaced LS-A-10 — a
CABI alignment padding bug in the async-lift retptr writeback that
predated this milestone and affected both callback and stackful
emitters. Fix landed in #148 before tag: extracted shared
emit_globals_to_retptr_cabi helper that pads via align_up before each
store. Regression pinned by
cabi_alignment_stackful_retptr_writes_i64_at_offset_8.

SR-32 status: implemented. Cross-memory (ptr, len) result returns
from stackful mode are deferred to a follow-up under #140 reopen if
needed; the emitter produces a clear error for that case rather than
emit wrong wasm.

Tests: 207 lib pass, 11 P3 async integration pass.

Refs: #94, #140 (closed), SR-32, LS-A-10, ADR-1, #146, #147, #148

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant