test(e2e): golden behavioural-equivalence harness (Tier A + B) by avrabe · Pull Request #213 · pulseengine/meld

avrabe · 2026-05-31T18:52:07Z

Answers "how do I know it really works" with a differential end-to-end test of meld's central claim — fusion preserves observable behaviour:

Run a real component unfused under deterministic wasmtime → that result is the golden (no hand-authored expected values).
meld fuse it.
Run the fused output the same way.
Assert identical observable behaviour (run-ok + stdout / typed return).

Tier A — active, green (12 checks)

Single-component round-trip equivalence over the wit-bindgen ABI fixtures and real cross-language command components (hello_rust / hello_c_cli / hello_cpp_cli), both memory strategies. The hello_* cases assert byte-identical stdout — a real behavioural diff, not just "didn't trap." SharedMemory+rebasing correctly declines memory.grow fixtures (logged skip — meld refusing is not a divergence).

Tier B — discovery oracle (`#[ignore]` on #212)

Fuses a real two-component composition built offline with wasm-tools + wac (compose/build.sh): consumer.runner.compute() calls provider.add(20,22) = 42. The body asserts the meld-fused output computes 42 standalone — the acceptance test for #212, un-ignore when it lands.

Building Tier B surfaced three real multi-component fusion gaps (filed as #212):

Separate-input cross-component interface links are not internalised (fused output still imports the dependency).
wac-composed inputs lose their top-level export (empty world root {}).
Bare world-level func exports drop their result type → invalid component.

Tier A proves real wit-bindgen compositions fuse + run with identical behaviour today; Tier B marks the boundary of what multi-component fusion doesn't yet handle.

Honest boundary

Equivalence is proven under wasmtime (the reference runtime), not the synth/kiln MCU target — a module passing here can still break after synth transcodes it. That cross-repo hardware smoke is tracked separately (owner: you).

Fixtures are committed (*.wasm is gitignored; force-added like the existing fixtures) and regenerable via compose/build.sh.

🤖 Generated with Claude Code

meld's central claim is "fusion preserves observable behaviour." This harness falsifies it differentially: run a real component unfused under deterministic wasmtime (the result IS the golden), meld-fuse it, run the fused output the same way, assert identical observable behaviour. No hand-authored expected values. Tier A (active, green): single-component round-trip equivalence over the wit-bindgen ABI fixtures + real cross-language command components (hello_rust/c/cpp), both memory strategies. 12 fuse-and-run checks; the hello_* ones assert byte-identical stdout. SharedMemory+rebasing correctly declines memory.grow fixtures (logged skip, not a divergence). Tier B (discovery oracle, #[ignore] on meld#212): fuse a real two-component composition (consumer.runner.compute -> provider.add(20,22) = 42, built offline via wasm-tools + wac, see compose/build.sh) and assert the fused output computes 42 standalone. Building it surfaced three real multi-component fusion gaps (meld#212): separate-input cross-component links not internalised, wac-composed exports dropped, bare-world func-export result type dropped. The test body is the fix's acceptance test — un-ignore when #212 lands. Honest boundary: equivalence under wasmtime (reference runtime), NOT the synth/kiln MCU target — that hardware smoke is tracked separately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-31T18:54:43Z

LS-N verification gate

⚠️ 36/38 verified — 2 missing regression tests

	count
Passed (≥1 test, all green)	36
Failed (≥1 test failure)	0
Missing (no `ls__NN_` test found)	2

_{Approved loss-scenarios.yaml entries are expected to have a

regression test named ls_<letter>_<num>_* (e.g. LS-A-11 →

ls_a_11_*). The gate runs each prefix via cargo test --lib --no-fail-fast and aggregates pass/fail/missing.}

Failed LS entries

(none)

Missing regression tests

LS-R-13
LS-M-6

_{Updated automatically by tools/post_verification_comment.py.

Source of truth: safety/stpa/loss-scenarios.yaml.}

avrabe merged commit 45c6f42 into main Jun 2, 2026
14 checks passed

avrabe deleted the test/golden-e2e branch June 2, 2026 05:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(e2e): golden behavioural-equivalence harness (Tier A + B)#213

test(e2e): golden behavioural-equivalence harness (Tier A + B)#213
avrabe merged 1 commit into
mainfrom
test/golden-e2e

avrabe commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

avrabe commented May 31, 2026

Tier A — active, green (12 checks)

Tier B — discovery oracle (#[ignore] on #212)

Honest boundary

Uh oh!

github-actions Bot commented May 31, 2026

LS-N verification gate

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Tier B — discovery oracle (`#[ignore]` on #212)