feat(measurement): corpus-wide LOOM vs wasm-opt harness + measured wins on components (v0.9.0 PR-P)#110
Merged
Merged
Conversation
Adds scripts/measure_corpus.sh, a single bash harness that runs LOOM and
(optionally) wasm-opt -O3 against a curated set of real-world WebAssembly
fixtures, validates every output via wasm-tools, and emits a markdown report
to docs/measurements/v0.9.0-corpus-baseline.md.
Per-workload pipeline (executed against each canonical fixture path; absent
fixtures are silently marked n/a so the harness is safe to run on any
checkout):
1. Record baseline byte count (wc -c) and code-section size (wasm-tools dump).
2. LOOM: loom optimize <fixture> -> <name>.loom.wasm
3. wasm-opt -O3: wasm-opt -O3 <fixture> -> <name>.wopt.wasm (skipped
cleanly if wasm-opt is not on PATH)
4. wasm-opt -> LOOM: loom optimize <name>.wopt.wasm -- catches the
"does LOOM still help after wasm-opt?" question.
5. wasm-tools validate every output. **Validation failure is a HARD ERROR:**
the harness exits 2 with the offending workload + the wasm-tools message,
so we cannot ship invalid wasm without noticing.
Markdown report:
- One row per workload with Baseline / LOOM / wasm-opt-O3 / wasm-opt->LOOM
byte counts plus signed Δ% vs baseline columns.
- One-paragraph headline naming which workloads LOOM helps on, is neutral
on, or loses on vs wasm-opt.
- Red rows (:red_circle:) for any workload where LOOM grew the baseline OR
wasm-opt beats LOOM by more than 1% of baseline, with a recommendation to
do a gap analysis.
- LOOM commit SHA, branch, version, and tool versions in the header so the
report is reproducible from a single commit.
The current report (this commit) is structurally complete but reports every
canonical fixture as n/a, because the fresh worktree on this branch does not
yet check in the corpus .wasm files. Subsequent PRs that land fixtures (or
re-running the harness from a checkout where artifacts exist) will populate
the table with real numbers. The harness was exercised by hand against the
LOOM optimizer on the existing component fixture to confirm the pipeline is
wired correctly.
Future PRs will use this harness to catch regressions automatically.
Refs: v0.9.0 PR-P
…xtures PR-P agent shipped the harness wired up, but the first real run on gale showed LOOM +45.6% which would be a catastrophic regression. Root cause: `loom optimize` ships with --attestation enabled by default (security feature that embeds a crypto audit trail in a custom section, ~980 bytes for gale). Pass `--attestation false` so the byte-delta column reflects optimization quality, not a security feature's overhead. Also wires in three component fixtures we already have on-disk: - calculator.wasm (2.3 MB root-level) - loom-core/tests/component_fixtures/simple.component.wasm - loom-core/tests/component_fixtures/calc.component.wasm ## Results post-fix | Workload | Baseline | LOOM | wasm-opt -O3 | LOOM Δ% | wasm-opt Δ% | |---|---|---|---|---|---| | gale | 1941 | 1846 | 1925 | -4.9% | -0.8% | | calculator_root | 2,337,724 | 2,327,794 | (errors on components) | -0.4% | n/a | | simple_component | 261 | 212 | (errors) | **-18.8%** | n/a | | calc_component | 442 | 392 | (errors) | **-11.3%** | n/a | Two findings: 1. **LOOM beats wasm-opt -O3 on gale by 4.1 points.** First measured place LOOM dominates at total-file level. 2. **PR-M (Component-Model adapter specialization, v0.8.0) shows -11% to -19% on small adapter-heavy components.** Validates that the v0.8.0 infrastructure work paid off on the workloads it was designed for; gets diluted on large components where core code dominates total size. Trace: REQ-3
Merged
avrabe
added a commit
that referenced
this pull request
May 14, 2026
Bump workspace version 0.8.0 → 0.9.0. Measurement and harvest release. Three PRs landed in parallel via worktree-isolated agents: #110 PR-P corpus-wide LOOM vs wasm-opt -O3 measurement harness #111 PR-K2 span-based CSE replacement infrastructure + verifier-gap #112 PR-L2 Souper rule set: 3 → 12 identities, wire into pipeline First objective measured wins: | Workload | LOOM Δ% | wasm-opt Δ% | |----------|--------:|------------:| | gale | -4.9% | -0.8% (LOOM wins by 4.1 pts) | simple_component | -18.8% | n/a (wasm-opt errors) | calc_component | -11.3% | n/a (wasm-opt errors) | calculator_root | -0.4% | n/a (wasm-opt errors) LOOM beats wasm-opt on gale; PR-M's adapter spec delivers double- digit wins on adapter-heavy components. wasm-opt cannot process Component-Model components at all — LOOM's strategic moat. Critical finding (PR-K2): the Z3 verifier models every Call as a fresh symbolic constant, so cross-call CSE dedup is rejected even when the replacement is correct. PR-K3 (verifier-side) will model pure+no-trap calls as uninterpreted-function applications.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Corpus-wide measurement harness + first real measurements that validate v0.8.0's infrastructure work.
Two layers
scripts/measure_corpus.sh(~280 LOC bash) — runs LOOM and wasm-opt -O3 against a curated set of real-world wasm fixtures, validates every output via wasm-tools, emits a markdown report. Hard-errors on invalid wasm. Honest about missing fixtures.docs/measurements/v0.9.0-corpus-baseline.md— the report. Committed so reviewers see what each release did to which workload.Result table
Key findings
Critical fix during validation
First run showed LOOM at +45.6% on gale, which would be catastrophic. Root cause:
loom optimizedefaults to--attestation true, embedding a ~980-byte crypto audit trail. That's a security feature, not an optimization cost — measurement harness now passes--attestation falseso the byte-delta column reflects optimization quality.Missing fixtures
The harness honestly marks these
n/a:httparse,nom_numbers,state_machine,json_lite,loomself-build,tests/calculator.wasm(the smaller variant)A follow-up PR can land these as committed corpus.
🤖 Generated with Claude Code