engine feature-support audit: 49-row corpus + cross-engine harness by danieljohnmorris · Pull Request #413 · ilo-lang/ilo

danieljohnmorris · 2026-05-18T22:24:48Z

Summary

Investigation-only PR for adoption brief 5. Lands the test corpus and harness needed to run the cross-engine feature audit, plus the empirical data needed for reference/engines.md (engines-as-contracts). No engine source changed.

49 small .ilo programs under tests/engine-matrix/, one per feature
tests/engine-matrix/run-matrix.sh runs every program through tree / VM / Cranelift JIT / Cranelift AOT and prints a markdown matrix
Audit write-up at research/engine-audit-2026-05.md in the worktree (local-only per .gitignore for research/, but informs the public engines-as-contracts page)

Results

Tree / VM / Cranelift JIT are at feature parity across the audited surface (46 of 47 non-skipped rows OK on each). The lone shared miss is returning a capturing closure from a function (>F n n body of a lambda) which the parser rejects on every engine — call it a parser surface gap, not an engine bug.

AOT diverges on 9 rows, every failure closure / HOF / sum-type related:

map (x:n>n;*x 2) [1,2,3] returns [nil, nil, nil] from an AOT binary
map (x:n>n;+x k) [1,2,3] (capture) same shape
map fn ctx xs (3-arg closure-bind) returns nil
fld add [1,2,3,4] 0 returns nil
grp and uniqby (PR feature: srt/grp/uniqby off tree-bridge (Phase 2 PR3c) #391 surface) return nil
Returning a fn-ref via >F n n;sq returns nil
S a b c sum-type variant: AOT compile error (Duplicate definition of identifier: ilo_strconst_1)

Every other AOT row matches its peers.

Bugs / gaps surfaced (full list in research doc)

AOT silently miscomputes any HOF that takes a function value — agents writing map (\x. *x 2) xs will deploy a binary that returns nils, no warning. Proposed ILO-R014.
AOT default entry is "first declared function", not main — ilo compile file.ilo -o out && ./out silently segfaults (exit 139) for any program that declares a helper before main. The harness passes main explicitly to dodge this; recommend the dispatcher pick main when it exists.
AOT panics surface as SIGSEGV, not a JSON diagnostic — agents can't distinguish user-error from runtime-panic. Proposed ILO-R015.
AOT divide-by-zero returns nil, exits 0 — tree/VM/JIT return inf per f64 semantics. Three engines, three answers.
AOT cannot compile sum-type variants — Duplicate definition of identifier in cranelift backend's string-constant interning when sum tags collide with other string constants.
SPEC.md drift — claims closure capture is tree-only and that VM / JIT raise ILO-R012 with auto-fallback. Empirically, --run-vm and --jit handle Phase 2 captures natively now (likely since PRs leading up to feature: srt/grp/uniqby off tree-bridge (Phase 2 PR3c) #391). SPEC.md, ai.txt, skills/ilo/SKILL.md all need a pass.
Returning a capturing closure has no surface syntax — mkadd k:n>F n n;x:n>n;+x k is rejected by the parser on every engine.
AOT does not support cross-compilation — --target flag does not exist.

What's in the diff

One commit:

add engine-matrix test corpus + run harness for cross-engine audit — 49 .ilo programs + bash harness + README under tests/engine-matrix/. Each program has -- feature: + -- expected: header lines so future audits can re-run via the harness.

Test plan

cargo build --release --features cranelift (used by harness)
bash tests/engine-matrix/run-matrix.sh — produces the matrix; 9 AOT FAIL cells reproduce the bugs above
No source code in src/ touched
Corpus files have stable, minimal content suitable for re-running after any compiler change to detect regressions

Follow-ups

Per the brief, this PR does not dispatch fixes. Top three to prioritise from the gap list:

AOT HOF dispatch (rows 16, 17, 18, 31, 36, 37, 38) — biggest concrete user-facing wrongness
AOT default-entry main resolution — cheapest fix, removes a SIGSEGV trap
SPEC.md / ai.txt closure-capture drift — doc-only, blocks the engines-as-contracts brief

49 small .ilo programs each exercising one feature, plus a bash harness that runs every program through tree / VM / Cranelift JIT / Cranelift AOT and prints a markdown matrix. Used to populate the audit doc at research/engine-audit-2026-05.md (research/ is gitignored). Harness passes `main` as the AOT entry function explicitly. The default "first declared function in the file" entry is rarely correct for real programs and silently segfaults; flagged as a gap in the audit doc.

codecov · 2026-05-18T22:29:12Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

Engine audit PR #413 found AOT silently returns nil for HOF / closure dispatch (map over a lambda, fld, grp, uniqby, fn-ref return). Root cause: AOT never publishes ACTIVE_PROGRAM, so jit_call_dyn and jit_call_builtin_tree hit their null-program guards and return TAG_NIL for every user-fn callback. To fix, the AOT binary needs a CompiledProgram at runtime. Add a postcard wire format (chunks + func_names + is_tool + type_registry + ast) gated by schema_version. Chunk constants encode only the variants the compiler emits today (Nil / Number / Text / Bool / List); a future variant trips the From<&Value> guard in the test suite before any binary ships. The AST is serialised as JSON inside the postcard envelope because the Program::serialize_decls custom impl uses serialize_seq(None) which postcard rejects with "The length of a sequence must be known". serde_json handles unsized seqs and the AST already serialises cleanly via JSON for --ast. Round-trip unit tests cover the empty program, a map-lambda program, an fld user-fn program, type-registry preservation, and schema-version mismatch detection.

danieljohnmorris merged commit 38d2bc5 into main May 18, 2026
5 checks passed

danieljohnmorris deleted the feature/engine-audit branch May 18, 2026 22:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

engine feature-support audit: 49-row corpus + cross-engine harness#413

engine feature-support audit: 49-row corpus + cross-engine harness#413
danieljohnmorris merged 1 commit into
mainfrom
feature/engine-audit

danieljohnmorris commented May 18, 2026

Uh oh!

codecov Bot commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danieljohnmorris commented May 18, 2026

Summary

Results

Bugs / gaps surfaced (full list in research doc)

What's in the diff

Test plan

Follow-ups

Uh oh!

codecov Bot commented May 18, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant