Skip to content

engine feature-support audit: 49-row corpus + cross-engine harness#413

Merged
danieljohnmorris merged 1 commit into
mainfrom
feature/engine-audit
May 18, 2026
Merged

engine feature-support audit: 49-row corpus + cross-engine harness#413
danieljohnmorris merged 1 commit into
mainfrom
feature/engine-audit

Conversation

@danieljohnmorris
Copy link
Copy Markdown
Collaborator

Summary

Investigation-only PR for adoption brief 5. Lands the test corpus and harness needed to run the cross-engine feature audit, plus the empirical data needed for reference/engines.md (engines-as-contracts). No engine source changed.

  • 49 small .ilo programs under tests/engine-matrix/, one per feature
  • tests/engine-matrix/run-matrix.sh runs every program through tree / VM / Cranelift JIT / Cranelift AOT and prints a markdown matrix
  • Audit write-up at research/engine-audit-2026-05.md in the worktree (local-only per .gitignore for research/, but informs the public engines-as-contracts page)

Results

Tree / VM / Cranelift JIT are at feature parity across the audited surface (46 of 47 non-skipped rows OK on each). The lone shared miss is returning a capturing closure from a function (>F n n body of a lambda) which the parser rejects on every engine — call it a parser surface gap, not an engine bug.

AOT diverges on 9 rows, every failure closure / HOF / sum-type related:

  • map (x:n>n;*x 2) [1,2,3] returns [nil, nil, nil] from an AOT binary
  • map (x:n>n;+x k) [1,2,3] (capture) same shape
  • map fn ctx xs (3-arg closure-bind) returns nil
  • fld add [1,2,3,4] 0 returns nil
  • grp and uniqby (PR feature: srt/grp/uniqby off tree-bridge (Phase 2 PR3c) #391 surface) return nil
  • Returning a fn-ref via >F n n;sq returns nil
  • S a b c sum-type variant: AOT compile error (Duplicate definition of identifier: ilo_strconst_1)

Every other AOT row matches its peers.

Bugs / gaps surfaced (full list in research doc)

  1. AOT silently miscomputes any HOF that takes a function value — agents writing map (\x. *x 2) xs will deploy a binary that returns nils, no warning. Proposed ILO-R014.
  2. AOT default entry is "first declared function", not mainilo compile file.ilo -o out && ./out silently segfaults (exit 139) for any program that declares a helper before main. The harness passes main explicitly to dodge this; recommend the dispatcher pick main when it exists.
  3. AOT panics surface as SIGSEGV, not a JSON diagnostic — agents can't distinguish user-error from runtime-panic. Proposed ILO-R015.
  4. AOT divide-by-zero returns nil, exits 0 — tree/VM/JIT return inf per f64 semantics. Three engines, three answers.
  5. AOT cannot compile sum-type variantsDuplicate definition of identifier in cranelift backend's string-constant interning when sum tags collide with other string constants.
  6. SPEC.md drift — claims closure capture is tree-only and that VM / JIT raise ILO-R012 with auto-fallback. Empirically, --run-vm and --jit handle Phase 2 captures natively now (likely since PRs leading up to feature: srt/grp/uniqby off tree-bridge (Phase 2 PR3c) #391). SPEC.md, ai.txt, skills/ilo/SKILL.md all need a pass.
  7. Returning a capturing closure has no surface syntaxmkadd k:n>F n n;x:n>n;+x k is rejected by the parser on every engine.
  8. AOT does not support cross-compilation--target flag does not exist.

What's in the diff

One commit:

  • add engine-matrix test corpus + run harness for cross-engine audit — 49 .ilo programs + bash harness + README under tests/engine-matrix/. Each program has -- feature: + -- expected: header lines so future audits can re-run via the harness.

Test plan

  • cargo build --release --features cranelift (used by harness)
  • bash tests/engine-matrix/run-matrix.sh — produces the matrix; 9 AOT FAIL cells reproduce the bugs above
  • No source code in src/ touched
  • Corpus files have stable, minimal content suitable for re-running after any compiler change to detect regressions

Follow-ups

Per the brief, this PR does not dispatch fixes. Top three to prioritise from the gap list:

  1. AOT HOF dispatch (rows 16, 17, 18, 31, 36, 37, 38) — biggest concrete user-facing wrongness
  2. AOT default-entry main resolution — cheapest fix, removes a SIGSEGV trap
  3. SPEC.md / ai.txt closure-capture drift — doc-only, blocks the engines-as-contracts brief

49 small .ilo programs each exercising one feature, plus a bash harness
that runs every program through tree / VM / Cranelift JIT / Cranelift
AOT and prints a markdown matrix. Used to populate the audit doc at
research/engine-audit-2026-05.md (research/ is gitignored).

Harness passes `main` as the AOT entry function explicitly. The default
"first declared function in the file" entry is rarely correct for
real programs and silently segfaults; flagged as a gap in the audit doc.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@danieljohnmorris danieljohnmorris merged commit 38d2bc5 into main May 18, 2026
5 checks passed
@danieljohnmorris danieljohnmorris deleted the feature/engine-audit branch May 18, 2026 22:54
danieljohnmorris added a commit that referenced this pull request May 19, 2026
Engine audit PR #413 found AOT silently returns nil for HOF / closure
dispatch (map over a lambda, fld, grp, uniqby, fn-ref return). Root cause:
AOT never publishes ACTIVE_PROGRAM, so jit_call_dyn and jit_call_builtin_tree
hit their null-program guards and return TAG_NIL for every user-fn callback.

To fix, the AOT binary needs a CompiledProgram at runtime. Add a postcard
wire format (chunks + func_names + is_tool + type_registry + ast) gated by
schema_version. Chunk constants encode only the variants the compiler
emits today (Nil / Number / Text / Bool / List); a future variant trips
the From<&Value> guard in the test suite before any binary ships.

The AST is serialised as JSON inside the postcard envelope because the
Program::serialize_decls custom impl uses serialize_seq(None) which postcard
rejects with "The length of a sequence must be known". serde_json handles
unsized seqs and the AST already serialises cleanly via JSON for --ast.

Round-trip unit tests cover the empty program, a map-lambda program, an
fld user-fn program, type-registry preservation, and schema-version
mismatch detection.
danieljohnmorris added a commit that referenced this pull request May 19, 2026
Engine audit PR #413 found AOT silently returns nil for HOF / closure
dispatch (map over a lambda, fld, grp, uniqby, fn-ref return). Root cause:
AOT never publishes ACTIVE_PROGRAM, so jit_call_dyn and jit_call_builtin_tree
hit their null-program guards and return TAG_NIL for every user-fn callback.

To fix, the AOT binary needs a CompiledProgram at runtime. Add a postcard
wire format (chunks + func_names + is_tool + type_registry + ast) gated by
schema_version. Chunk constants encode only the variants the compiler
emits today (Nil / Number / Text / Bool / List); a future variant trips
the From<&Value> guard in the test suite before any binary ships.

The AST is serialised as JSON inside the postcard envelope because the
Program::serialize_decls custom impl uses serialize_seq(None) which postcard
rejects with "The length of a sequence must be known". serde_json handles
unsized seqs and the AST already serialises cleanly via JSON for --ast.

Round-trip unit tests cover the empty program, a map-lambda program, an
fld user-fn program, type-registry preservation, and schema-version
mismatch detection.
danieljohnmorris added a commit that referenced this pull request May 19, 2026
Engine audit PR #413 found AOT silently returns nil for HOF / closure
dispatch (map over a lambda, fld, grp, uniqby, fn-ref return). Root cause:
AOT never publishes ACTIVE_PROGRAM, so jit_call_dyn and jit_call_builtin_tree
hit their null-program guards and return TAG_NIL for every user-fn callback.

To fix, the AOT binary needs a CompiledProgram at runtime. Add a postcard
wire format (chunks + func_names + is_tool + type_registry + ast) gated by
schema_version. Chunk constants encode only the variants the compiler
emits today (Nil / Number / Text / Bool / List); a future variant trips
the From<&Value> guard in the test suite before any binary ships.

The AST is serialised as JSON inside the postcard envelope because the
Program::serialize_decls custom impl uses serialize_seq(None) which postcard
rejects with "The length of a sequence must be known". serde_json handles
unsized seqs and the AST already serialises cleanly via JSON for --ast.

Round-trip unit tests cover the empty program, a map-lambda program, an
fld user-fn program, type-registry preservation, and schema-version
mismatch detection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant