cli: --bench --json envelope with top-level engine field by danieljohnmorris · Pull Request #419 · ilo-lang/ilo

danieljohnmorris · 2026-05-18T23:22:12Z

Summary

Slim slice of adoption brief 4. --bench previously ignored output mode and dumped text only. With --json it now emits one JSON object per engine measurement, top-level engine field naming which engine produced the numbers — so agents can filter bench results without parsing prose.

$ ilo 'add a:n b:n>n;+a b' --bench add 1 2 --json
{"schemaVersion":1,"engine":"tree","result":"3","iterations":10000,"totalMs":6.83,"perCallNs":683}
{"schemaVersion":1,"engine":"vm","variant":"fresh","result":"3","iterations":10000,"totalMs":1.55,"perCallNs":155}
{"schemaVersion":1,"engine":"vm","variant":"reusable","result":"3","iterations":10000,"totalMs":0.66,"perCallNs":66}
{"schemaVersion":1,"engine":"jit","result":"3","iterations":10000,"totalMs":0.43,"perCallNs":43}

The other deliverables in the brief either already exist (engine docs at skills/ilo/ilo-engines.md) or were deliberately reversed (the --engine CLI flag, removed in 0e30891). This PR is just the JSON envelope.

What's in the diff

cli: --bench honors --json with per-engine envelope — run_bench takes a json flag from the resolved OutputMode. Each engine block emits either text or one JSON line. Python comparator + Summary skipped in JSON mode (Python isn't one of the four engines; Summary is prose).
tests: cover bench --json engine field; pin eval_inline bench to --text — new regression_bench_engine_field.rs (4 tests) covers the JSON contract; the 7 pre-existing bench_* tests in eval_inline.rs need --text because subprocess mode now auto-resolves to JSON (matching every other subcommand).
docs: ilo-engines.md notes the bench --json schema — replaces the misleading per-engine bench table (every --bench invocation already exercises all engines) with the actual invocations and the JSON envelope shape.

Notes

aot is reserved in the JSON contract but --bench doesn't measure AOT today (use ilo build then time the produced binary). When a future change adds an AOT bench it gets "engine":"aot" for free.
VM emits two records distinguished by variant: "fresh"|"reusable". Keeping both is useful — the reusable VmState measurement is closer to the steady-state cost agents care about, while the fresh-compile path matches what ilo file.ilo does today.

Test plan

cargo test --release --features cranelift (all green)
cargo fmt --check
cargo clippy --release --features cranelift --tests -- -D warnings
New regression test covers tree / vm (fresh + reusable) / jit
--text mode unchanged

Follow-ups

Add a bench measurement for the AOT engine so "engine":"aot" is actually produced (separate brief — needs design on whether bench builds + spawns a binary, or compiles in-process).
Engine-can't-handle-program diagnostic, mentioned in the brief — separate PR, needs the design decision on what to do under the auto-fallback regime.

`--bench` previously dumped human-readable text regardless of output mode. Wire `run_bench` to the resolved `OutputMode` so `--json` (or non-TTY auto-detect) emits one JSON object per engine measurement instead. Top-level `engine` field names which engine produced the numbers — `tree`, `vm` (with `variant: "fresh"|"reusable"` for the two VM modes), `jit`, or `llvm`. Python transpiler comparison and the summary block are skipped under `--json`; text mode is unchanged. Lets agents read bench results without parsing prose. Closes the JSON-envelope slice of adoption brief 4.

New regression suite asserts the bench JSON contract: - one envelope per engine measurement - top-level `engine` is `tree`, `vm`, or `jit` - VM emits two records with `variant: "fresh" | "reusable"` - every line carries iterations / totalMs / perCallNs / result - `--text` still produces the legacy human-readable text The 7 pre-existing `bench_*` tests in eval_inline.rs grepped for "Rust interpreter" / "Register VM" / "Cranelift JIT" headings. Subprocess-mode output now auto-resolves to JSON when stderr isn't a TTY (matching every other ilo subcommand), so these tests need an explicit `--text` to stay on the legacy text contract they were written against.

Replace the misleading per-engine bench invocation table (every `--bench` run already exercises tree/vm/jit regardless of engine flag) with one accurate text invocation, the JSON invocation, and the envelope shape so agents know what to parse.

codecov · 2026-05-18T23:26:41Z

Codecov Report

❌ Patch coverage is 98.48485% with 1 line in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/main.rs	98.48%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

danieljohnmorris added 3 commits May 19, 2026 00:21

danieljohnmorris merged commit 618236d into main May 18, 2026
5 checks passed

danieljohnmorris deleted the feature/bench-engine-field branch May 18, 2026 23:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cli: --bench --json envelope with top-level engine field#419

cli: --bench --json envelope with top-level engine field#419
danieljohnmorris merged 3 commits into
mainfrom
feature/bench-engine-field

danieljohnmorris commented May 18, 2026

Uh oh!

codecov Bot commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danieljohnmorris commented May 18, 2026

Summary

What's in the diff

Notes

Test plan

Follow-ups

Uh oh!

codecov Bot commented May 18, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant