Skip to content

cli: --bench --json envelope with top-level engine field#419

Merged
danieljohnmorris merged 3 commits into
mainfrom
feature/bench-engine-field
May 18, 2026
Merged

cli: --bench --json envelope with top-level engine field#419
danieljohnmorris merged 3 commits into
mainfrom
feature/bench-engine-field

Conversation

@danieljohnmorris
Copy link
Copy Markdown
Collaborator

Summary

Slim slice of adoption brief 4. --bench previously ignored output mode and dumped text only. With --json it now emits one JSON object per engine measurement, top-level engine field naming which engine produced the numbers — so agents can filter bench results without parsing prose.

$ ilo 'add a:n b:n>n;+a b' --bench add 1 2 --json
{"schemaVersion":1,"engine":"tree","result":"3","iterations":10000,"totalMs":6.83,"perCallNs":683}
{"schemaVersion":1,"engine":"vm","variant":"fresh","result":"3","iterations":10000,"totalMs":1.55,"perCallNs":155}
{"schemaVersion":1,"engine":"vm","variant":"reusable","result":"3","iterations":10000,"totalMs":0.66,"perCallNs":66}
{"schemaVersion":1,"engine":"jit","result":"3","iterations":10000,"totalMs":0.43,"perCallNs":43}

The other deliverables in the brief either already exist (engine docs at skills/ilo/ilo-engines.md) or were deliberately reversed (the --engine CLI flag, removed in 0e30891). This PR is just the JSON envelope.

What's in the diff

  • cli: --bench honors --json with per-engine enveloperun_bench takes a json flag from the resolved OutputMode. Each engine block emits either text or one JSON line. Python comparator + Summary skipped in JSON mode (Python isn't one of the four engines; Summary is prose).
  • tests: cover bench --json engine field; pin eval_inline bench to --text — new regression_bench_engine_field.rs (4 tests) covers the JSON contract; the 7 pre-existing bench_* tests in eval_inline.rs need --text because subprocess mode now auto-resolves to JSON (matching every other subcommand).
  • docs: ilo-engines.md notes the bench --json schema — replaces the misleading per-engine bench table (every --bench invocation already exercises all engines) with the actual invocations and the JSON envelope shape.

Notes

  • aot is reserved in the JSON contract but --bench doesn't measure AOT today (use ilo build then time the produced binary). When a future change adds an AOT bench it gets "engine":"aot" for free.
  • VM emits two records distinguished by variant: "fresh"|"reusable". Keeping both is useful — the reusable VmState measurement is closer to the steady-state cost agents care about, while the fresh-compile path matches what ilo file.ilo does today.

Test plan

  • cargo test --release --features cranelift (all green)
  • cargo fmt --check
  • cargo clippy --release --features cranelift --tests -- -D warnings
  • New regression test covers tree / vm (fresh + reusable) / jit
  • --text mode unchanged

Follow-ups

  • Add a bench measurement for the AOT engine so "engine":"aot" is actually produced (separate brief — needs design on whether bench builds + spawns a binary, or compiles in-process).
  • Engine-can't-handle-program diagnostic, mentioned in the brief — separate PR, needs the design decision on what to do under the auto-fallback regime.

`--bench` previously dumped human-readable text regardless of output
mode. Wire `run_bench` to the resolved `OutputMode` so `--json` (or
non-TTY auto-detect) emits one JSON object per engine measurement
instead. Top-level `engine` field names which engine produced the
numbers — `tree`, `vm` (with `variant: "fresh"|"reusable"` for the
two VM modes), `jit`, or `llvm`. Python transpiler comparison and
the summary block are skipped under `--json`; text mode is unchanged.

Lets agents read bench results without parsing prose. Closes the
JSON-envelope slice of adoption brief 4.
New regression suite asserts the bench JSON contract:
- one envelope per engine measurement
- top-level `engine` is `tree`, `vm`, or `jit`
- VM emits two records with `variant: "fresh" | "reusable"`
- every line carries iterations / totalMs / perCallNs / result
- `--text` still produces the legacy human-readable text

The 7 pre-existing `bench_*` tests in eval_inline.rs grepped for
"Rust interpreter" / "Register VM" / "Cranelift JIT" headings.
Subprocess-mode output now auto-resolves to JSON when stderr isn't
a TTY (matching every other ilo subcommand), so these tests need
an explicit `--text` to stay on the legacy text contract they were
written against.
Replace the misleading per-engine bench invocation table (every
`--bench` run already exercises tree/vm/jit regardless of engine
flag) with one accurate text invocation, the JSON invocation, and
the envelope shape so agents know what to parse.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

❌ Patch coverage is 98.48485% with 1 line in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/main.rs 98.48% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@danieljohnmorris danieljohnmorris merged commit 618236d into main May 18, 2026
5 checks passed
@danieljohnmorris danieljohnmorris deleted the feature/bench-engine-field branch May 18, 2026 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant