bench: statistical methodology (warmup, confidence intervals) by danieljohnmorris · Pull Request #684 · ilo-lang/ilo

danieljohnmorris · 2026-05-22T05:46:27Z

Summary

Implements ILO-349 (follow-up to ILO-65 / #608).

Warmup: 5 runs discarded before measurement (3 in --quick mode) to eliminate cold-start noise
30 measurement runs per engine/language (10 in --quick), each an independent process invocation; ilo uses its internal 10k-iter loop per run
Full statistics computed per (benchmark, language): min, max, mean, median, p95, p99, stddev
95% confidence intervals via t-distribution (df = n-1), converging to z = 1.960 for large samples
CI overlap detection: after all benchmarks complete, all pairwise CIs are compared; overlapping CIs are reported as "no statistically significant difference" (non-fatal warning, exit 1 can be enabled for CI gates)
results.json schema updated: values are now stats objects instead of bare scalars; a methodology block records warmup/measure counts and CI method

Test plan

./bench/run.sh --quick --no-rust completes without error
bench/results.json contains ci95_lo/ci95_hi fields
CI overlap section prints without crashing on degenerate (single-sample) input
--quick produces 10 samples, normal mode produces 30

🤖 Generated with Claude Code

codecov · 2026-05-22T05:58:39Z

❌ 1 Tests Failed:

Tests completed	Failed	Passed	Skipped
686	1	685	0

View the full list of 1 ❄️ flaky test(s)

ilo::interpreter::tests::interpret_braced_guard_in_loop_no_early_return

Flake rate in main: 66.67% (Passed 1 times, Failed 2 times)

Stack Traces | 0.011s run time

thread 'interpreter::tests::interpret_braced_guard_in_loop_no_early_return' (32872) panicked at src/interpreter/mod.rs:10202:9:
parse errors: [ParseError { code: "ILO-P003", position: 27, span: Span { start: 35, end: 36 }, message: "expected `{`, got `;`", hint: Some("ilo bodies are single-line, `;`-separated — not python/swift-style indented. Use either the brace-block form `name p:t>r;{body1;body2}` or the single-line form `name p:t>r;body1;body2`. For statements that require a block (`@k xs{...}`, `wh cond{...}`, `?subj{...}`), the `{...}` must be on the same line as the head.") }]
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

To view more test analytics, go to the Test Analytics Dashboard
_{📋 Got 3 mins? Take this short survey to help us improve Test Analytics.}

danieljohnmorris · 2026-05-22T09:02:49Z

needs manual rebase (conflicts in: bench/run.sh)

danieljohnmorris · 2026-05-22T09:21:50Z

needs manual — rebase conflict in non-doc file(s). Auto-resolve skipped.

danieljohnmorris · 2026-05-22T09:22:16Z

needs manual — rebase conflict in non-doc file(s)

hotfix(codegen/js): handle Pattern::Or, Expr::Todo, Expr::Panic

danieljohnmorris added the mini Created by mini PC autonomous workflow label May 22, 2026

Merge pull request #720 from ilo-lang/hotfix/js-codegen-new-variants

0485a68

hotfix(codegen/js): handle Pattern::Or, Expr::Todo, Expr::Panic

danieljohnmorris force-pushed the chore/bench-stats-methodology branch from 8527fc7 to 0485a68 Compare May 22, 2026 10:36

danieljohnmorris merged commit f32d1e6 into main May 22, 2026
8 of 10 checks passed

danieljohnmorris deleted the chore/bench-stats-methodology branch May 22, 2026 10:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bench: statistical methodology (warmup, confidence intervals)#684

bench: statistical methodology (warmup, confidence intervals)#684
danieljohnmorris merged 1 commit into
mainfrom
chore/bench-stats-methodology

danieljohnmorris commented May 22, 2026

Uh oh!

codecov Bot commented May 22, 2026 •

edited

Loading

Uh oh!

danieljohnmorris commented May 22, 2026

Uh oh!

danieljohnmorris commented May 22, 2026

Uh oh!

danieljohnmorris commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danieljohnmorris commented May 22, 2026

Summary

Test plan

Uh oh!

codecov Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ 1 Tests Failed:

Uh oh!

danieljohnmorris commented May 22, 2026

Uh oh!

danieljohnmorris commented May 22, 2026

Uh oh!

danieljohnmorris commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented May 22, 2026 •

edited

Loading