chore(bench): hardware standardisation for CI runner (ILO-348) by danieljohnmorris · Pull Request #683 · ilo-lang/ilo

danieljohnmorris · 2026-05-22T05:42:27Z

Summary

Follow-up to ILO-65 / #608. Bench harness was running on whatever runner GitHub picked, making nightly numbers noisy and incomparable.

Pin runner shape — bench.yml uses ubuntu-latest (GitHub's standard 2-core / 7 GB RAM machines) and documents why it is the pinning point.
Record hardware in results.json — every result file now carries a top-level "hardware" block with cpu_model, cpu_count, and mem_gb.
Reject mismatched hardware — new "Collect hardware info and check baseline" step reads /proc/cpuinfo + /proc/meminfo, seeds bench/hw-baseline.json on first run, and fails the job if the shape differs from the baseline. Regression comparison also skips (rather than false-positives) when the two result files were produced on different hardware.

Files changed

File	Change
`.github/workflows/bench.yml`	hardware check step; regression skip guard; seed `hw-baseline.json` in commit pattern
`bench/run.sh`	embed `hardware` block in `results.json` output
`bench/results.json`	back-filled `hardware` block for existing baseline
`bench/hw-baseline.json`	new — seeds baseline from existing run (AMD EPYC 7763, 2-core, 6.8 GB)

Test plan

Trigger bench workflow manually on a standard ubuntu-latest runner — should print "Hardware matches baseline — proceeding."
Verify bench/results.json now contains a "hardware" key after a run
Delete bench/hw-baseline.json, re-run — should seed a new baseline and commit it
Force a cpu_count mismatch locally by editing hw-baseline.json — job should exit 1 with "HARDWARE MISMATCH"

Closes ILO-348.

🤖 Generated with Claude Code

…-348) - bench.yml: adds a "Collect hardware info and check baseline" step that reads /proc/cpuinfo and /proc/meminfo, seeds bench/hw-baseline.json on first run, and exits non-zero when the runner shape differs from the baseline so polluted results are never committed. - bench/run.sh: embeds cpu_model, cpu_count, mem_gb into results.json under a top-level "hardware" key; falls back to live detection on local runs when .hw-info.json is absent. - bench/results.json: back-fills hardware block for the existing baseline run (AMD EPYC 7763, 2-core, 6.8 GB — standard GitHub ubuntu-latest). - bench/hw-baseline.json: seeds the hardware baseline from that same run. - Regression check in bench.yml skips comparison when hardware changed between the two result files (belt-and-suspenders guard). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

codecov · 2026-05-22T05:54:56Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

danieljohnmorris merged commit f61503a into main May 22, 2026
5 checks passed

danieljohnmorris deleted the chore/bench-hw-standardise branch May 22, 2026 06:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(bench): hardware standardisation for CI runner (ILO-348)#683

chore(bench): hardware standardisation for CI runner (ILO-348)#683
danieljohnmorris merged 1 commit into
mainfrom
chore/bench-hw-standardise

danieljohnmorris commented May 22, 2026

Uh oh!

codecov Bot commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danieljohnmorris commented May 22, 2026

Summary

Files changed

Test plan

Uh oh!

codecov Bot commented May 22, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant