Skip to content

chore(bench): hardware standardisation for CI runner (ILO-348)#683

Merged
danieljohnmorris merged 1 commit into
mainfrom
chore/bench-hw-standardise
May 22, 2026
Merged

chore(bench): hardware standardisation for CI runner (ILO-348)#683
danieljohnmorris merged 1 commit into
mainfrom
chore/bench-hw-standardise

Conversation

@danieljohnmorris
Copy link
Copy Markdown
Collaborator

Summary

Follow-up to ILO-65 / #608. Bench harness was running on whatever runner GitHub picked, making nightly numbers noisy and incomparable.

  • Pin runner shapebench.yml uses ubuntu-latest (GitHub's standard 2-core / 7 GB RAM machines) and documents why it is the pinning point.
  • Record hardware in results.json — every result file now carries a top-level "hardware" block with cpu_model, cpu_count, and mem_gb.
  • Reject mismatched hardware — new "Collect hardware info and check baseline" step reads /proc/cpuinfo + /proc/meminfo, seeds bench/hw-baseline.json on first run, and fails the job if the shape differs from the baseline. Regression comparison also skips (rather than false-positives) when the two result files were produced on different hardware.

Files changed

File Change
.github/workflows/bench.yml hardware check step; regression skip guard; seed hw-baseline.json in commit pattern
bench/run.sh embed hardware block in results.json output
bench/results.json back-filled hardware block for existing baseline
bench/hw-baseline.json new — seeds baseline from existing run (AMD EPYC 7763, 2-core, 6.8 GB)

Test plan

  • Trigger bench workflow manually on a standard ubuntu-latest runner — should print "Hardware matches baseline — proceeding."
  • Verify bench/results.json now contains a "hardware" key after a run
  • Delete bench/hw-baseline.json, re-run — should seed a new baseline and commit it
  • Force a cpu_count mismatch locally by editing hw-baseline.json — job should exit 1 with "HARDWARE MISMATCH"

Closes ILO-348.

🤖 Generated with Claude Code

…-348)

- bench.yml: adds a "Collect hardware info and check baseline" step that
  reads /proc/cpuinfo and /proc/meminfo, seeds bench/hw-baseline.json on
  first run, and exits non-zero when the runner shape differs from the
  baseline so polluted results are never committed.
- bench/run.sh: embeds cpu_model, cpu_count, mem_gb into results.json
  under a top-level "hardware" key; falls back to live detection on local
  runs when .hw-info.json is absent.
- bench/results.json: back-fills hardware block for the existing baseline
  run (AMD EPYC 7763, 2-core, 6.8 GB — standard GitHub ubuntu-latest).
- bench/hw-baseline.json: seeds the hardware baseline from that same run.
- Regression check in bench.yml skips comparison when hardware changed
  between the two result files (belt-and-suspenders guard).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@danieljohnmorris danieljohnmorris merged commit f61503a into main May 22, 2026
5 checks passed
@danieljohnmorris danieljohnmorris deleted the chore/bench-hw-standardise branch May 22, 2026 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant