Skip to content

Add rolling benchmark history tracking#14

Merged
benvinegar merged 8 commits intomainfrom
explore/benchmark-history-feasibility
Apr 19, 2026
Merged

Add rolling benchmark history tracking#14
benvinegar merged 8 commits intomainfrom
explore/benchmark-history-feasibility

Conversation

@benvinegar
Copy link
Copy Markdown
Member

@benvinegar benvinegar commented Apr 18, 2026

Summary

  • add a benchmark:history pipeline that resolves each benchmark repo's default-branch commit as of a recorded timestamp, scans it, and upserts one JSONL datapoint per repo per UTC week
  • generate rolling history artifacts under benchmarks/history/known-ai-vs-solid-oss/ plus a markdown summary report at reports/known-ai-vs-solid-oss-history.md
  • add mini sparkline trends to the rolling history report and keep compact series data in latest.json for downstream consumers
  • show both the latest pinned score and the highest pinned score in the history tables so repos can be compared against their own prior peaks
  • mirror that trend view in the README benchmark table with latest/highest/delta columns and links to the pinned and rolling benchmark artifacts
  • support honest backfills with --recorded-at, skipping repos that did not exist yet instead of fabricating datapoints
  • add a weekly GitHub Actions workflow to refresh benchmark history automatically and commit updates back to the repo
  • seed the current benchmark set with a 4-week backfill of history datapoints
  • tighten the top-level README by removing redundant sections while keeping supported file extensions explicit

Validation

  • bun run format:check
  • bun run lint
  • bun test
  • bun run build
  • bun run benchmark:history --recorded-at 2026-03-23T12:00:00Z
  • bun run benchmark:history --recorded-at 2026-03-30T12:00:00Z
  • bun run benchmark:history --recorded-at 2026-04-06T12:00:00Z
  • bun run benchmark:history --recorded-at 2026-04-13T12:00:00Z

This PR description was generated by Pi using GPT-5

@benvinegar benvinegar merged commit 5fdf2a2 into main Apr 19, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant