Skip to content

feat: add benchmark dataset with 13 reference repos#33

Merged
hummbl-dev merged 2 commits into
mainfrom
feat/claude/benchmark
Apr 18, 2026
Merged

feat: add benchmark dataset with 13 reference repos#33
hummbl-dev merged 2 commits into
mainfrom
feat/claude/benchmark

Conversation

@hummbl-dev
Copy link
Copy Markdown
Owner

Summary

  • Adds benchmarks/reference_repos.json with 13 repos spanning grades A-D
  • New benchmark.py module with BenchmarkEntry dataclass and load_benchmark()
  • 7 new tests validating data integrity, field presence, sort order, range reasonableness
  • 346 tests passing

Purpose

Regression testing for Arbiter's scoring engine. If analyzer changes or weight adjustments cause a benchmark repo to fall outside its expected range, tests catch it.

Test plan

  • File exists
  • Load returns entries with required fields
  • Constant matches file
  • Sorted by score descending
  • Known repos present
  • Score ranges ≤15 points wide

🤖 Generated with Claude Code

Claude (agent) and others added 2 commits April 18, 2026 13:53
Curated set of popular Python repos with expected score ranges for
regression testing. Detects calibration drift when analyzers or
scoring weights change. Includes repos from 1K to 923K LOC across
grades A through D.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@hummbl-dev hummbl-dev merged commit cf2288f into main Apr 18, 2026
3 checks passed
@hummbl-dev hummbl-dev deleted the feat/claude/benchmark branch April 18, 2026 17:55
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5d7e8fda15

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/arbiter/benchmark.py
from dataclasses import dataclass
from pathlib import Path

_BENCHMARK_PATH = Path(__file__).parent.parent.parent / "benchmarks" / "reference_repos.json"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid repo-root path for benchmark data

_BENCHMARK_PATH assumes a source checkout layout (src/arbiter/... -> ../../.. /benchmarks), but in an installed package this resolves to something like .../lib/python3.x/benchmarks/reference_repos.json, which does not exist. Because BENCHMARK_REPOS = load_benchmark() runs at import time, importing arbiter.benchmark will raise FileNotFoundError for packaged installs, making the new benchmark API unusable outside this repo checkout.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant