feat: add benchmark dataset with 13 reference repos by hummbl-dev · Pull Request #33 · hummbl-dev/arbiter

hummbl-dev · 2026-04-18T17:53:29Z

Summary

Adds benchmarks/reference_repos.json with 13 repos spanning grades A-D
New benchmark.py module with BenchmarkEntry dataclass and load_benchmark()
7 new tests validating data integrity, field presence, sort order, range reasonableness
346 tests passing

Purpose

Regression testing for Arbiter's scoring engine. If analyzer changes or weight adjustments cause a benchmark repo to fall outside its expected range, tests catch it.

Test plan

🤖 Generated with Claude Code

Curated set of popular Python repos with expected score ranges for regression testing. Detects calibration drift when analyzers or scoring weights change. Includes repos from 1K to 923K LOC across grades A through D. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5d7e8fda15

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-18T17:56:05Z

+from dataclasses import dataclass
+from pathlib import Path
+
+_BENCHMARK_PATH = Path(__file__).parent.parent.parent / "benchmarks" / "reference_repos.json"


Avoid repo-root path for benchmark data

_BENCHMARK_PATH assumes a source checkout layout (src/arbiter/... -> ../../.. /benchmarks), but in an installed package this resolves to something like .../lib/python3.x/benchmarks/reference_repos.json, which does not exist. Because BENCHMARK_REPOS = load_benchmark() runs at import time, importing arbiter.benchmark will raise FileNotFoundError for packaged installs, making the new benchmark API unusable outside this repo checkout.

Useful? React with 👍 / 👎.

Claude (agent) and others added 2 commits April 18, 2026 13:53

fix: remove unused json import in benchmark tests

d95228e

hummbl-dev merged commit cf2288f into main Apr 18, 2026
3 checks passed

hummbl-dev deleted the feat/claude/benchmark branch April 18, 2026 17:55

chatgpt-codex-connector Bot reviewed Apr 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add benchmark dataset with 13 reference repos#33

feat: add benchmark dataset with 13 reference repos#33
hummbl-dev merged 2 commits into
mainfrom
feat/claude/benchmark

hummbl-dev commented Apr 18, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hummbl-dev commented Apr 18, 2026

Summary

Purpose

Test plan

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant