Skip to content

feat(gym): add multi-file-refactor, plan-decomposition, failure-recovery scenarios (#1210)#1221

Merged
rysweet merged 1 commit intomainfrom
engineer/add-more-gym-benchmark-scenarios-1777052268-c9e483
Apr 24, 2026
Merged

feat(gym): add multi-file-refactor, plan-decomposition, failure-recovery scenarios (#1210)#1221
rysweet merged 1 commit intomainfrom
engineer/add-more-gym-benchmark-scenarios-1777052268-c9e483

Conversation

@rysweet
Copy link
Copy Markdown
Owner

@rysweet rysweet commented Apr 24, 2026

Advances issue #1210 by adding three new gym benchmark scenarios in src/gym/scenarios.rs:

  • multi-file-rename-public-api — Refactoring class
  • plan-decomposition-vague-goal — SessionQuality class
  • failure-recovery-dirty-worktree — Debugging class

The BENCHMARK_SCENARIOS array size bumps from 149 to 152, and the count assertion in src/gym/tests_scenarios.rs is updated accordingly. Existing class_specific_checks handlers for Refactoring, SessionQuality, and Debugging already cover the evidence ids these scenarios produce, so no new check ids were required.

Verified locally with cargo check --lib (clean) and cargo test --lib -- gym (418 passed, 0 failed).

Refs #1210.

…ery scenarios (#1210)

Adds three new BenchmarkScenario entries to BENCHMARK_SCENARIOS, bumping
the array size from 149 to 152, and updates the scenario count assertion
in src/gym/tests_scenarios.rs:

- multi-file-rename-public-api (Refactoring)
- plan-decomposition-vague-goal (SessionQuality)
- failure-recovery-dirty-worktree (Debugging)

Existing class_specific_checks handlers for Refactoring, SessionQuality,
and Debugging already cover the evidence ids these scenarios produce, so
no new check ids were needed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@rysweet rysweet merged commit 04d4f5b into main Apr 24, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant