feat(gym): add multi-file-refactor, plan-decomposition, failure-recovery scenarios (#1210) by rysweet · Pull Request #1221 · rysweet/Simard

rysweet · 2026-04-24T17:43:17Z

Advances issue #1210 by adding three new gym benchmark scenarios in src/gym/scenarios.rs:

multi-file-rename-public-api — Refactoring class
plan-decomposition-vague-goal — SessionQuality class
failure-recovery-dirty-worktree — Debugging class

The BENCHMARK_SCENARIOS array size bumps from 149 to 152, and the count assertion in src/gym/tests_scenarios.rs is updated accordingly. Existing class_specific_checks handlers for Refactoring, SessionQuality, and Debugging already cover the evidence ids these scenarios produce, so no new check ids were required.

Verified locally with cargo check --lib (clean) and cargo test --lib -- gym (418 passed, 0 failed).

Refs #1210.

…ery scenarios (#1210) Adds three new BenchmarkScenario entries to BENCHMARK_SCENARIOS, bumping the array size from 149 to 152, and updates the scenario count assertion in src/gym/tests_scenarios.rs: - multi-file-rename-public-api (Refactoring) - plan-decomposition-vague-goal (SessionQuality) - failure-recovery-dirty-worktree (Debugging) Existing class_specific_checks handlers for Refactoring, SessionQuality, and Debugging already cover the evidence ids these scenarios produce, so no new check ids were needed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

rysweet merged commit 04d4f5b into main Apr 24, 2026
7 checks passed

rysweet mentioned this pull request Apr 24, 2026

feat(gym): add multi-file-refactor and flaky-test-triage scenarios #1224

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gym): add multi-file-refactor, plan-decomposition, failure-recovery scenarios (#1210)#1221

feat(gym): add multi-file-refactor, plan-decomposition, failure-recovery scenarios (#1210)#1221
rysweet merged 1 commit intomainfrom
engineer/add-more-gym-benchmark-scenarios-1777052268-c9e483

rysweet commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rysweet commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant