feat: partial reward support#9
Conversation
When `partial_reward=True`, failed verifier runs compute a fractional score from the error/success accumulators instead of binary 0/1. Passing tasks are unaffected. Off by default. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fcf841b to
cc5bf37
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| re.DOTALL, | ||
| ) | ||
| if not err_match and not suc_match: | ||
| return None |
There was a problem hiding this comment.
Missing accumulator gives failed verifier a perfect score
High Severity
The guard not err_match and not suc_match only returns None when both accumulators are missing. When only SUCCESS_ACCUMULATOR is present (e.g., verifier crashed before printing ERROR_ACCUMULATOR), n_errors defaults to 0, so the partial score computes to n_success / n_success = 1.0. This silently overrides a genuinely failed verifier with a perfect reward of 1.0, corrupting the training signal. The condition likely needs to be not err_match or not suc_match to require both accumulators before computing a partial score.


Summary
partial_rewardflag toFleetTaskEnv(off by default)ERROR_ACCUMULATOR/SUCCESS_ACCUMULATORstdout instead of binary 0/1_parse_partial_rewardstatic method on FleetTaskEnvTest plan
partial_reward=False🤖 Generated with Claude Code