Skip to content

feat: partial reward support#9

Merged
dzorlu merged 1 commit into
deniz/fleet_clientfrom
feat/partial-reward
Mar 11, 2026
Merged

feat: partial reward support#9
dzorlu merged 1 commit into
deniz/fleet_clientfrom
feat/partial-reward

Conversation

@dzorlu
Copy link
Copy Markdown
Collaborator

@dzorlu dzorlu commented Mar 11, 2026

Summary

  • Adds partial_reward flag to FleetTaskEnv (off by default)
  • When enabled, failed verifier runs compute a fractional score from ERROR_ACCUMULATOR / SUCCESS_ACCUMULATOR stdout instead of binary 0/1
  • Passing tasks (score=1.0) are never modified
  • Encapsulated in a _parse_partial_reward static method on FleetTaskEnv

Test plan

  • Verify existing binary reward behavior unchanged when partial_reward=False
  • Enable flag and verify partial scores appear in logs for failing tasks

🤖 Generated with Claude Code

@dzorlu dzorlu changed the base branch from deniz/fleet-logfire to main March 11, 2026 05:31
When `partial_reward=True`, failed verifier runs compute a fractional
score from the error/success accumulators instead of binary 0/1.
Passing tasks are unaffected. Off by default.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dzorlu dzorlu force-pushed the feat/partial-reward branch from fcf841b to cc5bf37 Compare March 11, 2026 05:34
@dzorlu dzorlu changed the base branch from main to deniz/fleet_client March 11, 2026 05:34
@dzorlu dzorlu merged commit 7c5d64f into deniz/fleet_client Mar 11, 2026
1 check passed
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

re.DOTALL,
)
if not err_match and not suc_match:
return None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing accumulator gives failed verifier a perfect score

High Severity

The guard not err_match and not suc_match only returns None when both accumulators are missing. When only SUCCESS_ACCUMULATOR is present (e.g., verifier crashed before printing ERROR_ACCUMULATOR), n_errors defaults to 0, so the partial score computes to n_success / n_success = 1.0. This silently overrides a genuinely failed verifier with a perfect reward of 1.0, corrupting the training signal. The condition likely needs to be not err_match or not suc_match to require both accumulators before computing a partial score.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant