feat: partial reward support by dzorlu · Pull Request #9 · fleet-ai/OpenEnv

dzorlu · 2026-03-11T05:30:33Z

Summary

Adds partial_reward flag to FleetTaskEnv (off by default)
When enabled, failed verifier runs compute a fractional score from ERROR_ACCUMULATOR / SUCCESS_ACCUMULATOR stdout instead of binary 0/1
Passing tasks (score=1.0) are never modified
Encapsulated in a _parse_partial_reward static method on FleetTaskEnv

Test plan

Verify existing binary reward behavior unchanged when partial_reward=False
Enable flag and verify partial scores appear in logs for failing tasks

🤖 Generated with Claude Code

When `partial_reward=True`, failed verifier runs compute a fractional score from the error/success accumulators instead of binary 0/1. Passing tasks are unaffected. Off by default. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-11T05:37:57Z

+            re.DOTALL,
+        )
+        if not err_match and not suc_match:
+            return None


Missing accumulator gives failed verifier a perfect score

High Severity

The guard not err_match and not suc_match only returns None when both accumulators are missing. When only SUCCESS_ACCUMULATOR is present (e.g., verifier crashed before printing ERROR_ACCUMULATOR), n_errors defaults to 0, so the partial score computes to n_success / n_success = 1.0. This silently overrides a genuinely failed verifier with a perfect reward of 1.0, corrupting the training signal. The condition likely needs to be not err_match or not suc_match to require both accumulators before computing a partial score.

dzorlu changed the base branch from deniz/fleet-logfire to main March 11, 2026 05:31

feat: add partial reward support behind flag

cc5bf37

When `partial_reward=True`, failed verifier runs compute a fractional score from the error/success accumulators instead of binary 0/1. Passing tasks are unaffected. Off by default. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

dzorlu force-pushed the feat/partial-reward branch from fcf841b to cc5bf37 Compare March 11, 2026 05:34

dzorlu changed the base branch from main to deniz/fleet_client March 11, 2026 05:34

dzorlu merged commit 7c5d64f into deniz/fleet_client Mar 11, 2026
1 check passed

cursor Bot reviewed Mar 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: partial reward support#9

feat: partial reward support#9
dzorlu merged 1 commit into
deniz/fleet_clientfrom
feat/partial-reward

dzorlu commented Mar 11, 2026

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dzorlu commented Mar 11, 2026

Summary

Test plan

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Mar 11, 2026

Choose a reason for hiding this comment

Missing accumulator gives failed verifier a perfect score

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant