-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Adding check traces with reward for VERL #317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/ci |
|
🚀 CI Watcher for correlation id-3540975835-mi2zskf6 triggered by comment 3540975835
✅ All runs completed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds tracking and validation for rollouts with rewards in the VERL (Verification for Reinforcement Learning) system. The changes introduce new metrics to monitor rollout success rates and ensure training stability.
Key Changes:
- Added
has_rewardtracking to distinguish between rollouts with and without actual rewards - Introduced
n_rollouts_w_rewardmetric alongside existingn_rollouts_w_tracemetric for both training and validation - Enhanced validation script to check that rollout counts remain consistent throughout training
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| scripts/validate_example_wandb.py | Added validation checks to ensure rollout counts for rewards and traces remain consistent across training runs |
| agentlightning/verl/daemon.py | Added has_reward field to rollout statistics and new n_rollouts_w_reward metrics for both training and validation paths |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if first_row["val/n_rollouts_w_reward"] != last_row["val/n_rollouts_w_reward"]: | ||
| print( | ||
| f"::error::Some rollouts have failed to produce rewards: {first_row['val/n_rollouts_w_reward']} -> {last_row['val/n_rollouts_w_reward']}" | ||
| ) | ||
| sys.exit(1) | ||
|
|
||
| if first_row["val/n_rollouts_w_trace"] != last_row["val/n_rollouts_w_trace"]: | ||
| print( | ||
| f"::error::Some rollouts have failed to produce traces: {first_row['val/n_rollouts_w_trace']} -> {last_row['val/n_rollouts_w_trace']}" | ||
| ) | ||
| sys.exit(1) |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code accesses "val/n_rollouts_w_reward" and "val/n_rollouts_w_trace" from the history dataframe, but line 27 only fetches keys=["val/reward"]. This will cause a KeyError. The run.history() call on line 27 needs to be updated to include these keys:
hist = run.history(keys=["val/reward", "val/n_rollouts_w_reward", "val/n_rollouts_w_trace"], pandas=True)| final_reward = self._fillna_reward(rollout) | ||
| if not rollout.triplets: | ||
| print(f"Warning: No triplets found for test rollout {rollout.rollout_id}.") | ||
| sample_stat_list.append({"reward": final_reward}) |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a rollout has no triplets, the has_reward key is not added to the dictionary appended to sample_stat_list. This will cause issues when calculating val/n_rollouts_w_reward metric on lines 625 and 604-606, as it tries to access stat["has_reward"] for all stats. Add the missing key:
sample_stat_list.append({"reward": final_reward, "has_reward": final_reward_raw is not None})| sample_stat_list.append({"reward": final_reward}) | |
| sample_stat_list.append({"reward": final_reward, "has_reward": final_reward_raw is not None}) |
|
/ci |
|
🚀 CI Watcher for correlation id-3541258935-mi31vid8 triggered by comment 3541258935
✅ All runs completed. |
No description provided.