Skip to content

Conversation

@ultmaster
Copy link
Contributor

No description provided.

Copilot AI review requested due to automatic review settings November 17, 2025 10:18
@ultmaster
Copy link
Contributor Author

/ci

@github-actions
Copy link

github-actions bot commented Nov 17, 2025

🚀 CI Watcher for correlation id-3540975835-mi2zskf6 triggered by comment 3540975835
🏃‍♀️ Tracking 6 workflow run(s):

✅ All runs completed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds tracking and validation for rollouts with rewards in the VERL (Verification for Reinforcement Learning) system. The changes introduce new metrics to monitor rollout success rates and ensure training stability.

Key Changes:

  • Added has_reward tracking to distinguish between rollouts with and without actual rewards
  • Introduced n_rollouts_w_reward metric alongside existing n_rollouts_w_trace metric for both training and validation
  • Enhanced validation script to check that rollout counts remain consistent throughout training

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
scripts/validate_example_wandb.py Added validation checks to ensure rollout counts for rewards and traces remain consistent across training runs
agentlightning/verl/daemon.py Added has_reward field to rollout statistics and new n_rollouts_w_reward metrics for both training and validation paths

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +37 to +47
if first_row["val/n_rollouts_w_reward"] != last_row["val/n_rollouts_w_reward"]:
print(
f"::error::Some rollouts have failed to produce rewards: {first_row['val/n_rollouts_w_reward']} -> {last_row['val/n_rollouts_w_reward']}"
)
sys.exit(1)

if first_row["val/n_rollouts_w_trace"] != last_row["val/n_rollouts_w_trace"]:
print(
f"::error::Some rollouts have failed to produce traces: {first_row['val/n_rollouts_w_trace']} -> {last_row['val/n_rollouts_w_trace']}"
)
sys.exit(1)
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code accesses "val/n_rollouts_w_reward" and "val/n_rollouts_w_trace" from the history dataframe, but line 27 only fetches keys=["val/reward"]. This will cause a KeyError. The run.history() call on line 27 needs to be updated to include these keys:

hist = run.history(keys=["val/reward", "val/n_rollouts_w_reward", "val/n_rollouts_w_trace"], pandas=True)

Copilot uses AI. Check for mistakes.
final_reward = self._fillna_reward(rollout)
if not rollout.triplets:
print(f"Warning: No triplets found for test rollout {rollout.rollout_id}.")
sample_stat_list.append({"reward": final_reward})
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a rollout has no triplets, the has_reward key is not added to the dictionary appended to sample_stat_list. This will cause issues when calculating val/n_rollouts_w_reward metric on lines 625 and 604-606, as it tries to access stat["has_reward"] for all stats. Add the missing key:

sample_stat_list.append({"reward": final_reward, "has_reward": final_reward_raw is not None})
Suggested change
sample_stat_list.append({"reward": final_reward})
sample_stat_list.append({"reward": final_reward, "has_reward": final_reward_raw is not None})

Copilot uses AI. Check for mistakes.
@ultmaster
Copy link
Contributor Author

/ci

@github-actions
Copy link

github-actions bot commented Nov 17, 2025

🚀 CI Watcher for correlation id-3541258935-mi31vid8 triggered by comment 3541258935
🏃‍♀️ Tracking 6 workflow run(s):

✅ All runs completed.

@ultmaster ultmaster merged commit d433418 into main Nov 17, 2025
14 checks passed
totoluo pushed a commit to totoluo/agent-lightning that referenced this pull request Dec 6, 2025
totoluo pushed a commit to totoluo/agent-lightning that referenced this pull request Dec 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants