Adding check traces with reward for VERL #317

ultmaster · 2025-11-17T10:18:31Z

No description provided.

ultmaster · 2025-11-17T10:18:55Z

/ci

github-actions · 2025-11-17T10:19:04Z

🚀 CI Watcher for correlation id-3540975835-mi2zskf6 triggered by comment 3540975835
🏃‍♀️ Tracking 6 workflow run(s):

🔴 Spider - PR #317 - ci-all - id-3540975835-mi2zskf6 — completed/failure
🟢 APO - PR #317 - ci-all - id-3540975835-mi2zskf6 — completed/success
🔴 Backward Compatibility - PR #317 - ci-all - id-3540975835-mi2zskf6 — completed/failure
🔴 Calc-X - PR #317 - ci-all - id-3540975835-mi2zskf6 — completed/failure
🟢 GPU Test - PR #317 - ci-all - id-3540975835-mi2zskf6 — completed/success
🟢 Unsloth - PR #317 - ci-all - id-3540975835-mi2zskf6 — completed/success

✅ All runs completed.

Copilot

Pull Request Overview

This PR adds tracking and validation for rollouts with rewards in the VERL (Verification for Reinforcement Learning) system. The changes introduce new metrics to monitor rollout success rates and ensure training stability.

Key Changes:

Added has_reward tracking to distinguish between rollouts with and without actual rewards
Introduced n_rollouts_w_reward metric alongside existing n_rollouts_w_trace metric for both training and validation
Enhanced validation script to check that rollout counts remain consistent throughout training

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
scripts/validate_example_wandb.py	Added validation checks to ensure rollout counts for rewards and traces remain consistent across training runs
agentlightning/verl/daemon.py	Added `has_reward` field to rollout statistics and new `n_rollouts_w_reward` metrics for both training and validation paths

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-17T10:22:28Z

scripts/validate_example_wandb.py

+    if first_row["val/n_rollouts_w_reward"] != last_row["val/n_rollouts_w_reward"]:
+        print(
+            f"::error::Some rollouts have failed to produce rewards: {first_row['val/n_rollouts_w_reward']} -> {last_row['val/n_rollouts_w_reward']}"
+        )
+        sys.exit(1)
+
+    if first_row["val/n_rollouts_w_trace"] != last_row["val/n_rollouts_w_trace"]:
+        print(
+            f"::error::Some rollouts have failed to produce traces: {first_row['val/n_rollouts_w_trace']} -> {last_row['val/n_rollouts_w_trace']}"
+        )
+        sys.exit(1)


The code accesses "val/n_rollouts_w_reward" and "val/n_rollouts_w_trace" from the history dataframe, but line 27 only fetches keys=["val/reward"]. This will cause a KeyError. The run.history() call on line 27 needs to be updated to include these keys:

hist = run.history(keys=["val/reward", "val/n_rollouts_w_reward", "val/n_rollouts_w_trace"], pandas=True)

Copilot · 2025-11-17T10:22:28Z

agentlightning/verl/daemon.py

            final_reward = self._fillna_reward(rollout)
            if not rollout.triplets:
                print(f"Warning: No triplets found for test rollout {rollout.rollout_id}.")
                sample_stat_list.append({"reward": final_reward})


When a rollout has no triplets, the has_reward key is not added to the dictionary appended to sample_stat_list. This will cause issues when calculating val/n_rollouts_w_reward metric on lines 625 and 604-606, as it tries to access stat["has_reward"] for all stats. Add the missing key:

sample_stat_list.append({"reward": final_reward, "has_reward": final_reward_raw is not None})

Suggested change

sample_stat_list.append({"reward": final_reward})

sample_stat_list.append({"reward": final_reward, "has_reward": final_reward_raw is not None})

ultmaster · 2025-11-17T11:17:09Z

/ci

github-actions · 2025-11-17T11:17:21Z

🚀 CI Watcher for correlation id-3541258935-mi31vid8 triggered by comment 3541258935
🏃‍♀️ Tracking 6 workflow run(s):

🔴 GPU Test - PR #317 - ci-all - id-3541258935-mi31vid8 — completed/failure
🟢 APO - PR #317 - ci-all - id-3541258935-mi31vid8 — completed/success
🟢 Backward Compatibility - PR #317 - ci-all - id-3541258935-mi31vid8 — completed/success
🟢 Spider - PR #317 - ci-all - id-3541258935-mi31vid8 — completed/success
🟢 Unsloth - PR #317 - ci-all - id-3541258935-mi31vid8 — completed/success
🔴 Calc-X - PR #317 - ci-all - id-3541258935-mi31vid8 — completed/failure

✅ All runs completed.

fix reward trace number check

86c0d60

Copilot AI review requested due to automatic review settings November 17, 2025 10:18

ultmaster added the ci-all label Nov 17, 2025

Copilot started reviewing on behalf of ultmaster November 17, 2025 10:18 View session

Copilot finished reviewing on behalf of ultmaster November 17, 2025 10:21

Copilot AI reviewed Nov 17, 2025

View reviewed changes

.

f7472f0

ultmaster merged commit d433418 into main Nov 17, 2025
14 checks passed

totoluo pushed a commit to totoluo/agent-lightning that referenced this pull request Dec 6, 2025

Adding check traces with reward for VERL (microsoft#317)

fbdf79f

totoluo pushed a commit to totoluo/agent-lightning that referenced this pull request Dec 6, 2025

Adding check traces with reward for VERL (microsoft#317)

cf17ae6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding check traces with reward for VERL #317

Adding check traces with reward for VERL #317

Uh oh!

ultmaster commented Nov 17, 2025

Uh oh!

ultmaster commented Nov 17, 2025

Uh oh!

github-actions bot commented Nov 17, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 17, 2025

Uh oh!

Copilot AI Nov 17, 2025

Uh oh!

ultmaster commented Nov 17, 2025

Uh oh!

github-actions bot commented Nov 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	sample_stat_list.append({"reward": final_reward})
	sample_stat_list.append({"reward": final_reward, "has_reward": final_reward_raw is not None})

Adding check traces with reward for VERL #317

Adding check traces with reward for VERL #317

Uh oh!

Conversation

ultmaster commented Nov 17, 2025

Uh oh!

ultmaster commented Nov 17, 2025

Uh oh!

github-actions bot commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

Uh oh!

ultmaster commented Nov 17, 2025

Uh oh!

github-actions bot commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Nov 17, 2025 •

edited

Loading

github-actions bot commented Nov 17, 2025 •

edited

Loading