Skip to content

Conversation

@Intaik
Copy link
Contributor

@Intaik Intaik commented Nov 17, 2025

Summary:
Episodes with all rewards = 0 or =1 does not help learning as advantage would be 0. also, episodes with generations that are tuncated due to max_res_tokens would mostly get 0 rewards unnecessary as most of answers are at the end.

Dropping these episodes provides trainer better batches to learn from (at the cost of sampling efficiency)

{F1983571844}

{F1983571853}

Differential Revision: D87243621

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 17, 2025
@meta-codesync
Copy link

meta-codesync bot commented Nov 17, 2025

@Intaik has exported this pull request. If you are a Meta employee, you can view the originating Diff in D87243621.

Summary:

Episodes with all rewards = 0 or =1 does not help learning as advantage would be 0. also, episodes with generations that are tuncated due to max_res_tokens would mostly get 0 rewards unnecessary as most of answers are at the end.

Dropping these episodes provides trainer better batches to learn from (at the cost of sampling efficiency)

{F1983571844}

{F1983571853}

Reviewed By: casteryh

Differential Revision: D87243621
Copy link
Contributor

@casteryh casteryh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review automatically exported from Phabricator review in Meta.

@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.10%. Comparing base (4410e90) to head (95b82c8).
⚠️ Report is 17 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #580   +/-   ##
=======================================
  Coverage   84.10%   84.10%           
=======================================
  Files          29       29           
  Lines        3687     3687           
=======================================
  Hits         3101     3101           
  Misses        586      586           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@meta-codesync meta-codesync bot merged commit 5bfcfae into meta-pytorch:main Nov 18, 2025
13 checks passed
daniellepintz pushed a commit to daniellepintz/torchforge that referenced this pull request Nov 20, 2025
Summary:
## Bug Description:

meta-pytorch#580 had incorrect indentation
cuasing the input_ids, episodes varibles to be deleted inside the episodes
building loop, causing program to hang.

Next diff shall make background thread crashes to be surfaced to the main thread so that we know what thread crashed for what reason.

Reviewed By: daniellepintz

Differential Revision: D87554570
daniellepintz pushed a commit to daniellepintz/torchforge that referenced this pull request Nov 20, 2025
Summary:

## Bug Description:

meta-pytorch#580 had incorrect indentation
cuasing the input_ids, episodes varibles to be deleted inside the episodes
building loop, causing program to hang.

Next diff shall make background thread crashes to be surfaced to the main thread so that we know what thread crashed for what reason.

Reviewed By: daniellepintz

Differential Revision: D87554570
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants