Skip to content

[FIX] reduce reward timeout warning#579

Merged
garrett4wade merged 5 commits intomainfrom
sxj/reduce_rw_warn
Nov 14, 2025
Merged

[FIX] reduce reward timeout warning#579
garrett4wade merged 5 commits intomainfrom
sxj/reduce_rw_warn

Conversation

@fishcrap
Copy link
Collaborator

@fishcrap fishcrap commented Nov 14, 2025

Description

This PR adds support for controlling the number of workers in AsyncRewardWrapper's ProcessPoolExecutor to reduce reward timeout warnings.

This prevents warnings when ProcessPoolExecutor is created with too many workers (typically when max_workers=None defaults to CPU count, which can be excessive in distributed training scenarios).

Related Issue

N/A

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not
    work as expected)
  • Documentation update
  • Code refactoring (no functional changes)
  • Performance improvement
  • Test coverage improvement

Checklist

  • I have read the Contributing Guide
  • I have run formatting tools (pre-commit or manual)
  • I have run relevant unit tests and they pass
  • I have added tests for new functionality
  • I have updated documentation if needed
  • My branch is up to date with main
  • This PR introduces breaking changes (if yes, fill out details below)
  • If this PR changes documentation, I have built and previewed it locally with
    jb build docs
  • No critical issues raised by AI reviewers (/gemini review)

Breaking Change Details (if applicable):

N/A

Additional Context

N/A


Need help? Check the Contributing Guide or ask in
GitHub Discussions!

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @fishcrap, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances resource management within the reinforcement learning workflows by introducing explicit control over the concurrency of asynchronous reward functions. By allowing the maximum number of workers for reward calculations to be configured and dynamically determined based on system resources and parallelization strategy, the changes aim to optimize performance and prevent resource-related warnings during model training and evaluation.

Highlights

  • Concurrency Control for Reward Functions: Introduced a new reward_fn_max_workers parameter to AsyncRewardWrapper and various workflow classes (MultiTurnV2Workflow, MultiTurn, RLVRWorkflow, VisionRLVRWorkflow) to explicitly limit the number of workers used for asynchronous reward function execution.
  • Dynamic Worker Calculation in Examples: Implemented dynamic calculation of reward_fn_max_workers in numerous example scripts. This calculation leverages os.cpu_count() and the parallel strategy's data parallel size to determine an optimal number of workers, aiming to prevent resource oversubscription.
  • Improved Resource Management: The changes are designed to reduce potential warnings and improve the stability and efficiency of training and evaluation by ensuring reward functions do not consume excessive CPU resources, especially in parallel processing environments.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@fishcrap fishcrap changed the title reduce reward warning [FIX] reduce reward timeout warning Nov 14, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a reward_fn_max_workers parameter to control the number of workers for the asynchronous reward function, which is a good improvement for resource management. However, the implementation of calculating this value in the example scripts has a potential issue: os.cpu_count() can return None, which would cause a runtime error. I've added comments with suggestions to handle this case. Additionally, this calculation logic is duplicated across many example files. It would be beneficial to centralize this logic into a helper function to improve maintainability and avoid repeating the same bug.

@fishcrap
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to reduce reward timeout warnings by intelligently setting a default for max_workers in AsyncRewardWrapper. The approach of calculating workers based on CPU and device count is sound.

However, I've identified a critical issue: the new logic inadvertently initializes the CUDA context in the main process before forking for the ProcessPoolExecutor. This can lead to deadlocks on Linux systems. I've provided details on how to resolve this by using the 'spawn' multiprocessing context.

Additionally, I've suggested a minor improvement for code readability regarding the worker calculation heuristic. Please review the comments for details.

garrett4wade and others added 3 commits November 14, 2025 15:41
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@fishcrap
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a sensible heuristic to dynamically determine the number of workers for the AsyncRewardWrapper's ProcessPoolExecutor, which should help prevent timeout warnings in distributed training scenarios. The implementation adds a new helper function to safely detect the number of devices and updates the __init__ method to use this information. The changes are logical and well-contained. I've added a couple of suggestions to improve the code's maintainability and debuggability.

pass

# Fallback: assume 8 devices for cautious max_workers calculation
return 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The fallback value 8 is a magic number. It would be more readable and maintainable to define this as a module-level constant (e.g., _FALLBACK_DEVICE_COUNT = 8) near the top of the file. This makes the value's purpose clear and centralizes its configuration.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@garrett4wade garrett4wade merged commit d78f628 into main Nov 14, 2025
1 check passed
@garrett4wade garrett4wade deleted the sxj/reduce_rw_warn branch November 14, 2025 09:01
CormickKneey pushed a commit to CormickKneey/AReaL that referenced this pull request Nov 16, 2025
…nclusionAI#579)

* fix reward timeout warning by reducing the number of workers

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: Wei Fu <36355462+garrett4wade@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Bruce-rl-hw pushed a commit to Bruce-rl-hw/AReaL-vllm that referenced this pull request Dec 4, 2025
…nclusionAI#579)

* fix reward timeout warning by reducing the number of workers

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Apply suggestion from @gemini-code-assist[bot]

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: Wei Fu <36355462+garrett4wade@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants