[FIX] reduce reward timeout warning by fishcrap · Pull Request #579 · inclusionAI/AReaL

fishcrap · 2025-11-14T06:23:21Z

Description

This PR adds support for controlling the number of workers in AsyncRewardWrapper's ProcessPoolExecutor to reduce reward timeout warnings.

This prevents warnings when ProcessPoolExecutor is created with too many workers (typically when max_workers=None defaults to CPU count, which can be excessive in distributed training scenarios).

Related Issue

N/A

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not
work as expected)
Documentation update
Code refactoring (no functional changes)
Performance improvement
Test coverage improvement

Checklist

I have read the Contributing Guide
I have run formatting tools (pre-commit or manual)
I have run relevant unit tests and they pass
I have added tests for new functionality
I have updated documentation if needed
My branch is up to date with main
This PR introduces breaking changes (if yes, fill out details below)
If this PR changes documentation, I have built and previewed it locally with
jb build docs
No critical issues raised by AI reviewers (/gemini review)

Breaking Change Details (if applicable):

N/A

Additional Context

N/A

Need help? Check the Contributing Guide or ask in
GitHub Discussions!

gemini-code-assist · 2025-11-14T06:23:46Z

Summary of Changes

Hello @fishcrap, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances resource management within the reinforcement learning workflows by introducing explicit control over the concurrency of asynchronous reward functions. By allowing the maximum number of workers for reward calculations to be configured and dynamically determined based on system resources and parallelization strategy, the changes aim to optimize performance and prevent resource-related warnings during model training and evaluation.

Highlights

Concurrency Control for Reward Functions: Introduced a new reward_fn_max_workers parameter to AsyncRewardWrapper and various workflow classes (MultiTurnV2Workflow, MultiTurn, RLVRWorkflow, VisionRLVRWorkflow) to explicitly limit the number of workers used for asynchronous reward function execution.
Dynamic Worker Calculation in Examples: Implemented dynamic calculation of reward_fn_max_workers in numerous example scripts. This calculation leverages os.cpu_count() and the parallel strategy's data parallel size to determine an optimal number of workers, aiming to prevent resource oversubscription.
Improved Resource Management: The changes are designed to reduce potential warnings and improve the stability and efficiency of training and evaluation by ensuring reward functions do not consume excessive CPU resources, especially in parallel processing environments.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a reward_fn_max_workers parameter to control the number of workers for the asynchronous reward function, which is a good improvement for resource management. However, the implementation of calculating this value in the example scripts has a potential issue: os.cpu_count() can return None, which would cause a runtime error. I've added comments with suggestions to handle this case. Additionally, this calculation logic is duplicated across many example files. It would be beneficial to centralize this logic into a helper function to improve maintainability and avoid repeating the same bug.

examples/experimental/dapo/gsm8k_dapo.py

examples/experimental/dr.grpo/gsm8k_drgrpo.py

examples/experimental/lite_ppo/gsm8k_liteppo.py

examples/experimental/proxy/gsm8k_grpo_proxy.py

examples/lora/gsm8k_grpo_lora.py

examples/math/gsm8k_grpo_trainer.py

examples/math/gsm8k_ppo.py

examples/tracer/gsm8k_grpo.py

examples/vlm/clevr_count_70k_grpo.py

recipe/AEnt/gsm8k_aent_grpo.py

fishcrap · 2025-11-14T07:29:21Z

/gemini review

gemini-code-assist

Code Review

This pull request aims to reduce reward timeout warnings by intelligently setting a default for max_workers in AsyncRewardWrapper. The approach of calculating workers based on CPU and device count is sound.

However, I've identified a critical issue: the new logic inadvertently initializes the CUDA context in the main process before forking for the ProcessPoolExecutor. This can lead to deadlocks on Linux systems. I've provided details on how to resolve this by using the 'spawn' multiprocessing context.

Additionally, I've suggested a minor improvement for code readability regarding the worker calculation heuristic. Please review the comments for details.

areal/api/reward_api.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

fishcrap · 2025-11-14T08:38:25Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a sensible heuristic to dynamically determine the number of workers for the AsyncRewardWrapper's ProcessPoolExecutor, which should help prevent timeout warnings in distributed training scenarios. The implementation adds a new helper function to safely detect the number of devices and updates the __init__ method to use this information. The changes are logical and well-contained. I've added a couple of suggestions to improve the code's maintainability and debuggability.

areal/api/reward_api.py

gemini-code-assist · 2025-11-14T08:39:43Z

areal/api/reward_api.py

+        pass
+
+    # Fallback: assume 8 devices for cautious max_workers calculation
+    return 8


The fallback value 8 is a magic number. It would be more readable and maintainable to define this as a module-level constant (e.g., _FALLBACK_DEVICE_COUNT = 8) near the top of the file. This makes the value's purpose clear and centralizes its configuration.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

@gemini-code-assist

…nclusionAI#579) * fix reward timeout warning by reducing the number of workers * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: Wei Fu <36355462+garrett4wade@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

@gemini-code-assist

…nclusionAI#579) * fix reward timeout warning by reducing the number of workers * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: Wei Fu <36355462+garrett4wade@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

fishcrap changed the title ~~reduce reward warning~~ [FIX] reduce reward timeout warning Nov 14, 2025

gemini-code-assist bot reviewed Nov 14, 2025

View reviewed changes

fishcrap requested review from dhh1995, garrett4wade, nuzant and rchardx November 14, 2025 06:43

reduce reward warning

6dd5bd6

fishcrap force-pushed the sxj/reduce_rw_warn branch from c3e8f62 to 6dd5bd6 Compare November 14, 2025 07:27

gemini-code-assist bot reviewed Nov 14, 2025

View reviewed changes

areal/api/reward_api.py Outdated Show resolved Hide resolved

areal/api/reward_api.py Outdated Show resolved Hide resolved

garrett4wade and others added 3 commits November 14, 2025 15:41

Apply suggestion from @gemini-code-assist[bot]

d597ba5

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

fix device count

349fd07

fix

b07d2b1

gemini-code-assist bot reviewed Nov 14, 2025

View reviewed changes

Apply suggestion from @gemini-code-assist[bot]

6624945

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

garrett4wade approved these changes Nov 14, 2025

View reviewed changes

garrett4wade merged commit d78f628 into main Nov 14, 2025
1 check passed

garrett4wade deleted the sxj/reduce_rw_warn branch November 14, 2025 09:01

garrett4wade mentioned this pull request Nov 14, 2025

[BUG] Reward API WARNING: Computing reward timeout after 15s. Set reward to 0.的warning #548

Closed

3 tasks

Conversation

fishcrap commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue

Type of Change

Checklist

Additional Context

Uh oh!

gemini-code-assist bot commented Nov 14, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fishcrap commented Nov 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

fishcrap commented Nov 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fishcrap commented Nov 14, 2025 •

edited

Loading