[FIX] reduce reward timeout warning#579
Conversation
Summary of ChangesHello @fishcrap, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances resource management within the reinforcement learning workflows by introducing explicit control over the concurrency of asynchronous reward functions. By allowing the maximum number of workers for reward calculations to be configured and dynamically determined based on system resources and parallelization strategy, the changes aim to optimize performance and prevent resource-related warnings during model training and evaluation. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a reward_fn_max_workers parameter to control the number of workers for the asynchronous reward function, which is a good improvement for resource management. However, the implementation of calculating this value in the example scripts has a potential issue: os.cpu_count() can return None, which would cause a runtime error. I've added comments with suggestions to handle this case. Additionally, this calculation logic is duplicated across many example files. It would be beneficial to centralize this logic into a helper function to improve maintainability and avoid repeating the same bug.
c3e8f62 to
6dd5bd6
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request aims to reduce reward timeout warnings by intelligently setting a default for max_workers in AsyncRewardWrapper. The approach of calculating workers based on CPU and device count is sound.
However, I've identified a critical issue: the new logic inadvertently initializes the CUDA context in the main process before forking for the ProcessPoolExecutor. This can lead to deadlocks on Linux systems. I've provided details on how to resolve this by using the 'spawn' multiprocessing context.
Additionally, I've suggested a minor improvement for code readability regarding the worker calculation heuristic. Please review the comments for details.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a sensible heuristic to dynamically determine the number of workers for the AsyncRewardWrapper's ProcessPoolExecutor, which should help prevent timeout warnings in distributed training scenarios. The implementation adds a new helper function to safely detect the number of devices and updates the __init__ method to use this information. The changes are logical and well-contained. I've added a couple of suggestions to improve the code's maintainability and debuggability.
| pass | ||
|
|
||
| # Fallback: assume 8 devices for cautious max_workers calculation | ||
| return 8 |
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…nclusionAI#579) * fix reward timeout warning by reducing the number of workers * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: Wei Fu <36355462+garrett4wade@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
…nclusionAI#579) * fix reward timeout warning by reducing the number of workers * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Apply suggestion from @gemini-code-assist[bot] Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: Wei Fu <36355462+garrett4wade@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Description
This PR adds support for controlling the number of workers in
AsyncRewardWrapper's ProcessPoolExecutor to reduce reward timeout warnings.This prevents warnings when ProcessPoolExecutor is created with too many workers (typically when
max_workers=Nonedefaults to CPU count, which can be excessive in distributed training scenarios).Related Issue
N/A
Type of Change
work as expected)
Checklist
jb build docs/gemini review)Breaking Change Details (if applicable):
N/A
Additional Context
N/A
Need help? Check the Contributing Guide or ask in
GitHub Discussions!