-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Client] Add warnings when user schedules many tasks with ray client #16454
[Client] Add warnings when user schedules many tasks with ray client #16454
Conversation
python/ray/util/client/worker.py
Outdated
logger.warning( | ||
f"More than {TASK_WARNING_THRESHOLD} remote tasks have been " | ||
"scheduled. This can be slow on Ray Client due to " | ||
"communication overhead. If you're running many fine-grained " | ||
"tasks consider batching them (details in the Ray Design " | ||
"Pattern document).") | ||
self.warning_raised = True | ||
if not self.warning_raised and \ | ||
self.total_outbound_message_size > MESSAGE_SIZE_THRESHOLD: | ||
logger.warning( | ||
"More than 10MB of messages have been created to schedule " | ||
"tasks on the server. If you're running many fine-grained " | ||
"tasks consider batching them (details in the Ray Design " | ||
"Pattern document). If you have large arguments that are " | ||
"frequently reused, consider storing them remotely with " | ||
"ray.put or wrapping them in an actor object.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richardliaw , does this warning look right to you?
Robert suggested linking to the right section in the doc or some clear simple action steps for the user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM now!
python/ray/util/client/worker.py
Outdated
# Link to the Ray Design Pattern doc to use in the task overhead warning | ||
# message | ||
DESIGN_PATTERN_DOC_LINK = \ | ||
"https://docs.google.com/document/d/167rnnDFIVRhHhK4mznEIemOtj63IOhtIPvSYaPgI4Fg/" # noqa E501 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please use this link (which goes directly to the subsection):
https://docs.google.com/document/d/167rnnDFIVRhHhK4mznEIemOtj63IOhtIPvSYaPgI4Fg/edit#heading=h.f7ins22n6nyl
python/ray/util/client/worker.py
Outdated
"objects remotely with ray.put. An example of this is shown " | ||
"in the \"Closure capture of large / unserializable object\" " | ||
"section of the Ray Design Patterns document, available here: " | ||
f"{DESIGN_PATTERN_DOC_LINK}", UserWarning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need another link here, which points to the exact subsection.
https://docs.google.com/document/d/167rnnDFIVRhHhK4mznEIemOtj63IOhtIPvSYaPgI4Fg/edit#heading=h.1afmymq455wu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left a few final comments.
Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
4b6cdbd
to
80fb2f0
Compare
Reviewers can merge at their discretion |
Thanks @ckw017 ! |
…16454) * Add warnings when user schedules many tasks with ray client * add test_client_warnings to BUILD * better variable names * use util.debug.log_once() * batching -> explanation of batching * Switch to warnings.warn * Add links to Ray Design Pattern doc with code snippets * Cleaner linking and refer to sections directly * Better testNoWarning * add sys.exit(pytest.main(...)) * Update python/ray/util/client/worker.py Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu> * Update python/ray/util/client/worker.py Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu> * better error messages * Switch links to new readthedocs sections * Revert "Switch links to new readthedocs sections" This reverts commit d3785bf. Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
Why are these changes needed?
Using a lot of f.remote() calls with ray client can be extremely slow, so raises a one time warning can be useful to alert the user.
Raises a warning if the total number of tasks scheduled exceeds 1000 or the total size of ClientTask messages passed to the server exceeds 10MB since the beginning of the session. This warning will only be raised at most once per connection.
Related issue number
Closes #15009
Checks
scripts/format.sh
to lint the changes in this PR.