Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Client] Add warnings when user schedules many tasks with ray client #16454

Merged
merged 16 commits into from
Jun 18, 2021

Conversation

ckw017
Copy link
Member

@ckw017 ckw017 commented Jun 15, 2021

Why are these changes needed?

Using a lot of f.remote() calls with ray client can be extremely slow, so raises a one time warning can be useful to alert the user.

Raises a warning if the total number of tasks scheduled exceeds 1000 or the total size of ClientTask messages passed to the server exceeds 10MB since the beginning of the session. This warning will only be raised at most once per connection.

Screen Shot 2021-06-15 at 4 23 15 PM

Related issue number

Closes #15009

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@ckw017 ckw017 requested a review from AmeerHajAli June 16, 2021 16:47
@ckw017 ckw017 changed the title [wip] Add warnings when user schedules many tasks with ray client [Client] Add warnings when user schedules many tasks with ray client Jun 16, 2021
@AmeerHajAli AmeerHajAli requested a review from ijrsvt June 16, 2021 16:58
Comment on lines 341 to 356
logger.warning(
f"More than {TASK_WARNING_THRESHOLD} remote tasks have been "
"scheduled. This can be slow on Ray Client due to "
"communication overhead. If you're running many fine-grained "
"tasks consider batching them (details in the Ray Design "
"Pattern document).")
self.warning_raised = True
if not self.warning_raised and \
self.total_outbound_message_size > MESSAGE_SIZE_THRESHOLD:
logger.warning(
"More than 10MB of messages have been created to schedule "
"tasks on the server. If you're running many fine-grained "
"tasks consider batching them (details in the Ray Design "
"Pattern document). If you have large arguments that are "
"frequently reused, consider storing them remotely with "
"ray.put or wrapping them in an actor object.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardliaw , does this warning look right to you?

@AmeerHajAli
Copy link
Contributor

Robert suggested linking to the right section in the doc or some clear simple action steps for the user.

Copy link
Contributor

@ijrsvt ijrsvt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM now!

# Link to the Ray Design Pattern doc to use in the task overhead warning
# message
DESIGN_PATTERN_DOC_LINK = \
"https://docs.google.com/document/d/167rnnDFIVRhHhK4mznEIemOtj63IOhtIPvSYaPgI4Fg/" # noqa E501
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please use this link (which goes directly to the subsection):
https://docs.google.com/document/d/167rnnDFIVRhHhK4mznEIemOtj63IOhtIPvSYaPgI4Fg/edit#heading=h.f7ins22n6nyl

"objects remotely with ray.put. An example of this is shown "
"in the \"Closure capture of large / unserializable object\" "
"section of the Ray Design Patterns document, available here: "
f"{DESIGN_PATTERN_DOC_LINK}", UserWarning)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@AmeerHajAli AmeerHajAli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left a few final comments.

@AmeerHajAli AmeerHajAli added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jun 17, 2021
@ckw017 ckw017 removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jun 17, 2021
@ckw017
Copy link
Member Author

ckw017 commented Jun 18, 2021

Reviewers can merge at their discretion

@AmeerHajAli AmeerHajAli merged commit c91a1b1 into ray-project:master Jun 18, 2021
@AmeerHajAli
Copy link
Contributor

Thanks @ckw017 !

DmitriGekhtman pushed a commit that referenced this pull request Jun 21, 2021
…16454)

* Add warnings when user schedules many tasks with ray client

* add test_client_warnings to BUILD

* better variable names

* use util.debug.log_once()

* batching -> explanation of batching

* Switch to warnings.warn

* Add links to Ray Design Pattern doc with code snippets

* Cleaner linking and refer to sections directly

* Better testNoWarning

* add sys.exit(pytest.main(...))

* Update python/ray/util/client/worker.py

Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>

* Update python/ray/util/client/worker.py

Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>

* better error messages

* Switch links to new readthedocs sections

* Revert "Switch links to new readthedocs sections"

This reverts commit d3785bf.

Co-authored-by: Ameer Haj Ali <ameerh@berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

raise a warning when multiple f.remote() calls are made with ray client
5 participants