Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ray] Use num_workers API for Horovod Ray #2870

Merged
merged 15 commits into from
Apr 29, 2021

Conversation

amogkam
Copy link
Collaborator

@amogkam amogkam commented Apr 26, 2021

Checklist before submitting

  • Did you read the contributor guide?
  • Did you update the docs?
  • Did you write any tests to validate this change?
  • Did you update the CHANGELOG, if this change affects users?

Description

Closes #2702

Review process to land

  1. All tests and other checks must succeed.
  2. At least one member of the technical steering committee must review and approve.
  3. If any member of the technical steering committee requests changes, they must be addressed.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@amogkam amogkam changed the title Use num_workers API for Horovod Ray [Ray] Use num_workers API for Horovod Ray Apr 26, 2021
amogkam and others added 9 commits April 27, 2021 01:03
Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>
Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>
Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>
Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>
@amogkam amogkam marked this pull request as ready for review April 27, 2021 08:05
@amogkam
Copy link
Collaborator Author

amogkam commented Apr 27, 2021

@richardliaw this is ready for review

@github-actions

This comment has been minimized.

Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>
@github-actions

This comment has been minimized.

Signed-off-by: Amog Kamsetty <amogkamsetty@yahoo.com>
@github-actions

This comment has been minimized.

@tgaddair
Copy link
Collaborator

Awesome work @amogkam! Is this working with the cross_size issue we discussed @richardliaw ?

Comment on lines 189 to 198
raise DeprecationWarning("`num_slots` is now deprecated. Please "
"use the `num_workers` API, "
"or to enforce an equal number of "
"workers on each node, set "
"`num_hosts` and `num_workers_per_host`")
if cpus_per_slot or gpus_per_slot:
raise DeprecationWarning("`cpus_per_slot` and `gpus_per_slot` "
"have been deprecated. Use "
"`cpus_per_worker` and "
"`gpus_per_worker` instead.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for 1 release, can we make this a soft warning (warnings.warn) and then in the later release make this a hard warning?

This will make it much easier to enable migration,

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
@richardliaw
Copy link
Collaborator

@tgaddair yep! just added a test for that too.

Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
@richardliaw richardliaw self-assigned this Apr 29, 2021
Signed-off-by: Richard Liaw <rliaw@berkeley.edu>
@github-actions

This comment has been minimized.

Copy link
Collaborator

@richardliaw richardliaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@richardliaw richardliaw merged commit 39aa85b into horovod:master Apr 29, 2021
@github-actions
Copy link

Unit Test Results

     792 files  ±0       792 suites  ±0   5h 37m 0s ⏱️ ±0s
     588 tests ±0       555 ✔️ ±0       31 💤 ±0  2 ❌ ±0 
16 228 runs  ±0  12 345 ✔️ ±0  3 880 💤 ±0  3 ❌ ±0 

For more details on these failures, see this check.

Results for commit 39aa85b. ± Comparison against base commit 39aa85b.

amogkam added a commit to ray-project/ray_lightning that referenced this pull request Jan 6, 2022
Update to use latest Horovod-Ray API (horovod/horovod#2870)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

[ray] Support 'num_gpus' for Horovod
3 participants