Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add process set support for MXNet #3043

Merged
merged 4 commits into from Jul 23, 2021
Merged

Conversation

maxhgerlach
Copy link
Collaborator

Checklist before submitting

  • Did you read the contributor guide?
  • Did you update the docs?
  • Did you write any tests to validate this change?
  • Did you update the CHANGELOG, if this change affects users?

Description

This is a straightforward extension of #2839, adding the process set feature to the MXNet API.

Included are the ops allgather, allreduce, alltoall, broadcast, and grouped_allreduce, as well as the DistributedOptimizer.

There's also the DistributedTrainer that I didn't look into. I don't have much experience working with MXNet, but it seems that a Gluon Trainer would be a higher level concept meant to operate on an entire neural net. In that case adding an overall process_set argument might not be the right call as users will typically want to non-globally aggregate the gradients for just a subset of their parameters, not for the entire model.

Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
Signed-off-by: Max H. Gerlach <git@maxgerlach.de>
@tgaddair tgaddair merged commit 0a42194 into horovod:master Jul 23, 2021
@github-actions
Copy link

github-actions bot commented Jul 23, 2021

Unit Test Results

     784 files  ±0       784 suites  ±0   6h 37m 3s ⏱️ ±0s
     639 tests ±0       596 ✔️ ±0       42 💤 ±0  1 ❌ ±0 
16 088 runs  ±0  12 458 ✔️ ±0  3 629 💤 ±0  1 ❌ ±0 

For more details on these failures, see this check.

Results for commit 0a42194. ± Comparison against base commit 0a42194.

♻️ This comment has been updated with latest results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants