Add process set support for MXNet #3043
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Checklist before submitting
Description
This is a straightforward extension of #2839, adding the process set feature to the MXNet API.
Included are the ops
allgather
,allreduce
,alltoall
,broadcast
, andgrouped_allreduce
, as well as theDistributedOptimizer
.There's also the
DistributedTrainer
that I didn't look into. I don't have much experience working with MXNet, but it seems that a Gluon Trainer would be a higher level concept meant to operate on an entire neural net. In that case adding an overallprocess_set
argument might not be the right call as users will typically want to non-globally aggregate the gradients for just a subset of their parameters, not for the entire model.