New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve new_group example in the context of SyncBatchNorm #48897
Conversation
💊 CI failures summary and remediationsAs of commit ff8d56c (more details on the Dr. CI page):
🕵️ 5 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages: pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc5_4_build (1/5)Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)
|
torch/nn/modules/batchnorm.py
Outdated
@@ -435,7 +435,13 @@ class SyncBatchNorm(_BatchNorm): | |||
>>> m = nn.SyncBatchNorm(100) | |||
>>> # creating process group (optional) | |||
>>> # process_ids is a list of int identifying rank ids. | |||
>>> process_group = torch.distributed.new_group(process_ids) | |||
>>> process_ids = list(range(8)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe call these ranks
instead of process_ids since it might be confused with OS pids.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rohan-varma has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Lint failure is unrelated: ERROR: Could not open requirements file: [Errno 2] No such file or directory: 'requirements-flake8.txt' |
@rohan-varma merged this pull request in c0a0845. |
Closes #48804
Improves some documentation/example in SyncBN docs to clearly show that each rank must call into all
new_group()
calls for creating process subgroups, even if they are not going to be part of that particular subgroup.We then pick the right group, i.e. the group that the rank is part of, and pass that into the SyncBN APIs.
Doc rendering: