-
Notifications
You must be signed in to change notification settings - Fork 25k
Allow SyncBatchNorm without DDP in inference mode #24815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this! Shall we add a test for it (SyncBatchNorm without DDP)?
process_group = torch.distributed.group.WORLD | ||
if self.process_group: | ||
process_group = self.process_group | ||
world_size = torch.distributed.get_world_size(process_group) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
process_group.size()
should work as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the implementation of get_group_size
it looks like .size()
is not sufficient?
pytorch/torch/distributed/distributed_c10d.py
Lines 194 to 204 in dfdb86a
def _get_group_size(group): | |
""" | |
Helper that gets a given group's world size | |
""" | |
if group is GroupMember.WORLD: | |
_check_default_pg() | |
return _default_pg.size() | |
if group not in _pg_group_ranks: | |
raise RuntimeError("The given group does not exist") | |
return len(_pg_group_ranks[group]) |
f45d957
to
01c5124
Compare
01c5124
to
dc2d7c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ppwwyyxx, thanks for adding the fix. Feel free to land if you need this feature urgently, but would really prefer to have a test for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ppwwyyxx is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Fix pytorch#22538 Pull Request resolved: pytorch#24815 Test Plan: Can run a detectron2 evaluation without entering DDP. #sandcastle Differential Revision: D16883694 fbshipit-source-id: a9199a82a55ba784a9d80005969cccf32d6b4827
dc2d7c2
to
e8a5a27
Compare
Summary: Fix #22538
Differential Revision: D16883694