Skip to content

Allow SyncBatchNorm without DDP in inference mode #24815

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

ppwwyyxx
Copy link
Collaborator

Summary: Fix #22538

Differential Revision: D16883694

@pytorchbot pytorchbot added the module: nn Related to torch.nn label Aug 18, 2019
Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this! Shall we add a test for it (SyncBatchNorm without DDP)?

process_group = torch.distributed.group.WORLD
if self.process_group:
process_group = self.process_group
world_size = torch.distributed.get_world_size(process_group)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process_group.size() should work as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the implementation of get_group_size it looks like .size() is not sufficient?

def _get_group_size(group):
"""
Helper that gets a given group's world size
"""
if group is GroupMember.WORLD:
_check_default_pg()
return _default_pg.size()
if group not in _pg_group_ranks:
raise RuntimeError("The given group does not exist")
return len(_pg_group_ranks[group])

Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ppwwyyxx, thanks for adding the fix. Feel free to land if you need this feature urgently, but would really prefer to have a test for it.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ppwwyyxx is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary:
Fix pytorch#22538
Pull Request resolved: pytorch#24815

Test Plan:
Can run a detectron2 evaluation without entering DDP.

#sandcastle

Differential Revision: D16883694

fbshipit-source-id: a9199a82a55ba784a9d80005969cccf32d6b4827
@facebook-github-bot
Copy link
Contributor

@ppwwyyxx merged this pull request in 927fb56.

@ppwwyyxx ppwwyyxx deleted the export-D16883694 branch September 26, 2024 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged module: nn Related to torch.nn
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SyncBatchNorm test mode
5 participants