Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import torch: AttributeError: module 'torch.distributed' has no attribute 'BuiltinCommHookType' #47153

Closed
kshitij12345 opened this issue Oct 31, 2020 · 2 comments
Assignees
Labels
high priority oncall: distributed Add this issue/PR to distributed oncall triage queue triage review

Comments

@kshitij12345
Copy link
Collaborator

kshitij12345 commented Oct 31, 2020

After building the latest master ee0033a with,

USE_DISTRIBUTED=0 USE_GLOO=0 BUILD_TEST=0 USE_CUDA=1 USE_MKLDNN=0 DEBUG=0 python setup.py install

On importing torch, I get the following error,

>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/.conda/envs/PytorchENV/lib/python3.7/site-packages/torch/__init__.py", line 526, in <module>
    from .functional import *
  File "/home/user/.conda/envs/PytorchENV/lib/python3.7/site-packages/torch/functional.py", line 6, in <module>
    import torch.nn.functional as F
  File "/home/user/.conda/envs/PytorchENV/lib/python3.7/site-packages/torch/nn/__init__.py", line 3, in <module>
    from .parallel import DataParallel
  File "/home/user/.conda/envs/PytorchENV/lib/python3.7/site-packages/torch/nn/parallel/__init__.py", line 5, in <module>
    from .distributed import DistributedDataParallel
  File "/home/user/.conda/envs/PytorchENV/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 93, in <module>
    class DistributedDataParallel(Module):
  File "/home/user/.conda/envs/PytorchENV/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 1064, in DistributedDataParallel
    self, comm_hook_type: dist.BuiltinCommHookType
AttributeError: module 'torch.distributed' has no attribute 'BuiltinCommHookType'
>>> 

Previously there was no issue with the same.

Probably related to ee0033a

cc @ezyang @gchanan @zou3519 @bdhirsh @heitorschueroff @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @xush6528 @osalpekar @jiayisuse @agolynski

@facebook-github-bot facebook-github-bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Oct 31, 2020
@jonasteuwen
Copy link
Contributor

Same problem on OS X.

@robieta
Copy link

robieta commented Nov 2, 2020

@SciPioneer This breaks any USE_DISTRIBUTED=0 build.

wayi1 pushed a commit that referenced this issue Nov 2, 2020
… as built-in comm hooks"

Revert the diff because of #47153

Differential Revision: [D24691866](https://our.internmc.facebook.com/intern/diff/D24691866/)

[ghstack-poisoned]
wayi1 pushed a commit that referenced this issue Nov 2, 2020
… as built-in comm hooks"

Revert the diff because of #47153

Original PR issue: C++ DDP Communication Hook #46348

Differential Revision: [D24691866](https://our.internmc.facebook.com/intern/diff/D24691866/)

[ghstack-poisoned]
wayi1 pushed a commit that referenced this issue Nov 2, 2020
… as built-in comm hooks"

Revert the diff because of #47153

Original PR issue: C++ DDP Communication Hook #46348

Differential Revision: [D24691866](https://our.internmc.facebook.com/intern/diff/D24691866/)

ghstack-source-id: 115720415
Pull Request resolved: #47234
facebook-github-bot pushed a commit that referenced this issue Nov 3, 2020
… as built-in comm hooks" (#47234)

Summary:
Pull Request resolved: #47234

Revert the diff because of #47153

Original PR issue: C++ DDP Communication Hook #46348
ghstack-source-id: 115720415

Test Plan: waitforbuildbot

Reviewed By: mrshenli

Differential Revision: D24691866

fbshipit-source-id: 58fe0c45943a2ae2a09fe5d5eac4a4d947586539
@wayi1 wayi1 self-assigned this Nov 3, 2020
@wayi1 wayi1 closed this as completed Nov 3, 2020
wayi1 pushed a commit that referenced this issue Nov 3, 2020
…in comm hooks

This is almost same as #46959, except that in caffe2/torch/nn/parallel/distributed.py, BuiltinCommHookType is imported conditionally, only when dist.is_available(). Otherwise, this Python enum type defined in caffe2/torch/scrc/distributed/c10d/init.cpp cannot be imported, which is similar to another enum type ReduceOp defined in the same file. See #47153

To review the diff on top of #46959, compare V1 vs Latest.

Main Changes in V1 (#46959):
1. Implemented the Pybind part.
2. In the reducer, once the builtin_comm_hook_type is set,  a c++ comm hook instance will be created in Reducer::autograd_hook.
3. Added unit tests for the builit-in comm hooks.

Original PR issue: C++ DDP Communication Hook #46348

Differential Revision: [D24700959](https://our.internmc.facebook.com/intern/diff/D24700959/)

[ghstack-poisoned]
wayi1 pushed a commit that referenced this issue Nov 3, 2020
…in comm hooks

This is almost same as #46959, except that in caffe2/torch/nn/parallel/distributed.py, BuiltinCommHookType is imported conditionally, only when dist.is_available(). Otherwise, this Python enum type defined in caffe2/torch/scrc/distributed/c10d/init.cpp cannot be imported, which is similar to another enum type ReduceOp defined in the same file. See #47153

To review the diff on top of #46959, compare V1 vs Latest.

Main Changes in V1 (#46959):
1. Implemented the Pybind part.
2. In the reducer, once the builtin_comm_hook_type is set,  a c++ comm hook instance will be created in Reducer::autograd_hook.
3. Added unit tests for the builit-in comm hooks.

Original PR issue: C++ DDP Communication Hook #46348

Differential Revision: [D24700959](https://our.internmc.facebook.com/intern/diff/D24700959/)

ghstack-source-id: 115753518
Pull Request resolved: #47270
wayi1 pushed a commit that referenced this issue Nov 3, 2020
…I as built-in comm hooks"


This is almost same as #46959, except that in caffe2/torch/nn/parallel/distributed.py, BuiltinCommHookType is imported conditionally, only when dist.is_available(). Otherwise, this Python enum type defined in caffe2/torch/scrc/distributed/c10d/init.cpp cannot be imported, which is similar to another enum type ReduceOp defined in the same file. See #47153

Main Changes in #46959:
1. Implemented the Pybind part.
2. In the reducer, once the builtin_comm_hook_type is set,  a c++ comm hook instance will be created in Reducer::autograd_hook.
3. Added unit tests for the builit-in comm hooks.

Original PR issue: C++ DDP Communication Hook #46348

Differential Revision: [D24700959](https://our.internmc.facebook.com/intern/diff/D24700959/)

[ghstack-poisoned]
wayi1 pushed a commit that referenced this issue Nov 3, 2020
…in comm hooks

Pull Request resolved: #47270

This is almost same as #46959, except that in caffe2/torch/nn/parallel/distributed.py, BuiltinCommHookType should be imported conditionally, only when dist.is_available(). Otherwise, this Python enum type defined in caffe2/torch/scrc/distributed/c10d/init.cpp cannot be imported. See #47153

I tried to follow another enum type enum type ReduceOp defined in the same file, but did not work, because the C++ enum class is defined torch/lib/c10d library, but BuiltinCommHookType is defined in torch/csrc/distributed library. These two libraries are compiled in two different ways.

To avoid adding typing to distributed package, which can be a new project, I simply removed the arg type of BuiltinCommHookType in this file.

To review the diff on top of #46959, compare V1 vs Latest:
https://www.internalfb.com/diff/D24700959?src_version_fbid=270445741055617

Main Changes in V1 (#46959):
1. Implemented the Pybind part.
2. In the reducer, once the builtin_comm_hook_type is set,  a c++ comm hook instance will be created in Reducer::autograd_hook.
3. Added unit tests for the builit-in comm hooks.

Original PR issue: C++ DDP Communication Hook #46348
ghstack-source-id: 115783237

Differential Revision: [D24700959](https://our.internmc.facebook.com/intern/diff/D24700959/)
facebook-github-bot pushed a commit that referenced this issue Nov 4, 2020
…in comm hooks (#47270)

Summary:
Pull Request resolved: #47270

This is almost same as #46959, except that in caffe2/torch/nn/parallel/distributed.py, BuiltinCommHookType should be imported conditionally, only when dist.is_available(). Otherwise, this Python enum type defined in caffe2/torch/scrc/distributed/c10d/init.cpp cannot be imported. See #47153

I tried to follow another enum type enum type ReduceOp defined in the same file, but did not work, because the C++ enum class is defined torch/lib/c10d library, but BuiltinCommHookType is defined in torch/csrc/distributed library. These two libraries are compiled in two different ways.

To avoid adding typing to distributed package, which can be a new project, I simply removed the arg type of BuiltinCommHookType in this file.

To review the diff on top of #46959, compare V1 vs Latest:
https://www.internalfb.com/diff/D24700959?src_version_fbid=270445741055617

Main Changes in V1 (#46959):
1. Implemented the Pybind part.
2. In the reducer, once the builtin_comm_hook_type is set,  a c++ comm hook instance will be created in Reducer::autograd_hook.
3. Added unit tests for the builit-in comm hooks.

Original PR issue: C++ DDP Communication Hook #46348
ghstack-source-id: 115783237

Test Plan:
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_builtin_ddp_comm_hooks_nccl

//arvr/projects/eye_tracking/Masquerade:python_test

USE_DISTRIBUTED=0 USE_GLOO=0 BUILD_TEST=0 USE_CUDA=1 USE_MKLDNN=0 DEBUG=0 python setup.py install

Reviewed By: mrshenli

Differential Revision: D24700959

fbshipit-source-id: 69f303a48ae275aa856e6e9b50e12ad8602e1c7a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority oncall: distributed Add this issue/PR to distributed oncall triage queue triage review
Projects
None yet
Development

No branches or pull requests

5 participants