-
Notifications
You must be signed in to change notification settings - Fork 21.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataParallel with Torch 1.5 #40457
Comments
This appears to be a regression so I am tentatively labeling it as high-pri. |
This is known regression. After #33907, |
If you really need to access those parameters, one hacky solution is to read from the pytorch/torch/nn/parallel/distributed.py Lines 344 to 354 in 8066fba
|
cc @ngimel |
Thank you for answering. I don't need to access those parameters directly, but this issue/bug caused a crash when I was using Huggingface Transformers package. I downgraded to 1.4. it worked out just fine. |
is there an issue for the intersection of this and HuggingFace Transformers? |
According to @ngimel, HuggingFace already has an update to deal with this BC breakage. |
I run into a similar issue. It turns out self.parameters() is called when figuring out which gpu is used. In my case, I make the following change to the implementation in hugging face. It works.
|
@wmmxk Thanks for this heads up! I can confirm this is a bug and your fix (as well as an additional fix, noted below) resolves the issue. Two changes I made: In transformers/generation_utils.py, change |
馃悰 Bug
I tried to leverage multi-gpu using nn.DataParallel. I got an error with torch 1.5, but the same code work will work with torch 1.4.
To Reproduce
I tested it with the code in this tutorial from PyTorch.org
Following code can be used to reproduce the error:
And i got the following error message:
Expected behavior
With torch 1.4, i got the following output without any error.
Environment
Collecting environment information...
PyTorch version: 1.5.1
Is debug build: No
CUDA used to build PyTorch: 10.2
OS: Ubuntu 18.04.3 LTS
GCC version: (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
CMake version: Could not collect
Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.2.89
GPU models and configuration:
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB
Nvidia driver version: 418.87.01
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
Versions of relevant libraries:
[pip3] numpy==1.18.5
[pip3] torch==1.5.1
[conda] Could not collect
cc @ezyang @gchanan @zou3519
The text was updated successfully, but these errors were encountered: