-
Notifications
You must be signed in to change notification settings - Fork 22.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nn.parallel.gather does not accept scalars (0-dim tensors) of v0.4 #6983
Comments
That's a bug, we should handle this. Thanks for the report! |
This has the same problem as With Thoughts? |
Another possibility would be to change the API to return a list/tuple from dp.gather; from there you could just add the results (works for everything) or concatenate them (works for everything except for scalars), but we could push that problem to other apis instead of data parallel ones. |
Yet another possibility: we could use |
Hmm good point. I don't think we should be adding a flag like this. It would be much simpler to check that no inputs are scalars, and raise a readable error if they are. |
Indeed fixed by #7973. However, your script should be changed a bit. In particular, random_input_scatter = nn.parallel.scatter([random_input], devices) because otherwise each device will see a single tensor in
so your target should also be. random_target = torch.randn((4, 1)) Without these changes it will only work with batch size == num devices. However, considering how subtle this is, I think it makes sense to let |
Issue description
Since v0.4 returns scalar (0-dim tensor) loss, gathering the scalar loss manually raises an error like the example.
Unsqueezing the scalar losses back to 1-dim vector like the previous versions works, but is this an intended behavior of
nn.parallel.gather
?The given parallel GPU code scheme is used in Annotated Transformer implementation.
Code example
System Info
PyTorch version: 0.4.0
Is debug build: No
CUDA used to build PyTorch: 9.0.176
OS: Ubuntu 16.04.4 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
CMake version: version 3.5.1
Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration:
GPU 0: TITAN Xp
GPU 1: TITAN Xp
GPU 2: TITAN Xp
GPU 3: TITAN Xp
Nvidia driver version: 390.30
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.6.0.21
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.0.5
/usr/lib/x86_64-linux-gnu/libcudnn_static.a
/usr/local/MATLAB/R2017b/bin/glnxa64/libcudnn.so.5.1.5
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn.so.6.0.21
/usr/local/cuda-8.0/targets/x86_64-linux/lib/libcudnn_static.a
Versions of relevant libraries:
[pip] msgpack-numpy (0.4.1)
[pip] numpy (1.13.3)
[pip] torch (0.4.0)
[pip] torchfile (0.1.0)
[pip] torchnet (0.0.1)
[pip] torchtext (0.2.3)
[pip] torchvision (0.2.1)
[conda] cuda90 1.0 h6433d27_0 pytorch
[conda] pytorch 0.4.0 py36_cuda9.0.176_cudnn7.1.2_1 [cuda90] pytorch
[conda] torchfile 0.1.0
[conda] torchnet 0.0.1
[conda] torchtext 0.2.3
[conda] torchvision 0.2.1 py36_1 pytorch
The text was updated successfully, but these errors were encountered: