Wrong conv2d output on GPU when kernel has many zeros #35655
Labels
module: convolution
Problems related to convolutions (THNN, THCUNN, CuDNN)
module: cuda
Related to torch.cuda, and CUDA support in general
module: dependency bug
Problem is not caused by us, but caused by an upstream library we use
module: numerical-stability
Problems related to numerical stability of operations
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
馃悰 Bug
When kernel has many zeros (e.g. in a masked conv), conv2d output is wrong on GPU.
To Reproduce
Here's a test function, in which I convolve x
0 0 0 0 0
0 0 0 0 0
0 0 1 1 1
0 0 1 1 1
0 0 1 1 1
with k
1 1 1 1 1
1 1 1 1 1
1 1 0 0 0
1 1 0 0 0
1 1 0 0 0
On CPU, the output is 0 (expected). On GPU, the output varies depending on channel count of x.
Environment
PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.0
OS: Ubuntu 18.04.1 LTS
GCC version: (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0
CMake version: Could not collect
Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB
GPU 4: Tesla V100-SXM2-32GB
GPU 5: Tesla V100-SXM2-32GB
GPU 6: Tesla V100-SXM2-32GB
GPU 7: Tesla V100-SXM2-32GB
Nvidia driver version: 410.79
cuDNN version: Could not collect
Versions of relevant libraries:
[pip] numpy==1.18.1
[pip] torch==1.4.0
[pip] torchvision==0.5.0
[conda] blas 1.0 mkl
[conda] mkl 2020.0 166
[conda] mkl-service 2.3.0 py36he904b0f_0
[conda] mkl_fft 1.0.15 py36ha843d7b_0
[conda] mkl_random 1.1.0 py36hd6b4f25_0
[conda] pytorch 1.4.0 py3.6_cuda10.0.130_cudnn7.6.3_0 pytorch
[conda] torchvision 0.5.0 py36_cu100 pytorch
cc @ezyang @gchanan @zou3519 @ngimel
The text was updated successfully, but these errors were encountered: