Wrong conv2d output on GPU when kernel has many zeros #35655

jtxapl · 2020-03-30T04:23:05Z

🐛 Bug

When kernel has many zeros (e.g. in a masked conv), conv2d output is wrong on GPU.

To Reproduce

Here's a test function, in which I convolve x

0 0 0 0 0
0 0 0 0 0
0 0 1 1 1
0 0 1 1 1
0 0 1 1 1

with k

1 1 1 1 1
1 1 1 1 1
1 1 0 0 0
1 1 0 0 0
1 1 0 0 0

On CPU, the output is 0 (expected). On GPU, the output varies depending on channel count of x.

import torch
def test(c): 
    x = torch.ones(1, c, 5, 5) 
    x[:,:,:2]=0 
    x[:,:,:,:2]=0 
    k = 1-x 
    print(torch.nn.functional.conv2d(x.cuda(), k.cuda())) 
    print(torch.nn.functional.conv2d(x.cpu(), k.cpu())) 

test(32)
tensor([[[[-1.9055e-05]]]], device='cuda:0')
tensor([[[[0.]]]])

test(320)                                                                       
tensor([[[[-0.1693]]]], device='cuda:0')
tensor([[[[0.]]]])

test(640)                                                                       
tensor([[[[5.0290]]]], device='cuda:0')
tensor([[[[0.]]]])

Environment

PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.0

OS: Ubuntu 18.04.1 LTS
GCC version: (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0
CMake version: Could not collect

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB
GPU 4: Tesla V100-SXM2-32GB
GPU 5: Tesla V100-SXM2-32GB
GPU 6: Tesla V100-SXM2-32GB
GPU 7: Tesla V100-SXM2-32GB

Nvidia driver version: 410.79
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] numpy==1.18.1
[pip] torch==1.4.0
[pip] torchvision==0.5.0
[conda] blas 1.0 mkl
[conda] mkl 2020.0 166
[conda] mkl-service 2.3.0 py36he904b0f_0
[conda] mkl_fft 1.0.15 py36ha843d7b_0
[conda] mkl_random 1.1.0 py36hd6b4f25_0
[conda] pytorch 1.4.0 py3.6_cuda10.0.130_cudnn7.6.3_0 pytorch
[conda] torchvision 0.5.0 py36_cu100 pytorch

cc @ezyang @gchanan @zou3519 @ngimel

The text was updated successfully, but these errors were encountered:

ailzhang · 2020-03-30T22:45:02Z

I can confirm this is reproducible on master.
Marking as high pri for CUDA to fix.

ngimel · 2020-03-31T00:38:36Z

On my device I had the output

tensor([[[[0.]]]], device='cuda:0')
tensor([[[[0.]]]])
tensor([[[[0.]]]], device='cuda:0')
tensor([[[[0.]]]])
tensor([[[[5.0290]]]], device='cuda:0')
tensor([[[[0.]]]])

only the last convolution produced the wrong result. Looking at the profile, winograd algorithm was used in this case, and since it is first doing transforms of the input and output it is expected that the accuracy is going to be lower, especially for a big kernel like 5x5.
You can work around this by setting

torch.backends.cudnn.deterministic=True

This will cause implicit-gemm based algorithm to be used, and it is going to produce more accurate result.

jtxapl · 2020-03-31T22:43:03Z

tensor([[[[5.0290]]]], device='cuda:0')
tensor([[[[0.]]]])


only the last convolution produced the wrong result. Looking at the profile, winograd algorithm was used in this case, and since it is first doing transforms of the input and output it is expected that the accuracy is going to be lower, especially for a big kernel like 5x5.

But I feel 5 is a bit too much? Also, what's so special about 640 channels? The error is much smaller elsewhere. Could there be some other hidden bugs?

In [2]: test(320)
tensor([[[[-0.1693]]]], device='cuda:0')
tensor([[[[0.]]]])

In [3]: test(640)
tensor([[[[5.0290]]]], device='cuda:0')
tensor([[[[0.]]]])

In [4]: test(960)
tensor([[[[0.0085]]]], device='cuda:0')
tensor([[[[0.]]]])

In [5]: test(1024)
tensor([[[[0.0150]]]], device='cuda:0')
tensor([[[[0.]]]])

You can work around this by setting
torch.backends.cudnn.deterministic=True
This will cause implicit-gemm based algorithm to be used, and it is going to produce more accurate result.

I confirm this works

ailzhang added high priority module: cuda Related to torch.cuda, and CUDA support in general labels Mar 30, 2020

pytorch-probot bot added the triage review label Mar 30, 2020

ezyang added the module: convolution Problems related to convolutions (THNN, THCUNN, CuDNN) label Mar 31, 2020

ailzhang removed high priority triage review labels Mar 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong conv2d output on GPU when kernel has many zeros #35655

Wrong conv2d output on GPU when kernel has many zeros #35655

jtxapl commented Mar 30, 2020 •

edited by pytorch-probot bot

ailzhang commented Mar 30, 2020

ngimel commented Mar 31, 2020

jtxapl commented Mar 31, 2020

Wrong conv2d output on GPU when kernel has many zeros #35655

Wrong conv2d output on GPU when kernel has many zeros #35655

Comments

jtxapl commented Mar 30, 2020 • edited by pytorch-probot bot

🐛 Bug

To Reproduce

Environment

ailzhang commented Mar 30, 2020

ngimel commented Mar 31, 2020

jtxapl commented Mar 31, 2020

jtxapl commented Mar 30, 2020 •

edited by pytorch-probot bot