Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong conv2d output on GPU when kernel has many zeros #35655

Open
jtxapl opened this issue Mar 30, 2020 · 3 comments
Open

Wrong conv2d output on GPU when kernel has many zeros #35655

jtxapl opened this issue Mar 30, 2020 · 3 comments
Labels
module: convolution Problems related to convolutions (THNN, THCUNN, CuDNN) module: cuda Related to torch.cuda, and CUDA support in general module: dependency bug Problem is not caused by us, but caused by an upstream library we use module: numerical-stability Problems related to numerical stability of operations triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@jtxapl
Copy link

jtxapl commented Mar 30, 2020

馃悰 Bug

When kernel has many zeros (e.g. in a masked conv), conv2d output is wrong on GPU.

To Reproduce

Here's a test function, in which I convolve x

0 0 0 0 0
0 0 0 0 0
0 0 1 1 1
0 0 1 1 1
0 0 1 1 1

with k

1 1 1 1 1
1 1 1 1 1
1 1 0 0 0
1 1 0 0 0
1 1 0 0 0

On CPU, the output is 0 (expected). On GPU, the output varies depending on channel count of x.

import torch
def test(c): 
    x = torch.ones(1, c, 5, 5) 
    x[:,:,:2]=0 
    x[:,:,:,:2]=0 
    k = 1-x 
    print(torch.nn.functional.conv2d(x.cuda(), k.cuda())) 
    print(torch.nn.functional.conv2d(x.cpu(), k.cpu())) 

test(32)
tensor([[[[-1.9055e-05]]]], device='cuda:0')
tensor([[[[0.]]]])

test(320)                                                                       
tensor([[[[-0.1693]]]], device='cuda:0')
tensor([[[[0.]]]])

test(640)                                                                       
tensor([[[[5.0290]]]], device='cuda:0')
tensor([[[[0.]]]])

Environment

PyTorch version: 1.4.0
Is debug build: No
CUDA used to build PyTorch: 10.0

OS: Ubuntu 18.04.1 LTS
GCC version: (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0
CMake version: Could not collect

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Tesla V100-SXM2-32GB
GPU 1: Tesla V100-SXM2-32GB
GPU 2: Tesla V100-SXM2-32GB
GPU 3: Tesla V100-SXM2-32GB
GPU 4: Tesla V100-SXM2-32GB
GPU 5: Tesla V100-SXM2-32GB
GPU 6: Tesla V100-SXM2-32GB
GPU 7: Tesla V100-SXM2-32GB

Nvidia driver version: 410.79
cuDNN version: Could not collect

Versions of relevant libraries:
[pip] numpy==1.18.1
[pip] torch==1.4.0
[pip] torchvision==0.5.0
[conda] blas 1.0 mkl
[conda] mkl 2020.0 166
[conda] mkl-service 2.3.0 py36he904b0f_0
[conda] mkl_fft 1.0.15 py36ha843d7b_0
[conda] mkl_random 1.1.0 py36hd6b4f25_0
[conda] pytorch 1.4.0 py3.6_cuda10.0.130_cudnn7.6.3_0 pytorch
[conda] torchvision 0.5.0 py36_cu100 pytorch

cc @ezyang @gchanan @zou3519 @ngimel

@ailzhang
Copy link
Contributor

I can confirm this is reproducible on master.
Marking as high pri for CUDA to fix.

@ailzhang ailzhang added high priority module: cuda Related to torch.cuda, and CUDA support in general labels Mar 30, 2020
@ngimel
Copy link
Collaborator

ngimel commented Mar 31, 2020

On my device I had the output

tensor([[[[0.]]]], device='cuda:0')
tensor([[[[0.]]]])
tensor([[[[0.]]]], device='cuda:0')
tensor([[[[0.]]]])
tensor([[[[5.0290]]]], device='cuda:0')
tensor([[[[0.]]]])

only the last convolution produced the wrong result. Looking at the profile, winograd algorithm was used in this case, and since it is first doing transforms of the input and output it is expected that the accuracy is going to be lower, especially for a big kernel like 5x5.
You can work around this by setting

torch.backends.cudnn.deterministic=True

This will cause implicit-gemm based algorithm to be used, and it is going to produce more accurate result.

@ezyang ezyang added the module: convolution Problems related to convolutions (THNN, THCUNN, CuDNN) label Mar 31, 2020
@jtxapl
Copy link
Author

jtxapl commented Mar 31, 2020

tensor([[[[5.0290]]]], device='cuda:0')
tensor([[[[0.]]]])


only the last convolution produced the wrong result. Looking at the profile, winograd algorithm was used in this case, and since it is first doing transforms of the input and output it is expected that the accuracy is going to be lower, especially for a big kernel like 5x5.

But I feel 5 is a bit too much? Also, what's so special about 640 channels? The error is much smaller elsewhere. Could there be some other hidden bugs?

In [2]: test(320)
tensor([[[[-0.1693]]]], device='cuda:0')
tensor([[[[0.]]]])

In [3]: test(640)
tensor([[[[5.0290]]]], device='cuda:0')
tensor([[[[0.]]]])

In [4]: test(960)
tensor([[[[0.0085]]]], device='cuda:0')
tensor([[[[0.]]]])

In [5]: test(1024)
tensor([[[[0.0150]]]], device='cuda:0')
tensor([[[[0.]]]])

You can work around this by setting

torch.backends.cudnn.deterministic=True

This will cause implicit-gemm based algorithm to be used, and it is going to produce more accurate result.

I confirm this works

@ezyang ezyang added module: dependency bug Problem is not caused by us, but caused by an upstream library we use module: numerical-stability Problems related to numerical stability of operations triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: convolution Problems related to convolutions (THNN, THCUNN, CuDNN) module: cuda Related to torch.cuda, and CUDA support in general module: dependency bug Problem is not caused by us, but caused by an upstream library we use module: numerical-stability Problems related to numerical stability of operations triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants