Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FP16 results in "Floating point exception" #14119

Closed
ngoyal2707 opened this issue Nov 16, 2018 · 3 comments
Closed

FP16 results in "Floating point exception" #14119

ngoyal2707 opened this issue Nov 16, 2018 · 3 comments

Comments

@ngoyal2707
Copy link

🐛 Bug

FP16 results in "Floating point exception" with following minimal example. See example below.

To Reproduce

Steps to reproduce the behavior:

>>> import torch
>>> from torch import nn
>>> x = torch.rand(1, 1, 188, 621).cuda().half()
>>> conv1 = nn.Conv2d(1, 64, kernel_size=1, bias=False).cuda().half()
>>> conv2 = nn.Conv2d(64, 64, kernel_size=3, stride=2, padding=1, bias=False).cuda().half()
>>> loss = conv2(conv1(x))
>>> loss.sum().backward()
Floating point exception

Expected behavior

It works correctly on fp32 calculation as following:

>>> x = torch.rand(1, 1, 188, 621).cuda()
>>> conv1 = nn.Conv2d(1, 64, kernel_size=1, bias=False).cuda()
>>> conv2 = nn.Conv2d(64, 64, kernel_size=3, stride=2, padding=1, bias=False).cuda() 
>>> loss = conv2(conv1(x))
>>> loss.sum().backward()
>>>

Environment

PyTorch version: 1.0.0.dev20181115
Is debug build: No
CUDA used to build PyTorch: 9.2.148

OS: Ubuntu 16.04.4 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
CMake version: version 3.12.2

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 9.2.88
GPU models and configuration: 
GPU 0: Tesla V100-SXM2-16GB
GPU 1: Tesla V100-SXM2-16GB
GPU 2: Tesla V100-SXM2-16GB
GPU 3: Tesla V100-SXM2-16GB
GPU 4: Tesla V100-SXM2-16GB
GPU 5: Tesla V100-SXM2-16GB
GPU 6: Tesla V100-SXM2-16GB
GPU 7: Tesla V100-SXM2-16GB

Nvidia driver version: 396.51
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy (1.15.1)
[pip3] torch-nightly (1.0.0.dev20181115)
[pip3] torchvision (0.2.1)
[pip3] torchvision-nightly (0.2.1)
[conda] magma-cuda92              2.3.0                         1    pytorch
[conda] torch-nightly             1.0.0.dev20181115           <pip>
[conda] torchvision               0.2.1                     <pip>
[conda] torchvision-nightly       0.2.1                     <pip>
@ngimel
Copy link
Collaborator

ngimel commented Nov 16, 2018

Unfortunately 9.2 nightlies are built with cudnn 7.1.4 that has a known fpe bug. The solution here would be to build nightlies with more recent cudnn versions.

@soumith
Copy link
Member

soumith commented Nov 16, 2018

I'm updating the CuDNN version today. Hopefully nightlies from tomorrow will have this fixed.

@soumith
Copy link
Member

soumith commented Nov 19, 2018

tomorrow's nightlies should fix this, cudnn versipn has been bumped

@soumith soumith closed this as completed Nov 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants