FP16 results in "Floating point exception" #14119

ngoyal2707 · 2018-11-16T20:37:30Z

🐛 Bug

FP16 results in "Floating point exception" with following minimal example. See example below.

To Reproduce

Steps to reproduce the behavior:

>>> import torch
>>> from torch import nn
>>> x = torch.rand(1, 1, 188, 621).cuda().half()
>>> conv1 = nn.Conv2d(1, 64, kernel_size=1, bias=False).cuda().half()
>>> conv2 = nn.Conv2d(64, 64, kernel_size=3, stride=2, padding=1, bias=False).cuda().half()
>>> loss = conv2(conv1(x))
>>> loss.sum().backward()
Floating point exception

Expected behavior

It works correctly on fp32 calculation as following:

>>> x = torch.rand(1, 1, 188, 621).cuda()
>>> conv1 = nn.Conv2d(1, 64, kernel_size=1, bias=False).cuda()
>>> conv2 = nn.Conv2d(64, 64, kernel_size=3, stride=2, padding=1, bias=False).cuda() 
>>> loss = conv2(conv1(x))
>>> loss.sum().backward()
>>>

Environment

PyTorch version: 1.0.0.dev20181115
Is debug build: No
CUDA used to build PyTorch: 9.2.148

OS: Ubuntu 16.04.4 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
CMake version: version 3.12.2

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 9.2.88
GPU models and configuration: 
GPU 0: Tesla V100-SXM2-16GB
GPU 1: Tesla V100-SXM2-16GB
GPU 2: Tesla V100-SXM2-16GB
GPU 3: Tesla V100-SXM2-16GB
GPU 4: Tesla V100-SXM2-16GB
GPU 5: Tesla V100-SXM2-16GB
GPU 6: Tesla V100-SXM2-16GB
GPU 7: Tesla V100-SXM2-16GB

Nvidia driver version: 396.51
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy (1.15.1)
[pip3] torch-nightly (1.0.0.dev20181115)
[pip3] torchvision (0.2.1)
[pip3] torchvision-nightly (0.2.1)
[conda] magma-cuda92              2.3.0                         1    pytorch
[conda] torch-nightly             1.0.0.dev20181115           <pip>
[conda] torchvision               0.2.1                     <pip>
[conda] torchvision-nightly       0.2.1                     <pip>

The text was updated successfully, but these errors were encountered:

ngimel · 2018-11-16T20:47:14Z

Unfortunately 9.2 nightlies are built with cudnn 7.1.4 that has a known fpe bug. The solution here would be to build nightlies with more recent cudnn versions.

soumith · 2018-11-16T21:41:42Z

I'm updating the CuDNN version today. Hopefully nightlies from tomorrow will have this fixed.

soumith · 2018-11-19T18:35:18Z

tomorrow's nightlies should fix this, cudnn versipn has been bumped

soumith closed this as completed Nov 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FP16 results in "Floating point exception" #14119

FP16 results in "Floating point exception" #14119

ngoyal2707 commented Nov 16, 2018

ngimel commented Nov 16, 2018

soumith commented Nov 16, 2018

soumith commented Nov 19, 2018

FP16 results in "Floating point exception" #14119

FP16 results in "Floating point exception" #14119

Comments

ngoyal2707 commented Nov 16, 2018

🐛 Bug

To Reproduce

Expected behavior

Environment

ngimel commented Nov 16, 2018

soumith commented Nov 16, 2018

soumith commented Nov 19, 2018