🐛 Bug
When the input is a torch.float16 tensor and all values are 0, the torch.nn.functional.layer_norm function returns nan. It can be repro in pytorch 1.4.0 and pytorch 1.5.1 (haven't tried newer version), while pytorch 1.3.1 has no problem (return all 0 tensor).
To Reproduce
The below sample code can repro: the last line prints a tensor with all nan values.
import torch
dim = 10
input = torch.zeros(1, dim).float().cuda()
normalized_shape = tuple((dim,))
weight = torch.ones(dim).float().cuda()
bias = torch.zeros(dim).float().cuda()
eps=1e-12
output = torch.nn.functional.layer_norm(input, normalized_shape, weight, bias, eps)
print(output)
output = torch.nn.functional.layer_norm(input.half(), normalized_shape, weight.half(), bias.half(), eps)
print(output)
Expected behavior
Current print results:
tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], device='cuda:0')
tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]], device='cuda:0', dtype=torch.float16)
Expected to be
tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], device='cuda:0')
tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], device='cuda:0', dtype=torch.float16)
Environment
Collecting environment information...
PyTorch version: 1.5.1
Is debug build: No
CUDA used to build PyTorch: 10.2
OS: Enterprise Linux Server 7.2 (Paladin)
GCC version: (GCC) 5.4.0
CMake version: Could not collect
Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.2.89
GPU models and configuration:
GPU 0: Tesla V100-PCIE-32GB
GPU 1: Tesla V100-PCIE-32GB
GPU 2: Tesla V100-PCIE-32GB
GPU 3: Tesla V100-PCIE-32GB
Nvidia driver version: 440.33.01
cuDNN version: /home/common/cuda_toolkit/cudnn_7.4.2/cuda10.0/lib64/libcudnn.so.7.4.2
Versions of relevant libraries:
[pip] Could not collect
[conda] cudatoolkit 10.2.89 hfd86e86_1 anaconda
[conda] numpy 1.18.1 pypi_0 pypi
[conda] pytorch-memlab 0.1.0 pypi_0 pypi
[conda] torch 1.5.1 pypi_0 pypi
[conda] torchvision 0.6.1 pypi_0 pypi
Additional context
cc @ezyang @gchanan @zou3519 @albanD @mruberry
🐛 Bug
When the input is a torch.float16 tensor and all values are 0, the torch.nn.functional.layer_norm function returns nan. It can be repro in pytorch 1.4.0 and pytorch 1.5.1 (haven't tried newer version), while pytorch 1.3.1 has no problem (return all 0 tensor).
To Reproduce
The below sample code can repro: the last line prints a tensor with all nan values.
import torch
dim = 10
input = torch.zeros(1, dim).float().cuda()
normalized_shape = tuple((dim,))
weight = torch.ones(dim).float().cuda()
bias = torch.zeros(dim).float().cuda()
eps=1e-12
output = torch.nn.functional.layer_norm(input, normalized_shape, weight, bias, eps)
print(output)
output = torch.nn.functional.layer_norm(input.half(), normalized_shape, weight.half(), bias.half(), eps)
print(output)
Expected behavior
Current print results:
tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], device='cuda:0')
tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]], device='cuda:0', dtype=torch.float16)
Expected to be
tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], device='cuda:0')
tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], device='cuda:0', dtype=torch.float16)
Environment
Collecting environment information...
PyTorch version: 1.5.1
Is debug build: No
CUDA used to build PyTorch: 10.2
OS: Enterprise Linux Server 7.2 (Paladin)
GCC version: (GCC) 5.4.0
CMake version: Could not collect
Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.2.89
GPU models and configuration:
GPU 0: Tesla V100-PCIE-32GB
GPU 1: Tesla V100-PCIE-32GB
GPU 2: Tesla V100-PCIE-32GB
GPU 3: Tesla V100-PCIE-32GB
Nvidia driver version: 440.33.01
cuDNN version: /home/common/cuda_toolkit/cudnn_7.4.2/cuda10.0/lib64/libcudnn.so.7.4.2
Versions of relevant libraries:
[pip] Could not collect
[conda] cudatoolkit 10.2.89 hfd86e86_1 anaconda
[conda] numpy 1.18.1 pypi_0 pypi
[conda] pytorch-memlab 0.1.0 pypi_0 pypi
[conda] torch 1.5.1 pypi_0 pypi
[conda] torchvision 0.6.1 pypi_0 pypi
Additional context
cc @ezyang @gchanan @zou3519 @albanD @mruberry