Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch.norm produces incorrect results #20551

Open
ecvgit opened this issue May 15, 2019 · 9 comments
Open

torch.norm produces incorrect results #20551

ecvgit opened this issue May 15, 2019 · 9 comments
Labels
module: norms and normalization module: numerical-reproducibility triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@ecvgit
Copy link

ecvgit commented May 15, 2019

馃悰 Bug

torch.norm gives incorrect results on CPU in the latest nightly build as well as in 1.1.0 stable.

To Reproduce


>>>  import torch
>>> a=torch.rand(2000,2000,64)
>>> b=torch.norm(a)
>>> c=torch.norm(a.cuda())
>>> b,c
(tensor(5792.6187), tensor(9237.8311, device='cuda:0'))

Expected behavior

Both b and c should have the same values.

Environment

PyTorch version: 1.1.0.dev20190514
Is debug build: No
CUDA used to build PyTorch: 9.0.176

OS: Red Hat Enterprise Linux Server release 7.4 (Maipo)
GCC version: (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16)
CMake version: version 2.8.12.2

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Tesla K40m
GPU 1: Tesla K40m

Nvidia driver version: 387.26
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] msgpack-numpy==0.4.1
[pip3] numpy==1.14.3
[pip3] torch==0.4.0
[pip3] torchtext==0.2.3
[pip3] torchvision==0.2.1
[conda] blas 1.0 mkl
[conda] mkl 2018.0.2 1
[conda] mkl_fft 1.0.1 py36h3010b51_0
[conda] mkl_random 1.0.1 py36h629b387_0
[conda] pytorch-nightly 1.1.0.dev20190514 py3.6_cuda9.0.176_cudnn7.5.1_0 pytorch
[conda] torchtext 0.2.3

Additional context

@soumith
Copy link
Member

soumith commented May 15, 2019

i think this is float precision issues.

in float64, it seems to be working fine:

In [1]: import torch

In [2]: a=torch.rand(2000,2000,64)
a[0
In [3]: a[0][0][0]
Out[3]: tensor(0.4834)

In [4]: a=torch.rand(2000,2000,64, dtype=torch.float64)

In [5]: b=torch.norm(a)

In [6]: c=torch.norm(a.cuda())

In [7]: b, c
Out[7]:
(tensor(9237.5918, dtype=torch.float64),
 tensor(9237.5918, device='cuda:0', dtype=torch.float64))

@soumith
Copy link
Member

soumith commented May 15, 2019

cc: @umanwizard to double-check, i think you made norm TensorIterator compatible if I remember (sorry if it wasn't you)

@umanwizard
Copy link
Contributor

No, it was done by @jjsjann123 in #15414

@ecvgit
Copy link
Author

ecvgit commented May 15, 2019

The code above gives correct results in pytorch 0.4, but not in pytorch 1.1.0. If it is a float precision issue, not sure why it works correctly in pytorch 0.4

@colesbury
Copy link
Member

If it is a float precision issue, not sure why it works correctly in pytorch 0.4

PyTorch 0.4 did the accumulation using double https://github.com/pytorch/pytorch/blob/v0.4.1/aten/src/TH/generic/THTensorMath.cpp#L4307

Now it's using float accumulation:
https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cpu/ReduceOpsKernel.cpp#L57

CUDA uses float accumulation, but is saved because the necessary parallelism forces a form of pairwise summation. We should probably do the same thing for CPU.

@jjsjann123
Copy link
Collaborator

Like Sam mentioned, numerical behaviors are very differently on CPU/GPU because of the level of parallelism. Sacrifices like this (using double instead of float) is necessary for CPU, but it is not on CUDA code and would hurt performance.
We probably could plumb some logic through so we use different accumulation on each device. But really the shared code between CPU/GPU kernels are not significant. It might be a easier to just keep CPU/GPU kernels separate.

@colesbury
Copy link
Member

@jjsjann123 I was suggesting something different: that we use pairwise summation on the CPU in reduction kernels -- not that we switch to double accumulation.

@li-roy li-roy added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 16, 2019
@mruberry
Copy link
Collaborator

This is still an issue.

@foreverlms
Copy link

This still exists. Anyone take a look? or change the aten operator of torch cpu backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: norms and normalization module: numerical-reproducibility triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

9 participants