Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch.norm on CPU gives incorrect results for large Tensors #15602

Closed
soumith opened this issue Dec 29, 2018 · 4 comments
Closed

torch.norm on CPU gives incorrect results for large Tensors #15602

soumith opened this issue Dec 29, 2018 · 4 comments
Assignees
Labels
cherry-picked This PR was cherry-picked onto a release branch from master high priority
Milestone

Comments

@soumith
Copy link
Member

soumith commented Dec 29, 2018

🐛 Bug

>>> import torch
>>> x = torch.ones(40000)
>>> torch.norm(x)
tensor(266.0605)
>>> torch.norm(x.to(dtype=torch.float32, device='cuda'))
tensor(200., device='cuda:0')

Originally reported at https://discuss.pytorch.org/t/output-of-torch-norm-x-depends-on-size-of-x-not-in-a-good-way/33299/8

@cpuhrsch
Copy link
Contributor

cpuhrsch commented Jan 9, 2019

cc @xhzhao - I was able to bisect this down to 99c0b96 - do you have time to also take a look?

The script I used to bisect this is

python -c "import torch; import sys; sys.exit(0) if ((2 * torch.norm(torch.ones(10000))) == torch.norm(torch.ones(40000))) else sys.exit(1)"

For now I'll write a PR to revert the optimizations

@cpuhrsch
Copy link
Contributor

cpuhrsch commented Jan 9, 2019

cc @colesbury for ReduceOps

@cpuhrsch
Copy link
Contributor

cpuhrsch commented Jan 9, 2019

@xhzhao - I think you're using at::parallel_reduce incorrectly. Parallel_reduce takes a function that allows you to do a reduction over a section of the input and a function that is able to reduce partial results.

For example, you might have a tensor of 10000 entires and want to sum together all the elements. Parallel_reduce with a grain_size of 2500 will then allocate an intermediate result tensor with 4 elements. Then it will execute the function "f" you provide on each of these chunks of 2500 values, so 0-24999, 2500-4999, etc. It will write out the result from each of these chunks into the intermediate result tensor. After that it'll reduce the partial results from each chunk into a single number using the scalar function sf, which for sum would be "+". This is similar to tbb's approach, where you need to provide a function to accumulate a subrange, a function to combine two partial results and an identity.

In the norm kernel code you also pass this plus operator to combine two partial results. That means, when passing a tensor of 40000 1s, as in the script we use to reproduce this, for the default grain size, parallel_reduce will split the tensor into two chunks 0-32767 and 32768-39999. Your provided partial reduction function then returns the norm for each of these chunks which is 181.019 and 85.0412 respectively. However, combining those two values using add does not give you the correct overall results. You need to specify a different per-scalar combination function such as squaring, summing and then taking the square root again (for the 2-norm), but this will lead to numerical instability depending on the values and number of chunks.

facebook-github-bot pushed a commit that referenced this issue Jan 11, 2019
…5885)

Summary:
See #15602
Pull Request resolved: #15885

Differential Revision: D13614841

Pulled By: cpuhrsch

fbshipit-source-id: 5d3e45f499d36ac287dbbc2e45798aa51eb5bfdf
zdevito pushed a commit to zdevito/ATen that referenced this issue Jan 11, 2019
…5885)

Summary:
See pytorch/pytorch#15602
Pull Request resolved: pytorch/pytorch#15885

Differential Revision: D13614841

Pulled By: cpuhrsch

fbshipit-source-id: 5d3e45f499d36ac287dbbc2e45798aa51eb5bfdf
@soumith
Copy link
Member Author

soumith commented Jan 16, 2019

closed via #15885

@soumith soumith closed this as completed Jan 16, 2019
cpuhrsch added a commit to cpuhrsch/pytorch that referenced this issue Jan 17, 2019
…torch#15885)

Summary:
See pytorch#15602
Pull Request resolved: pytorch#15885

Differential Revision: D13614841

Pulled By: cpuhrsch

fbshipit-source-id: 5d3e45f499d36ac287dbbc2e45798aa51eb5bfdf
@soumith soumith added the cherry-picked This PR was cherry-picked onto a release branch from master label Jan 17, 2019
soumith pushed a commit that referenced this issue Jan 17, 2019
…5885)

Summary:
See #15602
Pull Request resolved: #15885

Differential Revision: D13614841

Pulled By: cpuhrsch

fbshipit-source-id: 5d3e45f499d36ac287dbbc2e45798aa51eb5bfdf
soumith pushed a commit that referenced this issue Jan 18, 2019
…5885)

Summary:
See #15602
Pull Request resolved: #15885

Differential Revision: D13614841

Pulled By: cpuhrsch

fbshipit-source-id: 5d3e45f499d36ac287dbbc2e45798aa51eb5bfdf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cherry-picked This PR was cherry-picked onto a release branch from master high priority
Projects
None yet
Development

No branches or pull requests

2 participants