Fix consistentcy of histc on CPU and CUDA #87832

Aidyn-A · 2022-10-27T00:24:53Z

The main reason why histc returns slightly different outputs is the difference on how bin position is calculate.
The CPU calculates it as:

pytorch/aten/src/ATen/native/cpu/HistogramKernel.cpp

Lines 168 to 170 in 449778a

    
           pos = static_cast<int64_t>((elt - leftmost_edge[dim]) 
        
                   / (rightmost_edge[dim] - leftmost_edge[dim]) 
        
                   * (num_bin_edges[dim] - 1));

which is basically (i - a) / (b - a) * N, while cuda code

pytorch/aten/src/ATen/native/cuda/SummaryOps.cu

Line 41 in 449778a

IndexType bin = (int)(((bVal - minvalue)) * nbins / (maxvalue - minvalue));

which is (i - a) * N / (b - a).

For some cases like in #87657 the order of arithmetic operations matters due to the floating point round-off.

Not sure where would be the most appropriate place to put the unit test. Hope test_reductions::test_histc will do.

cc @VitalyFedyunin @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

pytorch-bot · 2022-10-27T00:24:55Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/87832

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit e187347:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

soumith · 2022-10-27T02:40:03Z

great, if you fix lint we can merge it.

ngimel · 2022-10-27T06:37:41Z

test/test_reductions.py

@@ -2843,6 +2843,13 @@ def test_against_np(tensor, bins=100, min=0, max=0):
        expanded = torch.randn(1, 5, 1, 2, device=device).expand(3, 5, 7, 2)
        test_against_np(expanded)

+        if torch.cuda.is_available():


would test_against_np catch it instead?

Unfortunately, there is a discrepancy with numpy for this input tensor.
The code

import torch import numpy as np def fn(x): return torch.histc(x, bins=10, min=0, max=0.99) def fn_np(x): return torch.from_numpy(np.histogram(x.numpy(), bins=10, range=(0, 0.99))[0]) x = torch.linspace(0, 0.99, 1001, dtype=torch.float32) torch.histc(x, bins=10, min=0, max=0.99) print(fn(x)) print(fn(x.cuda())) print(fn_np(x))

Produces:

tensor([101., 99., 101., 99., 101., 99., 100., 100., 100., 101.]) tensor([101., 99., 101., 99., 101., 99., 100., 100., 100., 101.], device='cuda:0') tensor([101, 99, 100, 100, 100, 100, 100, 100, 100, 101])

Is NumPy's histogram more accurate? Making histc consistent on CPU and CUDA is nice, but is the histc algorithm we're using actually correct?

NumPy might be more accurate because it uses direct comparison https://github.com/numpy/numpy/blob/13d55a3c2f016a58a6e9d6b8086f338e07c7478f/numpy/lib/histograms.py#L862-L866 to find out if the value is within the range or not. However, I noticed that it struggles with this case too as it produces different results for float32 and float64.

Instead of conditioning on CUDA being available, just compare the result to the expected outcome of

tensor([101., 99., 101., 99., 101., 99., 100., 100., 100., 101.])

directly, and add a comment that NumPy produces a different result

If you change boundary computation to what cpu is doing now (instead of to what cuda is doing) would the results match?

Yes, the results will exactly match if I slightly change the input tensor or histogram boundaries by a machine epsilon.

mruberry · 2022-10-27T19:46:42Z

One test tweak needed (see comment inline) and then this should be OK. I'd really like to get the CUDA version of histogramdd added, so we can consistently direct people to that. It's more accurate and more consistent with NumPy.

kit1980 · 2022-11-18T02:02:28Z

@pytorchbot rebase

kit1980 · 2022-11-18T02:02:53Z

@Aidyn-A should we merge this now?

pytorchmergebot · 2022-11-18T02:04:52Z

@pytorchbot successfully started a rebase job. Check the current status here

pytorchmergebot · 2022-11-18T02:04:57Z

Successfully rebased fix_consistency_between_cpu_and_cuda_for_histc onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout fix_consistency_between_cpu_and_cuda_for_histc && git pull --rebase)

Aidyn-A · 2022-11-18T02:10:40Z

@kit1980 yes, we can merge it now.
@pytorchbot merge -g

pytorchmergebot · 2022-11-18T02:12:13Z

Merge started

Your change will be merged once all checks on your PR pass since you used the green (-g) flag (ETA: 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fixes pytorch#87657 The main reason why `histc` returns slightly different outputs is the difference on how bin position is calculate. The CPU calculates it as: https://github.com/pytorch/pytorch/blob/449778a939f2adc8867c5035b08be4e2d88339d8/aten/src/ATen/native/cpu/HistogramKernel.cpp#L168-L170 which is basically `(i - a) / (b - a) * N`, while cuda code https://github.com/pytorch/pytorch/blob/449778a939f2adc8867c5035b08be4e2d88339d8/aten/src/ATen/native/cuda/SummaryOps.cu#L41 which is `(i - a) * N / (b - a)`. For some cases like in pytorch#87657 the order of arithmetic operations matters due to the floating point round-off. ________________ Not sure where would be the most appropriate place to put the unit test. Hope `test_reductions::test_histc` will do. Pull Request resolved: pytorch#87832 Approved by: https://github.com/soumith

Aidyn-A requested review from mruberry and ngimel as code owners October 27, 2022 00:24

pytorchbot added the open source label Oct 27, 2022

soumith approved these changes Oct 27, 2022

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 27, 2022

ngimel reviewed Oct 27, 2022

View reviewed changes

Aidyn-A added 5 commits November 18, 2022 02:04

fix bin calculation in histogramdd_cpu_contiguous

bef61cf

lint

1297e70

update test

2b5c3d0

update test

774955e

update test

e187347

pytorchmergebot force-pushed the fix_consistency_between_cpu_and_cuda_for_histc branch from ad890b0 to e187347 Compare November 18, 2022 02:05

github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Nov 18, 2022

pytorchmergebot added the Merged label Nov 18, 2022

pytorchmergebot closed this in 3bc7829 Nov 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix consistentcy of histc on CPU and CUDA #87832

Fix consistentcy of histc on CPU and CUDA #87832

Aidyn-A commented Oct 27, 2022 •

edited by pytorch-bot bot

pytorch-bot bot commented Oct 27, 2022 •

edited

soumith commented Oct 27, 2022

ngimel Oct 27, 2022

Aidyn-A Oct 27, 2022

mruberry Oct 27, 2022

Aidyn-A Oct 27, 2022 •

edited

mruberry Oct 27, 2022

ngimel Oct 27, 2022

Aidyn-A Oct 28, 2022

mruberry commented Oct 27, 2022

kit1980 commented Nov 18, 2022

kit1980 commented Nov 18, 2022

pytorchmergebot commented Nov 18, 2022

pytorchmergebot commented Nov 18, 2022

Aidyn-A commented Nov 18, 2022

pytorchmergebot commented Nov 18, 2022

	pos = static_cast<int64_t>((elt - leftmost_edge[dim])
	/ (rightmost_edge[dim] - leftmost_edge[dim])
	* (num_bin_edges[dim] - 1));

Fix consistentcy of histc on CPU and CUDA #87832

Fix consistentcy of histc on CPU and CUDA #87832

Conversation

Aidyn-A commented Oct 27, 2022 • edited by pytorch-bot bot

pytorch-bot bot commented Oct 27, 2022 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/87832

⏳ No Failures, 1 Pending

soumith commented Oct 27, 2022

ngimel Oct 27, 2022

Choose a reason for hiding this comment

Aidyn-A Oct 27, 2022

Choose a reason for hiding this comment

mruberry Oct 27, 2022

Choose a reason for hiding this comment

Aidyn-A Oct 27, 2022 • edited

Choose a reason for hiding this comment

mruberry Oct 27, 2022

Choose a reason for hiding this comment

ngimel Oct 27, 2022

Choose a reason for hiding this comment

Aidyn-A Oct 28, 2022

Choose a reason for hiding this comment

mruberry commented Oct 27, 2022

kit1980 commented Nov 18, 2022

kit1980 commented Nov 18, 2022

pytorchmergebot commented Nov 18, 2022

pytorchmergebot commented Nov 18, 2022

Aidyn-A commented Nov 18, 2022

pytorchmergebot commented Nov 18, 2022

Merge started

Aidyn-A commented Oct 27, 2022 •

edited by pytorch-bot bot

pytorch-bot bot commented Oct 27, 2022 •

edited

Aidyn-A Oct 27, 2022 •

edited