Fix permuted sum precision issue for lower precision on CPU #108559

CaoE · 2023-09-05T09:23:44Z

Fixes #83149
There is a limitation of TensorIterator reductions:
The non-permuted input tensor will be coalesced down to a 2-d tensor by TensorIterator whereas the permuted case may become a >2d operation (for example, two reduced dimensions and non-reduced dim).
Since the cpu reduction loop of TensorIterator only operates on two dimensions at a time, this means the intermediate sums will be truncated to lower precision.

pytorch-bot · 2023-09-05T09:23:46Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108559

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 9f491ad with merge base 3381f28 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / linux-focal-rocm6.0-py3.8 / test (default, 1, 1, linux.rocm.gpu) (gh)
test_autograd.py::TestAutograd::test_pynode_destruction_deadlock

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jgong5

What's the performance impact? Do we have comparison between fp32 and bf16 with the same inputs?

jgong5 · 2023-09-07T00:39:07Z

test/test_reductions.py

+
+        def helper(self, shape, reduce_dims, device, dtype):
+            permute_list = dim_sequences[len(shape)]
+            random.shuffle(permute_list)


Does it make sense to specify the permutation instead of randomizing it so that we are sure the non-contiguous scenario can always happen?

Use permutations instead of random.shuffle

jgong5 · 2023-09-07T09:15:00Z

aten/src/ATen/native/ReduceOps.cpp

+  if (!at::isReducedFloatingType(iter.common_dtype())) {
+    return false;
+  }
+  if (ndim < 2 || iter.noutputs() != 1) {


it happens with ndim >= 3 so should check ndim < 3 here?

Thanks for the comment. Fixed.

aten/src/ATen/native/ReduceOps.cpp

jgong5

The changes LGTM now, still stamp if no performance impact.

aten/src/ATen/native/ReduceOps.cpp

github-actions · 2023-12-06T13:33:41Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

mingfeima

Could you please update the root cause of the issue in the pr comment as well.

CaoE · 2024-03-04T02:55:08Z

@peterbell10 Could you please help review this PR ? Thanks.

CaoE · 2024-03-04T02:55:37Z

@mruberry Could you please help review this PR ? Thanks.

peterbell10 · 2024-03-04T17:14:44Z

aten/src/ATen/native/ReduceOps.cpp

+    // See https://github.com/pytorch/pytorch/issues/83149
+    if (should_use_acc_buffer(iter)) {
+      auto tmp_output = at::empty(result.sizes(), result.options().dtype(kFloat));
+      at::sum_outf(self.to(ScalarType::Float), opt_dim, keepdim, /*dtype=*/c10::nullopt, tmp_output);


Note that in my original comment I suggested adding mixed dtype kernels with half precision as the input and float as the output, like we have on CUDA. This is okay for now though I guess. It just won't perform as well.

CaoE · 2024-03-06T00:59:15Z

@pytorchbot merge

pytorchmergebot · 2024-03-06T01:01:07Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchbot added the open source label Sep 5, 2023

CaoE force-pushed the ecao/fix_reduce2 branch 2 times, most recently from 213911d to ae9d9f8 Compare September 6, 2023 03:46

CaoE added topic: not user facing topic category module: bfloat16 module: half Related to float16 half-precision floats ciflow/trunk Trigger trunk jobs on your pull request ciflow/mps Run MPS tests (subset of trunk) labels Sep 6, 2023

CaoE changed the title ~~Fix non-contiguous sum precision issue for BFloat16 on CPU~~ Fix permuted sum precision issue for lower precision on CPU Sep 6, 2023

CaoE force-pushed the ecao/fix_reduce2 branch from ae9d9f8 to 1604dec Compare September 6, 2023 03:57

CaoE requested a review from mingfeima September 6, 2023 03:58

CaoE force-pushed the ecao/fix_reduce2 branch from 1604dec to 9859973 Compare September 6, 2023 13:07

CaoE requested a review from jgong5 September 6, 2023 13:07

CaoE force-pushed the ecao/fix_reduce2 branch from 9859973 to 6f8180f Compare September 7, 2023 06:26

jgong5 reviewed Sep 7, 2023

View reviewed changes

CaoE force-pushed the ecao/fix_reduce2 branch from 7682040 to 8cc4a4e Compare September 7, 2023 10:19

jgong5 reviewed Sep 8, 2023

View reviewed changes

aten/src/ATen/native/ReduceOps.cpp Show resolved Hide resolved

CaoE force-pushed the ecao/fix_reduce2 branch from 4b7f769 to 4a843e8 Compare October 7, 2023 06:24

github-actions bot added the Stale label Dec 6, 2023

github-actions bot closed this Jan 5, 2024

CaoE reopened this Feb 1, 2024

CaoE force-pushed the ecao/fix_reduce2 branch from 4a843e8 to 9be3b64 Compare February 1, 2024 09:00

mingfeima approved these changes Feb 20, 2024

View reviewed changes

mingfeima marked this pull request as ready for review February 20, 2024 05:36

mingfeima requested a review from mruberry as a code owner February 20, 2024 05:36

CaoE requested a review from peterbell10 March 4, 2024 02:54

peterbell10 approved these changes Mar 4, 2024

View reviewed changes

CaoE added 4 commits March 4, 2024 18:18

fix permuted sum precision issue for lower precision on CPU

15555d3

fix ndim check

33265ea

use permutations instead of random.shuffle

60a40ee

add non-contiguous cases

9f491ad

CaoE force-pushed the ecao/fix_reduce2 branch from 9be3b64 to 9f491ad Compare March 5, 2024 02:20

pytorchmergebot added the merging label Mar 6, 2024

pytorchmergebot added the Merged label Mar 6, 2024

pytorchmergebot closed this in 412c687 Mar 6, 2024

pytorchmergebot removed the merging label Mar 6, 2024

Fix permuted sum precision issue for lower precision on CPU #108559

Fix permuted sum precision issue for lower precision on CPU #108559

Uh oh!

Conversation

CaoE commented Sep 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/108559

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

jgong5 left a comment

Choose a reason for hiding this comment

Uh oh!

jgong5 Sep 7, 2023

Choose a reason for hiding this comment

Uh oh!

CaoE Sep 8, 2023

Choose a reason for hiding this comment

Uh oh!

jgong5 Sep 7, 2023

Choose a reason for hiding this comment

Uh oh!

CaoE Sep 7, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jgong5 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Dec 6, 2023

Uh oh!

mingfeima left a comment

Choose a reason for hiding this comment

Uh oh!

CaoE commented Mar 4, 2024

Uh oh!

CaoE commented Mar 4, 2024

Uh oh!

peterbell10 Mar 4, 2024

Choose a reason for hiding this comment

Uh oh!

CaoE commented Mar 6, 2024

Uh oh!

pytorchmergebot commented Mar 6, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

CaoE commented Sep 5, 2023 •

edited

Loading

pytorch-bot bot commented Sep 5, 2023 •

edited

Loading