Migrate renorm to ATen (CPU and CUDA) #59250

peterbell10 · 2021-06-01T15:18:31Z

Resubmit of #59108, closes #24754, closes #24616

This reuses linalg_vector_norm to calculate the norms. I just add a new kernel that turns the norm into a normalization factor, then multiply the original tensor using a normal broadcasted mul operator. The result is less code, and better performance to boot.

Benchmarks (CPU):

Shape	Dim	Before	After (1 thread)	After (8 threads)
(10, 10, 10)	0	11.6 us	4.2 us	4.2 us
	1	14.3 us	5.2 us	5.2 us
	2	12.7 us	4.6 us	4.6 us
(50, 50, 50)	0	330 us	120 us	24.4 us
	1	350 us	135 us	28.2 us
	2	417 us	130 us	24.4 us

Benchmarks (CUDA)

Shape	Dim	Before	After
(10, 10, 10)	0	12.5 us	12.1 us
	1	13.1 us	12.2 us
	2	13.1 us	11.8 us
(50, 50, 50)	0	33.7 us	11.6 us
	1	36.5 us	15.8 us
	2	41.1 us	15 us

facebook-github-bot · 2021-06-01T15:18:37Z

💊 CI failures summary and remediations

As of commit e4b4f8f (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_xla_linux_bionic_py3_6_clang9_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jun 03 00:57:31 [ FAILED ] AtenXlaTensorTest.TestBitwiseAndPromotion

Jun 03 00:57:31 [----------] 1 test from XlaUtilCacheTest (0 ms total)
Jun 03 00:57:31 
Jun 03 00:57:31 [----------] Global test environment tear-down
Jun 03 00:57:31 [==========] 592 tests from 8 test suites ran. (468723 ms total)
Jun 03 00:57:31 [  PASSED  ] 588 tests.
Jun 03 00:57:31 [  SKIPPED ] 1 test, listed below:
Jun 03 00:57:31 [  SKIPPED ] AtenXlaTensorTest.TestGroupNormBackward
Jun 03 00:57:31 [  FAILED  ] 3 tests, listed below:
Jun 03 00:57:31 [  FAILED  ] AtenXlaTensorTest.TestBitwiseAnd
Jun 03 00:57:31 [  FAILED  ] AtenXlaTensorTest.TestBitwiseAndScalar
Jun 03 00:57:31 [  FAILED  ] AtenXlaTensorTest.TestBitwiseAndPromotion
Jun 03 00:57:31 
Jun 03 00:57:31  3 FAILED TESTS
Jun 03 00:57:31 + cleanup
Jun 03 00:57:31 + retcode=1
Jun 03 00:57:31 + set +x
Jun 03 00:57:31 =================== sccache compilation log ===================
Jun 03 00:57:31 =========== If your build fails, please take a look at the log above for possible reasons ===========
Jun 03 00:57:31 Compile requests                      0
Jun 03 00:57:31 Compile requests executed             0
Jun 03 00:57:31 Cache hits                            0

XLA failure

Job pytorch_xla_linux_bionic_py3_6_clang9_test is failing. Please create an issue with title prefixed by [PT_BREAK] in pytorch/xla and link to to this PR. If you have questions, please reach out to @ailzhang / @dlibenzi / @JackCaoG.

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

peterbell10 · 2021-06-01T15:21:14Z

aten/src/ATen/native/Normalization.cpp

+  if (acc_type != dtype) {
+    norm = at::linalg_vector_norm(self, p.toDouble(), reduce_dims,
+                                  /*keepdim=*/true, /*dtype=*/acc_type);
+  } else {
+    norm = at::linalg_vector_norm(self, p.toDouble(), reduce_dims,
+                                  /*keepdim=*/true);
+  }


For complex types, the input is cast to dtype and the output has type dtype so it has to be complex. However, if I leave the argument out entirely, the result is real. Seems like an odd way for this to work.

cc @kurtamohler for norm dtype behavior. Should we disallow specifying non-complex dtype for complex? I believe there was a warning about that at some point. But then, it doesn't allow users to specify different precision if the input is complex.

I agree that it's odd behavior. The explanation is that the input is converted to dtype before the norm is calculated, so if we allowed a non-complex dtype arg when the input is complex, we would be discarding the imaginary part. So at the moment we throw an error in that case: https://github.com/pytorch/pytorch/blob/afdfd2288ab/aten/src/ATen/native/LinearAlgebra.cpp#L2271

Maybe the result should always be real though, even if dtype is complex. For complex outputs, the imaginary part is always going to be 0.

If the result was always real then norm(tensor, ..., dtype=dtype) would behaave just like norm(tensor.to(dtype), ...). I think that's reasonable.

facebook-github-bot · 2021-06-02T03:19:54Z

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ngimel · 2021-06-02T06:41:33Z

torch/csrc/autograd/FunctionsManual.cpp

+    norm = at::linalg_vector_norm(
+        self, p, reduce_dims, /*keepdim=*/true);
+  }
+  auto grad_output = (self.conj() * grad).sum(


out of curiosity, is discarding imaginary part expected here?

Yes. The norm is real-valued so its gradient should be as well. In fact the gradcheck tests fail if I don't intentionally cast to real.

But /*dtype=*/c10::toValueType(acc_type) will always be throwing warnings when input is converted to real, even though that's what we want. Maybe (self.conj() * grad).real().contiguous().sum(...)? (.contiguous is optional, on cpu I suspect discontiguous reduction will be very bad, on the gpu on the other hand .contiguous() + reduction will be slower than just reduction, and in any case, people don't seem to care about renorm backward performance).

Maybe (self.conj() * grad).real().contiguous().sum(...)?

The awkward thing is that at::real isn't a no-op for real types. It throws an error so you can't write a one-liner. But if this way creates a warning then I guess there's no ignoring it.

facebook-github-bot · 2021-06-02T20:38:19Z

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-06-03T18:44:41Z

@ngimel merged this pull request in 6408cbd.

Summary: Resubmit of pytorch#59108, closes pytorch#24754, closes pytorch#24616 This reuses `linalg_vector_norm` to calculate the norms. I just add a new kernel that turns the norm into a normalization factor, then multiply the original tensor using a normal broadcasted `mul` operator. The result is less code, and better performance to boot. #### Benchmarks (CPU): | Shape | Dim | Before | After (1 thread) | After (8 threads) | |:------------:|:---:|--------:|-----------------:|------------------:| | (10, 10, 10) | 0 | 11.6 us | 4.2 us | 4.2 us | | | 1 | 14.3 us | 5.2 us | 5.2 us | | | 2 | 12.7 us | 4.6 us | 4.6 us | | (50, 50, 50) | 0 | 330 us | 120 us | 24.4 us | | | 1 | 350 us | 135 us | 28.2 us | | | 2 | 417 us | 130 us | 24.4 us | #### Benchmarks (CUDA) | Shape | Dim | Before | After | |:------------:|:---:|--------:|--------:| | (10, 10, 10) | 0 | 12.5 us | 12.1 us | | | 1 | 13.1 us | 12.2 us | | | 2 | 13.1 us | 11.8 us | | (50, 50, 50) | 0 | 33.7 us | 11.6 us | | | 1 | 36.5 us | 15.8 us | | | 2 | 41.1 us | 15 us | Pull Request resolved: pytorch#59250 Reviewed By: mruberry Differential Revision: D28820359 Pulled By: ngimel fbshipit-source-id: 572486adabac8135d52a9b8700f9d145c2a4ed45

peterbell10 added 4 commits June 1, 2021 13:19

Migrate renorm to ATen (CPU and CUDA)

9d7a84a

Calculate half input norms in float precision

5e24e97

Fix includes

997dce3

Update renorm OpInfo and add complex support

8e89da8

peterbell10 added the open source label Jun 1, 2021

peterbell10 requested a review from ngimel June 1, 2021 15:18

peterbell10 requested review from albanD, ezyang and soulitzer as code owners June 1, 2021 15:18

facebook-github-bot added the cla signed label Jun 1, 2021

peterbell10 commented Jun 1, 2021

View reviewed changes

ezyang removed their request for review June 1, 2021 17:36

Add missing cpp file to android build

e397ada

ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 1, 2021

albanD removed their request for review June 1, 2021 20:59

ngimel approved these changes Jun 2, 2021

View reviewed changes

ngimel reviewed Jun 2, 2021

View reviewed changes

Silence complex-to-real conversion warning

e4b4f8f

facebook-github-bot closed this in 6408cbd Jun 3, 2021

facebook-github-bot added the Merged label Jun 3, 2021

This was referenced Jun 7, 2021

Add gradcheck tolerance for renorm #59557

Closed

renorm is failing for slow gradcheck #59584

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate renorm to ATen (CPU and CUDA) #59250

Migrate renorm to ATen (CPU and CUDA) #59250

peterbell10 commented Jun 1, 2021

facebook-github-bot commented Jun 1, 2021 •

edited

peterbell10 Jun 1, 2021 •

edited

ngimel Jun 1, 2021

kurtamohler Jun 1, 2021

kurtamohler Jun 1, 2021

peterbell10 Jun 1, 2021

facebook-github-bot commented Jun 2, 2021

ngimel Jun 2, 2021

peterbell10 Jun 2, 2021 •

edited

ngimel Jun 2, 2021

peterbell10 Jun 2, 2021

facebook-github-bot commented Jun 2, 2021

facebook-github-bot commented Jun 3, 2021

Migrate renorm to ATen (CPU and CUDA) #59250

Migrate renorm to ATen (CPU and CUDA) #59250

Conversation

peterbell10 commented Jun 1, 2021

Benchmarks (CPU):

Benchmarks (CUDA)

facebook-github-bot commented Jun 1, 2021 • edited

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_xla_linux_bionic_py3_6_clang9_test (1/1)

XLA failure

peterbell10 Jun 1, 2021 • edited

Choose a reason for hiding this comment

ngimel Jun 1, 2021

Choose a reason for hiding this comment

kurtamohler Jun 1, 2021

Choose a reason for hiding this comment

kurtamohler Jun 1, 2021

Choose a reason for hiding this comment

peterbell10 Jun 1, 2021

Choose a reason for hiding this comment

facebook-github-bot commented Jun 2, 2021

ngimel Jun 2, 2021

Choose a reason for hiding this comment

peterbell10 Jun 2, 2021 • edited

Choose a reason for hiding this comment

ngimel Jun 2, 2021

Choose a reason for hiding this comment

peterbell10 Jun 2, 2021

Choose a reason for hiding this comment

facebook-github-bot commented Jun 2, 2021

facebook-github-bot commented Jun 3, 2021

facebook-github-bot commented Jun 1, 2021 •

edited

peterbell10 Jun 1, 2021 •

edited

peterbell10 Jun 2, 2021 •

edited