Port cholesky_inverse to ATen #50269

IvanYashchuk · 2021-01-08T10:38:56Z

Now we can remove _th_potri!

Compared to the original TH-based cholesky_inverse, complex (#33152) and batched inputs (#7500) are now supported both on CPU and CUDA.

Closes #24685.
Closes #24543.

Ref. #49421, #42666

Use Tensor of ints for 'infos' instead of std::vector

…erse

facebook-github-bot · 2021-01-08T10:39:14Z

💊 CI failures summary and remediations

As of commit f117e84 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

IvanYashchuk · 2021-01-08T11:23:42Z

Benchmarks for torch.float64:
Before this PR:

[-------------------- cholesky_inverse (TH) torch.float64 --------------------]
                              |  cholesky_inverse CUDA  |  cholesky_inverse CPU
 --------------------------------------------------------------------
      torch.Size([32, 32])    |          5198.2         |            7.4       
      torch.Size([512, 512])  |          8669.2         |         5555.0

After:

[---------------------- cholesky_inverse (ATen) torch.float64 -----------------------]
                                   |  cholesky_inverse CUDA  |  cholesky_inverse CPU
 -------------------------------------------------------------------------
      torch.Size([32, 32])         |           5402.7        |            16.2      
      torch.Size([512, 512])       |           6949.7        |          7204.2   

      torch.Size([1, 32, 32])      |             84.1        |            16.4      
      torch.Size([32, 32, 32])     |             87.7        |           241.9      
      torch.Size([256, 32, 32])    |            191.9        |          1627.3         
      torch.Size([1, 512, 512])    |           2491.6        |          7307.1      
      torch.Size([32, 512, 512])   |          48022.1        |        180274.9      
      torch.Size([256, 512, 512])  |         375196.1        |       1432507.8      

Times are in microseconds (us).

mruberry · 2021-01-11T16:03:28Z

So this PR pretty significantly regresses performance on both CUDA and CPU? That's worrying. What do you think is happening and can we address this?

cc @ngimel

mruberry · 2021-01-25T11:09:37Z

aten/src/ATen/native/BatchLinearAlgebra.cpp

+  TORCH_CHECK(result.device() == input.device(),
+    "result device ", result.device(), " does not match input device ", input.device());
+
+  // Single matrix MAGMA routine requires 'infos' to reside in CPU memory,


Wait -- doesn't this PR implemented batched CUDA, though?

Oh, this comment is a bit outdated. I need to change it. At that time I thought that batched MAGMA routine would require infos to live on GPU.
Now CUDA path is implemented in terms of MAGMA's apply_cholesky_solve. It is different from other batched functions in that it doesn't take an array of ints infos argument, it doesn't raise any errors related to the algorithm. It returns only a single integer to tell whether all passed arguments were ok or not.
If cuSOLVER would be used instead in the future, then infos would need to be created on GPU.

I will change it to:

if input on CPU then we need infos of size batchsize(input) to fill with error codes from LAPACK for each matrix in the batched tensor.

if input on GPU then we need only one integer living on CPU for storing error code from MAGMA

It also means that for some inputs CPU could raise an error while GPU would output something because the operations used are different.

It also means that for some inputs CPU could raise an error while GPU would output something because the operations used are different.

Are there some specific cases you're thinking of?

Checked the docs. It will happen for inputs with zero on the diagonal. CPU version raises an error with the index of the zero diagonal element, while CUDA gives inf. I need to add this to tests.

In [1]: import torch In [2]: a = torch.randn(2, 2) In [3]: a Out[3]: tensor([[-0.6556, -0.4479], [-0.9347, 0.8169]]) In [4]: a[1, 1] = 0 In [5]: a Out[5]: tensor([[-0.6556, -0.4479], [-0.9347, 0.0000]]) In [6]: torch.cholesky_inverse(a) --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-6-680561e9e67c> in <module> ----> 1 torch.cholesky_inverse(a) RuntimeError: Lapack Error potri : A(2,2) is 0, A cannot be factorized at ../aten/src/TH/generic/THTensorLapack.cpp:245 In [7]: torch.cholesky_solve(torch.eye(2), a) Out[7]: tensor([[inf, -inf], [-inf, inf]]) In [8]: torch.cholesky_solve(torch.eye(2, device='cuda'), a.cuda()) Out[8]: tensor([[inf, -inf], [-inf, inf]], device='cuda:0')

Testing (and, in the near future, documenting) this behavior sounds great.

mruberry

Great work, @IvanYashchuk. There are a couple follow-ups (adjustment based on future OpInfo fixes, docs) as @anjali411 points out, but I think they're separable. I've added this to the list of operators to review in our 1.8 scrub.

When you're happy with this PR ping me and let's merge it.

facebook-github-bot

@anjali411 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

anjali411 · 2021-01-27T04:40:52Z

@IvanYashchuk please rebase the PR and let me know once it's ready for merge.

…erse

IvanYashchuk · 2021-01-27T15:50:17Z

@anjali411 I resolved the conflict.

facebook-github-bot

@anjali411 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-01-29T00:28:12Z

@anjali411 merged this pull request in 6e4746c.

dncliss · 2021-02-02T14:50:30Z

After this issue was merged, the builds on ppc64le platform started failing again, and for the same reason they had
been failing just recently before #51217 was merged shortly before this. That merge fixed the issue, and then this merge
introduced it again.
I believe it probably simply means that this feature was missing an equivalent line in aten/src/ATen/native/BatchLinearAlgebraKernel.cpp at about line 177 .... that most likely the following line should be present (compare to the REGISTER_VSX_DISPATCH lines shortly below this point):

REGISTER_VSX_DISPATCH(cholesky_inverse_stub, &cholesky_inverse_kernel_impl);

anjali411 · 2021-02-02T15:04:16Z

@dncliss thanks for reporting the issue, and yeah the fix should be similar. @IvanYashchuk could you create a fix for this?

dncliss · 2021-02-02T15:14:26Z

@anjali411 Thanks for confirming. I didn't open a new issue (I figure maybe @IvanYashchuk can just push the 1-line fix) but if you want it reported that way I could do so.

For reference, the regular ppc64le builds seen here: https://powerci.osuosl.org/job/pytorch-master-nightly-py3-linux-ppc64le-gpu/ older than build 1047 had the error fixed by PR 51217, then 1047 itself was a successful build, and 1048 onward introduced the failure after the merging of this cholesky_inverse feature PR. The error, as before, is an "undefined reference" during linking.

IvanYashchuk · 2021-02-02T15:22:10Z

@anjali411, @dncliss I submitted the fix. Sorry for the trouble!

Summary: It was overlooked that vsx dispatch is also needed for cholesky_inverse cpu dispatch. See #50269 (comment) Pull Request resolved: #51562 Reviewed By: H-Huang Differential Revision: D26199581 Pulled By: anjali411 fbshipit-source-id: 5d02c6da52ce1d2e9e26001f5d4648a71dd0e829

…puts. (#69069) Summary: While implementing #68720, We found out empirically that `torch.cholesky_inverse` support batched inputs, but it is not explained in doc: [link](#68720 (review)) `torch.cholesky_inverse` is implemented in #50269 and the doc was updated at #31275 but not merged. neerajprad Pull Request resolved: #69069 Reviewed By: mrshenli Differential Revision: D32979362 Pulled By: neerajprad fbshipit-source-id: 0967c969434ce6e0ab15889c240149c23c0bce44

…puts. (#69069) Summary: While implementing #68720, We found out empirically that `torch.cholesky_inverse` support batched inputs, but it is not explained in doc: [link](#68720 (review)) `torch.cholesky_inverse` is implemented in #50269 and the doc was updated at #31275 but not merged. neerajprad Reviewed By: mrshenli Differential Revision: D32979362 Pulled By: neerajprad fbshipit-source-id: 0967c969434ce6e0ab15889c240149c23c0bce44 [ghstack-poisoned]

…puts. (#69069) Summary: While implementing #68720, We found out empirically that `torch.cholesky_inverse` support batched inputs, but it is not explained in doc: [link](#68720 (review)) `torch.cholesky_inverse` is implemented in #50269 and the doc was updated at #31275 but not merged. neerajprad Reviewed By: mrshenli Differential Revision: D32979362 Pulled By: neerajprad fbshipit-source-id: 0967c969434ce6e0ab15889c240149c23c0bce44

…puts. (#69069) Summary: While implementing #68720, We found out empirically that `torch.cholesky_inverse` support batched inputs, but it is not explained in doc: [link](#68720 (review)) `torch.cholesky_inverse` is implemented in #50269 and the doc was updated at #31275 but not merged. neerajprad Pull Request resolved: #69069 Reviewed By: mrshenli Differential Revision: D32979362 Pulled By: neerajprad fbshipit-source-id: 0967c969434ce6e0ab15889c240149c23c0bce44

IvanYashchuk added 19 commits December 21, 2020 20:53

Ported cholesky_inverse to ATen; tests pass

da9a254

Renamed self -> input

3c95d86

Make _out variant to be the primary one

4486c37

Use Tensor of ints for 'infos' instead of std::vector

Merge remote-tracking branch 'upstream/master' into port-cholesky-inv…

2afc8ec

…erse

Corrected comments

c06c5a7

Added other dtypes to test_cholesky_inverse

b9da168

Use CPU, CUDA instead of DefaultBackend

39a7dbf

Added path for batched cuda to use cholesky_solve

06f02e1

Moved cholesky_solve workaround to cholesky_inverse_out

d9bf828

Added missing .conj

84732f7

Fixed checking for cuda device

5cfdf52

Add missing upper to cholesky_solve_out call

69429c9

Fix lda argument for cholesky_solve

634f53f

Updated tests

68de787

Removed unused import

b7b482f

Added op-based tests for cholesky_inverse

747fd71

Use c10 full dispatcher

78ef953

Merge remote-tracking branch 'upstream/master' into port-cholesky-inv…

db1602c

…erse

Allow non column major out

f1ed1bc

IvanYashchuk added module: porting Issues related to porting TH/THNN legacy to ATen native module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul labels Jan 8, 2021

IvanYashchuk requested a review from mruberry January 8, 2021 10:38

facebook-github-bot added the cla signed label Jan 8, 2021

IvanYashchuk mentioned this pull request Jan 8, 2021

Complex Numbers Support #33152

Closed

pytorchbot added the open source label Jan 8, 2021

Fix mypy failure

962a302

gchanan added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 8, 2021

Added check_batched_gradgrad=False

703d490

mruberry reviewed Jan 25, 2021

View reviewed changes

mruberry approved these changes Jan 25, 2021

View reviewed changes

facebook-github-bot reviewed Jan 25, 2021

View reviewed changes

IvanYashchuk added 3 commits January 27, 2021 09:46

Updated the comment on infos' device

3807d28

Added test case for non-invertible input

682f80a

Merge remote-tracking branch 'upstream/master' into port-cholesky-inv…

f117e84

…erse

facebook-github-bot reviewed Jan 27, 2021

View reviewed changes

facebook-github-bot closed this in 6e4746c Jan 29, 2021

facebook-github-bot added the Merged label Jan 29, 2021

dncliss mentioned this pull request Feb 2, 2021

add mising VSX dispatches #51217

Closed

IvanYashchuk mentioned this pull request Feb 2, 2021

Added missing VSX dispatch for cholesky_inverse #51562

Closed

IvanYashchuk mentioned this pull request Feb 12, 2021

Fixed out= variant of linalg.inv #51977

Closed

ngimel mentioned this pull request May 6, 2021

Roll-up: remaining TH functions #49421

Closed

14 tasks

neerajprad mentioned this pull request Nov 29, 2021

Debug positive definite constraints #68720

Closed

nonconvexopt mentioned this pull request Nov 30, 2021

Extend explanation of torch.cholesky_inverse to consider batched inputs. #69069

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port cholesky_inverse to ATen #50269

Port cholesky_inverse to ATen #50269

IvanYashchuk commented Jan 8, 2021

facebook-github-bot commented Jan 8, 2021 •

edited

IvanYashchuk commented Jan 8, 2021 •

edited

mruberry commented Jan 11, 2021

mruberry Jan 25, 2021

IvanYashchuk Jan 25, 2021

mruberry Jan 25, 2021

IvanYashchuk Jan 25, 2021

mruberry Jan 25, 2021

mruberry left a comment

facebook-github-bot left a comment

anjali411 commented Jan 27, 2021

IvanYashchuk commented Jan 27, 2021

facebook-github-bot left a comment

facebook-github-bot commented Jan 29, 2021

dncliss commented Feb 2, 2021

anjali411 commented Feb 2, 2021

dncliss commented Feb 2, 2021

IvanYashchuk commented Feb 2, 2021

Port cholesky_inverse to ATen #50269

Port cholesky_inverse to ATen #50269

Conversation

IvanYashchuk commented Jan 8, 2021

facebook-github-bot commented Jan 8, 2021 • edited

💊 CI failures summary and remediations

IvanYashchuk commented Jan 8, 2021 • edited

mruberry commented Jan 11, 2021

mruberry Jan 25, 2021

Choose a reason for hiding this comment

IvanYashchuk Jan 25, 2021

Choose a reason for hiding this comment

mruberry Jan 25, 2021

Choose a reason for hiding this comment

IvanYashchuk Jan 25, 2021

Choose a reason for hiding this comment

mruberry Jan 25, 2021

Choose a reason for hiding this comment

mruberry left a comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

anjali411 commented Jan 27, 2021

IvanYashchuk commented Jan 27, 2021

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jan 29, 2021

dncliss commented Feb 2, 2021

anjali411 commented Feb 2, 2021

dncliss commented Feb 2, 2021

IvanYashchuk commented Feb 2, 2021

facebook-github-bot commented Jan 8, 2021 •

edited

IvanYashchuk commented Jan 8, 2021 •

edited