Port CPU torch.orgqr to ATen #50502

IvanYashchuk · 2021-01-13T21:11:03Z

Now we can remove _th_orgqr!

Compared to the original TH-based orgqr, complex (#33152) and batched inputs are now supported.
CUDA support will be added in a follow-up PR.

Closes #24747

Ref. #49421, #42666

native_functions.yml

facebook-github-bot · 2021-01-13T21:11:22Z

💊 CI failures summary and remediations

As of commit 5f67bc9 (more details on the Dr. CI page):

1/1 failures possibly* introduced in this PR
- 1/1 non-CircleCI failure(s)

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

aten/src/ATen/native/BatchLinearAlgebra.cpp

mruberry · 2021-01-14T01:33:53Z

aten/src/ATen/native/BatchLinearAlgebra.cpp

+  auto infos = at::empty({std::max<int64_t>(1, batchCount(input))}, input.options().dtype(kInt).device(kCPU));
+
+  // if result is not empty and not in batched column major format we have to allocate a temporary tensor
+  if (result.numel() != 0 && !result.transpose(-2, -1).is_contiguous()) {


This is really nice.

mruberry · 2021-01-14T01:37:13Z

aten/src/ATen/native/BatchLinearAlgebra.cpp

+
+Tensor orgqr(const Tensor& input, const Tensor& tau) {
+  Tensor result = at::empty({0}, input.options());
+  result = at::orgqr_outf(input, tau, result);


orgqr_outf must be a typo here, right?

No, that's the actual function now.
at::orgqr_outf(input, tau, result) is equivalent to at::orgqr_out(result, input, tau).

mruberry · 2021-01-14T01:41:31Z

test/test_linalg.py

@@ -4049,21 +4049,98 @@ def test_renorm_ps(self, device):

    @onlyCPU
    @skipCPUIfNoLapack
-    def test_orgqr_errors(self, device):
+    @dtypes(torch.float32, torch.float64, torch.complex64, torch.complex128)
+    def test_orgqr(self, device, dtype):


Pretty cool that this was never tested previously.

test/test_linalg.py

mruberry · 2021-01-14T01:44:46Z

test/test_linalg.py

+            actual = torch.orgqr(reflectors, tau)
+            self.assertEqual(expected, actual)
+
+            out = torch.empty_like(A)


Let's add an OpInfo for this function either in the CUDA port PR or after it. Then we won't need this out test, I think.

mruberry

Hey @IvanYashchuk!

This port looks pretty good. @ngimel and I made a few comments.

However, we were looking at this function and wondering if it should just be removed. We could find no uses of it within Facebook. Do you think it's interesting vs. torch.linalg.qr? If we want to support this functionality, maybe we'd offer it as a mode on torch.linalg.qr instead of this cryptically named function.

Looking forward to hearing your thoughts.

EDIT: We preemptively edited the tracking issue #49421 to reflect deprecating and removing, not porting, functions like orgqr. If we decide we don't want to keep it then we can change the issue back.

mruberry · 2021-01-14T02:00:10Z

Some more context on deprecating this function, it is used in one function:

def basis(A):
    """Return orthogonal basis of A columns.
    """
    if A.is_cuda:
        # torch.orgqr is not available in CUDA
        Q, _ = torch.qr(A, some=True)
    else:
        Q = torch.orgqr(*torch.geqrf(A))
    return Q

that appears to be copied around some repos on Github.

IvanYashchuk · 2021-01-14T07:58:08Z

However, we were looking at this function and wondering if it should just be removed. We could find no uses of it within Facebook. Do you think it's interesting vs. torch.linalg.qr? If we want to support this functionality, maybe we'd offer it as a mode on torch.linalg.qr instead of this cryptically named function.

There is a recent request for this function to support CUDA and differentiation #50104. I definitely agree that we shouldn't use lapack's name. I was thinking to introduce a more descriptive name together with the PR for the backward rule, something like householder_orthogonal and householder_orthogonal_mul for torch.ormqr. Maybe this function was not used much exactly because it lacks CUDA and autograd support.

mruberry · 2021-01-14T08:31:16Z

There is a recent request for this function to support CUDA and differentiation #50104. I definitely agree that we shouldn't use lapack's name. I was thinking to introduce a more descriptive name together with the PR for the backward rule, something like householder_orthogonal and householder_orthogonal_mul for torch.ormqr. Maybe this function was not used much exactly because it lacks CUDA and autograd support.

Aha! Thank you for reminding me. We can work on a name improvement like you've suggested in a future PR (using an alias).

header

IvanYashchuk · 2021-01-15T20:06:52Z

CI failures are not related to this PR.
Mobile builds now pass. The problem was in not having the implementation of the templated function apply_orgqr in the header.
@mruberry this is ready for another round. I addressed the comments.

mruberry

Thanks @IvanYashchuk!

mruberry

Thanks @IvanYashchuk!

facebook-github-bot

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

mruberry · 2021-01-20T13:13:47Z

This is hitting a series of errors internally. I suggest we rebase and resubmit it using ci-all, especially since this base looks like it has several failing builds. The failures are the CPU variants of test_orgqr:

caffe2/test:linalg - test_orgqr_cpu_float32 (test_linalg.TestLinalgCPU)
caffe2/test:linalg - test_orgqr_cpu_float64 (test_linalg.TestLinalgCPU)
caffe2/test:linalg - test_orgqr_cpu_complex64 (test_linalg.TestLinalgCPU)
caffe2/test:linalg - test_orgqr_cpu_complex128 (test_linalg.TestLinalgCPU)

Example error output:

RuntimeError: orgqr: Argument 8 has illegal value
  File "/usr/local/fbcode/platform007/lib/python3.7/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/local/fbcode/platform007/lib/python3.7/unittest/case.py", line 628, in run
    testMethod()
  File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/opt/gen/caffe2/test/linalg#binary,link-tree/torch/testing/_internal/common_device_type.py", line 286, in instantiated_test
    raise rte
  File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/opt/gen/caffe2/test/linalg#binary,link-tree/torch/testing/_internal/common_device_type.py", line 281, in instantiated_test
    result = test_fn(self, *args)
  File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/opt/gen/caffe2/test/linalg#binary,link-tree/torch/testing/_internal/common_device_type.py", line 678, in only_fn
    return fn(slf, device, *args, **kwargs)
  File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/opt/gen/caffe2/test/linalg#binary,link-tree/torch/testing/_internal/common_device_type.py", line 572, in dep_fn
    return fn(slf, device, *args, **kwargs)
  File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/opt/gen/caffe2/test/linalg#binary,link-tree/test_linalg.py", line 4100, in test_orgqr
    run_test(shape)
  File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/opt/gen/caffe2/test/linalg#binary,link-tree/test_linalg.py", line 4080, in run_test
    actual = torch.orgqr(reflectors, tau)

IvanYashchuk · 2021-01-20T21:00:21Z

aten/src/ATen/native/BatchLinearAlgebra.h

+  int lwork = -1;
+  scalar_t wkopt;
+  lapackOrgqr<scalar_t>(m, n_columns, k, self_data, lda, tau_data, &wkopt, lwork, &infos_data[0]);
+  lwork = static_cast<int>(real_impl<scalar_t, value_t>(wkopt));


I don't like this error Argument 8 has illegal value. The 8-th argument is lwork, it is an integer for which we should obtain the value from LAPACK.

The size of the work array (lwork≥n).

MKL doc link: https://software.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-fortran/top/lapack-routines/lapack-least-squares-and-eigenvalue-problem-routines/lapack-least-squares-and-eigenvalue-problem-computational-routines/orthogonal-factorizations-lapack-computational-routines/orgqr.html

@mruberry ci-all is all green (except this which seems unrelated).
Could you somehow check what BLAS/LAPACK implementation PyTorch in that failing build is linked to? Is it MKL?torch.__config__.show()

The versioned used internally seems to be much older than the version in CI.

Could it be the change in how the function is called, using m instead of lda? I'll try to debug internally, too, to get a better sense for what's going on. @ngimel points out that torch.linalg.qr must be relying on this same function, so it's surprising we haven't seen this issue previously.

ngimel · 2021-01-22T04:37:51Z

aten/src/ATen/native/BatchLinearAlgebra.cpp

+    at::native::resize_as_(result, input.transpose(-2, -1), MemoryFormat::Contiguous);
+    result.transpose_(-2, -1);
+  }
+


adding early return here

//early return for empty matrices if (result.numel() == 0) { infos.fill_(0); return result; }

fixes internal error. I now wonder how OSS tests pass, because for empty matrix lwork is returned as 0, and that's an illegal value (it should be at least 1)

Reference LAPACK for empty matrices returns lwork as 1
https://github.com/Reference-LAPACK/lapack/blob/master/SRC/dorgqr.f#L193-L196
Maybe the older version of MKL didn't do that and now it does the same as in reference implementation, that's why OSS could be passing.
We haven't seen issues with torch.linalg.qr because the early return is used there.

…uk/port-orgqr

IvanYashchuk · 2021-01-22T12:01:11Z

@mruberry, @ngimel I added an early return for empty matrices. It should fix the internal tests.

facebook-github-bot

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

codecov · 2021-01-22T20:43:47Z

Codecov Report

Merging #50502 (5f67bc9) into master (e34992e) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #50502      +/-   ##
==========================================
+ Coverage   80.99%   81.01%   +0.01%     
==========================================
  Files        1916     1917       +1     
  Lines      209552   209556       +4     
==========================================
+ Hits       169736   169762      +26     
+ Misses      39816    39794      -22

facebook-github-bot · 2021-01-25T11:00:56Z

@mruberry merged this pull request in 627a331.

IvanYashchuk added 8 commits January 10, 2021 21:16

wip porting orgqr to ATen

1b47d8f

Merge remote-tracking branch 'upstream/master' into port-orgqr

496018b

Don't resize infos, fixed a few asserts

e06f813

Added tests

b05d1c6

Allow incorrectly sized out tensors; added more tau input checks

3111406

More tests for errors and warnings

b3df30a

AT_ERROR -> TORCH_CHECK

1dd97ab

Use declared/defined dispatch instead of registering helper function in

b63cf51

native_functions.yml

IvanYashchuk requested a review from mruberry January 13, 2021 21:11

facebook-github-bot added the cla signed label Jan 13, 2021

pytorchbot added the open source label Jan 13, 2021

mruberry requested a review from ngimel January 13, 2021 22:09

mruberry added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 13, 2021