Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port CPU torch.orgqr to ATen #50502

Closed
wants to merge 19 commits into from
Closed

Conversation

IvanYashchuk
Copy link
Collaborator

Now we can remove _th_orgqr!

Compared to the original TH-based orgqr, complex (#33152) and batched inputs are now supported.
CUDA support will be added in a follow-up PR.

Closes #24747

Ref. #49421, #42666

@IvanYashchuk IvanYashchuk added module: porting Issues related to porting TH/THNN legacy to ATen native better-engineering Relatively self-contained tasks for better engineering contributors module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul labels Jan 13, 2021
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jan 13, 2021

💊 CI failures summary and remediations

As of commit 5f67bc9 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

@mruberry mruberry requested a review from ngimel January 13, 2021 22:09
@mruberry mruberry added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 13, 2021
auto infos = at::empty({std::max<int64_t>(1, batchCount(input))}, input.options().dtype(kInt).device(kCPU));

// if result is not empty and not in batched column major format we have to allocate a temporary tensor
if (result.numel() != 0 && !result.transpose(-2, -1).is_contiguous()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice.


Tensor orgqr(const Tensor& input, const Tensor& tau) {
Tensor result = at::empty({0}, input.options());
result = at::orgqr_outf(input, tau, result);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orgqr_outf must be a typo here, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's the actual function now.
at::orgqr_outf(input, tau, result) is equivalent to at::orgqr_out(result, input, tau).

@@ -4049,21 +4049,98 @@ def test_renorm_ps(self, device):

@onlyCPU
@skipCPUIfNoLapack
def test_orgqr_errors(self, device):
@dtypes(torch.float32, torch.float64, torch.complex64, torch.complex128)
def test_orgqr(self, device, dtype):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty cool that this was never tested previously.

test/test_linalg.py Outdated Show resolved Hide resolved
actual = torch.orgqr(reflectors, tau)
self.assertEqual(expected, actual)

out = torch.empty_like(A)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add an OpInfo for this function either in the CUDA port PR or after it. Then we won't need this out test, I think.

Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @IvanYashchuk!

This port looks pretty good. @ngimel and I made a few comments.

However, we were looking at this function and wondering if it should just be removed. We could find no uses of it within Facebook. Do you think it's interesting vs. torch.linalg.qr? If we want to support this functionality, maybe we'd offer it as a mode on torch.linalg.qr instead of this cryptically named function.

Looking forward to hearing your thoughts.

EDIT: We preemptively edited the tracking issue #49421 to reflect deprecating and removing, not porting, functions like orgqr. If we decide we don't want to keep it then we can change the issue back.

@mruberry
Copy link
Collaborator

Some more context on deprecating this function, it is used in one function:

def basis(A):
    """Return orthogonal basis of A columns.
    """
    if A.is_cuda:
        # torch.orgqr is not available in CUDA
        Q, _ = torch.qr(A, some=True)
    else:
        Q = torch.orgqr(*torch.geqrf(A))
    return Q

that appears to be copied around some repos on Github.

@IvanYashchuk
Copy link
Collaborator Author

However, we were looking at this function and wondering if it should just be removed. We could find no uses of it within Facebook. Do you think it's interesting vs. torch.linalg.qr? If we want to support this functionality, maybe we'd offer it as a mode on torch.linalg.qr instead of this cryptically named function.

There is a recent request for this function to support CUDA and differentiation #50104. I definitely agree that we shouldn't use lapack's name. I was thinking to introduce a more descriptive name together with the PR for the backward rule, something like householder_orthogonal and householder_orthogonal_mul for torch.ormqr. Maybe this function was not used much exactly because it lacks CUDA and autograd support.

@mruberry
Copy link
Collaborator

There is a recent request for this function to support CUDA and differentiation #50104. I definitely agree that we shouldn't use lapack's name. I was thinking to introduce a more descriptive name together with the PR for the backward rule, something like householder_orthogonal and householder_orthogonal_mul for torch.ormqr. Maybe this function was not used much exactly because it lacks CUDA and autograd support.

Aha! Thank you for reminding me. We can work on a name improvement like you've suggested in a future PR (using an alias).

@IvanYashchuk IvanYashchuk force-pushed the port-orgqr branch 2 times, most recently from 511f4dd to 20b0654 Compare January 15, 2021 13:43
@IvanYashchuk
Copy link
Collaborator Author

CI failures are not related to this PR.
Mobile builds now pass. The problem was in not having the implementation of the templated function apply_orgqr in the header.
@mruberry this is ready for another round. I addressed the comments.

@mruberry mruberry self-requested a review January 19, 2021 13:56
@mruberry mruberry changed the title Port torch.orgqr to ATen Port CPU torch.orgqr to ATen Jan 19, 2021
Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @IvanYashchuk!

Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @IvanYashchuk!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mruberry
Copy link
Collaborator

This is hitting a series of errors internally. I suggest we rebase and resubmit it using ci-all, especially since this base looks like it has several failing builds. The failures are the CPU variants of test_orgqr:

caffe2/test:linalg - test_orgqr_cpu_float32 (test_linalg.TestLinalgCPU)
caffe2/test:linalg - test_orgqr_cpu_float64 (test_linalg.TestLinalgCPU)
caffe2/test:linalg - test_orgqr_cpu_complex64 (test_linalg.TestLinalgCPU)
caffe2/test:linalg - test_orgqr_cpu_complex128 (test_linalg.TestLinalgCPU)

Example error output:

RuntimeError: orgqr: Argument 8 has illegal value
  File "/usr/local/fbcode/platform007/lib/python3.7/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/local/fbcode/platform007/lib/python3.7/unittest/case.py", line 628, in run
    testMethod()
  File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/opt/gen/caffe2/test/linalg#binary,link-tree/torch/testing/_internal/common_device_type.py", line 286, in instantiated_test
    raise rte
  File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/opt/gen/caffe2/test/linalg#binary,link-tree/torch/testing/_internal/common_device_type.py", line 281, in instantiated_test
    result = test_fn(self, *args)
  File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/opt/gen/caffe2/test/linalg#binary,link-tree/torch/testing/_internal/common_device_type.py", line 678, in only_fn
    return fn(slf, device, *args, **kwargs)
  File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/opt/gen/caffe2/test/linalg#binary,link-tree/torch/testing/_internal/common_device_type.py", line 572, in dep_fn
    return fn(slf, device, *args, **kwargs)
  File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/opt/gen/caffe2/test/linalg#binary,link-tree/test_linalg.py", line 4100, in test_orgqr
    run_test(shape)
  File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/opt/gen/caffe2/test/linalg#binary,link-tree/test_linalg.py", line 4080, in run_test
    actual = torch.orgqr(reflectors, tau)

Comment on lines +67 to +70
int lwork = -1;
scalar_t wkopt;
lapackOrgqr<scalar_t>(m, n_columns, k, self_data, lda, tau_data, &wkopt, lwork, &infos_data[0]);
lwork = static_cast<int>(real_impl<scalar_t, value_t>(wkopt));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry ci-all is all green (except this which seems unrelated).
Could you somehow check what BLAS/LAPACK implementation PyTorch in that failing build is linked to? Is it MKL?torch.__config__.show()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The versioned used internally seems to be much older than the version in CI.

Could it be the change in how the function is called, using m instead of lda? I'll try to debug internally, too, to get a better sense for what's going on. @ngimel points out that torch.linalg.qr must be relying on this same function, so it's surprising we haven't seen this issue previously.

at::native::resize_as_(result, input.transpose(-2, -1), MemoryFormat::Contiguous);
result.transpose_(-2, -1);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding early return here

  //early return for empty matrices
  if (result.numel() == 0) {
    infos.fill_(0);
    return result;
  }

fixes internal error. I now wonder how OSS tests pass, because for empty matrix lwork is returned as 0, and that's an illegal value (it should be at least 1)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference LAPACK for empty matrices returns lwork as 1
https://github.com/Reference-LAPACK/lapack/blob/master/SRC/dorgqr.f#L193-L196
Maybe the older version of MKL didn't do that and now it does the same as in reference implementation, that's why OSS could be passing.
We haven't seen issues with torch.linalg.qr because the early return is used there.

@IvanYashchuk
Copy link
Collaborator Author

@mruberry, @ngimel I added an early return for empty matrices. It should fix the internal tests.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@codecov
Copy link

codecov bot commented Jan 22, 2021

Codecov Report

Merging #50502 (5f67bc9) into master (e34992e) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #50502      +/-   ##
==========================================
+ Coverage   80.99%   81.01%   +0.01%     
==========================================
  Files        1916     1917       +1     
  Lines      209552   209556       +4     
==========================================
+ Hits       169736   169762      +26     
+ Misses      39816    39794      -22     

@facebook-github-bot
Copy link
Contributor

@mruberry merged this pull request in 627a331.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
better-engineering Relatively self-contained tasks for better engineering contributors cla signed Merged module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul module: porting Issues related to porting TH/THNN legacy to ATen native open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate orgqr from the TH to Aten (CPU)
5 participants