Implement torch.linalg.svd #45562

antocuni · 2020-09-30T09:56:12Z

This is related to #42666 .
I am opening this PR to have the opportunity to discuss things.
First, we need to consider the differences between torch.svd and numpy.linalg.svd:

torch.svd takes some=True, while numpy.linalg.svd takes full_matrices=True, which is effectively the opposite (and with the opposite default, too!)
torch.svd returns (U, S, V), while numpy.linalg.svd returns (U, S, VT) (i.e., V transposed).
torch.svd always returns a 3-tuple; numpy.linalg.svd returns only S in case compute_uv==False
numpy.linalg.svd also takes an optional hermitian=False argument.

I think that the plan is to eventually deprecate torch.svd in favor of torch.linalg.svd, so this PR does the following:

Rename/adapt the old svd C++ functions into linalg_svd: in particular, now linalg_svd takes full_matrices and returns VT
Re-implement the old C++ interface on top of the new (by negating full_matrices and transposing VT).
The C++ version of linalg_svd always returns a 3-tuple (we can't do anything else). So, there is a python wrapper which manually calls torch._C._linalg.linalg_svd to tweak the return value in case compute_uv==False.

Currently, linalg_svd_backward is broken because it has not been adapted yet after the V ==> VT change, but before continuing and spending more time on it I wanted to make sure that the general approach is fine.

…nt svd as a tiny wrapper around it

…which is the OPPOSITE (and with a different default) thatn the old some=true. The signature of at::svd is unchanged

… for it

…agam expect VT to be in column-major order, so what we were passing around was effectively V, not VT. - Fix the comment and actually create a column-major version of VT, which we can return directly from linalg_svd - kill the .t() from python's version of linalg.svd(), since we are directly getting VT from C++ now - call VT.transpose_() inside the legacy version of svd, to keep backwards compatibilty This is WIP because autograd is still broken

…ave been modified

dr-ci · 2020-09-30T10:03:24Z

💊 CI failures summary and remediations

As of commit 81510d5 (more details on the Dr. CI page):

1/2 failures possibly* introduced in this PR
- 1/1 non-CircleCI failure(s)
1/2 broken upstream at merge base 321b988 from Jan 06 until Jan 07

1 job timed out:

pytorch_linux_bionic_py3_8_gcc9_coverage_test1

🚧 1 fixed upstream failure:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

If your commit is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Check out the recency history of this "viable master" tracking branch.

pytorch_linux_bionic_py3_8_gcc9_coverage_test1 from Jan 06 until Jan 07 (11cdb91 - 321b988)
- 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This comment has been revised 232 times.

mruberry · 2020-10-05T05:35:52Z

Thanks for your thoughtful write-up of the issues, @antocuni, and sorry I couldn't respond earlier. Also, please reach out on Slack if you'd like to have a more direct discussion of these issues.

One design principle I'd like us to consider is that functions should always return the same type. PyTorch already fails to do this in some cases. For example, we have max(tensor), which returns a tensor, but then max(tensor, 1) returns a tuple of the reduced tensor and the indices of the maximum elements in the original tensor. I think it'd be clearer if the latter function had a name like max_with_indices or the max function always returned a tuple.

In this case, compute_uv is the input giving us a problem. Setting compute_uv=False seems very rare, so I don't want to optimize for that case, either. How does following the Torch convention of always returning a tuple of three tensors sound? The current torch.svd() will return tensors U and V filled with zeros when compute_uv=False. I'm not totally sure why it does that instead of returning None or empty tensors, however. Do you have an idea for what those tensors should be when compute_uv=False?

As for deprecating torch.svd(), let's just focus on implementing torch.linalg.svd() in a NumPy-compatible way (except when compute_uv=False) for the moment.

What are your thoughts, @antocuni?

mruberry · 2020-10-05T07:30:35Z

I just updated the torch.linalg.norm tracking issue (although it needs review from the Quansight team) and pulled some SVD-related issues. A couple that seem relevant are:

#45821
#41306

Do we want to address #45821 while implementing a NumPy-compatible signature? Also, is this a good opportunity to compare the current CUDA implementation with an alternative implementation, like reusing the CPU implementation?

antocuni · 2020-10-06T10:17:55Z

One design principle I'd like us to consider is that functions should always return the same type. PyTorch already fails to do this in some cases. For example, we have max(tensor), which returns a tensor, but then max(tensor, 1) returns a tuple of the reduced tensor and the indices of the maximum elements in the original tensor. I think it'd be clearer if the latter function had a name like max_with_indices or the max function always returned a tuple.

I agree in general: return types should not depend on the values of the arguments.

However, for the specific case at hand, we need to decide whether we value more API-consistency (including consistency between C++ and Python) or compatibility with numpy. I seemed to understand that the goal was to be drop-in compatible to numpy, isn't it the case?

It's worth to note that since we are moving svd to a different namespace there are no BC issues and we can accommodate the signature freely, but if we make a mistake it will be hard to change our mind later.

In this case, compute_uv is the input giving us a problem. Setting compute_uv=False seems very rare, so I don't want to optimize for that case, either. How does following the Torch convention of always returning a tuple of three tensors sound? The current torch.svd() will return tensors U and V filled with zeros when compute_uv=False. I'm not totally sure why it does that instead of returning None or empty tensors, however. Do you have an idea for what those tensors should be when compute_uv=False?

Looking at the code, I think that the decision to return 0-filled tensors was for convenience of the implementation. The C++ return type is std::tuple<Tensor, Tensor, Tensor>, and I don't think there is a way to set one item to "None/NULL", is there?
The closest alternative would have been to return an undefined tensor, but they were already defined so it was probably easier to just fill them with 0.

From the programmer point of view, returning a tuple (None, S, None) is better, because it ensures that I cannot use the zero-filled U and V by mistake, which might happen now if I pass the wrong value for compute_uv. Also note that this is a concrete risk, since torch.svd(some=True) has a different default and opposite meaning than torch.linalg.svd(compute_uv=True), so there is a concrete risk of people making mistakes if they want to migrate from the former to the latter.

Re #45821: yes, it seems a good idea to update the docs while we are here.

Re #41306: I would personally leave it for another PR, to make this more self-contained and to reduce the risk of breaking stuff.

mruberry · 2020-10-06T16:38:26Z

One design principle I'd like us to consider is that functions should always return the same type. PyTorch already fails to do this in some cases. For example, we have max(tensor), which returns a tensor, but then max(tensor, 1) returns a tuple of the reduced tensor and the indices of the maximum elements in the original tensor. I think it'd be clearer if the latter function had a name like max_with_indices or the max function always returned a tuple.

I agree in general: return types should not depend on the values of the arguments.

However, for the specific case at hand, we need to decide whether we value more API-consistency (including consistency between C++ and Python) or compatibility with numpy. I seemed to understand that the goal was to be drop-in compatible to numpy, isn't it the case?

Being NumPy-compatible is a goal, but we also have UX goals. For example, we recently discovered that in NumPy functions and their method variants can have different behavior, but a UX goal of PyTorch is that they always behave identically. In this case I think it's more important we be consistent with the return type than be NumPy-compatible. I could be mistaken.

It's worth to note that since we are moving svd to a different namespace there are no BC issues and we can accommodate the signature freely, but if we make a mistake it will be hard to change our mind later.

In this case, compute_uv is the input giving us a problem. Setting compute_uv=False seems very rare, so I don't want to optimize for that case, either. How does following the Torch convention of always returning a tuple of three tensors sound? The current torch.svd() will return tensors U and V filled with zeros when compute_uv=False. I'm not totally sure why it does that instead of returning None or empty tensors, however. Do you have an idea for what those tensors should be when compute_uv=False?

Looking at the code, I think that the decision to return 0-filled tensors was for convenience of the implementation. The C++ return type is std::tuple<Tensor, Tensor, Tensor>, and I don't think there is a way to set one item to "None/NULL", is there?
The closest alternative would have been to return an undefined tensor, but they were already defined so it was probably easier to just fill them with 0.

From the programmer point of view, returning a tuple (None, S, None) is better, because it ensures that I cannot use the zero-filled U and V by mistake, which might happen now if I pass the wrong value for compute_uv. Also note that this is a concrete risk, since torch.svd(some=True) has a different default and opposite meaning than torch.linalg.svd(compute_uv=True), so there is a concrete risk of people making mistakes if they want to migrate from the former to the latter.

Let's return None in Python for them and empty tensors in C++?

Re #45821: yes, it seems a good idea to update the docs while we are here.

Cool.

Re #41306: I would personally leave it for another PR, to make this more self-contained and to reduce the risk of breaking stuff.

Sounds good.

antocuni · 2020-10-06T17:06:08Z

Being NumPy-compatible is a goal, but we also have UX goals. For example, we recently discovered that in NumPy functions and their method variants can have different behavior, but a UX goal of PyTorch is that they always behave identically. In this case I think it's more important we be consistent with the return type than be NumPy-compatible. I could be mistaken.

sounds good to me. I also suggest to document differences w.r.t. numpy in the docs.

If we want to be super-nice to the users, we could have a global flag which enables runtime warnings for the cases in which pytorch behaves differently than numpy. Not sure if (1) this is a goal and (2) should be part of the PR, though.

Let's return None in Python for them and empty tensors in C++?

+1.
By "empty tensor" you mean an undefined tensor (i.e. Tensor()), or something more specific?

mruberry · 2020-10-06T18:37:35Z

Being NumPy-compatible is a goal, but we also have UX goals. For example, we recently discovered that in NumPy functions and their method variants can have different behavior, but a UX goal of PyTorch is that they always behave identically. In this case I think it's more important we be consistent with the return type than be NumPy-compatible. I could be mistaken.

sounds good to me. I also suggest to document differences w.r.t. numpy in the docs.

Great idea.

If we want to be super-nice to the users, we could have a global flag which enables runtime warnings for the cases in which pytorch behaves differently than numpy. Not sure if (1) this is a goal and (2) should be part of the PR, though.

Definitely not part of this PR. There are some functions which are divergent from NumPy and we might eventually offer more compatible alternatives for them, but not at the moment.

Let's return None in Python for them and empty tensors in C++?

+1.

By "empty tensor" you mean an undefined tensor (i.e. Tensor()), or something more specific?

How about at::empty({0}, ...) with options derived from the self tensor? So like, if the input is on cpu double and compute_uv=False then you'll get two empty double tensors on the cpu as part of the output.

Total aside and not for this PR: eventually we should think about making these return types named tuples so users can more readably access the results.

…implement linalg_svd on top of it

…n empty tensors in C++ and None in Python

mruberry · 2020-10-12T10:09:07Z

Hey @antocuni, just an fyi that I was reviewing a similar issue for another operator, and it looks like the standard we're going to go with for these return values is a named tuple with empty tensors in both Python and C++ for the tensors that aren't computed, plus a docs warning that this behavior is likely to change in the future. I'll also file an issue soon to discuss this behavior and request that we add support for, in Python, only returning the tensors that are computed.

antocuni · 2020-10-12T14:34:10Z

Hey @antocuni, just an fyi that I was reviewing a similar issue for another operator, and it looks like the standard we're going to go with for these return values is a named tuple with empty tensors in both Python and C++

ok thanks, I'll update the code to reflect that. Do you have a link to that issue?

…h#45821

antocuni · 2020-10-12T15:43:08Z

also, what do we want to do with the hermitian=False parameter? Quoting the numpy docs:

    hermitian : bool, optional
        If True, `a` is assumed to be Hermitian (symmetric if real-valued),
        enabling a more efficient method for finding singular values.
        Defaults to False.

I think we have at least three options:

we simply don't support this parameter for now
we add the parameter but ignore it (since it's just an optimization)
2b. same as 2, but we emit a run-time warning if you pass =True
we implement it immediatly.

mruberry · 2020-10-12T16:02:55Z

Hey @antocuni, just an fyi that I was reviewing a similar issue for another operator, and it looks like the standard we're going to go with for these return values is a named tuple with empty tensors in both Python and C++

ok thanks, I'll update the code to reflect that. Do you have a link to that issue?

Just created #46187. We actually decided to go with returning None instead of empty tensors, but we haven't worked out how this translates from C++ yet.

mruberry · 2020-10-12T16:04:16Z

also, what do we want to do with the hermitian=False parameter? Quoting the numpy docs:
    hermitian : bool, optional
        If True, `a` is assumed to be Hermitian (symmetric if real-valued),
        enabling a more efficient method for finding singular values.
        Defaults to False.
I think we have at least three options:

we simply don't support this parameter for now

Sounds great. It won't be BC-breaking to add because it (1) appears last in the param list and (2) has a default value. If that wasn't true then I'd suggest we add it to the signature now.

antocuni · 2020-10-12T17:06:38Z

Just created #46187. We actually decided to go with returning None instead of empty tensors, but we haven't worked out how this translates from C++ yet.

Ok, then I'll go for None in Python / empty tensor in C++ for now.

Sounds great. It won't be BC-breaking to add because it (1) appears last in the param list and (2) has a default value. If that wasn't true then I'd suggest we add it to the signature now.

+1

…es Vh, so we need to transpose AND conjugate. OTOH, linalg.svd returns Vh, so nothing is needed in that case

codecov · 2020-12-23T17:57:57Z

Codecov Report

Merging #45562 (24c6506) into master (6e6231f) will decrease coverage by 0.00%.
The diff coverage is 97.87%.

@@            Coverage Diff             @@
##           master   #45562      +/-   ##
==========================================
- Coverage   80.67%   80.67%   -0.01%     
==========================================
  Files        1899     1899              
  Lines      206066   206104      +38     
==========================================
+ Hits       166241   166265      +24     
- Misses      39825    39839      +14

antocuni · 2020-12-23T18:05:24Z

@mruberry it seems that all CI tests have passed, so the PR is ready for a new review round

aten/src/ATen/native/BatchLinearAlgebra.cpp

mruberry · 2020-12-28T16:52:56Z

aten/src/ATen/native/BatchLinearAlgebra.cpp

  Tensor U_tmp, S_tmp, VT_tmp;
-  std::tie(U_tmp, S_tmp, VT_tmp) = at::_svd_helper(self, some, compute_uv);
+  std::tie(U_tmp, S_tmp, VT_tmp) = at::linalg_svd(self, full_matrices, compute_uv);
  U.resize_as_(U_tmp).copy_(U_tmp);


See note on error-checking out above. Here we should require same dtype and same device for now, and use resize_output instead of resize_as_.

done in commit e483647.
I noticed that resize_output doesn't check the device, so I wrote the check manually: so, what is the convention for output tensors? Is it generally allowed to pass output tensors on different devices? If not, why the check is not done by resize_output itself?

This is an excellent question. I actually wrote a brief description of out= handling in the Developer FAQ: https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch. I think it's currently a gap in the tools PyTorch provides that we require some operations to implement this check manually.

@ezyang is actually developing a new architecture that I think solves this issue. Maybe we should extend resize_output, too. Currently it doesn't have the device information necessary to implement this check. @heitorschueroff actually had that same idea recently.

thank you for the answer, I have a better picture now

torch/_torch_docs.py

torch/linalg/__init__.py

mruberry

This looks great, @antocuni! I made a few small notes and this needs a rebase. Once that happens ping me and we'll get this merged!

antocuni · 2021-01-05T15:13:47Z

This looks great, @antocuni! I made a few small notes and this needs a rebase. Once that happens ping me and we'll get this merged!

@mruberry I addressed the remaining issues. Tests are running, hopefully they will pass

mruberry · 2021-01-06T14:25:34Z

Both failures are in upstream. I think we're gtg!

facebook-github-bot

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

antocuni · 2021-01-07T14:25:18Z

@mruberry there was a conflict due to 6643e9f. I updated this PR with the fix (tests still running at this point)

facebook-github-bot

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-01-08T15:11:54Z

@mruberry merged this pull request in 5c5abd5.

Summary: This is related to pytorch#42666 . I am opening this PR to have the opportunity to discuss things. First, we need to consider the differences between `torch.svd` and `numpy.linalg.svd`: 1. `torch.svd` takes `some=True`, while `numpy.linalg.svd` takes `full_matrices=True`, which is effectively the opposite (and with the opposite default, too!) 2. `torch.svd` returns `(U, S, V)`, while `numpy.linalg.svd` returns `(U, S, VT)` (i.e., V transposed). 3. `torch.svd` always returns a 3-tuple; `numpy.linalg.svd` returns only `S` in case `compute_uv==False` 4. `numpy.linalg.svd` also takes an optional `hermitian=False` argument. I think that the plan is to eventually deprecate `torch.svd` in favor of `torch.linalg.svd`, so this PR does the following: 1. Rename/adapt the old `svd` C++ functions into `linalg_svd`: in particular, now `linalg_svd` takes `full_matrices` and returns `VT` 2. Re-implement the old C++ interface on top of the new (by negating `full_matrices` and transposing `VT`). 3. The C++ version of `linalg_svd` *always* returns a 3-tuple (we can't do anything else). So, there is a python wrapper which manually calls `torch._C._linalg.linalg_svd` to tweak the return value in case `compute_uv==False`. Currently, `linalg_svd_backward` is broken because it has not been adapted yet after the `V ==> VT` change, but before continuing and spending more time on it I wanted to make sure that the general approach is fine. Pull Request resolved: pytorch#45562 Reviewed By: H-Huang Differential Revision: D25803557 Pulled By: mruberry fbshipit-source-id: 4966f314a0ba2ee391bab5cda4563e16275ce91f

antocuni added 8 commits September 28, 2020 11:05

linalg.svd, step 1: rename the old svd into linalg_svd, and reimpleme…

3923d4e

…nt svd as a tiny wrapper around it

rename svd_backward into linalg_svd_backward, for consistency

130cd04

change the signature of linalg_svd: now it takes full_matrices=true, …

f39dc15

…which is the OPPOSITE (and with a different default) thatn the old some=true. The signature of at::svd is unchanged

add a test for torch.linalg.svd, and write a numpy-compatible wrapper…

e16fde1

… for it

we can't use transpose_(), else autograd complains that the results h…

675e122

…ave been modified

add a TODO

91ec980

use ! instead of not

5886d58

pytorchbot added the open source label Sep 30, 2020

antocuni mentioned this pull request Sep 30, 2020

torch.linalg in PyTorch 1.10 tracker #42666

Closed

4 tasks

antocuni added 3 commits October 9, 2020 05:22

partially undo commit 3923d4e: keep at::svd as the main function and …

f763c8d

…implement linalg_svd on top of it

change the return type of linalg_svd(..., compute_uv=False): we retur…

b319768

…n empty tensors in C++ and None in Python

fix flake8

1817211

antocuni mentioned this pull request Oct 12, 2020

Singular value decomposition of complex valued matrices does not match mathematical definition #45821

Closed

fix the docstring of svd, according to the discussion in issue pytorc…

97662da

…h#45821

antocuni added 2 commits October 15, 2020 09:49

write the docstring for linalg.svd

25caa20

fix for the complex case: torch.svd should return V but lapack comput…

f056875

…es Vh, so we need to transpose AND conjugate. OTOH, linalg.svd returns Vh, so nothing is needed in that case