Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement torch.linalg.svd #45562

Closed
wants to merge 75 commits into from
Closed

Conversation

antocuni
Copy link
Contributor

This is related to #42666 .
I am opening this PR to have the opportunity to discuss things.
First, we need to consider the differences between torch.svd and numpy.linalg.svd:

  1. torch.svd takes some=True, while numpy.linalg.svd takes full_matrices=True, which is effectively the opposite (and with the opposite default, too!)

  2. torch.svd returns (U, S, V), while numpy.linalg.svd returns (U, S, VT) (i.e., V transposed).

  3. torch.svd always returns a 3-tuple; numpy.linalg.svd returns only S in case compute_uv==False

  4. numpy.linalg.svd also takes an optional hermitian=False argument.

I think that the plan is to eventually deprecate torch.svd in favor of torch.linalg.svd, so this PR does the following:

  1. Rename/adapt the old svd C++ functions into linalg_svd: in particular, now linalg_svd takes full_matrices and returns VT

  2. Re-implement the old C++ interface on top of the new (by negating full_matrices and transposing VT).

  3. The C++ version of linalg_svd always returns a 3-tuple (we can't do anything else). So, there is a python wrapper which manually calls torch._C._linalg.linalg_svd to tweak the return value in case compute_uv==False.

Currently, linalg_svd_backward is broken because it has not been adapted yet after the V ==> VT change, but before continuing and spending more time on it I wanted to make sure that the general approach is fine.

…which is the OPPOSITE (and with a different default) thatn the old some=true. The signature of at::svd is unchanged
…agam expect VT to be in column-major order, so what we were passing around was effectively V, not VT.

- Fix the comment and actually create a column-major version of VT, which we can return directly from linalg_svd

- kill the .t() from python's version of linalg.svd(), since we are directly getting VT from C++ now

- call VT.transpose_() inside the legacy version of svd, to keep backwards compatibilty

This is WIP because autograd is still broken
@dr-ci
Copy link

dr-ci bot commented Sep 30, 2020

💊 CI failures summary and remediations

As of commit 81510d5 (more details on the Dr. CI page):


  • 1/2 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)
  • 1/2 broken upstream at merge base 321b988 from Jan 06 until Jan 07

1 job timed out:

  • pytorch_linux_bionic_py3_8_gcc9_coverage_test1

🚧 1 fixed upstream failure:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

If your commit is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Check out the recency history of this "viable master" tracking branch.


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This comment has been revised 232 times.

@mruberry
Copy link
Collaborator

mruberry commented Oct 5, 2020

Thanks for your thoughtful write-up of the issues, @antocuni, and sorry I couldn't respond earlier. Also, please reach out on Slack if you'd like to have a more direct discussion of these issues.

One design principle I'd like us to consider is that functions should always return the same type. PyTorch already fails to do this in some cases. For example, we have max(tensor), which returns a tensor, but then max(tensor, 1) returns a tuple of the reduced tensor and the indices of the maximum elements in the original tensor. I think it'd be clearer if the latter function had a name like max_with_indices or the max function always returned a tuple.

In this case, compute_uv is the input giving us a problem. Setting compute_uv=False seems very rare, so I don't want to optimize for that case, either. How does following the Torch convention of always returning a tuple of three tensors sound? The current torch.svd() will return tensors U and V filled with zeros when compute_uv=False. I'm not totally sure why it does that instead of returning None or empty tensors, however. Do you have an idea for what those tensors should be when compute_uv=False?

As for deprecating torch.svd(), let's just focus on implementing torch.linalg.svd() in a NumPy-compatible way (except when compute_uv=False) for the moment.

What are your thoughts, @antocuni?

@mruberry
Copy link
Collaborator

mruberry commented Oct 5, 2020

I just updated the torch.linalg.norm tracking issue (although it needs review from the Quansight team) and pulled some SVD-related issues. A couple that seem relevant are:

#45821
#41306

Do we want to address #45821 while implementing a NumPy-compatible signature? Also, is this a good opportunity to compare the current CUDA implementation with an alternative implementation, like reusing the CPU implementation?

@antocuni
Copy link
Contributor Author

antocuni commented Oct 6, 2020

One design principle I'd like us to consider is that functions should always return the same type. PyTorch already fails to do this in some cases. For example, we have max(tensor), which returns a tensor, but then max(tensor, 1) returns a tuple of the reduced tensor and the indices of the maximum elements in the original tensor. I think it'd be clearer if the latter function had a name like max_with_indices or the max function always returned a tuple.

I agree in general: return types should not depend on the values of the arguments.

However, for the specific case at hand, we need to decide whether we value more API-consistency (including consistency between C++ and Python) or compatibility with numpy. I seemed to understand that the goal was to be drop-in compatible to numpy, isn't it the case?

It's worth to note that since we are moving svd to a different namespace there are no BC issues and we can accommodate the signature freely, but if we make a mistake it will be hard to change our mind later.

In this case, compute_uv is the input giving us a problem. Setting compute_uv=False seems very rare, so I don't want to optimize for that case, either. How does following the Torch convention of always returning a tuple of three tensors sound? The current torch.svd() will return tensors U and V filled with zeros when compute_uv=False. I'm not totally sure why it does that instead of returning None or empty tensors, however. Do you have an idea for what those tensors should be when compute_uv=False?

Looking at the code, I think that the decision to return 0-filled tensors was for convenience of the implementation. The C++ return type is std::tuple<Tensor, Tensor, Tensor>, and I don't think there is a way to set one item to "None/NULL", is there?
The closest alternative would have been to return an undefined tensor, but they were already defined so it was probably easier to just fill them with 0.

From the programmer point of view, returning a tuple (None, S, None) is better, because it ensures that I cannot use the zero-filled U and V by mistake, which might happen now if I pass the wrong value for compute_uv. Also note that this is a concrete risk, since torch.svd(some=True) has a different default and opposite meaning than torch.linalg.svd(compute_uv=True), so there is a concrete risk of people making mistakes if they want to migrate from the former to the latter.

Re #45821: yes, it seems a good idea to update the docs while we are here.

Re #41306: I would personally leave it for another PR, to make this more self-contained and to reduce the risk of breaking stuff.

@mruberry
Copy link
Collaborator

mruberry commented Oct 6, 2020

One design principle I'd like us to consider is that functions should always return the same type. PyTorch already fails to do this in some cases. For example, we have max(tensor), which returns a tensor, but then max(tensor, 1) returns a tuple of the reduced tensor and the indices of the maximum elements in the original tensor. I think it'd be clearer if the latter function had a name like max_with_indices or the max function always returned a tuple.

I agree in general: return types should not depend on the values of the arguments.

However, for the specific case at hand, we need to decide whether we value more API-consistency (including consistency between C++ and Python) or compatibility with numpy. I seemed to understand that the goal was to be drop-in compatible to numpy, isn't it the case?

Being NumPy-compatible is a goal, but we also have UX goals. For example, we recently discovered that in NumPy functions and their method variants can have different behavior, but a UX goal of PyTorch is that they always behave identically. In this case I think it's more important we be consistent with the return type than be NumPy-compatible. I could be mistaken.

It's worth to note that since we are moving svd to a different namespace there are no BC issues and we can accommodate the signature freely, but if we make a mistake it will be hard to change our mind later.

In this case, compute_uv is the input giving us a problem. Setting compute_uv=False seems very rare, so I don't want to optimize for that case, either. How does following the Torch convention of always returning a tuple of three tensors sound? The current torch.svd() will return tensors U and V filled with zeros when compute_uv=False. I'm not totally sure why it does that instead of returning None or empty tensors, however. Do you have an idea for what those tensors should be when compute_uv=False?

Looking at the code, I think that the decision to return 0-filled tensors was for convenience of the implementation. The C++ return type is std::tuple<Tensor, Tensor, Tensor>, and I don't think there is a way to set one item to "None/NULL", is there?
The closest alternative would have been to return an undefined tensor, but they were already defined so it was probably easier to just fill them with 0.

From the programmer point of view, returning a tuple (None, S, None) is better, because it ensures that I cannot use the zero-filled U and V by mistake, which might happen now if I pass the wrong value for compute_uv. Also note that this is a concrete risk, since torch.svd(some=True) has a different default and opposite meaning than torch.linalg.svd(compute_uv=True), so there is a concrete risk of people making mistakes if they want to migrate from the former to the latter.

Let's return None in Python for them and empty tensors in C++?

Re #45821: yes, it seems a good idea to update the docs while we are here.

Cool.

Re #41306: I would personally leave it for another PR, to make this more self-contained and to reduce the risk of breaking stuff.

Sounds good.

@antocuni
Copy link
Contributor Author

antocuni commented Oct 6, 2020

Being NumPy-compatible is a goal, but we also have UX goals. For example, we recently discovered that in NumPy functions and their method variants can have different behavior, but a UX goal of PyTorch is that they always behave identically. In this case I think it's more important we be consistent with the return type than be NumPy-compatible. I could be mistaken.

sounds good to me. I also suggest to document differences w.r.t. numpy in the docs.

If we want to be super-nice to the users, we could have a global flag which enables runtime warnings for the cases in which pytorch behaves differently than numpy. Not sure if (1) this is a goal and (2) should be part of the PR, though.

Let's return None in Python for them and empty tensors in C++?

+1.
By "empty tensor" you mean an undefined tensor (i.e. Tensor()), or something more specific?

@mruberry
Copy link
Collaborator

mruberry commented Oct 6, 2020

Being NumPy-compatible is a goal, but we also have UX goals. For example, we recently discovered that in NumPy functions and their method variants can have different behavior, but a UX goal of PyTorch is that they always behave identically. In this case I think it's more important we be consistent with the return type than be NumPy-compatible. I could be mistaken.

sounds good to me. I also suggest to document differences w.r.t. numpy in the docs.

Great idea.

If we want to be super-nice to the users, we could have a global flag which enables runtime warnings for the cases in which pytorch behaves differently than numpy. Not sure if (1) this is a goal and (2) should be part of the PR, though.

Definitely not part of this PR. There are some functions which are divergent from NumPy and we might eventually offer more compatible alternatives for them, but not at the moment.

Let's return None in Python for them and empty tensors in C++?

+1.

By "empty tensor" you mean an undefined tensor (i.e. Tensor()), or something more specific?

How about at::empty({0}, ...) with options derived from the self tensor? So like, if the input is on cpu double and compute_uv=False then you'll get two empty double tensors on the cpu as part of the output.

Total aside and not for this PR: eventually we should think about making these return types named tuples so users can more readably access the results.

@mruberry
Copy link
Collaborator

Hey @antocuni, just an fyi that I was reviewing a similar issue for another operator, and it looks like the standard we're going to go with for these return values is a named tuple with empty tensors in both Python and C++ for the tensors that aren't computed, plus a docs warning that this behavior is likely to change in the future. I'll also file an issue soon to discuss this behavior and request that we add support for, in Python, only returning the tensors that are computed.

@antocuni
Copy link
Contributor Author

Hey @antocuni, just an fyi that I was reviewing a similar issue for another operator, and it looks like the standard we're going to go with for these return values is a named tuple with empty tensors in both Python and C++

ok thanks, I'll update the code to reflect that. Do you have a link to that issue?

@antocuni
Copy link
Contributor Author

also, what do we want to do with the hermitian=False parameter? Quoting the numpy docs:

    hermitian : bool, optional
        If True, `a` is assumed to be Hermitian (symmetric if real-valued),
        enabling a more efficient method for finding singular values.
        Defaults to False.

I think we have at least three options:

  1. we simply don't support this parameter for now
  2. we add the parameter but ignore it (since it's just an optimization)
    2b. same as 2, but we emit a run-time warning if you pass =True
  3. we implement it immediatly.

@mruberry
Copy link
Collaborator

Hey @antocuni, just an fyi that I was reviewing a similar issue for another operator, and it looks like the standard we're going to go with for these return values is a named tuple with empty tensors in both Python and C++

ok thanks, I'll update the code to reflect that. Do you have a link to that issue?

Just created #46187. We actually decided to go with returning None instead of empty tensors, but we haven't worked out how this translates from C++ yet.

@mruberry
Copy link
Collaborator

also, what do we want to do with the hermitian=False parameter? Quoting the numpy docs:

    hermitian : bool, optional
        If True, `a` is assumed to be Hermitian (symmetric if real-valued),
        enabling a more efficient method for finding singular values.
        Defaults to False.

I think we have at least three options:

  1. we simply don't support this parameter for now

Sounds great. It won't be BC-breaking to add because it (1) appears last in the param list and (2) has a default value. If that wasn't true then I'd suggest we add it to the signature now.

@antocuni
Copy link
Contributor Author

Just created #46187. We actually decided to go with returning None instead of empty tensors, but we haven't worked out how this translates from C++ yet.

Ok, then I'll go for None in Python / empty tensor in C++ for now.

Sounds great. It won't be BC-breaking to add because it (1) appears last in the param list and (2) has a default value. If that wasn't true then I'd suggest we add it to the signature now.

+1

…es Vh, so we need to transpose AND conjugate. OTOH, linalg.svd returns Vh, so nothing is needed in that case
@codecov
Copy link

codecov bot commented Dec 23, 2020

Codecov Report

Merging #45562 (24c6506) into master (6e6231f) will decrease coverage by 0.00%.
The diff coverage is 97.87%.

@@            Coverage Diff             @@
##           master   #45562      +/-   ##
==========================================
- Coverage   80.67%   80.67%   -0.01%     
==========================================
  Files        1899     1899              
  Lines      206066   206104      +38     
==========================================
+ Hits       166241   166265      +24     
- Misses      39825    39839      +14     

@antocuni
Copy link
Contributor Author

@mruberry it seems that all CI tests have passed, so the PR is ready for a new review round

Tensor U_tmp, S_tmp, VT_tmp;
std::tie(U_tmp, S_tmp, VT_tmp) = at::_svd_helper(self, some, compute_uv);
std::tie(U_tmp, S_tmp, VT_tmp) = at::linalg_svd(self, full_matrices, compute_uv);
U.resize_as_(U_tmp).copy_(U_tmp);
Copy link
Collaborator

@mruberry mruberry Dec 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See note on error-checking out above. Here we should require same dtype and same device for now, and use resize_output instead of resize_as_.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in commit e483647.
I noticed that resize_output doesn't check the device, so I wrote the check manually: so, what is the convention for output tensors? Is it generally allowed to pass output tensors on different devices? If not, why the check is not done by resize_output itself?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent question. I actually wrote a brief description of out= handling in the Developer FAQ: https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch. I think it's currently a gap in the tools PyTorch provides that we require some operations to implement this check manually.

@ezyang is actually developing a new architecture that I think solves this issue. Maybe we should extend resize_output, too. Currently it doesn't have the device information necessary to implement this check. @heitorschueroff actually had that same idea recently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the answer, I have a better picture now

torch/_torch_docs.py Outdated Show resolved Hide resolved
torch/linalg/__init__.py Outdated Show resolved Hide resolved
@mruberry mruberry self-requested a review December 30, 2020 07:47
Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, @antocuni! I made a few small notes and this needs a rebase. Once that happens ping me and we'll get this merged!

@antocuni
Copy link
Contributor Author

antocuni commented Jan 5, 2021

This looks great, @antocuni! I made a few small notes and this needs a rebase. Once that happens ping me and we'll get this merged!

@mruberry I addressed the remaining issues. Tests are running, hopefully they will pass

@mruberry
Copy link
Collaborator

mruberry commented Jan 6, 2021

Both failures are in upstream. I think we're gtg!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@antocuni
Copy link
Contributor Author

antocuni commented Jan 7, 2021

@mruberry there was a conflict due to 6643e9f. I updated this PR with the fix (tests still running at this point)

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mruberry merged this pull request in 5c5abd5.

hwangdeyu pushed a commit to hwangdeyu/pytorch that referenced this pull request Jan 14, 2021
Summary:
This is related to pytorch#42666 .
I am opening this PR to have the opportunity to discuss things.
First, we need to consider the differences between `torch.svd` and `numpy.linalg.svd`:

1. `torch.svd` takes `some=True`, while `numpy.linalg.svd` takes `full_matrices=True`, which is effectively the opposite (and with the opposite default, too!)

2. `torch.svd` returns `(U, S, V)`, while `numpy.linalg.svd` returns `(U, S, VT)` (i.e., V transposed).

3. `torch.svd` always returns a 3-tuple; `numpy.linalg.svd` returns only `S` in case `compute_uv==False`

4. `numpy.linalg.svd` also takes an optional `hermitian=False` argument.

I think that the plan is to eventually deprecate `torch.svd` in favor of `torch.linalg.svd`, so this PR does the following:

1. Rename/adapt the old `svd` C++ functions into `linalg_svd`: in particular, now `linalg_svd` takes `full_matrices` and returns `VT`

2. Re-implement the old C++ interface on top of the new (by negating `full_matrices` and transposing `VT`).

3. The C++ version of `linalg_svd` *always* returns a 3-tuple (we can't do anything else). So, there is a python wrapper which manually calls `torch._C._linalg.linalg_svd` to tweak the return value in case `compute_uv==False`.

Currently, `linalg_svd_backward` is broken because it has not been adapted yet after the `V ==> VT` change, but before continuing and spending more time on it I wanted to make sure that the general approach is fine.

Pull Request resolved: pytorch#45562

Reviewed By: H-Huang

Differential Revision: D25803557

Pulled By: mruberry

fbshipit-source-id: 4966f314a0ba2ee391bab5cda4563e16275ce91f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants