New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement torch.linalg.svd #45562
Implement torch.linalg.svd #45562
Conversation
…nt svd as a tiny wrapper around it
…which is the OPPOSITE (and with a different default) thatn the old some=true. The signature of at::svd is unchanged
…agam expect VT to be in column-major order, so what we were passing around was effectively V, not VT. - Fix the comment and actually create a column-major version of VT, which we can return directly from linalg_svd - kill the .t() from python's version of linalg.svd(), since we are directly getting VT from C++ now - call VT.transpose_() inside the legacy version of svd, to keep backwards compatibilty This is WIP because autograd is still broken
…ave been modified
💊 CI failures summary and remediationsAs of commit 81510d5 (more details on the Dr. CI page):
1 job timed out:
🚧 1 fixed upstream failure:These were probably caused by upstream breakages that were already fixed.
Please rebase on the
|
Thanks for your thoughtful write-up of the issues, @antocuni, and sorry I couldn't respond earlier. Also, please reach out on Slack if you'd like to have a more direct discussion of these issues. One design principle I'd like us to consider is that functions should always return the same type. PyTorch already fails to do this in some cases. For example, we have max(tensor), which returns a tensor, but then max(tensor, 1) returns a tuple of the reduced tensor and the indices of the maximum elements in the original tensor. I think it'd be clearer if the latter function had a name like max_with_indices or the max function always returned a tuple. In this case, As for deprecating What are your thoughts, @antocuni? |
I just updated the torch.linalg.norm tracking issue (although it needs review from the Quansight team) and pulled some SVD-related issues. A couple that seem relevant are: Do we want to address #45821 while implementing a NumPy-compatible signature? Also, is this a good opportunity to compare the current CUDA implementation with an alternative implementation, like reusing the CPU implementation? |
I agree in general: return types should not depend on the values of the arguments. However, for the specific case at hand, we need to decide whether we value more API-consistency (including consistency between C++ and Python) or compatibility with numpy. I seemed to understand that the goal was to be drop-in compatible to numpy, isn't it the case? It's worth to note that since we are moving
Looking at the code, I think that the decision to return 0-filled tensors was for convenience of the implementation. The C++ return type is From the programmer point of view, returning a tuple Re #45821: yes, it seems a good idea to update the docs while we are here. Re #41306: I would personally leave it for another PR, to make this more self-contained and to reduce the risk of breaking stuff. |
Being NumPy-compatible is a goal, but we also have UX goals. For example, we recently discovered that in NumPy functions and their method variants can have different behavior, but a UX goal of PyTorch is that they always behave identically. In this case I think it's more important we be consistent with the return type than be NumPy-compatible. I could be mistaken.
Let's return None in Python for them and empty tensors in C++?
Cool.
Sounds good. |
sounds good to me. I also suggest to document differences w.r.t. numpy in the docs. If we want to be super-nice to the users, we could have a global flag which enables runtime warnings for the cases in which pytorch behaves differently than numpy. Not sure if (1) this is a goal and (2) should be part of the PR, though.
+1. |
Great idea.
Definitely not part of this PR. There are some functions which are divergent from NumPy and we might eventually offer more compatible alternatives for them, but not at the moment.
How about Total aside and not for this PR: eventually we should think about making these return types named tuples so users can more readably access the results. |
…implement linalg_svd on top of it
…n empty tensors in C++ and None in Python
Hey @antocuni, just an fyi that I was reviewing a similar issue for another operator, and it looks like the standard we're going to go with for these return values is a named tuple with empty tensors in both Python and C++ for the tensors that aren't computed, plus a docs warning that this behavior is likely to change in the future. I'll also file an issue soon to discuss this behavior and request that we add support for, in Python, only returning the tensors that are computed. |
ok thanks, I'll update the code to reflect that. Do you have a link to that issue? |
also, what do we want to do with the
I think we have at least three options:
|
Just created #46187. We actually decided to go with returning None instead of empty tensors, but we haven't worked out how this translates from C++ yet. |
Sounds great. It won't be BC-breaking to add because it (1) appears last in the param list and (2) has a default value. If that wasn't true then I'd suggest we add it to the signature now. |
Ok, then I'll go for
+1 |
…es Vh, so we need to transpose AND conjugate. OTOH, linalg.svd returns Vh, so nothing is needed in that case
Codecov Report
@@ Coverage Diff @@
## master #45562 +/- ##
==========================================
- Coverage 80.67% 80.67% -0.01%
==========================================
Files 1899 1899
Lines 206066 206104 +38
==========================================
+ Hits 166241 166265 +24
- Misses 39825 39839 +14 |
@mruberry it seems that all CI tests have passed, so the PR is ready for a new review round |
Tensor U_tmp, S_tmp, VT_tmp; | ||
std::tie(U_tmp, S_tmp, VT_tmp) = at::_svd_helper(self, some, compute_uv); | ||
std::tie(U_tmp, S_tmp, VT_tmp) = at::linalg_svd(self, full_matrices, compute_uv); | ||
U.resize_as_(U_tmp).copy_(U_tmp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See note on error-checking out above. Here we should require same dtype and same device for now, and use resize_output instead of resize_as_.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in commit e483647.
I noticed that resize_output
doesn't check the device, so I wrote the check manually: so, what is the convention for output tensors? Is it generally allowed to pass output tensors on different devices? If not, why the check is not done by resize_output
itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an excellent question. I actually wrote a brief description of out= handling in the Developer FAQ: https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch. I think it's currently a gap in the tools PyTorch provides that we require some operations to implement this check manually.
@ezyang is actually developing a new architecture that I think solves this issue. Maybe we should extend resize_output
, too. Currently it doesn't have the device information necessary to implement this check. @heitorschueroff actually had that same idea recently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for the answer, I have a better picture now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, @antocuni! I made a few small notes and this needs a rebase. Once that happens ping me and we'll get this merged!
Both failures are in upstream. I think we're gtg! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: This is related to pytorch#42666 . I am opening this PR to have the opportunity to discuss things. First, we need to consider the differences between `torch.svd` and `numpy.linalg.svd`: 1. `torch.svd` takes `some=True`, while `numpy.linalg.svd` takes `full_matrices=True`, which is effectively the opposite (and with the opposite default, too!) 2. `torch.svd` returns `(U, S, V)`, while `numpy.linalg.svd` returns `(U, S, VT)` (i.e., V transposed). 3. `torch.svd` always returns a 3-tuple; `numpy.linalg.svd` returns only `S` in case `compute_uv==False` 4. `numpy.linalg.svd` also takes an optional `hermitian=False` argument. I think that the plan is to eventually deprecate `torch.svd` in favor of `torch.linalg.svd`, so this PR does the following: 1. Rename/adapt the old `svd` C++ functions into `linalg_svd`: in particular, now `linalg_svd` takes `full_matrices` and returns `VT` 2. Re-implement the old C++ interface on top of the new (by negating `full_matrices` and transposing `VT`). 3. The C++ version of `linalg_svd` *always* returns a 3-tuple (we can't do anything else). So, there is a python wrapper which manually calls `torch._C._linalg.linalg_svd` to tweak the return value in case `compute_uv==False`. Currently, `linalg_svd_backward` is broken because it has not been adapted yet after the `V ==> VT` change, but before continuing and spending more time on it I wanted to make sure that the general approach is fine. Pull Request resolved: pytorch#45562 Reviewed By: H-Huang Differential Revision: D25803557 Pulled By: mruberry fbshipit-source-id: 4966f314a0ba2ee391bab5cda4563e16275ce91f
This is related to #42666 .
I am opening this PR to have the opportunity to discuss things.
First, we need to consider the differences between
torch.svd
andnumpy.linalg.svd
:torch.svd
takessome=True
, whilenumpy.linalg.svd
takesfull_matrices=True
, which is effectively the opposite (and with the opposite default, too!)torch.svd
returns(U, S, V)
, whilenumpy.linalg.svd
returns(U, S, VT)
(i.e., V transposed).torch.svd
always returns a 3-tuple;numpy.linalg.svd
returns onlyS
in casecompute_uv==False
numpy.linalg.svd
also takes an optionalhermitian=False
argument.I think that the plan is to eventually deprecate
torch.svd
in favor oftorch.linalg.svd
, so this PR does the following:Rename/adapt the old
svd
C++ functions intolinalg_svd
: in particular, nowlinalg_svd
takesfull_matrices
and returnsVT
Re-implement the old C++ interface on top of the new (by negating
full_matrices
and transposingVT
).The C++ version of
linalg_svd
always returns a 3-tuple (we can't do anything else). So, there is a python wrapper which manually callstorch._C._linalg.linalg_svd
to tweak the return value in casecompute_uv==False
.Currently,
linalg_svd_backward
is broken because it has not been adapted yet after theV ==> VT
change, but before continuing and spending more time on it I wanted to make sure that the general approach is fine.