torch.linalg in PyTorch 1.10 tracker #42666

mattip · 2020-08-06T11:23:58Z

This is a tracking issue for torch.linalg tasks for PyTorch 1.10. The goals for the 1.10 release are:

Reviewing the Python Array AP's linalg extension and pursuing consistency with it
Modernizing linear algebra functions still under the torch namespace to newer torch.linalg versions
Continuing to deprecate older linear algebra operators (see Deprecations tracking issue #55953)
Improving CUDA performance (see Linear algebra GPU backend tracking issue [magma/cusolver/cublas] #47953)
Improving linear algebra gradient computations

Tasks

Add the "upper" kwarg to linalg.cholesky Enable upper for torch.linalg.cholesky #62434
Rationalize the kwargs of linalg.eigh, linalg.eigvalsh, linalg.lstsq, and linalg.pinv with each other and the Python Array API
Add relative and absolute tolerances for matrix_rank, pinv #54151 (comment)
Add modern versions of lu, lu_unpack, geqrf, orgqr, ormqr, lu_solve, triangular_solve, cholesky_solve, cholesky_inverse, tensordot, matrix_exp, {svd, pca}_lowrank and lobpcg to torch.linalg
Move matrix_exp into torch.linalg #61648
Move tensordot into torch.linalg #61649
Move {svd,pca}_lowrank into torch.linalg #61650
Move lobpcg into torch.linalg #61653
Move lu, lu_solve, and lu_unpack into torch.linalg #61657
Move torch.cholesky_solve into torch.linalg. #61658
Investigate a common "solver API" to rationalize the various solvers

cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @jianyuh @nikitaved @pearu @mruberry @heitorschueroff @walterddr @IvanYashchuk @xwang233 @lezcano @rgommers @vincentqb @vishwakftw @ssnl

The text was updated successfully, but these errors were encountered:

rgommers · 2020-08-07T09:13:14Z

xref gh-42053, the correct way to implement aliases is still under discussion there.

mruberry · 2020-08-07T16:09:02Z

xref gh-42053, the correct way to implement aliases is still under discussion there.

We ultimately decided on the approach taken in #42586. In the future we may implement an "alias" entry in native_functions.yaml, too.

albanD · 2020-08-24T14:13:56Z

I think an important thing to keep in mind for these new ops is to make sure we don't do the same mistake as the current ones and provide proper documentation about them.
In particular, it should contain enough information for people that know the field to know what happens (for example, high level functions should mention how they are implemented)
Moreover, for less experience people, it would be important to have informations like:

What are the requirements on the inputs for this function to be valid
What are the numerical properties of this function in general
When it exists, propose a more numerically stable alternative for the problematic inputs

Finally, when the backward is not implemented for all cases that the function supports, this should be mentioned as well.

rgommers · 2020-09-11T14:41:20Z

Regarding the task list here: I'm not too sure about deprecating so many functions. For norm it makes sense, because the torch.linalg.norm function is different from (and arguably nicer) than torch.norm. But for simple aliases, I'd at least do those last, because there's only value once the whole linalg namespace is complete so we can say "torch.linalg matches numpy.linalg". And for some heavily used functions I'd consider not doing the deprecation at all.

rgommers · 2020-09-11T20:10:01Z

@mruberry, @ezyang and I had a chat about priorities:

Implement functions that don’t yet exist, most-used ones first (e.g. kron, eigh - here is some usage data)
Implement functions that exist in torch but need a change to match numpy when they go into torch.linalg
Add aliases to linalg namespace for functions that exist and already match numpy
Add the necessary deprecations to old functions in one go when the torch.linalg namespace is complete (TBD whether they should all be deprecated, see my comment above).

mruberry · 2020-09-12T03:14:22Z

@rgommers I updated the rollup to reflect these priorities and remove all deprecation tasks (except deprecating torch.norm, as we discussed earlier). I appreciate your point about maybe being less aggressive with deprecations than this initially was. Let's review on a case-by-case basis.

antocuni · 2020-09-24T11:57:20Z

picking up torch.linalg.svd

IvanYashchuk · 2020-09-25T09:33:18Z

Picking up kron, eigh, cond, tensorinv.

antocuni · 2020-09-25T15:20:42Z

I have a design question about torch.linalg.svd, but I think it's general enough that it's good to discuss it here.
numpy.lingalg.svd differs from torch.svd in two major ways:

the numpy version has a parameter called full_matrices, while torch.svd has one called some which is basically the opposite, so it's easy to adapt
the numpy signature is irregular, and the return type changes depending on the value of compute_uv. In particular, if compute_uv==True it returns a 3-tuple, else it returns a single array. OTOH, torch.svd always return a 3-tuple.

I see that the the existing torch.linalg.* functions all provide a C++ API in the torch::linalg namespace, but point 2 makes it hard/impossible to provide a C++ signature which is consistent with the Python one. I see at least two solutions:

implement torch.linalg.svd entirely in Python on top of torch.svd, and not offering the C++ API
implement a C++ version of torch::linalg::svd which always returns a 3-tuple, and write a thin Python wrapper which discards the two extra elements in the compute_uv==False case.

Moreover, what are we going to do with torch.svd? Should we:

keep it as is and implement torch.linalg.svd on top of it OR
move the current implementation to torch.linalg.svd and re-implement the old torch.svd on top of it for backwards compatibility?

IvanYashchuk · 2020-09-25T16:57:09Z

I'm in favor of this solution:

implement a C++ version of torch::linalg::svd which always returns a 3-tuple, and write a thin Python wrapper which discards the two extra elements in the compute_uv==False case.

I have done the same for ChainerX here.

rgommers · 2020-09-25T18:07:12Z

Moreover, what are we going to do with torch.svd? Should we:

keep it as is and implement torch.linalg.svd on top of it OR

move the current implementation to torch.linalg.svd and re-implement the old torch.svd on top of it for backwards compatibility?

It seems to me that (1) is a little less work, but would be much less appealing if torch.svd gets deprecated in the future. That's not decided, but seems fairly likely to happen at some point. So (2) is probably preferred. @mruberry does that make sense to you?

antocuni · 2020-09-30T10:01:29Z

It seems to me that (1) is a little less work, but would be much less appealing if torch.svd gets deprecated in the future. That's not decided, but seems fairly likely to happen at some point. So (2) is probably preferred. @mruberry does that make sense to you?

I opened PR #45562 to discuss this more concretely. I'll wait for some feedback on it before continuing

mruberry · 2020-10-01T09:54:16Z

@antocuni Sorry for the delay, working through a backlog now that the 1.7 branch is cut. I'll get to this ASAP (definitely before Monday).

mruberry · 2020-10-05T05:40:32Z

Moreover, what are we going to do with torch.svd? Should we:

keep it as is and implement torch.linalg.svd on top of it OR

move the current implementation to torch.linalg.svd and re-implement the old torch.svd on top of it for backwards compatibility?

It seems to me that (1) is a little less work, but would be much less appealing if torch.svd gets deprecated in the future. That's not decided, but seems fairly likely to happen at some point. So (2) is probably preferred. @mruberry does that make sense to you?

Unfortunately I'm not very familiar with the code. I would do the simplest thing to implement torch.linalg.svd(), as discussed on the PR. We can always rearrange the functions internally later.

mruberry · 2020-10-05T07:57:39Z

@muthuArivoli It's taken awhile, but I was finally able to review linear algebra related issues. You mentioned you were interested in working on one. Would torch.inner(), torch.kron(), fixing torch.einsum, torch.linalg.cond(), or torch.linalg.matrix_rank() be interesting? Or maybe reviewing if there's an alternative to a MAGMA-based implementation of a function like torch.svd() or torch.lstsq?

Summary: This PR modifies the behavior of `_out` variants to match the description here https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch With this PR result and input tensors must be on the same device and have the same "type kind". I skipped `qr` and `eig` in this process as they require a bit more work. Functions that can use the provided storage directly do so. If `result` is not empty and not in the batched column-major format or does not have the same type as input then we have to allocate a temporary tensor and copy it. TODO: - [x] Add more tests for same device and valid safe dtype - [x] Move inv and solve changes to separate PRs pytorch#51968, pytorch#51977 Ref. pytorch#42666 Pull Request resolved: pytorch#51560 Reviewed By: albanD Differential Revision: D26400734 Pulled By: heitorschueroff fbshipit-source-id: a6201ed7e919c1670c6ff3ef60217d1dbfb72e67

Summary: This PR modifies the behavior of the `linalg_inv_out` variant to match the description here https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch With this PR result and input tensors must be on the same device and have the same "type kind". It's allowed to pass out tensors with complex dtypes for float inputs. Ref. pytorch#42666 Pull Request resolved: pytorch#51977 Reviewed By: H-Huang Differential Revision: D26725718 Pulled By: mruberry fbshipit-source-id: 2acc2a311328268706ce27ce060fc88fc7416753

Summary: This PR modifies the behavior of the `linalg_solve_out` variant to match the description here https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch With this PR result and input tensors must be on the same device and have the same "type kind". It's allowed to pass out tensors with complex dtypes for float inputs. `linalg_solve_out` was broken for batched vector inputs and it's now fixed. Ref. pytorch#42666 Pull Request resolved: pytorch#51968 Reviewed By: H-Huang Differential Revision: D26728825 Pulled By: mruberry fbshipit-source-id: c06fe937e7f452193b23ba09ca6cfa2703488455

…solve (#54315) Summary: This PR adds cusolver potrs and potrsBatched to the backend of torch.cholesky_solve and torch.linalg.cholesky_solve. `cholesky_solve` heuristics: - If magma is not installed, or batch_size is 1: - If batch_size > 1 and nrhs == 1, dispatch to `cusolverDn<T>potrsBatched`, - Otherwise, dispatch to `cusolverDnXpotrs` (64 bit) and `cusolverDn<T>potrs` (legacy). - Otherwise, use magma. Note: `cusolverDn<T>potrsBatched` only supports `nrhs == 1`. It is used for `nrhs==1` batched matrix if magma is **not** installed. See also #42666 #47953 Todo: - [x] benchmark and heuristic Pull Request resolved: #54315 Reviewed By: ngimel Differential Revision: D27562225 Pulled By: mruberry fbshipit-source-id: 323e5d60610abbbdc8369f5eb112d9fa01da40f6

Summary: This PR adds `torch.linalg.eig`, and `torch.linalg.eigvals` for NumPy compatibility. MAGMA uses a hybrid CPU-GPU algorithm and doesn't have a GPU interface for the non-symmetric eigendecomposition. It means that it forces us to transfer inputs living in GPU memory to CPU first before calling MAGMA, and then transfer results from MAGMA to CPU. That is rather slow for smaller matrices and MAGMA is faster than CPU path only for matrices larger than 3000x3000. Unfortunately, there is no cuSOLVER function for this operation. Autograd support for `torch.linalg.eig` will be added in a follow-up PR. Ref #42666 Pull Request resolved: #52491 Reviewed By: anjali411 Differential Revision: D27563616 Pulled By: mruberry fbshipit-source-id: b42bb98afcd2ed7625d30bdd71cfc74a7ea57bb5

…== 1 on CUDA (#54676) Summary: This PR adds the functionality to use cusolver potrs as the backend of cholesky_inverse for batch_size == 1 on CUDA. Cusolver `potri` is **not** used, because - it only returns the upper or lower triangular matrix as a result. Although the other half is zero, we may still need extra kernels to get the full Hermitian matrix - it's no faster than cusolver potrs in most cases - it doesn't have a batched version or 64-bit version `cholesky_inverse` dispatch heuristics: - If magma is not installed, or batch_size is 1, dispatch to `cusolverDnXpotrs` (64 bit) and `cusolverDn<T>potrs` (legacy). - Otherwise, use magma. See also #42666 #47953 Pull Request resolved: #54676 Reviewed By: ngimel Differential Revision: D27723805 Pulled By: mruberry fbshipit-source-id: f65122812c9e56a781aabe4d87ed28b309abf93f

Summary: Currently `torch.linalg.matrix_rank` accepts only Python's float for `tol=` argument. The current behavior is not NumPy compatible and this PR adds the possibility to pass Tensor for matrix-wise tolerances. Ref. #42666 Pull Request resolved: #54157 Reviewed By: ezyang Differential Revision: D27961548 Pulled By: mruberry fbshipit-source-id: 47318eefa07a7876e6360dae089e5389b9939489

Summary: Currently `torch.linalg.matrix_rank` accepts only Python's float for `tol=` argument. The current behavior is not NumPy compatible and this PR adds the possibility to pass Tensor for matrix-wise tolerances. Ref. pytorch#42666 Pull Request resolved: pytorch#54157 Reviewed By: ezyang Differential Revision: D27961548 Pulled By: mruberry fbshipit-source-id: 47318eefa07a7876e6360dae089e5389b9939489

… 11.3 (#57788) Summary: This PR enables the usage of cusolver potrf batched as the backend of Cholesky decomposition (`torch.linalg.cholesky` and `torch.linalg.cholesky_ex`) when cuda version is greater than or equal to 11.3. Benchmark available at https://github.com/xwang233/code-snippet/tree/master/linalg/cholesky-new. It is seen that cusolver potrf batched performs better than magma potrf batched in most cases. ## cholesky dispatch heuristics: ### before: - batch size == 1: cusolver potrf - batch size > 1: magma xpotrf batched ### after: cuda >= 11.3: - batch size == 1: cusolver potrf - batch size > 1: cusolver potrf batched cuda < 11.3 (not changed): - batch size == 1: cusolver potrf - batch size > 1: magma xpotrf batched --- See also #42666 #47953 #53104 #53879 Pull Request resolved: #57788 Reviewed By: ngimel Differential Revision: D28345530 Pulled By: mruberry fbshipit-source-id: 3022cf73b2750e1953c0e00a9e8b093dfc551f61

Summary: We are ready to move to the new stage for our `torch.linalg` module, which is stable (or STABLE?). Ref. #42666 Pull Request resolved: #58043 Reviewed By: ngimel Differential Revision: D28356172 Pulled By: mruberry fbshipit-source-id: e2c1effa79b9635b2ef0a820a03a0685105042bd

…== 1 on CUDA (pytorch#54676) Summary: This PR adds the functionality to use cusolver potrs as the backend of cholesky_inverse for batch_size == 1 on CUDA. Cusolver `potri` is **not** used, because - it only returns the upper or lower triangular matrix as a result. Although the other half is zero, we may still need extra kernels to get the full Hermitian matrix - it's no faster than cusolver potrs in most cases - it doesn't have a batched version or 64-bit version `cholesky_inverse` dispatch heuristics: - If magma is not installed, or batch_size is 1, dispatch to `cusolverDnXpotrs` (64 bit) and `cusolverDn<T>potrs` (legacy). - Otherwise, use magma. See also pytorch#42666 pytorch#47953 Pull Request resolved: pytorch#54676 Reviewed By: ngimel Differential Revision: D27723805 Pulled By: mruberry fbshipit-source-id: f65122812c9e56a781aabe4d87ed28b309abf93f

Summary: Currently `torch.linalg.matrix_rank` accepts only Python's float for `tol=` argument. The current behavior is not NumPy compatible and this PR adds the possibility to pass Tensor for matrix-wise tolerances. Ref. pytorch#42666 Pull Request resolved: pytorch#54157 Reviewed By: ezyang Differential Revision: D27961548 Pulled By: mruberry fbshipit-source-id: 47318eefa07a7876e6360dae089e5389b9939489

… 11.3 (pytorch#57788) Summary: This PR enables the usage of cusolver potrf batched as the backend of Cholesky decomposition (`torch.linalg.cholesky` and `torch.linalg.cholesky_ex`) when cuda version is greater than or equal to 11.3. Benchmark available at https://github.com/xwang233/code-snippet/tree/master/linalg/cholesky-new. It is seen that cusolver potrf batched performs better than magma potrf batched in most cases. ## cholesky dispatch heuristics: ### before: - batch size == 1: cusolver potrf - batch size > 1: magma xpotrf batched ### after: cuda >= 11.3: - batch size == 1: cusolver potrf - batch size > 1: cusolver potrf batched cuda < 11.3 (not changed): - batch size == 1: cusolver potrf - batch size > 1: magma xpotrf batched --- See also pytorch#42666 pytorch#47953 pytorch#53104 pytorch#53879 Pull Request resolved: pytorch#57788 Reviewed By: ngimel Differential Revision: D28345530 Pulled By: mruberry fbshipit-source-id: 3022cf73b2750e1953c0e00a9e8b093dfc551f61

Summary: We are ready to move to the new stage for our `torch.linalg` module, which is stable (or STABLE?). Ref. pytorch#42666 Pull Request resolved: pytorch#58043 Reviewed By: ngimel Differential Revision: D28356172 Pulled By: mruberry fbshipit-source-id: e2c1effa79b9635b2ef0a820a03a0685105042bd

…da >= 11.3 U1 (#62003) Summary: This PR adds the `cusolverDn<T>SyevjBatched` fuction to the backend of `torch.linalg.eigh` (eigenvalue solver for Hermitian matrix). Using the heuristics from #53040 (comment) and my local tests, the `syevj_batched` path is only used when `batch_size > 1` and `matrix_size <= 32`. This would give us huge performance boost in those cases. Since there were known numerical issues on cusolver `syevj_batched` before cuda 11.3 update 1, this PR only enables the dispatch when cuda version is no less than that. See also #42666 #47953 #53040 Pull Request resolved: #62003 Reviewed By: heitorschueroff Differential Revision: D30006316 Pulled By: ngimel fbshipit-source-id: 3a65c5fc9adbbe776524f8957df5442c3d3aeb8e

mruberry · 2021-08-30T04:18:43Z

Closing in favor of individual issues

mruberry added module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul module: numpy Related to numpy support, and also numpy compatibility of our operators high priority labels Aug 6, 2020

pytorch-probot bot added the triage review label Aug 6, 2020

mruberry added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module and removed triage review labels Aug 6, 2020

mruberry mentioned this issue Sep 9, 2020

Can't solve torch.lstsq() with specific values #44378

Closed

mruberry changed the title ~~ENH: fill out linalg namespace (tracking issue)~~ torch.linalg tracking issue Sep 9, 2020

This was referenced Sep 25, 2020

Added Kronecker product of tensors (torch.kron) #45358

Closed

Added linalg.eigh, linalg.eigvalsh #45526

Closed

antocuni mentioned this issue Sep 30, 2020

Implement torch.linalg.svd #45562

Closed

mruberry changed the title ~~torch.linalg tracking issue~~ Linear Algebra tracking issue Oct 5, 2020

This was referenced Oct 5, 2020

Added computing matrix condition numbers (linalg.cond) #45832

Closed

Added linalg.tensorinv #45969

Closed

Added linalg.cholesky #46083

Closed

xwang233 mentioned this issue Mar 25, 2021

Use cusolver potrs as the backend of cholesky_inverse for batch_size == 1 on CUDA #54676

Closed

xwang233 mentioned this issue Apr 1, 2021

Cusolver handle may decrease MAGMA performance on GPU #55122

Open

xwang233 mentioned this issue May 7, 2021

Enable cusolver potrf batched for Cholesky decomposition when cuda >= 11.3 #57788

Closed

IvanYashchuk mentioned this issue May 11, 2021

Remove the BETA status for torch.linalg #58043

Closed

mruberry added tracker A tracking issue and removed module: numpy Related to numpy support, and also numpy compatibility of our operators labels Jun 17, 2021

mruberry changed the title ~~Linear Algebra tracking issue~~ torch.linalg in PyTorch 1.10 tracker Jun 17, 2021

xwang233 mentioned this issue Jul 21, 2021

[Linalg] Add cusolver syevjBatched path for torch.linalg.eigh when cuda >= 11.3 U1 #62003

Closed

mruberry closed this as completed Aug 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch.linalg in PyTorch 1.10 tracker #42666

torch.linalg in PyTorch 1.10 tracker #42666

mattip commented Aug 6, 2020 •

edited by mruberry

Loading

rgommers commented Aug 7, 2020

mruberry commented Aug 7, 2020

albanD commented Aug 24, 2020

rgommers commented Sep 11, 2020

rgommers commented Sep 11, 2020

mruberry commented Sep 12, 2020

antocuni commented Sep 24, 2020

IvanYashchuk commented Sep 25, 2020

antocuni commented Sep 25, 2020

IvanYashchuk commented Sep 25, 2020

rgommers commented Sep 25, 2020

antocuni commented Sep 30, 2020

mruberry commented Oct 1, 2020

mruberry commented Oct 5, 2020

mruberry commented Oct 5, 2020

mruberry commented Aug 30, 2021

torch.linalg in PyTorch 1.10 tracker #42666

torch.linalg in PyTorch 1.10 tracker #42666

Comments

mattip commented Aug 6, 2020 • edited by mruberry Loading

rgommers commented Aug 7, 2020

mruberry commented Aug 7, 2020

albanD commented Aug 24, 2020

rgommers commented Sep 11, 2020

rgommers commented Sep 11, 2020

mruberry commented Sep 12, 2020

antocuni commented Sep 24, 2020

IvanYashchuk commented Sep 25, 2020

antocuni commented Sep 25, 2020

IvanYashchuk commented Sep 25, 2020

rgommers commented Sep 25, 2020

antocuni commented Sep 30, 2020

mruberry commented Oct 1, 2020

mruberry commented Oct 5, 2020

mruberry commented Oct 5, 2020

mruberry commented Aug 30, 2021

mattip commented Aug 6, 2020 •

edited by mruberry

Loading