-
Notifications
You must be signed in to change notification settings - Fork 21.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torch.linalg in PyTorch 1.10 tracker #42666
Comments
xref gh-42053, the correct way to implement aliases is still under discussion there. |
I think an important thing to keep in mind for these new ops is to make sure we don't do the same mistake as the current ones and provide proper documentation about them.
Finally, when the backward is not implemented for all cases that the function supports, this should be mentioned as well. |
Regarding the task list here: I'm not too sure about deprecating so many functions. For |
@mruberry, @ezyang and I had a chat about priorities:
|
@rgommers I updated the rollup to reflect these priorities and remove all deprecation tasks (except deprecating torch.norm, as we discussed earlier). I appreciate your point about maybe being less aggressive with deprecations than this initially was. Let's review on a case-by-case basis. |
picking up |
Picking up |
I have a design question about
I see that the the existing
Moreover, what are we going to do with
|
I'm in favor of this solution:
I have done the same for ChainerX here. |
It seems to me that (1) is a little less work, but would be much less appealing if |
I opened PR #45562 to discuss this more concretely. I'll wait for some feedback on it before continuing |
@antocuni Sorry for the delay, working through a backlog now that the 1.7 branch is cut. I'll get to this ASAP (definitely before Monday). |
Unfortunately I'm not very familiar with the code. I would do the simplest thing to implement |
@muthuArivoli It's taken awhile, but I was finally able to review linear algebra related issues. You mentioned you were interested in working on one. Would torch.inner(), torch.kron(), fixing torch.einsum, torch.linalg.cond(), or torch.linalg.matrix_rank() be interesting? Or maybe reviewing if there's an alternative to a MAGMA-based implementation of a function like torch.svd() or torch.lstsq? |
Summary: This PR modifies the behavior of `_out` variants to match the description here https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch With this PR result and input tensors must be on the same device and have the same "type kind". I skipped `qr` and `eig` in this process as they require a bit more work. Functions that can use the provided storage directly do so. If `result` is not empty and not in the batched column-major format or does not have the same type as input then we have to allocate a temporary tensor and copy it. TODO: - [x] Add more tests for same device and valid safe dtype - [x] Move inv and solve changes to separate PRs pytorch#51968, pytorch#51977 Ref. pytorch#42666 Pull Request resolved: pytorch#51560 Reviewed By: albanD Differential Revision: D26400734 Pulled By: heitorschueroff fbshipit-source-id: a6201ed7e919c1670c6ff3ef60217d1dbfb72e67
Summary: This PR modifies the behavior of the `linalg_inv_out` variant to match the description here https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch With this PR result and input tensors must be on the same device and have the same "type kind". It's allowed to pass out tensors with complex dtypes for float inputs. Ref. pytorch#42666 Pull Request resolved: pytorch#51977 Reviewed By: H-Huang Differential Revision: D26725718 Pulled By: mruberry fbshipit-source-id: 2acc2a311328268706ce27ce060fc88fc7416753
Summary: This PR modifies the behavior of the `linalg_solve_out` variant to match the description here https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch With this PR result and input tensors must be on the same device and have the same "type kind". It's allowed to pass out tensors with complex dtypes for float inputs. `linalg_solve_out` was broken for batched vector inputs and it's now fixed. Ref. pytorch#42666 Pull Request resolved: pytorch#51968 Reviewed By: H-Huang Differential Revision: D26728825 Pulled By: mruberry fbshipit-source-id: c06fe937e7f452193b23ba09ca6cfa2703488455
…solve (#54315) Summary: This PR adds cusolver potrs and potrsBatched to the backend of torch.cholesky_solve and torch.linalg.cholesky_solve. `cholesky_solve` heuristics: - If magma is not installed, or batch_size is 1: - If batch_size > 1 and nrhs == 1, dispatch to `cusolverDn<T>potrsBatched`, - Otherwise, dispatch to `cusolverDnXpotrs` (64 bit) and `cusolverDn<T>potrs` (legacy). - Otherwise, use magma. Note: `cusolverDn<T>potrsBatched` only supports `nrhs == 1`. It is used for `nrhs==1` batched matrix if magma is **not** installed. See also #42666 #47953 Todo: - [x] benchmark and heuristic Pull Request resolved: #54315 Reviewed By: ngimel Differential Revision: D27562225 Pulled By: mruberry fbshipit-source-id: 323e5d60610abbbdc8369f5eb112d9fa01da40f6
Summary: This PR adds `torch.linalg.eig`, and `torch.linalg.eigvals` for NumPy compatibility. MAGMA uses a hybrid CPU-GPU algorithm and doesn't have a GPU interface for the non-symmetric eigendecomposition. It means that it forces us to transfer inputs living in GPU memory to CPU first before calling MAGMA, and then transfer results from MAGMA to CPU. That is rather slow for smaller matrices and MAGMA is faster than CPU path only for matrices larger than 3000x3000. Unfortunately, there is no cuSOLVER function for this operation. Autograd support for `torch.linalg.eig` will be added in a follow-up PR. Ref #42666 Pull Request resolved: #52491 Reviewed By: anjali411 Differential Revision: D27563616 Pulled By: mruberry fbshipit-source-id: b42bb98afcd2ed7625d30bdd71cfc74a7ea57bb5
…== 1 on CUDA (#54676) Summary: This PR adds the functionality to use cusolver potrs as the backend of cholesky_inverse for batch_size == 1 on CUDA. Cusolver `potri` is **not** used, because - it only returns the upper or lower triangular matrix as a result. Although the other half is zero, we may still need extra kernels to get the full Hermitian matrix - it's no faster than cusolver potrs in most cases - it doesn't have a batched version or 64-bit version `cholesky_inverse` dispatch heuristics: - If magma is not installed, or batch_size is 1, dispatch to `cusolverDnXpotrs` (64 bit) and `cusolverDn<T>potrs` (legacy). - Otherwise, use magma. See also #42666 #47953 Pull Request resolved: #54676 Reviewed By: ngimel Differential Revision: D27723805 Pulled By: mruberry fbshipit-source-id: f65122812c9e56a781aabe4d87ed28b309abf93f
Summary: Currently `torch.linalg.matrix_rank` accepts only Python's float for `tol=` argument. The current behavior is not NumPy compatible and this PR adds the possibility to pass Tensor for matrix-wise tolerances. Ref. #42666 Pull Request resolved: #54157 Reviewed By: ezyang Differential Revision: D27961548 Pulled By: mruberry fbshipit-source-id: 47318eefa07a7876e6360dae089e5389b9939489
Summary: Currently `torch.linalg.matrix_rank` accepts only Python's float for `tol=` argument. The current behavior is not NumPy compatible and this PR adds the possibility to pass Tensor for matrix-wise tolerances. Ref. pytorch#42666 Pull Request resolved: pytorch#54157 Reviewed By: ezyang Differential Revision: D27961548 Pulled By: mruberry fbshipit-source-id: 47318eefa07a7876e6360dae089e5389b9939489
… 11.3 (#57788) Summary: This PR enables the usage of cusolver potrf batched as the backend of Cholesky decomposition (`torch.linalg.cholesky` and `torch.linalg.cholesky_ex`) when cuda version is greater than or equal to 11.3. Benchmark available at https://github.com/xwang233/code-snippet/tree/master/linalg/cholesky-new. It is seen that cusolver potrf batched performs better than magma potrf batched in most cases. ## cholesky dispatch heuristics: ### before: - batch size == 1: cusolver potrf - batch size > 1: magma xpotrf batched ### after: cuda >= 11.3: - batch size == 1: cusolver potrf - batch size > 1: cusolver potrf batched cuda < 11.3 (not changed): - batch size == 1: cusolver potrf - batch size > 1: magma xpotrf batched --- See also #42666 #47953 #53104 #53879 Pull Request resolved: #57788 Reviewed By: ngimel Differential Revision: D28345530 Pulled By: mruberry fbshipit-source-id: 3022cf73b2750e1953c0e00a9e8b093dfc551f61
…== 1 on CUDA (pytorch#54676) Summary: This PR adds the functionality to use cusolver potrs as the backend of cholesky_inverse for batch_size == 1 on CUDA. Cusolver `potri` is **not** used, because - it only returns the upper or lower triangular matrix as a result. Although the other half is zero, we may still need extra kernels to get the full Hermitian matrix - it's no faster than cusolver potrs in most cases - it doesn't have a batched version or 64-bit version `cholesky_inverse` dispatch heuristics: - If magma is not installed, or batch_size is 1, dispatch to `cusolverDnXpotrs` (64 bit) and `cusolverDn<T>potrs` (legacy). - Otherwise, use magma. See also pytorch#42666 pytorch#47953 Pull Request resolved: pytorch#54676 Reviewed By: ngimel Differential Revision: D27723805 Pulled By: mruberry fbshipit-source-id: f65122812c9e56a781aabe4d87ed28b309abf93f
Summary: Currently `torch.linalg.matrix_rank` accepts only Python's float for `tol=` argument. The current behavior is not NumPy compatible and this PR adds the possibility to pass Tensor for matrix-wise tolerances. Ref. pytorch#42666 Pull Request resolved: pytorch#54157 Reviewed By: ezyang Differential Revision: D27961548 Pulled By: mruberry fbshipit-source-id: 47318eefa07a7876e6360dae089e5389b9939489
… 11.3 (pytorch#57788) Summary: This PR enables the usage of cusolver potrf batched as the backend of Cholesky decomposition (`torch.linalg.cholesky` and `torch.linalg.cholesky_ex`) when cuda version is greater than or equal to 11.3. Benchmark available at https://github.com/xwang233/code-snippet/tree/master/linalg/cholesky-new. It is seen that cusolver potrf batched performs better than magma potrf batched in most cases. ## cholesky dispatch heuristics: ### before: - batch size == 1: cusolver potrf - batch size > 1: magma xpotrf batched ### after: cuda >= 11.3: - batch size == 1: cusolver potrf - batch size > 1: cusolver potrf batched cuda < 11.3 (not changed): - batch size == 1: cusolver potrf - batch size > 1: magma xpotrf batched --- See also pytorch#42666 pytorch#47953 pytorch#53104 pytorch#53879 Pull Request resolved: pytorch#57788 Reviewed By: ngimel Differential Revision: D28345530 Pulled By: mruberry fbshipit-source-id: 3022cf73b2750e1953c0e00a9e8b093dfc551f61
Summary: We are ready to move to the new stage for our `torch.linalg` module, which is stable (or STABLE?). Ref. pytorch#42666 Pull Request resolved: pytorch#58043 Reviewed By: ngimel Differential Revision: D28356172 Pulled By: mruberry fbshipit-source-id: e2c1effa79b9635b2ef0a820a03a0685105042bd
…da >= 11.3 U1 (#62003) Summary: This PR adds the `cusolverDn<T>SyevjBatched` fuction to the backend of `torch.linalg.eigh` (eigenvalue solver for Hermitian matrix). Using the heuristics from #53040 (comment) and my local tests, the `syevj_batched` path is only used when `batch_size > 1` and `matrix_size <= 32`. This would give us huge performance boost in those cases. Since there were known numerical issues on cusolver `syevj_batched` before cuda 11.3 update 1, this PR only enables the dispatch when cuda version is no less than that. See also #42666 #47953 #53040 Pull Request resolved: #62003 Reviewed By: heitorschueroff Differential Revision: D30006316 Pulled By: ngimel fbshipit-source-id: 3a65c5fc9adbbe776524f8957df5442c3d3aeb8e
Closing in favor of individual issues |
This is a tracking issue for torch.linalg tasks for PyTorch 1.10. The goals for the 1.10 release are:
Tasks
matrix_exp
intotorch.linalg
#61648tensordot
intotorch.linalg
#61649{svd,pca}_lowrank
intotorch.linalg
#61650lobpcg
intotorch.linalg
#61653lu
,lu_solve
, andlu_unpack
intotorch.linalg
#61657torch.cholesky_solve
intotorch.linalg
. #61658cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @jianyuh @nikitaved @pearu @mruberry @heitorschueroff @walterddr @IvanYashchuk @xwang233 @lezcano @rgommers @vincentqb @vishwakftw @ssnl
The text was updated successfully, but these errors were encountered: