Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch.linalg in PyTorch 1.10 tracker #42666

Closed
2 of 4 tasks
mattip opened this issue Aug 6, 2020 · 37 comments
Closed
2 of 4 tasks

torch.linalg in PyTorch 1.10 tracker #42666

mattip opened this issue Aug 6, 2020 · 37 comments
Labels
high priority module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul tracker A tracking issue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@mattip
Copy link
Collaborator

mattip commented Aug 6, 2020

This is a tracking issue for torch.linalg tasks for PyTorch 1.10. The goals for the 1.10 release are:

Tasks

cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @jianyuh @nikitaved @pearu @mruberry @heitorschueroff @walterddr @IvanYashchuk @xwang233 @lezcano @rgommers @vincentqb @vishwakftw @ssnl

@mruberry mruberry added module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul module: numpy Related to numpy support, and also numpy compatibility of our operators high priority labels Aug 6, 2020
@mruberry mruberry added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module and removed triage review labels Aug 6, 2020
@rgommers
Copy link
Collaborator

rgommers commented Aug 7, 2020

xref gh-42053, the correct way to implement aliases is still under discussion there.

@mruberry
Copy link
Collaborator

mruberry commented Aug 7, 2020

xref gh-42053, the correct way to implement aliases is still under discussion there.

We ultimately decided on the approach taken in #42586. In the future we may implement an "alias" entry in native_functions.yaml, too.

@albanD
Copy link
Collaborator

albanD commented Aug 24, 2020

I think an important thing to keep in mind for these new ops is to make sure we don't do the same mistake as the current ones and provide proper documentation about them.
In particular, it should contain enough information for people that know the field to know what happens (for example, high level functions should mention how they are implemented)
Moreover, for less experience people, it would be important to have informations like:

  • What are the requirements on the inputs for this function to be valid
  • What are the numerical properties of this function in general
  • When it exists, propose a more numerically stable alternative for the problematic inputs

Finally, when the backward is not implemented for all cases that the function supports, this should be mentioned as well.

@mruberry mruberry changed the title ENH: fill out linalg namespace (tracking issue) torch.linalg tracking issue Sep 9, 2020
@rgommers
Copy link
Collaborator

Regarding the task list here: I'm not too sure about deprecating so many functions. For norm it makes sense, because the torch.linalg.norm function is different from (and arguably nicer) than torch.norm. But for simple aliases, I'd at least do those last, because there's only value once the whole linalg namespace is complete so we can say "torch.linalg matches numpy.linalg". And for some heavily used functions I'd consider not doing the deprecation at all.

@rgommers
Copy link
Collaborator

@mruberry, @ezyang and I had a chat about priorities:

  1. Implement functions that don’t yet exist, most-used ones first (e.g. kron, eigh - here is some usage data)
  2. Implement functions that exist in torch but need a change to match numpy when they go into torch.linalg
  3. Add aliases to linalg namespace for functions that exist and already match numpy
  4. Add the necessary deprecations to old functions in one go when the torch.linalg namespace is complete (TBD whether they should all be deprecated, see my comment above).

@mruberry
Copy link
Collaborator

@rgommers I updated the rollup to reflect these priorities and remove all deprecation tasks (except deprecating torch.norm, as we discussed earlier). I appreciate your point about maybe being less aggressive with deprecations than this initially was. Let's review on a case-by-case basis.

@antocuni
Copy link
Contributor

picking up torch.linalg.svd

@IvanYashchuk
Copy link
Collaborator

Picking up kron, eigh, cond, tensorinv.

@antocuni
Copy link
Contributor

I have a design question about torch.linalg.svd, but I think it's general enough that it's good to discuss it here.
numpy.lingalg.svd differs from torch.svd in two major ways:

  1. the numpy version has a parameter called full_matrices, while torch.svd has one called some which is basically the opposite, so it's easy to adapt

  2. the numpy signature is irregular, and the return type changes depending on the value of compute_uv. In particular, if compute_uv==True it returns a 3-tuple, else it returns a single array. OTOH, torch.svd always return a 3-tuple.

I see that the the existing torch.linalg.* functions all provide a C++ API in the torch::linalg namespace, but point 2 makes it hard/impossible to provide a C++ signature which is consistent with the Python one. I see at least two solutions:

  1. implement torch.linalg.svd entirely in Python on top of torch.svd, and not offering the C++ API

  2. implement a C++ version of torch::linalg::svd which always returns a 3-tuple, and write a thin Python wrapper which discards the two extra elements in the compute_uv==False case.

Moreover, what are we going to do with torch.svd? Should we:

  1. keep it as is and implement torch.linalg.svd on top of it OR
  2. move the current implementation to torch.linalg.svd and re-implement the old torch.svd on top of it for backwards compatibility?

@IvanYashchuk
Copy link
Collaborator

I'm in favor of this solution:

implement a C++ version of torch::linalg::svd which always returns a 3-tuple, and write a thin Python wrapper which discards the two extra elements in the compute_uv==False case.

I have done the same for ChainerX here.

@rgommers
Copy link
Collaborator

Moreover, what are we going to do with torch.svd? Should we:

  1. keep it as is and implement torch.linalg.svd on top of it OR
  2. move the current implementation to torch.linalg.svd and re-implement the old torch.svd on top of it for backwards compatibility?

It seems to me that (1) is a little less work, but would be much less appealing if torch.svd gets deprecated in the future. That's not decided, but seems fairly likely to happen at some point. So (2) is probably preferred. @mruberry does that make sense to you?

@antocuni
Copy link
Contributor

It seems to me that (1) is a little less work, but would be much less appealing if torch.svd gets deprecated in the future. That's not decided, but seems fairly likely to happen at some point. So (2) is probably preferred. @mruberry does that make sense to you?

I opened PR #45562 to discuss this more concretely. I'll wait for some feedback on it before continuing

@mruberry
Copy link
Collaborator

mruberry commented Oct 1, 2020

@antocuni Sorry for the delay, working through a backlog now that the 1.7 branch is cut. I'll get to this ASAP (definitely before Monday).

@mruberry
Copy link
Collaborator

mruberry commented Oct 5, 2020

Moreover, what are we going to do with torch.svd? Should we:

  1. keep it as is and implement torch.linalg.svd on top of it OR
  2. move the current implementation to torch.linalg.svd and re-implement the old torch.svd on top of it for backwards compatibility?

It seems to me that (1) is a little less work, but would be much less appealing if torch.svd gets deprecated in the future. That's not decided, but seems fairly likely to happen at some point. So (2) is probably preferred. @mruberry does that make sense to you?

Unfortunately I'm not very familiar with the code. I would do the simplest thing to implement torch.linalg.svd(), as discussed on the PR. We can always rearrange the functions internally later.

@mruberry mruberry changed the title torch.linalg tracking issue Linear Algebra tracking issue Oct 5, 2020
@mruberry
Copy link
Collaborator

mruberry commented Oct 5, 2020

@muthuArivoli It's taken awhile, but I was finally able to review linear algebra related issues. You mentioned you were interested in working on one. Would torch.inner(), torch.kron(), fixing torch.einsum, torch.linalg.cond(), or torch.linalg.matrix_rank() be interesting? Or maybe reviewing if there's an alternative to a MAGMA-based implementation of a function like torch.svd() or torch.lstsq?

xsacha pushed a commit to xsacha/pytorch that referenced this issue Mar 31, 2021
Summary:
This PR modifies the behavior of `_out` variants to match the description here https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch
With this PR result and input tensors must be on the same device and have the same "type kind".

I skipped `qr` and `eig` in this process as they require a bit more work.

Functions that can use the provided storage directly do so. If `result` is not empty and not in the batched column-major format or does not have the same type as input then we have to allocate a temporary tensor and copy it.

TODO:

- [x] Add more tests for same device and valid safe dtype
- [x] Move inv and solve changes to separate PRs pytorch#51968, pytorch#51977

Ref. pytorch#42666

Pull Request resolved: pytorch#51560

Reviewed By: albanD

Differential Revision: D26400734

Pulled By: heitorschueroff

fbshipit-source-id: a6201ed7e919c1670c6ff3ef60217d1dbfb72e67
xsacha pushed a commit to xsacha/pytorch that referenced this issue Mar 31, 2021
Summary:
This PR modifies the behavior of the `linalg_inv_out` variant to match the description here https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch
With this PR result and input tensors must be on the same device and have the same "type kind".
It's allowed to pass out tensors with complex dtypes for float inputs.

Ref. pytorch#42666

Pull Request resolved: pytorch#51977

Reviewed By: H-Huang

Differential Revision: D26725718

Pulled By: mruberry

fbshipit-source-id: 2acc2a311328268706ce27ce060fc88fc7416753
xsacha pushed a commit to xsacha/pytorch that referenced this issue Mar 31, 2021
Summary:
This PR modifies the behavior of the `linalg_solve_out` variant to match the description here https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch
With this PR result and input tensors must be on the same device and have the same "type kind".
It's allowed to pass out tensors with complex dtypes for float inputs.

`linalg_solve_out` was broken for batched vector inputs and it's now fixed.

Ref. pytorch#42666

Pull Request resolved: pytorch#51968

Reviewed By: H-Huang

Differential Revision: D26728825

Pulled By: mruberry

fbshipit-source-id: c06fe937e7f452193b23ba09ca6cfa2703488455
facebook-github-bot pushed a commit that referenced this issue Apr 5, 2021
…solve (#54315)

Summary:
This PR adds cusolver potrs and potrsBatched to the backend of torch.cholesky_solve and torch.linalg.cholesky_solve.

`cholesky_solve` heuristics:

- If magma is not installed, or batch_size is 1:
  - If batch_size > 1 and nrhs == 1, dispatch to `cusolverDn<T>potrsBatched`,
  - Otherwise, dispatch to `cusolverDnXpotrs` (64 bit) and `cusolverDn<T>potrs` (legacy).
- Otherwise, use magma.

Note: `cusolverDn<T>potrsBatched` only supports `nrhs == 1`. It is used for `nrhs==1` batched matrix if magma is **not** installed.

See also #42666 #47953

Todo:

- [x] benchmark and heuristic

Pull Request resolved: #54315

Reviewed By: ngimel

Differential Revision: D27562225

Pulled By: mruberry

fbshipit-source-id: 323e5d60610abbbdc8369f5eb112d9fa01da40f6
facebook-github-bot pushed a commit that referenced this issue Apr 6, 2021
Summary:
This PR adds `torch.linalg.eig`, and `torch.linalg.eigvals` for NumPy compatibility.

MAGMA uses a hybrid CPU-GPU algorithm and doesn't have a GPU interface for the non-symmetric eigendecomposition. It means that it forces us to transfer inputs living in GPU memory to CPU first before calling MAGMA, and then transfer results from MAGMA to CPU. That is rather slow for smaller matrices and MAGMA is faster than CPU path only for matrices larger than 3000x3000.
Unfortunately, there is no cuSOLVER function for this operation.

Autograd support for `torch.linalg.eig` will be added in a follow-up PR.

Ref #42666

Pull Request resolved: #52491

Reviewed By: anjali411

Differential Revision: D27563616

Pulled By: mruberry

fbshipit-source-id: b42bb98afcd2ed7625d30bdd71cfc74a7ea57bb5
facebook-github-bot pushed a commit that referenced this issue Apr 15, 2021
…== 1 on CUDA (#54676)

Summary:
This PR adds the functionality to use cusolver potrs as the backend of cholesky_inverse for batch_size == 1 on CUDA.

Cusolver `potri` is **not** used, because

- it only returns the upper or lower triangular matrix as a result. Although the other half is zero, we may still need extra kernels to get the full Hermitian matrix
- it's no faster than cusolver potrs in most cases
- it doesn't have a batched version or 64-bit version

`cholesky_inverse` dispatch heuristics:

- If magma is not installed, or batch_size is 1, dispatch to `cusolverDnXpotrs` (64 bit) and `cusolverDn<T>potrs` (legacy).
- Otherwise, use magma.

See also #42666 #47953

Pull Request resolved: #54676

Reviewed By: ngimel

Differential Revision: D27723805

Pulled By: mruberry

fbshipit-source-id: f65122812c9e56a781aabe4d87ed28b309abf93f
facebook-github-bot pushed a commit that referenced this issue Apr 26, 2021
Summary:
Currently `torch.linalg.matrix_rank` accepts only Python's float for `tol=` argument. The current behavior is not NumPy compatible and this PR adds the possibility to pass Tensor for matrix-wise tolerances.

Ref. #42666

Pull Request resolved: #54157

Reviewed By: ezyang

Differential Revision: D27961548

Pulled By: mruberry

fbshipit-source-id: 47318eefa07a7876e6360dae089e5389b9939489
crcrpar pushed a commit to crcrpar/pytorch that referenced this issue May 7, 2021
Summary:
Currently `torch.linalg.matrix_rank` accepts only Python's float for `tol=` argument. The current behavior is not NumPy compatible and this PR adds the possibility to pass Tensor for matrix-wise tolerances.

Ref. pytorch#42666

Pull Request resolved: pytorch#54157

Reviewed By: ezyang

Differential Revision: D27961548

Pulled By: mruberry

fbshipit-source-id: 47318eefa07a7876e6360dae089e5389b9939489
facebook-github-bot pushed a commit that referenced this issue May 11, 2021
… 11.3 (#57788)

Summary:
This PR enables the usage of cusolver potrf batched as the backend of Cholesky decomposition (`torch.linalg.cholesky` and `torch.linalg.cholesky_ex`) when cuda version is greater than or equal to 11.3.

Benchmark available at https://github.com/xwang233/code-snippet/tree/master/linalg/cholesky-new. It is seen that cusolver potrf batched performs better than magma potrf batched in most cases.

## cholesky dispatch heuristics:

### before:

- batch size == 1: cusolver potrf
- batch size > 1: magma xpotrf batched

### after:

cuda >= 11.3:
- batch size == 1: cusolver potrf
- batch size > 1: cusolver potrf batched

cuda < 11.3 (not changed):
- batch size == 1: cusolver potrf
- batch size > 1: magma xpotrf batched

 ---

See also #42666 #47953 #53104 #53879

Pull Request resolved: #57788

Reviewed By: ngimel

Differential Revision: D28345530

Pulled By: mruberry

fbshipit-source-id: 3022cf73b2750e1953c0e00a9e8b093dfc551f61
facebook-github-bot pushed a commit that referenced this issue May 11, 2021
Summary:
We are ready to move to the new stage for our `torch.linalg` module, which is stable (or STABLE?).

Ref. #42666

Pull Request resolved: #58043

Reviewed By: ngimel

Differential Revision: D28356172

Pulled By: mruberry

fbshipit-source-id: e2c1effa79b9635b2ef0a820a03a0685105042bd
krshrimali pushed a commit to krshrimali/pytorch that referenced this issue May 19, 2021
…== 1 on CUDA (pytorch#54676)

Summary:
This PR adds the functionality to use cusolver potrs as the backend of cholesky_inverse for batch_size == 1 on CUDA.

Cusolver `potri` is **not** used, because

- it only returns the upper or lower triangular matrix as a result. Although the other half is zero, we may still need extra kernels to get the full Hermitian matrix
- it's no faster than cusolver potrs in most cases
- it doesn't have a batched version or 64-bit version

`cholesky_inverse` dispatch heuristics:

- If magma is not installed, or batch_size is 1, dispatch to `cusolverDnXpotrs` (64 bit) and `cusolverDn<T>potrs` (legacy).
- Otherwise, use magma.

See also pytorch#42666 pytorch#47953

Pull Request resolved: pytorch#54676

Reviewed By: ngimel

Differential Revision: D27723805

Pulled By: mruberry

fbshipit-source-id: f65122812c9e56a781aabe4d87ed28b309abf93f
krshrimali pushed a commit to krshrimali/pytorch that referenced this issue May 19, 2021
Summary:
Currently `torch.linalg.matrix_rank` accepts only Python's float for `tol=` argument. The current behavior is not NumPy compatible and this PR adds the possibility to pass Tensor for matrix-wise tolerances.

Ref. pytorch#42666

Pull Request resolved: pytorch#54157

Reviewed By: ezyang

Differential Revision: D27961548

Pulled By: mruberry

fbshipit-source-id: 47318eefa07a7876e6360dae089e5389b9939489
krshrimali pushed a commit to krshrimali/pytorch that referenced this issue May 19, 2021
… 11.3 (pytorch#57788)

Summary:
This PR enables the usage of cusolver potrf batched as the backend of Cholesky decomposition (`torch.linalg.cholesky` and `torch.linalg.cholesky_ex`) when cuda version is greater than or equal to 11.3.

Benchmark available at https://github.com/xwang233/code-snippet/tree/master/linalg/cholesky-new. It is seen that cusolver potrf batched performs better than magma potrf batched in most cases.

## cholesky dispatch heuristics:

### before:

- batch size == 1: cusolver potrf
- batch size > 1: magma xpotrf batched

### after:

cuda >= 11.3:
- batch size == 1: cusolver potrf
- batch size > 1: cusolver potrf batched

cuda < 11.3 (not changed):
- batch size == 1: cusolver potrf
- batch size > 1: magma xpotrf batched

 ---

See also pytorch#42666 pytorch#47953 pytorch#53104 pytorch#53879

Pull Request resolved: pytorch#57788

Reviewed By: ngimel

Differential Revision: D28345530

Pulled By: mruberry

fbshipit-source-id: 3022cf73b2750e1953c0e00a9e8b093dfc551f61
krshrimali pushed a commit to krshrimali/pytorch that referenced this issue May 19, 2021
Summary:
We are ready to move to the new stage for our `torch.linalg` module, which is stable (or STABLE?).

Ref. pytorch#42666

Pull Request resolved: pytorch#58043

Reviewed By: ngimel

Differential Revision: D28356172

Pulled By: mruberry

fbshipit-source-id: e2c1effa79b9635b2ef0a820a03a0685105042bd
@mruberry mruberry added tracker A tracking issue and removed module: numpy Related to numpy support, and also numpy compatibility of our operators labels Jun 17, 2021
@mruberry mruberry changed the title Linear Algebra tracking issue torch.linalg in PyTorch 1.10 tracker Jun 17, 2021
facebook-github-bot pushed a commit that referenced this issue Jul 30, 2021
…da >= 11.3 U1 (#62003)

Summary:
This PR adds the `cusolverDn<T>SyevjBatched` fuction to the backend of `torch.linalg.eigh` (eigenvalue solver for Hermitian matrix). Using the heuristics from #53040 (comment) and my local tests, the `syevj_batched` path is only used when `batch_size > 1` and `matrix_size <= 32`. This would give us huge performance boost in those cases.

Since there were known numerical issues on cusolver `syevj_batched` before cuda 11.3 update 1, this PR only enables the dispatch when cuda version is no less than that.

See also #42666 #47953 #53040

Pull Request resolved: #62003

Reviewed By: heitorschueroff

Differential Revision: D30006316

Pulled By: ngimel

fbshipit-source-id: 3a65c5fc9adbbe776524f8957df5442c3d3aeb8e
@mruberry
Copy link
Collaborator

Closing in favor of individual issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul tracker A tracking issue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

9 participants