Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a note on the stability of linalg functions. #88313

Closed
wants to merge 6 commits into from

Conversation

lezcano
Copy link
Collaborator

@lezcano lezcano commented Nov 2, 2022

Stack from ghstack (oldest at bottom):

This was long-due, as it keeps comming up in issues.

Fixes #85950
Fixes #59720
Fixes #59782

cc @svekars @carljparker @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @lezcano

This was long-due, as it keeps comming up in issues.

Fixes #85950
Fixes #59720
Fixes #59782

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 2, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88313

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit b5723ec:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

lezcano added a commit that referenced this pull request Nov 2, 2022
This was long-due, as it keeps comming up in issues.

Fixes #85950
Fixes #59720
Fixes #59782

ghstack-source-id: b5e1751d9dbad0816b69bba78cd311d54b739d93
Pull Request resolved: #88313
@lezcano lezcano added module: docs Related to our documentation, both in docs/ and docblocks module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul release notes: linalg_frontend release notes category labels Nov 2, 2022
This was long-due, as it keeps comming up in issues.

Fixes #85950
Fixes #59720
Fixes #59782

cc svekars carljparker jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

[ghstack-poisoned]
lezcano added a commit that referenced this pull request Nov 2, 2022
This was long-due, as it keeps comming up in issues.

Fixes #85950
Fixes #59720
Fixes #59782

ghstack-source-id: 3d8be9c09afb5cf3d10ce32cdf16c94c73dc862e
Pull Request resolved: #88313
This was long-due, as it keeps comming up in issues.

Fixes #85950
Fixes #59720
Fixes #59782

cc svekars carljparker jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

[ghstack-poisoned]
lezcano added a commit that referenced this pull request Nov 2, 2022
This was long-due, as it keeps comming up in issues.

Fixes #85950
Fixes #59720
Fixes #59782

ghstack-source-id: cd15f46532762f923f3cdd4afac6250897609f74
Pull Request resolved: #88313
Linear algebra (``torch.linalg``)
---------------------------------

Functions within ``torch.linalg`` are particularly problematic when dealing with extremal values.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Particularly problematic" is a vague phrase. In general, the handling of extremal values is not necessarily a (numerical) "stability" issue, either.

it is left to the user to check that the inputs they provide to these functions do not contain any
special values.

A similar issue happens with extremal values. Some ``linalg`` functions are not defined for all inputs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous paragraph already started the discussion of extremal values

@@ -51,6 +51,44 @@ datatype. E.g.:
a.norm() # produces tensor(inf)
a.double().norm() # produces tensor(1.4142e+20, dtype=torch.float64), representable in fp32

.. _Linear Algebra Stability:

Linear algebra (``torch.linalg``)
Copy link
Collaborator

@mruberry mruberry Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to workshop this a little over Slack, @lezcano?

This was long-due, as it keeps comming up in issues.

Fixes #85950
Fixes #59720
Fixes #59782

cc svekars carljparker jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

[ghstack-poisoned]
lezcano added a commit that referenced this pull request Nov 7, 2022
This was long-due, as it keeps comming up in issues.

Fixes #85950
Fixes #59720
Fixes #59782

ghstack-source-id: 39b71d67447e570cd0ed5050c4a9048d349c3d4f
Pull Request resolved: #88313
when the inputs have non-finite values like ``inf`` or ``NaN``. As such, neither does PyTorch.
The operations may return a tensor with non-finite values, or raise an exception, or even segfault.

Consider using :func:`torch.isfinite` before calling these functions if your inputs may contain these values.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems good, might be nice to elaborate slightly on what the user would do with the results of this call


Consider using :func:`torch.isfinite` before calling these functions if your inputs may contain these values.

Extremal values
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Ill-conditioned matrices"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefere extremal values because, as you've seen, different operations may have different kinds of extremal values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, right, I will have separate headings

If provided with ill-conditioned inputs, the results of these functions may vary when using the same input on different devices
or when using different backends via the keyword ``driver``.

Similarly, spectral operations like ``svd``, ``eig``, and ``eigh`` may return incorrect results (and their gradients may be infinite)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a separate heading: "Matrices with close eigenvalues"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a suggestion for what the user can do in this case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end, rather than making these their own heading, I made sure that the first words in the sentence describe the important set of operations that are described in the paragraph. Making these into their own heading did not look great when rendered.

being, non-invertible (for example, a matrix that a has a very small singular value), then these algorithms may silently return
incorrect results. These matrices are said to be `ill-conditioned <https://nhigham.com/2020/03/19/what-is-a-condition-number/>`_.
If provided with ill-conditioned inputs, the results of these functions may vary when using the same input on different devices
or when using different backends via the keyword ``driver``.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a suggestion for how users can check for ill-conditioning and what to do if they encounter it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's a good general suggestion for what to do, as that may have to be tacked on a per-case basis (discard that input, regularise it, analyse the model to figure out whether getting those values makes sense at all...)

when their inputs have eigenvalues that are too close to each other as the approximate algorithms used to compute these decompositions
struggle to converge for these inputs.

Running the computation in ``float64`` (as NumPy does by default) often helps, but it does not solve the issue in all cases.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New section: "Float32 vs. Float64 precision"?

when their inputs have eigenvalues that are too close to each other as the approximate algorithms used to compute these decompositions
struggle to converge for these inputs.

Running the computation in ``float64`` (as NumPy does by default) often helps, but it does not solve the issue in all cases.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be more precise than "helps." What this is trying to say is that matrices that are effectively "ill-conditioned" w.r.t. an operation in float32 may not be in double, and same for close eigenvalues, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would we recommend users trying running in double to address an issue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every time, unless the input matrix is singular in exact precision (e.g. torch.ones(3,3)).

struggle to converge for these inputs.

Running the computation in ``float64`` (as NumPy does by default) often helps, but it does not solve the issue in all cases.
Otherwise, analyzing the spectrum of the inputs via :func:`torch.linalg.svdvals` or their condition number via :func:`torch.linalg.cond`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This recommendation is too vague -- what would users do with the results of these operations?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, I don't think there's a good general suggestion we can give here...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not worth mentioning, then

This was long-due, as it keeps comming up in issues.

Fixes #85950
Fixes #59720
Fixes #59782

cc svekars carljparker jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

[ghstack-poisoned]
lezcano added a commit that referenced this pull request Nov 7, 2022
This was long-due, as it keeps comming up in issues.

Fixes #85950
Fixes #59720
Fixes #59782

ghstack-source-id: 8d2b5fcab47cc70bd5da3001639659e62af867d2
Pull Request resolved: #88313
This was long-due, as it keeps comming up in issues.

Fixes #85950
Fixes #59720
Fixes #59782

cc svekars carljparker jianyuh nikitaved pearu mruberry walterddr IvanYashchuk xwang233 Lezcano

[ghstack-poisoned]
lezcano added a commit that referenced this pull request Nov 7, 2022
This was long-due, as it keeps comming up in issues.

Fixes #85950
Fixes #59720
Fixes #59782

ghstack-source-id: 269bf9ebfcd86e8c9ac6d1a2544f5115b2aa9403
Pull Request resolved: #88313
@lezcano
Copy link
Collaborator Author

lezcano commented Nov 7, 2022

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 7, 2022
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@IvanYashchuk IvanYashchuk removed their request for review November 8, 2022 08:34
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
This was long-due, as it keeps comming up in issues.

Fixes pytorch#85950
Fixes pytorch#59720
Fixes pytorch#59782

Pull Request resolved: pytorch#88313
Approved by: https://github.com/soumith, https://github.com/mruberry
@facebook-github-bot facebook-github-bot deleted the gh/Lezcano/158/head branch June 8, 2023 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged module: docs Related to our documentation, both in docs/ and docblocks module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul open source release notes: linalg_frontend release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants