-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a note on the stability of linalg functions. #88313
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88313
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 1 PendingAs of commit b5723ec: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Linear algebra (``torch.linalg``) | ||
--------------------------------- | ||
|
||
Functions within ``torch.linalg`` are particularly problematic when dealing with extremal values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Particularly problematic" is a vague phrase. In general, the handling of extremal values is not necessarily a (numerical) "stability" issue, either.
it is left to the user to check that the inputs they provide to these functions do not contain any | ||
special values. | ||
|
||
A similar issue happens with extremal values. Some ``linalg`` functions are not defined for all inputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous paragraph already started the discussion of extremal values
@@ -51,6 +51,44 @@ datatype. E.g.: | |||
a.norm() # produces tensor(inf) | |||
a.double().norm() # produces tensor(1.4142e+20, dtype=torch.float64), representable in fp32 | |||
|
|||
.. _Linear Algebra Stability: | |||
|
|||
Linear algebra (``torch.linalg``) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to workshop this a little over Slack, @lezcano?
when the inputs have non-finite values like ``inf`` or ``NaN``. As such, neither does PyTorch. | ||
The operations may return a tensor with non-finite values, or raise an exception, or even segfault. | ||
|
||
Consider using :func:`torch.isfinite` before calling these functions if your inputs may contain these values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems good, might be nice to elaborate slightly on what the user would do with the results of this call
|
||
Consider using :func:`torch.isfinite` before calling these functions if your inputs may contain these values. | ||
|
||
Extremal values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Ill-conditioned matrices"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefere extremal values because, as you've seen, different operations may have different kinds of extremal values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, right, I will have separate headings
If provided with ill-conditioned inputs, the results of these functions may vary when using the same input on different devices | ||
or when using different backends via the keyword ``driver``. | ||
|
||
Similarly, spectral operations like ``svd``, ``eig``, and ``eigh`` may return incorrect results (and their gradients may be infinite) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about a separate heading: "Matrices with close eigenvalues"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a suggestion for what the user can do in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end, rather than making these their own heading, I made sure that the first words in the sentence describe the important set of operations that are described in the paragraph. Making these into their own heading did not look great when rendered.
being, non-invertible (for example, a matrix that a has a very small singular value), then these algorithms may silently return | ||
incorrect results. These matrices are said to be `ill-conditioned <https://nhigham.com/2020/03/19/what-is-a-condition-number/>`_. | ||
If provided with ill-conditioned inputs, the results of these functions may vary when using the same input on different devices | ||
or when using different backends via the keyword ``driver``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a suggestion for how users can check for ill-conditioning and what to do if they encounter it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's a good general suggestion for what to do, as that may have to be tacked on a per-case basis (discard that input, regularise it, analyse the model to figure out whether getting those values makes sense at all...)
when their inputs have eigenvalues that are too close to each other as the approximate algorithms used to compute these decompositions | ||
struggle to converge for these inputs. | ||
|
||
Running the computation in ``float64`` (as NumPy does by default) often helps, but it does not solve the issue in all cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New section: "Float32 vs. Float64 precision"?
when their inputs have eigenvalues that are too close to each other as the approximate algorithms used to compute these decompositions | ||
struggle to converge for these inputs. | ||
|
||
Running the computation in ``float64`` (as NumPy does by default) often helps, but it does not solve the issue in all cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be more precise than "helps." What this is trying to say is that matrices that are effectively "ill-conditioned" w.r.t. an operation in float32 may not be in double, and same for close eigenvalues, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would we recommend users trying running in double to address an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
every time, unless the input matrix is singular in exact precision (e.g. torch.ones(3,3)
).
struggle to converge for these inputs. | ||
|
||
Running the computation in ``float64`` (as NumPy does by default) often helps, but it does not solve the issue in all cases. | ||
Otherwise, analyzing the spectrum of the inputs via :func:`torch.linalg.svdvals` or their condition number via :func:`torch.linalg.cond` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This recommendation is too vague -- what would users do with the results of these operations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, I don't think there's a good general suggestion we can give here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not worth mentioning, then
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
This was long-due, as it keeps comming up in issues. Fixes pytorch#85950 Fixes pytorch#59720 Fixes pytorch#59782 Pull Request resolved: pytorch#88313 Approved by: https://github.com/soumith, https://github.com/mruberry
Stack from ghstack (oldest at bottom):
This was long-due, as it keeps comming up in issues.
Fixes #85950
Fixes #59720
Fixes #59782
cc @svekars @carljparker @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @lezcano