-
Notifications
You must be signed in to change notification settings - Fork 22.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ONNX] Fix numerical errors in softmax when dim is not last dimension #37326
Conversation
💊 Build failures summary and remediationsAs of commit 2a9c810 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 5 times. |
63c4108
to
ea8f934
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @p12tic for the fix! Overall looks good, please see inline comments for some additional updates.
@p12tic could you take a look at the comments and update your PR? Thanks! |
@BowenBao Yes, I will do so. This week my priorities were elsewhere, thus the delay. |
ea8f934
to
2a9c810
Compare
@BowenBao I've applied your suggestions to the PR. Thanks for suggesting them to the PR instead of doing it yourself, I've got to learn what max normalization in softmax is :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@p12tic Thanks for the fix! LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@houseroad has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@houseroad merged this pull request in d16c823. |
…pytorch#37326) Summary: Fixes pytorch#34585. This PR improves the workaround for the problem of different semantics between ONNX softmax and Pytorch softmax. In Pytorch the `dim` parameter specifies over which dimension normalize the values. ONNX on the other hand always coerces the input into a 2D tensor and the `axis` parameter specifies which dimensions represent rows and columns of the resulting tensor. As a result, only when we are normalizing the last dimension (`dim == ndim - 1`) semantics are the same. Previously this was handled by recognizing the `dim == ndim - 1` case and using `softmax` for that. All other cases used a fallback path of explicit invocations of exp, reducesum and div operators to compute the result. Unfortunately, this results in numeric errors when input values are large: the result of exp will produce infinity on both numerator and denumerator and the division of that will result in NaN. This can be improved by transposing the input tensor so that we can reuse ONNX softmax. Similar approach has been applied to `logsoftmax` function in pytorch#30433. Pull Request resolved: pytorch#37326 Reviewed By: hl475 Differential Revision: D21389712 Pulled By: houseroad fbshipit-source-id: 554fd1b98231a28984c30c7e7abd3c0643386ff7
…pytorch#37326) Summary: Fixes pytorch#34585. This PR improves the workaround for the problem of different semantics between ONNX softmax and Pytorch softmax. In Pytorch the `dim` parameter specifies over which dimension normalize the values. ONNX on the other hand always coerces the input into a 2D tensor and the `axis` parameter specifies which dimensions represent rows and columns of the resulting tensor. As a result, only when we are normalizing the last dimension (`dim == ndim - 1`) semantics are the same. Previously this was handled by recognizing the `dim == ndim - 1` case and using `softmax` for that. All other cases used a fallback path of explicit invocations of exp, reducesum and div operators to compute the result. Unfortunately, this results in numeric errors when input values are large: the result of exp will produce infinity on both numerator and denumerator and the division of that will result in NaN. This can be improved by transposing the input tensor so that we can reuse ONNX softmax. Similar approach has been applied to `logsoftmax` function in pytorch#30433. Pull Request resolved: pytorch#37326 Reviewed By: hl475 Differential Revision: D21389712 Pulled By: houseroad fbshipit-source-id: 554fd1b98231a28984c30c7e7abd3c0643386ff7
Fixes #34585.
This PR improves the workaround for the problem of different semantics between ONNX softmax and Pytorch softmax.
In Pytorch the
dim
parameter specifies over which dimension normalize the values. ONNX on the other hand always coerces the input into a 2D tensor and theaxis
parameter specifies which dimensions represent rows and columns of the resulting tensor. As a result, only when we are normalizing the last dimension (dim == ndim - 1
) semantics are the same.Previously this was handled by recognizing the
dim == ndim - 1
case and usingsoftmax
for that. All other cases used a fallback path of explicit invocations of exp, reducesum and div operators to compute the result. Unfortunately, this results in numeric errors when input values are large: the result of exp will produce infinity on both numerator and denumerator and the division of that will result in NaN.This can be improved by transposing the input tensor so that we can reuse ONNX softmax.
Similar approach has been applied to
logsoftmax
function in #30433.