-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: scipy.special.log_softmax
could be 2**126
to 2**1022
times more accurate
#19521
Comments
Thanks @JasonGross, that's pretty clever to take advantage of the fact that At the moment your implementation only handles 1d arrays and lacks the axis keyword argument. It also doesn't have handling of non-finite values. Check the current signature and implementation of log_softmax. If you get your implementation to parity, feel free to submit a PR. |
By taking advantage of the fact that `x - x_max` is going to be 0 at the maximum and that `exp(0)` is 1, we can use `log1p` instead of `log` to increase the accuracy of `log_softmax` at the maximum index by a factor of about `2**126` (for float32) or about `2**1022` (for float64). Fixes scipy#19521
In most situations the difference doesn't matter. It came up for me (in pytorch) when I was overtraining a very small transformer and the loss capped out at lower confidence than it needed to. I'll submit a PR momentarily |
By taking advantage of the fact that `x - x_max` is going to be 0 at the maximum and that `exp(0)` is 1, we can use `log1p` instead of `log` to increase the accuracy of `log_softmax` at the maximum index by a factor of about `2**126` (for float32) or about `2**1022` (for float64). Fixes scipy#19521
By taking advantage of the fact that `x - x_max` is going to be 0 at the maximum and that `exp(0)` is 1, we can use `log1p` instead of `log` to increase the accuracy of `log_softmax` at the maximum index by a factor of about `2**126` (for float32) or about `2**1022` (for float64). Fixes scipy#19521
By taking advantage of the fact that `x - x_max` is going to be 0 at the maximum and that `exp(0)` is 1, we can use `log1p` instead of `log` to increase the accuracy of `log_softmax` at the maximum index by a factor of about `2**126` (for float32) or about `2**1022` (for float64). Fixes scipy#19521
By taking advantage of the fact that `x - x_max` is going to be 0 at the maximum and that `exp(0)` is 1, we can use `log1p` instead of `log` to increase the accuracy of `log_softmax` at the maximum index by a factor of about `2**126` (for float32) or about `2**1022` (for float64). Fixes scipy#19521
Is your feature request related to a problem? Please describe.
Consider
As I understand it, scipy implements
log_softmax(x)
asx - np.max(x) - np.log(np.sum(np.exp(x - np.max(x))))
. However, when the largest value is much larger than the rest of the values (about 16 larger for float32, about 36 larger for float64),log_softmax
returns0
at the maximum value, when it could give a much more precise answer.This came up when a transformer I was training with cross-entropy loss on a classification task had loss dominated by
np.finfo(np.float32).eps
.Describe the solution you'd like.
Consider the following code, demonstrating a more accurate
log_softmax
:which outputs the numbers in the title of this issue:
For <class 'numpy.float32'>, diff in supported input accuracy is 2**-(102 - 15) = 2**-87; diff in output accuracy is np.log2(1.1920927533992653e-07) - np.log2(1.401298464324817e-45) = 126.0
For <class 'numpy.float64'>, diff in supported input accuracy is 2**-(744 - 35) = 2**-709; diff in output accuracy is np.log2(2.2204460492503128e-16) - np.log2(5e-324) = 1022.0
Describe alternatives you've considered.
No response
Additional context (e.g. screenshots, GIFs)
I originally posted this as a StackOverflow question.
Companion PyTorch issue: pytorch/pytorch#113708
Companion TensorFlow issue: tensorflow/tensorflow#62400
The text was updated successfully, but these errors were encountered: