-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: special.ndtr: fix error in implementation #20695
base: main
Are you sure you want to change the base?
Conversation
See the original version from cephes
Thanks for the PR @ThibaultDECO! It looks this goes back to when scipy/Lib/special/cephes/ndtr.c Lines 394 to 398 in 52c64a9
scipy/Lib/special/cephes/const.c Line 84 in 52c64a9
which is the same as SciPy started with the cephes release from June, 1992, but the one on The cephes book states this: but we see in the code in SciPy that if Would you like to add a test case here that fails on |
Flagging loss of precision : 1 - 1e-30 != 1
indentation change
indentation
Should raise warning when loss of precision for normal cdf (in which case the survival function should be used)
I couldn't add a test because the error is later fixed by erf itself which calls erfc when x > 1. But the fix of this PR avoids calling too many functions unnecessarily. I have also added a warning flagging when ndtr loses precision and gives mathematically incorrect results, such as ndtr(25) returning 1.0, because the results of ndtr should be in (0,1), not [0,1], except in the special case of calling ndtr(np.inf). |
preferred style Co-authored-by: Lucas Colley <lucas.colley8@gmail.com>
Unnecessary import Co-authored-by: Lucas Colley <lucas.colley8@gmail.com>
Comment on why it should raise a warning
Import pytest
adding context for error handling
t = 1.0 - y; | ||
if (t == 1.0 && y != 0) { | ||
set_error("ndtr", SF_ERROR_LOSS, NULL); | ||
} | ||
return t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nature of floating point numbers is that they are not evenly spaced. They fan out logarithmically in such a way that they are very dense near zero, and thin out towards
I think an example will help clear things up. Take for instance z = 20
, then y = 0.5 * erfc(z)
will be 2.6979328058039506e-176
. But t = 1.0 - y
will be 1.0, because floating point numbers are not dense enough near 1.0 to represent a number so close to 1.0.
Please remove this change. Also, in general, one typically should not expect exact equality to hold when working with floating point numbers just because two expressions are mathematically identical (e.g. it's possible for x + (y + z)
to be different from (x + y) + z
, so one typically avoids exact comparisons.
If you're interested in learning more, I'd suggest checking out the wikipedia pages for Machine Epsilon and unit in the last place. What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg is another great resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank @steppi, I know. But :
-
It is not an issue that
ndtr(1e-18) == 0.5
, even though it doesndtr(1e-18)
->0.5 + erf(1e-18)
->0.5 + 1.1283791670955125e-18
->0.5
Indeed,sc.ndtri(sc.ndtr(1e-18)) == 0.0
which is ok. -
However, it IS an issue that
ndtr(25) == 1.0
. The difference between the two examples is that the codomain of a CDF is (0,1), not [0,1]. Indeed,sc.ndtri(sc.ndtr(25)) == Inf
which is NOT ok.
ndtr(1e-18) is imprecise because of floating-point arithmetic but 0.5 falls within the codomain of a cdf, so it is not an issue. But ndtr(25) raises a problem beyond simple accuracy of floating-point arithmetic : it gives values outside of the expected codomain of the function.
Actually, the current version of erf and of erfc DOES raise a warning in similar situations and start warning at around abs(x) == 27 when the result is outside the codomain. This change simply extends this to ndtr. You can test :
import pytest
import scipy.special as sc
with sc.errstate(all='warn'):
with pytest.warns(sc.SpecialFunctionWarning):
sc.erfc(27)
with sc.errstate(all='warn'):
with pytest.warns(sc.SpecialFunctionWarning):
sc.erfc(-27)
with sc.errstate(all='warn'):
with pytest.warns(sc.SpecialFunctionWarning):
sc.erf(27)
with sc.errstate(all='warn'):
with pytest.warns(sc.SpecialFunctionWarning):
sc.erf(-27)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand what you're saying, but from a floating point perspective, I think 1.0 is the correct value for ndtr(25)
since it is the closest double
to the actual real number answer. 1.0
isn't in the image of ndtr
when the domain is the real numbers, but it is when the domain is the set of double precision IEEE-754 floating point numbers, which we can think of as their own algebraic structure. The ndtr
in SciPy is an approximation to the "ideal" correctly rounded ndtr
with domain the IEEE-754 doubles, and I think the SF_ERROR_LOSS
is intended for situations where the relative error between the correctly rounded result and the computed result is undesirably large.
I don't think of 1.0
occupying a privileged place in floating point arithmetic compared to any other value which is not infinite, zero, or nan
, so if we need to warn for loss of precision even when producing a correctly rounded result here, it seems like it would also be necessary to produce such a warning for any floating point computation which produces an inexact result, even if correctly rounded. The warnings that occur for erf
and erfc
are due to intermediate underflow in calculations, not due to an inexact result being produced.
Reference issue
It fixes an error in the computation of the normal cumulative distribution function : see the original version from cephes (here or here).
What does this implement/fix?
It fixes an error in the computation of the normal cumulative distribution function : see the original version from cephes (here or here).
Additional information