Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: special.ndtr: fix error in implementation #20695

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

ThibaultDECO
Copy link

Reference issue

It fixes an error in the computation of the normal cumulative distribution function : see the original version from cephes (here or here).

What does this implement/fix?

It fixes an error in the computation of the normal cumulative distribution function : see the original version from cephes (here or here).

Additional information

See the original version from cephes
@github-actions github-actions bot added scipy.special C/C++ Items related to the internal C/C++ code base labels May 11, 2024
@steppi
Copy link
Contributor

steppi commented May 11, 2024

Thanks for the PR @ThibaultDECO! It looks this goes back to when cephes was first added to SciPy in 2001.

x = a * SQRTH;
z = fabs(x);
if( z < SQRTH )
y = 0.5 + 0.5 * erf(x);

SQRTH is defined in cephes as

double SQRTH = 7.07106781186547524401E-1; /* sqrt(2)/2 */

which is the same as M_SQRT1_2.

SciPy started with the cephes release from June, 1992, but the one on netlib is from November 2000, and the mistake must have been fixed in the interim.

The cephes book states this:

image

but we see in the code in SciPy that if $x$ is the input, then $z = \left|x\frac{\sqrt{2}}{2}\right|$, so we should in fact be comparing if (z < 1) not if (z < M_SQRT1_2).

Would you like to add a test case here that fails on main but passes after this fix is made?

@lucascolley lucascolley added the defect A clear bug or issue that prevents SciPy from being installed or used as expected label May 11, 2024
@lucascolley lucascolley changed the title Update ndtr.h - fixing error for the normal cumulative distribution function BUG: special.ndtr: fix error in implementation May 11, 2024
Flagging loss of precision :
1 - 1e-30 != 1
indentation change
Should raise warning when loss of precision for normal cdf (in which case the survival function should be used)
scipy/special/tests/test_ndtr.py Outdated Show resolved Hide resolved
@ThibaultDECO
Copy link
Author

I couldn't add a test because the error is later fixed by erf itself which calls erfc when x > 1. But the fix of this PR avoids calling too many functions unnecessarily.

I have also added a warning flagging when ndtr loses precision and gives mathematically incorrect results, such as ndtr(25) returning 1.0, because the results of ndtr should be in (0,1), not [0,1], except in the special case of calling ndtr(np.inf).

preferred style

Co-authored-by: Lucas Colley <lucas.colley8@gmail.com>
@lucascolley lucascolley dismissed their stale review May 11, 2024 14:39

requested change made

ThibaultDECO and others added 4 commits May 11, 2024 16:42
Unnecessary import

Co-authored-by: Lucas Colley <lucas.colley8@gmail.com>
Comment on why it should raise a warning
Comment on lines +267 to +271
t = 1.0 - y;
if (t == 1.0 && y != 0) {
set_error("ndtr", SF_ERROR_LOSS, NULL);
}
return t;
Copy link
Contributor

@steppi steppi May 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nature of floating point numbers is that they are not evenly spaced. They fan out logarithmically in such a way that they are very dense near zero, and thin out towards $\pm \infty$.

I think an example will help clear things up. Take for instance z = 20, then y = 0.5 * erfc(z) will be 2.6979328058039506e-176. But t = 1.0 - y will be 1.0, because floating point numbers are not dense enough near 1.0 to represent a number so close to 1.0.

Please remove this change. Also, in general, one typically should not expect exact equality to hold when working with floating point numbers just because two expressions are mathematically identical (e.g. it's possible for x + (y + z) to be different from (x + y) + z, so one typically avoids exact comparisons.

If you're interested in learning more, I'd suggest checking out the wikipedia pages for Machine Epsilon and unit in the last place. What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg is another great resource.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank @steppi, I know. But :

  1. It is not an issue that ndtr(1e-18) == 0.5, even though it does ndtr(1e-18) -> 0.5 + erf(1e-18) -> 0.5 + 1.1283791670955125e-18 -> 0.5 Indeed, sc.ndtri(sc.ndtr(1e-18)) == 0.0 which is ok.

  2. However, it IS an issue that ndtr(25) == 1.0. The difference between the two examples is that the codomain of a CDF is (0,1), not [0,1]. Indeed, sc.ndtri(sc.ndtr(25)) == Inf which is NOT ok.

ndtr(1e-18) is imprecise because of floating-point arithmetic but 0.5 falls within the codomain of a cdf, so it is not an issue. But ndtr(25) raises a problem beyond simple accuracy of floating-point arithmetic : it gives values outside of the expected codomain of the function.

Actually, the current version of erf and of erfc DOES raise a warning in similar situations and start warning at around abs(x) == 27 when the result is outside the codomain. This change simply extends this to ndtr. You can test :

import pytest
import scipy.special as sc
        
with sc.errstate(all='warn'):
    with pytest.warns(sc.SpecialFunctionWarning):
        sc.erfc(27)
        
with sc.errstate(all='warn'):
    with pytest.warns(sc.SpecialFunctionWarning):
        sc.erfc(-27)

with sc.errstate(all='warn'):
    with pytest.warns(sc.SpecialFunctionWarning):
        sc.erf(27)
        
with sc.errstate(all='warn'):
    with pytest.warns(sc.SpecialFunctionWarning):
        sc.erf(-27)

Copy link
Contributor

@steppi steppi May 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what you're saying, but from a floating point perspective, I think 1.0 is the correct value for ndtr(25) since it is the closest double to the actual real number answer. 1.0 isn't in the image of ndtr when the domain is the real numbers, but it is when the domain is the set of double precision IEEE-754 floating point numbers, which we can think of as their own algebraic structure. The ndtr in SciPy is an approximation to the "ideal" correctly rounded ndtr with domain the IEEE-754 doubles, and I think the SF_ERROR_LOSS is intended for situations where the relative error between the correctly rounded result and the computed result is undesirably large.

I don't think of 1.0 occupying a privileged place in floating point arithmetic compared to any other value which is not infinite, zero, or nan, so if we need to warn for loss of precision even when producing a correctly rounded result here, it seems like it would also be necessary to produce such a warning for any floating point computation which produces an inexact result, even if correctly rounded. The warnings that occur for erf and erfc are due to intermediate underflow in calculations, not due to an inexact result being produced.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C/C++ Items related to the internal C/C++ code base defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.special
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants