BUG: special.ndtr: fix error in implementation #20695

ThibaultDECO · 2024-05-11T00:20:15Z

Reference issue

It fixes an error in the computation of the normal cumulative distribution function : see the original version from cephes (here or here).

What does this implement/fix?

It fixes an error in the computation of the normal cumulative distribution function : see the original version from cephes (here or here).

Additional information

See the original version from cephes

steppi · 2024-05-11T01:25:57Z

Thanks for the PR @ThibaultDECO! It looks this goes back to when cephes was first added to SciPy in 2001.

scipy/Lib/special/cephes/ndtr.c

Lines 394 to 398 in 52c64a9

    
           x = a * SQRTH; 
        
           z = fabs(x); 
        
           if( z < SQRTH ) 
        
           	y = 0.5 + 0.5 * erf(x);

SQRTH is defined in cephes as

scipy/Lib/special/cephes/const.c

Line 84 in 52c64a9

double SQRTH = 7.07106781186547524401E-1; /* sqrt(2)/2 */

which is the same as M_SQRT1_2.

SciPy started with the cephes release from June, 1992, but the one on netlib is from November 2000, and the mistake must have been fixed in the interim.

The cephes book states this:

but we see in the code in SciPy that if $x$ is the input, then $z = \left|x\frac{\sqrt{2}}{2}\right|$, so we should in fact be comparing if (z < 1) not if (z < M_SQRT1_2).

Would you like to add a test case here that fails on main but passes after this fix is made?

Flagging loss of precision : 1 - 1e-30 != 1

indentation change

indentation

Should raise warning when loss of precision for normal cdf (in which case the survival function should be used)

scipy/special/tests/test_ndtr.py

ThibaultDECO · 2024-05-11T14:12:46Z

I couldn't add a test because the error is later fixed by erf itself which calls erfc when x > 1. But the fix of this PR avoids calling too many functions unnecessarily.

I have also added a warning flagging when ndtr loses precision and gives mathematically incorrect results, such as ndtr(25) returning 1.0, because the results of ndtr should be in (0,1), not [0,1], except in the special case of calling ndtr(np.inf).

preferred style Co-authored-by: Lucas Colley <lucas.colley8@gmail.com>

scipy/special/tests/test_ndtr.py

requested change made

Unnecessary import Co-authored-by: Lucas Colley <lucas.colley8@gmail.com>

Comment on why it should raise a warning

Import pytest

adding context for error handling

steppi · 2024-05-12T01:27:10Z

scipy/special/special/cephes/ndtr.h

+                t = 1.0 - y;
+		if (t == 1.0 && y != 0) {
+		    set_error("ndtr", SF_ERROR_LOSS, NULL);
+		}
+		return t;


The nature of floating point numbers is that they are not evenly spaced. They fan out logarithmically in such a way that they are very dense near zero, and thin out towards $\pm \infty$.

I think an example will help clear things up. Take for instance z = 20, then y = 0.5 * erfc(z) will be 2.6979328058039506e-176. But t = 1.0 - y will be 1.0, because floating point numbers are not dense enough near 1.0 to represent a number so close to 1.0.

Please remove this change. Also, in general, one typically should not expect exact equality to hold when working with floating point numbers just because two expressions are mathematically identical (e.g. it's possible for x + (y + z) to be different from (x + y) + z, so one typically avoids exact comparisons.

If you're interested in learning more, I'd suggest checking out the wikipedia pages for Machine Epsilon and unit in the last place. What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg is another great resource.

Thank @steppi, I know. But :

It is not an issue that ndtr(1e-18) == 0.5, even though it does ndtr(1e-18) -> 0.5 + erf(1e-18) -> 0.5 + 1.1283791670955125e-18 -> 0.5 Indeed, sc.ndtri(sc.ndtr(1e-18)) == 0.0 which is ok.

However, it IS an issue that ndtr(25) == 1.0. The difference between the two examples is that the codomain of a CDF is (0,1), not [0,1]. Indeed, sc.ndtri(sc.ndtr(25)) == Inf which is NOT ok.

ndtr(1e-18) is imprecise because of floating-point arithmetic but 0.5 falls within the codomain of a cdf, so it is not an issue. But ndtr(25) raises a problem beyond simple accuracy of floating-point arithmetic : it gives values outside of the expected codomain of the function.

Actually, the current version of erf and of erfc DOES raise a warning in similar situations and start warning at around abs(x) == 27 when the result is outside the codomain. This change simply extends this to ndtr. You can test :

import pytest import scipy.special as sc with sc.errstate(all='warn'): with pytest.warns(sc.SpecialFunctionWarning): sc.erfc(27) with sc.errstate(all='warn'): with pytest.warns(sc.SpecialFunctionWarning): sc.erfc(-27) with sc.errstate(all='warn'): with pytest.warns(sc.SpecialFunctionWarning): sc.erf(27) with sc.errstate(all='warn'): with pytest.warns(sc.SpecialFunctionWarning): sc.erf(-27)

I understand what you're saying, but from a floating point perspective, I think 1.0 is the correct value for ndtr(25) since it is the closest double to the actual real number answer. 1.0 isn't in the image of ndtr when the domain is the real numbers, but it is when the domain is the set of double precision IEEE-754 floating point numbers, which we can think of as their own algebraic structure. The ndtr in SciPy is an approximation to the "ideal" correctly rounded ndtr with domain the IEEE-754 doubles, and I think the SF_ERROR_LOSS is intended for situations where the relative error between the correctly rounded result and the computed result is undesirably large.

I don't think of 1.0 occupying a privileged place in floating point arithmetic compared to any other value which is not infinite, zero, or nan, so if we need to warn for loss of precision even when producing a correctly rounded result here, it seems like it would also be necessary to produce such a warning for any floating point computation which produces an inexact result, even if correctly rounded. The warnings that occur for erf and erfc are due to intermediate underflow in calculations, not due to an inexact result being produced.

Update ndtr.h - fixing error

69c5c8e

See the original version from cephes

ThibaultDECO requested review from person142 and steppi as code owners May 11, 2024 00:20

github-actions bot added scipy.special C/C++ Items related to the internal C/C++ code base labels May 11, 2024

lucascolley added the defect A clear bug or issue that prevents SciPy from being installed or used as expected label May 11, 2024

lucascolley changed the title ~~Update ndtr.h - fixing error for the normal cumulative distribution function~~ BUG: special.ndtr: fix error in implementation May 11, 2024

ThibaultDECO added 4 commits May 11, 2024 14:34

Update ndtr.h - flagging loss of precision

ec60110

Flagging loss of precision : 1 - 1e-30 != 1

Update ndtr.h indentation

456b7dc

indentation change

Update ndtr.h indentation

35542fc

indentation

Update test_ndtr.py assert if warning raised when loss of precision

2b6e41f

Should raise warning when loss of precision for normal cdf (in which case the survival function should be used)

lucascolley previously requested changes May 11, 2024

View reviewed changes

scipy/special/tests/test_ndtr.py Outdated Show resolved Hide resolved

Update scipy/special/tests/test_ndtr.py preferred style

01c8a0e

preferred style Co-authored-by: Lucas Colley <lucas.colley8@gmail.com>

lucascolley reviewed May 11, 2024

View reviewed changes

scipy/special/tests/test_ndtr.py Outdated Show resolved Hide resolved

ThibaultDECO and others added 4 commits May 11, 2024 16:42

Update scipy/special/tests/test_ndtr.py

dad3cae

Unnecessary import Co-authored-by: Lucas Colley <lucas.colley8@gmail.com>

Update test_ndtr.py Comment on why warning

89c42dc

Comment on why it should raise a warning

Update test_ndtr.py import pytest

b23ddb6

Import pytest

Update test_ndtr.py adding context for error handling

079ab41

adding context for error handling

steppi reviewed May 12, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: special.ndtr: fix error in implementation #20695

BUG: special.ndtr: fix error in implementation #20695

ThibaultDECO commented May 11, 2024

steppi commented May 11, 2024

ThibaultDECO commented May 11, 2024

steppi May 12, 2024 •

edited

ThibaultDECO May 12, 2024 •

edited

steppi May 12, 2024 •

edited

BUG: special.ndtr: fix error in implementation #20695

Are you sure you want to change the base?

BUG: special.ndtr: fix error in implementation #20695

Conversation

ThibaultDECO commented May 11, 2024

Reference issue

What does this implement/fix?

Additional information

steppi commented May 11, 2024

ThibaultDECO commented May 11, 2024

steppi May 12, 2024 • edited

Choose a reason for hiding this comment

ThibaultDECO May 12, 2024 • edited

Choose a reason for hiding this comment

steppi May 12, 2024 • edited

Choose a reason for hiding this comment

steppi May 12, 2024 •

edited

ThibaultDECO May 12, 2024 •

edited

steppi May 12, 2024 •

edited