Skip to content

Improve precision of stats.norm.logcdf #188

Merged
merged 4 commits into from May 5, 2012

3 participants

@rgommers
SciPy member
rgommers commented Apr 1, 2012

This adds a log_ndtr function to scipy.special and uses it in stats.norm. See ticket 1614.

@pv pv and 1 other commented on an outdated diff Apr 1, 2012
scipy/special/cephes/ndtr.c
@@ -478,3 +477,17 @@ double erf(double x)
return( y );
}
+
+double log_ndtr(double a) {
+ double pi = 3.14159265358979323846264338327;
+ if (a > -10) {
@pv
SciPy member
pv added a note Apr 1, 2012

How was this cutoff chosen, and how big is the discontinuity across it? There probably should be a test case checking this.

@rgommers
SciPy member
rgommers added a note Apr 1, 2012

The discontinuity is about 0.0097. The pdf attached to the ticket has an error estimate, did you see that? It actually suggests that this approximation should be used for a cutoff closer to 0 that the current choice.

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

norm = stats.norm()

x = np.linspace(-9.9, -10.1, 300)
y = norm.logcdf(x)

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x, y, 'b-')

plt.show()

print 'Discontinuity around -10: ', norm.logcdf(-10+1e-10) - norm.logcdf(-10-1e-10)
@pv
SciPy member
pv added a note Apr 1, 2012

The error made by choosing the cutoff is of order 1/(2 z^2), and ndtr runs out of floating point numbers around z ~ -37, so getting very good accuracy by adjusting the cutoff is unfortunately not possible here. Adding the correction terms from the sum should help, however.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@pv pv commented on an outdated diff Apr 1, 2012
scipy/stats/tests/test_distributions.py
mvsk = stats.powerlaw.stats(a, moments="mvsk")
assert_array_almost_equal(mvsk, exact_mvsk)
+def test_norm_logcdf():
+ """Test precision of the logcdf of the normal distribution.
+
+ This precision was enhanced in ticket 1614.
+ """
+ x = -np.asarray(range(0, 120, 4))
+ # Values from R
+ expected = [-0.69, -10.36, -35.01, -75.41, -131.70, -203.92, -292.10,
+ -396.25, -516.39, -652.50, -804.61, -972.70, -1156.79,
+ -1356.87, -1572.94, -1805.01, -2053.08, -2317.14, -2597.20,
+ -2893.25, -3205.30, -3533.35, -3877.40, -4237.44, -4613.48,
+ -5005.52, -5413.56, -5837.60, -6277.64, -6733.67]
+
+ assert_allclose(stats.norm().logcdf(x), expected, atol=0.01)
@pv
SciPy member
pv added a note Apr 1, 2012

More decimals could be useful to have here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@pv
SciPy member
pv commented Apr 1, 2012

Ok, the relative accuracy of this is not so good, around 1e-4. I'd prefer at least sqrt(eps).

I think this patch must be revised to properly implement the asymptotic expansion of log(ndtr(z)) = log(.5*erfc(-z/sqrt(2))). The relevant formula is 7.1.23 in Abramowitz&Stegun, which gives:

log ndtr(z) = -.5*log(2*pi) - log(-z) - z**2/2 + log(1 + sum_{m=1}^infty (-1)**m 1*3*...*(2m-1)/z**(2*m))

The last log(1 + ...) could maybe also be taken by log1p which I think is in npymath. The number of terms should be limited so that the last term taken is smaller than the machine epsilon.

EDIT: many edits to this comment, due to exceeding number of typos in the formula :/

@rgommers
SciPy member
rgommers commented Apr 9, 2012

Issues should be addressed now. Andrew reported relative accuracy of 1e-14, for me it's about 1e-11. Test precision bumped up to atol=1e-8.

@rgommers
SciPy member
rgommers commented May 5, 2012

@pv: are you OK with merging this?

@pv
SciPy member
pv commented May 5, 2012

I believe this is correct as is stands. +1

@rgommers rgommers merged commit 92f62aa into scipy:master May 5, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.