Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Improve precision of stats.norm.logcdf #188

Merged
merged 4 commits into from

3 participants

@rgommers
Owner

This adds a log_ndtr function to scipy.special and uses it in stats.norm. See ticket 1614.

scipy/special/cephes/ndtr.c
@@ -478,3 +477,17 @@ double erf(double x)
return( y );
}
+
+double log_ndtr(double a) {
+ double pi = 3.14159265358979323846264338327;
+ if (a > -10) {
@pv Owner
pv added a note

How was this cutoff chosen, and how big is the discontinuity across it? There probably should be a test case checking this.

@rgommers Owner
rgommers added a note

The discontinuity is about 0.0097. The pdf attached to the ticket has an error estimate, did you see that? It actually suggests that this approximation should be used for a cutoff closer to 0 that the current choice.

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

norm = stats.norm()

x = np.linspace(-9.9, -10.1, 300)
y = norm.logcdf(x)

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(x, y, 'b-')

plt.show()

print 'Discontinuity around -10: ', norm.logcdf(-10+1e-10) - norm.logcdf(-10-1e-10)
@pv Owner
pv added a note

The error made by choosing the cutoff is of order 1/(2 z^2), and ndtr runs out of floating point numbers around z ~ -37, so getting very good accuracy by adjusting the cutoff is unfortunately not possible here. Adding the correction terms from the sum should help, however.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
scipy/stats/tests/test_distributions.py
((6 lines not shown))
mvsk = stats.powerlaw.stats(a, moments="mvsk")
assert_array_almost_equal(mvsk, exact_mvsk)
+def test_norm_logcdf():
+ """Test precision of the logcdf of the normal distribution.
+
+ This precision was enhanced in ticket 1614.
+ """
+ x = -np.asarray(range(0, 120, 4))
+ # Values from R
+ expected = [-0.69, -10.36, -35.01, -75.41, -131.70, -203.92, -292.10,
+ -396.25, -516.39, -652.50, -804.61, -972.70, -1156.79,
+ -1356.87, -1572.94, -1805.01, -2053.08, -2317.14, -2597.20,
+ -2893.25, -3205.30, -3533.35, -3877.40, -4237.44, -4613.48,
+ -5005.52, -5413.56, -5837.60, -6277.64, -6733.67]
+
+ assert_allclose(stats.norm().logcdf(x), expected, atol=0.01)
@pv Owner
pv added a note

More decimals could be useful to have here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@pv
Owner
pv commented

Ok, the relative accuracy of this is not so good, around 1e-4. I'd prefer at least sqrt(eps).

I think this patch must be revised to properly implement the asymptotic expansion of log(ndtr(z)) = log(.5*erfc(-z/sqrt(2))). The relevant formula is 7.1.23 in Abramowitz&Stegun, which gives:

log ndtr(z) = -.5*log(2*pi) - log(-z) - z**2/2 + log(1 + sum_{m=1}^infty (-1)**m 1*3*...*(2m-1)/z**(2*m))

The last log(1 + ...) could maybe also be taken by log1p which I think is in npymath. The number of terms should be limited so that the last term taken is smaller than the machine epsilon.

EDIT: many edits to this comment, due to exceeding number of typos in the formula :/

@rgommers
Owner

Issues should be addressed now. Andrew reported relative accuracy of 1e-14, for me it's about 1e-11. Test precision bumped up to atol=1e-8.

@rgommers
Owner

@pv: are you OK with merging this?

@pv
Owner
pv commented

I believe this is correct as is stands. +1

@rgommers rgommers merged commit 92f62aa into from
@ClemensFMN ClemensFMN referenced this pull request from a commit
Commit has since been removed from the repository and is no longer available.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Apr 1, 2012
  1. @scheinatadobe @rgommers
  2. @rgommers

    TST: add unit test for improved precision norm.logcdf(). Closes #1614.

    rgommers authored
    Also add Andrew Schein to THANKS.txt.
Commits on Apr 9, 2012
  1. @scheinatadobe @rgommers

    ENH: increase accuracy of Taylor expansion of logcdf of normal distri…

    scheinatadobe authored rgommers committed
    …bution.
    
    This addresses review comments on #1614.
  2. @rgommers

    TST: increase test accuracy for logcdf of normal distribution.

    rgommers authored
    Also some minor style changes to implementation.
This page is out of date. Refresh to see the latest.
View
1  THANKS.txt
@@ -99,6 +99,7 @@ Jacob Silterra for cwt-based peak finding in scipy.signal.
Denis Laxalde for the unified interface to minimizers in scipy.optimize.
David Fong for the sparse LSMR solver.
Andreas Hilboll for wrapping FITPACK's spgrid.f
+Andrew Schein for improving the numerical precision of norm.logcdf().
Institutions
View
6 scipy/special/_cephesmodule.c
@@ -174,6 +174,7 @@ static void * hankel2_data[] = { (void *)cbesh_wrap2, (void *)cbesh_wrap2,};
static void * hankel2e_data[] = { (void *)cbesh_wrap2_e, (void *)cbesh_wrap2_e,};
static void * ndtr_data[] = { (void *)ndtr, (void *)ndtr, };
+static void * log_ndtr_data[] = { (void *)log_ndtr, (void *)log_ndtr, };
static void * erfc_data[] = { (void *)erfc, (void *)erfc, };
static void * erf_data[] = { (void *)erf, (void *)erf, (void *)cerf_wrap, (void *)cerf_wrap};
static void * ndtri_data[] = { (void *)ndtri, (void *)ndtri, };
@@ -706,6 +707,11 @@ static void Cephes_InitOperators(PyObject *dictionary) {
PyDict_SetItemString(dictionary, "ndtr", f);
Py_DECREF(f);
+ f = PyUFunc_FromFuncAndData(cephes1_functions, log_ndtr_data, cephes_2_types, 2, 1, 1, PyUFunc_None, "log_ndtr", log_ndtr_doc, 0);
+ PyDict_SetItemString(dictionary, "log_ndtr", f);
+ Py_DECREF(f);
+
+
f = PyUFunc_FromFuncAndData(cephes1_functions, erfc_data, cephes_2_types, 2, 1, 1, PyUFunc_None, "erfc", erfc_doc, 0);
PyDict_SetItemString(dictionary, "erfc", f);
Py_DECREF(f);
View
1  scipy/special/cephes.h
@@ -127,6 +127,7 @@ extern double nbdtr ( int k, int n, double p );
extern double nbdtri ( int k, int n, double p );
extern double ndtr ( double a );
+extern double log_ndtr ( double a );
extern double erfc ( double a );
extern double erf ( double x );
extern double ndtri ( double y0 );
View
44 scipy/special/cephes/ndtr.c
@@ -144,7 +144,7 @@ Copyright 1984, 1987, 1988, 1992 by Stephen L. Moshier
Direct inquiries to 30 Frost Street, Cambridge, MA 02140
*/
-
+#include <float.h> /* DBL_EPSILON */
#include "mconf.h"
extern double SQRTH;
@@ -478,3 +478,45 @@ y = x * polevl( z, T, 4 ) / p1evl( z, U, 5 );
return( y );
}
+
+/*
+ * double log_ndtr(double a)
+ *
+ * For a > -20, use the existing ndtr technique and take a log.
+ * for a <= -20, we use the Taylor series approximation of erf to compute
+ * the log CDF directly. The Taylor series consists of two parts which we will name "left"
+ * and "right" accordingly. The right part involves a summation which we compute until the
+ * difference in terms falls below the machine-specific EPSILON.
+ *
+ * \Phi(z) &=&
+ * \frac{e^{-z^2/2}}{-z\sqrt{2\pi}} * [1 + \sum_{n=1}^{N-1} (-1)^n \frac{(2n-1)!!}{(z^2)^n}]
+ * + O(z^{-2N+2})
+ * = [\mbox{LHS}] * [\mbox{RHS}] + \mbox{error}.
+ *
+ */
+
+double log_ndtr(double a) {
+
+ double log_LHS, /* we compute the left hand side of the approx (LHS) in one shot */
+ last_total = 0, /* variable used to check for convergence */
+ right_hand_side = 1, /* includes first term from the RHS summation */
+ numerator = 1, /* numerator for RHS summand */
+ denom_factor = 1, /* use reciprocal for denominator to avoid division */
+ denom_cons = 1.0/(a*a); /* the precomputed division we use to adjust the denominator */
+ long sign = 1, i = 0;
+ if (a > -20 ) {
+ return log(ndtr(a));
+ }
+ log_LHS = -0.5*a*a - log(-a) - 0.5*log(2*M_PI);
+
+ while (fabs(last_total - right_hand_side) > DBL_EPSILON) {
+ i += 1;
+ last_total = right_hand_side;
+ sign = -sign;
+ denom_factor *= denom_cons;
+ numerator *= 2*i - 1;
+ right_hand_side += sign * numerator * denom_factor;
+
+ }
+ return log_LHS + log(right_hand_side);
+}
View
1  scipy/special/cephes_doc.h
@@ -122,6 +122,7 @@
#define nbdtrik_doc "k=nbdtrik(y,n,p) finds the argument k such that nbdtr(k,n,p)=y."
#define nbdtrin_doc "n=nbdtrin(k,y,p) finds the argument n such that nbdtr(k,n,p)=y."
#define ndtr_doc "y=ndtr(x) returns the area under the standard Gaussian probability \ndensity function, integrated from minus infinity to x:\n1/sqrt(2*pi) * integral(exp(-t**2 / 2),t=-inf..x)"
+#define log_ndtr_doc "y=log_ndtr(x) returns the log of the area under the standard Gaussian probability \ndensity function, integrated from minus infinity to x:\n1/sqrt(2*pi) * integral(exp(-t**2 / 2),t=-inf..x)"
#define ndtri_doc "x=ndtri(y) returns the argument x for which the area udnder the\nGaussian probability density function (integrated from minus infinity\nto x) is equal to y."
#define obl_ang1_doc "(s,sp)=obl_ang1(m,n,c,x) computes the oblate sheroidal angular function \nof the first kind and its derivative (with respect to x) for mode paramters\nm>=0 and n>=m, spheroidal parameter c and |x|<1.0."
#define obl_ang1_cv_doc "(s,sp)=obl_ang1_cv(m,n,c,cv,x) computes the oblate sheroidal angular function \nof the first kind and its derivative (with respect to x) for mode paramters\nm>=0 and n>=m, spheroidal parameter c and |x|<1.0. Requires pre-computed\ncharacteristic value."
View
2  scipy/stats/distributions.py
@@ -2072,7 +2072,7 @@ def _norm_logpdf(x):
def _norm_cdf(x):
return special.ndtr(x)
def _norm_logcdf(x):
- return log(special.ndtr(x))
+ return special.log_ndtr(x)
def _norm_ppf(q):
return special.ndtri(q)
class norm_gen(rv_continuous):
View
25 scipy/stats/tests/test_distributions.py
@@ -827,9 +827,9 @@ def test_tukeylambda_stats_ticket_1545():
def test_powerlaw_stats():
"""Test the powerlaw stats function.
-
+
This unit test is also a regression test for ticket 1548.
-
+
The exact values are:
mean:
mu = a / (a + 1)
@@ -858,10 +858,29 @@ def test_powerlaw_stats():
"""
cases = [(1.0, (0.5, 1./12 , 0.0, -1.2)),
(2.0, (2./3, 2./36, -0.56568542494924734, -0.6))]
- for a, exact_mvsk in cases:
+ for a, exact_mvsk in cases:
mvsk = stats.powerlaw.stats(a, moments="mvsk")
assert_array_almost_equal(mvsk, exact_mvsk)
+def test_norm_logcdf():
+ """Test precision of the logcdf of the normal distribution.
+
+ This precision was enhanced in ticket 1614.
+ """
+ x = -np.asarray(range(0, 120, 4))
+ # Values from R
+ expected = [-0.69314718, -10.36010149, -35.01343716, -75.41067300,
+ -131.69539607, -203.91715537, -292.09872100, -396.25241451,
+ -516.38564863, -652.50322759, -804.60844201, -972.70364403,
+ -1156.79057310, -1356.87055173, -1572.94460885, -1805.01356068,
+ -2053.07806561, -2317.13866238, -2597.19579746, -2893.24984493,
+ -3205.30112136, -3533.34989701, -3877.39640444, -4237.44084522,
+ -4613.48339520, -5005.52420869, -5413.56342187, -5837.60115548,
+ -6277.63751711, -6733.67260303]
+
+ assert_allclose(stats.norm().logcdf(x), expected, atol=1e-8)
+
+
if __name__ == "__main__":
run_module_suite()
Something went wrong with that request. Please try again.