# Improve precision of stats.norm.logcdf #188

Merged
merged 4 commits into from May 5, 2012

## Conversation

Projects
None yet
3 participants
Member

### rgommers commented Apr 1, 2012

 This adds a `log_ndtr` function to scipy.special and uses it in stats.norm. See ticket 1614.

``` ENH: increase precision of logcdf of the normal distribution. See #1614. ```
``` eda8a85 ```
``` TST: add unit test for improved precision norm.logcdf(). Closes #1614. ```
`Also add Andrew Schein to THANKS.txt.`
``` 70c214a ```

### pv reviewed Apr 1, 2012

 + +double log_ndtr(double a) { + double pi = 3.14159265358979323846264338327; + if (a > -10) {

#### pv Apr 1, 2012

Member

How was this cutoff chosen, and how big is the discontinuity across it? There probably should be a test case checking this.

#### rgommers Apr 1, 2012

Member

The discontinuity is about 0.0097. The pdf attached to the ticket has an error estimate, did you see that? It actually suggests that this approximation should be used for a cutoff closer to 0 that the current choice.

``````import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

norm = stats.norm()

x = np.linspace(-9.9, -10.1, 300)
y = norm.logcdf(x)

fig = plt.figure()
ax.plot(x, y, 'b-')

plt.show()

print 'Discontinuity around -10: ', norm.logcdf(-10+1e-10) - norm.logcdf(-10-1e-10)
``````

#### pv Apr 1, 2012

Member

The error made by choosing the cutoff is of order `1/(2 z^2)`, and ndtr runs out of floating point numbers around `z ~ -37`, so getting very good accuracy by adjusting the cutoff is unfortunately not possible here. Adding the correction terms from the sum should help, however.

### pv reviewed Apr 1, 2012

 + -2893.25, -3205.30, -3533.35, -3877.40, -4237.44, -4613.48, + -5005.52, -5413.56, -5837.60, -6277.64, -6733.67] + + assert_allclose(stats.norm().logcdf(x), expected, atol=0.01)

#### pv Apr 1, 2012

Member

More decimals could be useful to have here.

Member

### pv commented Apr 1, 2012

 Ok, the relative accuracy of this is not so good, around 1e-4. I'd prefer at least sqrt(eps). I think this patch must be revised to properly implement the asymptotic expansion of `log(ndtr(z)) = log(.5*erfc(-z/sqrt(2)))`. The relevant formula is 7.1.23 in Abramowitz&Stegun, which gives: ``````log ndtr(z) = -.5*log(2*pi) - log(-z) - z**2/2 + log(1 + sum_{m=1}^infty (-1)**m 1*3*...*(2m-1)/z**(2*m)) `````` The last log(1 + ...) could maybe also be taken by log1p which I think is in npymath. The number of terms should be limited so that the last term taken is smaller than the machine epsilon. EDIT: many edits to this comment, due to exceeding number of typos in the formula :/

``` ENH: increase accuracy of Taylor expansion of logcdf of normal distri… ```
```…bution.

``` d9b1b87 ```
``` TST: increase test accuracy for logcdf of normal distribution. ```
`Also some minor style changes to implementation.`
``` 84b6c3a ```
Member

### rgommers commented Apr 9, 2012

 Issues should be addressed now. Andrew reported relative accuracy of 1e-14, for me it's about 1e-11. Test precision bumped up to `atol=1e-8`.
Member

### rgommers commented May 5, 2012

 @pv: are you OK with merging this?
Member

### pv commented May 5, 2012

 I believe this is correct as is stands. +1

### rgommers added a commit that referenced this pull request May 5, 2012

``` Merge pull request #188 from rgommers/ticket-1614-norm-logcdf ```
`Improve precision of stats.norm.logcdf  Closes #1614.`
``` 92f62aa ```

Closed