BUG: stats: fix skewnorm.cdf losing precision at large x #8473

ev-br · 2018-02-24T19:55:48Z

Now that Owen's T function is available, can use it for the skew normal distribution.

closes #7746

josef-pkt · 2018-02-24T20:02:24Z

LGTM, it's good to have this now

person142 · 2018-02-24T20:18:23Z

Closes #5160 too, right?

WarrenWeckesser · 2018-02-24T20:21:09Z

Nice. The survival function _sf should now be defined as

def _sf(self, x, a):
    return _norm_cdf(-x) + 2*sc.owens_t(x, a)

so we get accurate values for large x

ev-br · 2018-02-24T20:27:38Z

@WarrenWeckesser I thought about it, but

In [9]: x = np.linspace(10, 50, 20)

In [10]: stats.skewnorm.sf(x, -1)
Out[10]: 
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.])

And I did not find a failing test case (admittedly, I did not try too hard). Do you have one?

WarrenWeckesser · 2018-02-24T20:45:21Z

Those values are all 0, but that just means the true values are smaller than ~1.11e-16. We should be able to do better than that.

Here's a comparison:

In [55]: def skewnorm_sf(x, a):
    ...:     x = np.asarray(x)
    ...:     return ndtr(-x) + 2*owens_t(x, a)
    ...: 

In [56]: x = np.arange(10.)

Here's the default sf, which is simply 1 - cdf, so it can't do better than ~1.11e-16:

In [57]: skewnorm.sf(x, -1)
Out[57]: 
array([  2.50000000e-01,   2.51714896e-02,   5.17568504e-04,
         1.82222470e-06,   1.00306752e-09,   8.22675261e-14,
         1.11022302e-16,   0.00000000e+00,   0.00000000e+00,
         0.00000000e+00])

Here's the version I suggested:

In [58]: skewnorm_sf(x, -1)
Out[58]: 
array([  2.50000000e-01,   2.51714896e-02,   5.17568504e-04,
         1.82222470e-06,   1.00306756e-09,   8.21691225e-14,
         9.73347976e-19,   1.63800341e-24,  -8.48025473e-30,
        -8.42594351e-34])

But that reveals a new problem: the survival function should not be negative! How accurate is the implementation of Owen's T with a call such as owens_t(10, -1)?

WarrenWeckesser · 2018-02-24T20:56:25Z

I said

We should be able to do better than that.

but apparently it isn't as easy as I hoped. For x large enough, the terms _norm_cdf(-x) and 2*owens_t(x, -1) have opposite signs and almost the same magnitude, so we get catastrophic loss of precision.

WarrenWeckesser · 2018-02-24T21:14:50Z

The indentation in the test function is wrong--it should be four spaces, not five.

WarrenWeckesser · 2018-02-24T21:47:43Z

The CDF implementation using owens_t also has problems. For example:

In [52]: skewnorm.cdf(-8, 1)
Out[52]: -8.4802547311258769e-30

With scipy 1.0.0, we get

In [5]: skewnorm.cdf(-8, 1)
Out[5]: 3.8700350444395389e-31

According to Wolfram Alpha, the correct answer is 3.870035046664392611e-31.

josef-pkt · 2018-02-24T21:53:31Z

problems in cdf and sf should be symmetric for opposite signs of x and a.

one possibility would be to recompute using super for edge cases where numerical integration (scipy 1.0) works better.

ev-br · 2018-02-24T23:04:48Z

The current version fixes a problem which was reported by a user for somewhat large arguments. At still larger arguments, cdf is exponentially small and the current version suffers from catastrophic cancellation. In that regime, one likely needs to figure out the asymptotic expansion of the Owen's T function. While this could be a fun project, and the answer is likely available from the literature (or possibly even implemented inside of scipy.special.owen_t), it is out of scope for this PR.

WarrenWeckesser · 2018-02-24T23:21:15Z

Evgeni, so what do you think is the next step? I don't think this pull request should be merged in its current state. It fixes one problem but creates a new problem.

person142 · 2018-02-25T00:19:08Z

A small thought: the test case skewnorm(x, 1) is the worst possible scenario because when a = 1 Owen's T function is 0.5*Phi(x)*(1 - Phi(x)). Of course this means there will also be problems around a = 1, but in this case you can use method T6 from Patefield and Tandy which decomposes Owen's T as 0.5*Phi(x)*(1 - Phi(x)) + [stuff] and cancel the common Phi(x) terms.

Haven't thought about whether there's cancellation in other regimes.

WarrenWeckesser · 2018-02-25T01:24:16Z

Haven't thought about whether there's cancellation in other regimes.

It is not hard to find examples where the CDF implementation using Owen's T can result in big errors even when a is not 1. In this session, I don't have this pull request installed, so I'll define a one-liner called skewnorm_cdf(x, a) that uses Owen's T to compute the CDF.

In [61]: import scipy

In [62]: scipy.__version__
Out[62]: '1.1.0.dev0+895a774'

In [63]: from scipy.stats import skewnorm

Here's the CDF implemented with Owen's T function:

In [64]: from scipy.special import ndtr, owens_t

In [65]: def skewnorm_cdf(x, a):
    ...:     return ndtr(x) - 2*owens_t(x, a)
    ...:

A couple cases where the error is big:

x = -4, a = 2. The correct value (according to Wolfram Alpha) is 8.1298399188811398e-21.

In [66]: skewnorm_cdf(-4, 2)
Out[66]: -1.0164395367051604e-19

In [67]: skewnorm.cdf(-4, 2)
Out[67]: 8.1298399223752232e-21

x = -2, a = 5. The correct value is 1.55326826787106273e-26.

In [105]: skewnorm_cdf(-2, 5)
Out[105]: -2.0816681711721685e-17

In [106]: skewnorm.cdf(-2, 5)
Out[106]: 1.5532682678674716e-26

WarrenWeckesser · 2018-02-25T17:20:57Z

An alternative to using Owen's T is to split the integral of the PDF at x=0, to ensure that quad "sees" the significant part of the PDF. If I define the following in skew_norm_gen, the problem shown in #7746 is fixed, and the other examples I gave above result in the "correct" values (relative error less than 1e-9).

    def _cdf_single(self, x, *args):
        if x < 0:
            cdf = integrate.quad(self._pdf, self.a, x, args=args)[0]
        else:
            t1 = integrate.quad(self._pdf, self.a, 0, args=args)[0]
            t2 = integrate.quad(self._pdf, 0, x, args=args)[0]
            cdf = t1 + t2
        if cdf > 1:
            # Presumably numerical noise, e.g. 1.0000000000000002
            cdf = 1.0
        return cdf

It would still be nice to get the CDF and SF functions working reliably with Owen's T, because it is much faster. For example,

In [19]: from scipy.special import ndtr, owens_t

In [20]: def skewnorm_cdf(x, a):
    ...:     return ndtr(x) - 2*owens_t(x, a)
    ...: 

In [21]: skewnorm.cdf(3, 2)  # Implemented using quad, split at x=0.
Out[21]: 0.99730020393729912

In [22]: skewnorm_cdf(3, 2)  # Implemented using Owen's T function.
Out[22]: 0.99730020393729946

In [23]: %timeit skewnorm.cdf(3, 2)
334 µs ± 1.97 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [24]: %timeit skewnorm_cdf(3, 2)   # Much faster to use Owen's T.
2.7 µs ± 19.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

WarrenWeckesser · 2018-02-25T18:28:55Z

@josef-pkt wrote

problems in cdf and sf should be symmetric for opposite signs of x and a.

Indeed, for the skew normal distribution, sf(x, a) = cdf(-x, -a). So we can implement _sf as

    def _sf(self, x, a):
        return self._cdf(-x, -a)

and focus on getting the CDF function right.

WarrenWeckesser · 2018-02-28T21:27:00Z

It might take a while for someone to resolve the problem of the catastrophic loss of precision, so in the meantime, I incorporated my comments above into an alternative pull request: #8501

ev-br · 2018-03-03T20:50:53Z

Closing this in favor of gh-8501

ev-br added the scipy.stats label Feb 24, 2018

WarrenWeckesser added the needs-work Items that are pending response from the author label Feb 24, 2018

BUG: stats: fix skewnorm.cdf losing precision at large x

f93866c

ev-br force-pushed the skewnorm_cdf branch from 0f25a09 to f93866c Compare February 24, 2018 22:57

WarrenWeckesser mentioned this pull request Feb 28, 2018

BUG: stats: Split the integral used to compute skewnorm.cdf. #8501

Merged

ev-br mentioned this pull request Mar 3, 2018

False CDF values for skew normal distribution #7746

Closed

ev-br closed this Mar 3, 2018

sethtroisi mentioned this pull request May 20, 2020

Implement Owen's T function and use it in stats.skewnorm #5160

Closed

WarrenWeckesser mentioned this pull request Jan 29, 2022

ENH: skewnorm.cdf is very slow. Consider a much more efficient approach #15473

Closed

zoj613 mentioned this pull request Jan 29, 2022

ENH: improve performance of the skewnorm cdf computation #15486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: stats: fix skewnorm.cdf losing precision at large x #8473

BUG: stats: fix skewnorm.cdf losing precision at large x #8473

ev-br commented Feb 24, 2018

josef-pkt commented Feb 24, 2018

person142 commented Feb 24, 2018

WarrenWeckesser commented Feb 24, 2018

ev-br commented Feb 24, 2018

WarrenWeckesser commented Feb 24, 2018

WarrenWeckesser commented Feb 24, 2018

WarrenWeckesser commented Feb 24, 2018

WarrenWeckesser commented Feb 24, 2018

josef-pkt commented Feb 24, 2018

ev-br commented Feb 24, 2018

WarrenWeckesser commented Feb 24, 2018

person142 commented Feb 25, 2018

WarrenWeckesser commented Feb 25, 2018

WarrenWeckesser commented Feb 25, 2018 •

edited

Loading

WarrenWeckesser commented Feb 25, 2018

WarrenWeckesser commented Feb 28, 2018

ev-br commented Mar 3, 2018

BUG: stats: fix skewnorm.cdf losing precision at large x #8473

BUG: stats: fix skewnorm.cdf losing precision at large x #8473

Conversation

ev-br commented Feb 24, 2018

josef-pkt commented Feb 24, 2018

person142 commented Feb 24, 2018

WarrenWeckesser commented Feb 24, 2018

ev-br commented Feb 24, 2018

WarrenWeckesser commented Feb 24, 2018

WarrenWeckesser commented Feb 24, 2018

WarrenWeckesser commented Feb 24, 2018

WarrenWeckesser commented Feb 24, 2018

josef-pkt commented Feb 24, 2018

ev-br commented Feb 24, 2018

WarrenWeckesser commented Feb 24, 2018

person142 commented Feb 25, 2018

WarrenWeckesser commented Feb 25, 2018

WarrenWeckesser commented Feb 25, 2018 • edited Loading

WarrenWeckesser commented Feb 25, 2018

WarrenWeckesser commented Feb 28, 2018

ev-br commented Mar 3, 2018

WarrenWeckesser commented Feb 25, 2018 •

edited

Loading