Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: improve performance of the skewnorm cdf computation #15486

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
53 changes: 43 additions & 10 deletions scipy/stats/_continuous_distns.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import warnings
from collections.abc import Iterable
from functools import wraps
import math
import ctypes

import numpy as np
Expand All @@ -26,6 +27,7 @@
get_distribution_names, _kurtosis, _ncx2_cdf, _ncx2_log_pdf, _ncx2_pdf,
rv_continuous, _skew, _get_fixed_fit_value, _check_shape)
from ._ksstats import kolmogn, kolmognp, kolmogni
from ._multivariate import multivariate_normal
from ._constants import (_XMIN, _EULER, _ZETA3,
_SQRT_2_OVER_PI, _LOG_SQRT_2_OVER_PI)
import scipy.stats._boost as _boost
Expand Down Expand Up @@ -7065,17 +7067,48 @@ def _pdf(self, x, a):
return 2.*_norm_pdf(x)*_norm_cdf(a*x)

def _cdf_single(self, x, *args):
_a, _b = self._get_support(*args)
if x <= 0:
cdf = integrate.quad(self._pdf, _a, x, args=args)[0]
"""
This method implements the method discussed in [1]. As stated in the
paper, the cdf value is calculated by assuming the skewness is
non-negative. For negative values of skewness, we use the reflection
property: P(x;-shape) = 1 - P(-x;shape).
mdhaber marked this conversation as resolved.
Show resolved Hide resolved

As recommented in the conclusion section of [1],
mdhaber marked this conversation as resolved.
Show resolved Hide resolved
* when shape==1, we use ``norm_cdf(Q)`` which provide exact results
accross all Q.
mdhaber marked this conversation as resolved.
Show resolved Hide resolved
* we use equation 14 when Q < 0, which is an approximation for the
lower tail of the distribution.
Comment on lines +7079 to +7080
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i haven't read the paper yet, but is it more accurate or much faster than equation 3? Is it a good approximation for all values of the shape parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several algorithms outlined in the paper that have differing accuracy and speed, depending on the sign and absolute values of the shape parameter. Unfortunately, It seems rather hard or impractical to account for all different values while choosing the "best" algorithm. Equation 14 is just as accurate as the current implementation but as seen in my previous comment here: #15486 (comment), the accuracy isn't equal. The hard part is knowing which algorithm to choose from given the combination of x and shape and whether or not that combination represents a point at the tail of the distribution.

* we use equation 3 elsewhere since it is accurate for the "central
mdhaber marked this conversation as resolved.
Show resolved Hide resolved
portion" of the distribution (10^-20 <= cdf <= 1 - 10^-20) and
upper tail (cdf > 1 - 10^-20).

References
----------
[1] Amsler, C., Papadopoulos, A. & Schmidt, P. Evaluating the cdf of
the Skew Normal distribution. Empir Econ 60, 3171–3202 (2021).
https://doi.org/10.1007/s00181-020-01868-6
"""
q = x
mdhaber marked this conversation as resolved.
Show resolved Hide resolved
(shape,) = args
orig_shape = shape

if orig_shape < 0:
shape = -shape
q = -q

if shape == 1:
# exact approximation across all values of q
mdhaber marked this conversation as resolved.
Show resolved Hide resolved
cdf = sc.ndtr(q) ** 2
elif q < 0:
z = (1 + shape ** 2) * q * q
cdf = math.exp(-0.5 * z) / (math.pi * shape * z)
mdhaber marked this conversation as resolved.
Show resolved Hide resolved
else:
t1 = integrate.quad(self._pdf, _a, 0, args=args)[0]
t2 = integrate.quad(self._pdf, 0, x, args=args)[0]
cdf = t1 + t2
if cdf > 1:
# Presumably numerical noise, e.g. 1.0000000000000002
cdf = 1.0
return cdf
q = np.c_[np.zeros_like(q), q]
mdhaber marked this conversation as resolved.
Show resolved Hide resolved
# construct the covariance of the bivariate standard normal
corr = -shape / math.sqrt(1 + shape ** 2)
cov = ((1, corr), (corr, 1))
cdf = 2 * multivariate_normal.cdf(q, cov=cov)
return (1 - cdf) if orig_shape < 0 else cdf

def _sf(self, x, a):
return self._cdf(-x, -a)
Expand Down