Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False CDF values for skew normal distribution #7746

Closed
pphilippos opened this issue Aug 18, 2017 · 3 comments
Closed

False CDF values for skew normal distribution #7746

pphilippos opened this issue Aug 18, 2017 · 3 comments
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats
Milestone

Comments

@pphilippos
Copy link

When x is sufficiently large, skewnorm.cdf outputs 0 instead of 1.

Reproducing code example:

A short code example that reproduces the problem:

from scipy.stats import skewnorm as sk2
import numpy as np
for x in np.linspace(-10,50,1000):
	print str(x) + " " + str(sk2.cdf(x,-1))

Error message:

No error messages. See the following graph for the issue:
img

Scipy/Numpy/Python version information:

Python 2.7.13
Scipy 0.19.1
Numpy 1.12.1

@pphilippos pphilippos changed the title False CDF values of the skewnorm distribution False CDF values for skew normal distribution Aug 18, 2017
@pv
Copy link
Member

pv commented Aug 18, 2017

Correct. The technical problem is that numerical integration of the pdf starts to fail once the support becomes too narrow compared to the interval. Can probably be fixed by just returning 1.0 when deep in the tail of the cdf.

@ev-br
Copy link
Member

ev-br commented Aug 18, 2017

ISTM the best fix would be to finish off gh-7120, which implements the Owen's T function, and add the explicit form of the _cdf. This seems to fix the loss of precision, cf https://github.com/ev-br/scipy/tree/pr/7120. The only relevant commit on top of gh-7120 is 86b43ba

@ev-br ev-br added defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats labels Aug 18, 2017
WarrenWeckesser added a commit to WarrenWeckesser/scipy that referenced this issue Feb 28, 2018
The CDF is computed by integrating the PDF using scipy.integrate.quad.
To ensure that quad "sees" the peak of the PDF, the integral is split
at x=0.

The calculation of the survival function is improved by using the
symmetry it has with the CDF:  sf(x, a) = cdf(-x, -a).

Closes scipygh-7746.
WarrenWeckesser added a commit to WarrenWeckesser/scipy that referenced this issue Mar 1, 2018
The CDF is computed by integrating the PDF using scipy.integrate.quad.
To ensure that quad "sees" the peak of the PDF, the integral is split
at x=0.

The calculation of the survival function is improved by using the
symmetry it has with the CDF:  sf(x, a) = cdf(-x, -a).

Closes scipygh-7746.
@ev-br
Copy link
Member

ev-br commented Mar 3, 2018

Per the discussion in gh-8473: the naive formula contains the difference 2\Phi(x) - T(x,a), which suffers from the loss of precision for some parameters. A proper fix likely involves coming up with the strategy of computing the full expression, instead of subtracting two terms. Meanwhile, gh-8501 has a workaround.

@rgommers rgommers added this to the 1.1.0 milestone Mar 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.stats
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants