Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: improve accuracy of ppf cdf roundtrip for gumbel_l #6228

Closed
pbrod opened this issue Jun 8, 2016 · 2 comments · Fixed by #6236
Closed

ENH: improve accuracy of ppf cdf roundtrip for gumbel_l #6228

pbrod opened this issue Jun 8, 2016 · 2 comments · Fixed by #6236
Labels
good first issue Good topic for first contributor pull requests, with a relatively straightforward solution scipy.stats
Milestone

Comments

@pbrod
Copy link
Contributor

pbrod commented Jun 8, 2016

The accuracy of the gumbel_l ppf and cdf functions is poor for large negative x as exemplified here:

>>> import numpy as np
>>> import scipy.stats as ss
>>> x = np.linspace(-100, -4)
>>> (ss.gumbel_l.ppf(ss.gumbel_l.cdf(x))-x)/x
array([             inf,              inf,              inf,
                inf,              inf,              inf,
                inf,              inf,              inf,
                inf,              inf,              inf,
                inf,              inf,              inf,
                inf,              inf,              inf,
                inf,              inf,              inf,
                inf,              inf,              inf,
                inf,              inf,              inf,
                inf,              inf,              inf,
                inf,              inf,  -1.52608162e-02,
     1.00926224e-04,   5.04405205e-04,  -1.22679122e-06,
    -3.82136386e-06,  -3.94631612e-07,  -5.56583419e-08,
    -9.00121598e-09,   3.31592421e-09,   4.64076847e-10,
    -7.81227413e-11,  -1.62509384e-11,  -2.56347474e-12,
     2.74780964e-13,  -9.17172674e-14,  -1.17775205e-14,
     2.68278550e-15,   6.66133815e-16])

By replacing the _cdf and _ppf methods with

def _cdf(self, x):
    return -expm1(-exp(x))

def _ppf(self, q):
    return log(-log1p(-q))

one will obtain full machine accuracy as shown here:

>>> (ss.gumbel_l.ppf(ss.gumbel_l.cdf(x))-x)/x
array([-0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0.,
         -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0.,
         -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0.,
         -0., -0., -0., -0., -0., -0., -0., -0., -0., -0., -0.])

One should also consider replacing the _logsf and _sf methods with

def _logsf(self, x):
    return -exp(x)

def _sf(self, x):
    return exp(-exp(x))

in order to improve the accuracy for those methods as well.

@ev-br ev-br added scipy.stats good first issue Good topic for first contributor pull requests, with a relatively straightforward solution labels Jun 9, 2016
@rgommers
Copy link
Member

rgommers commented Jun 9, 2016

Thanks @pbrod, nice set of related improvements in this and other recent issues!

@pbrod
Copy link
Contributor Author

pbrod commented Jun 10, 2016

Thanks.

@ev-br ev-br added this to the 0.18.0 milestone Jun 10, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good topic for first contributor pull requests, with a relatively straightforward solution scipy.stats
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants