Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Add the limiting distributions to generalized Pareto distribution with shapes c=0 and c=-1 #3225

Merged
merged 5 commits into from

7 participants

@ev-br
Collaborator

closes gh-1295

The implementation here is similar to the original one by @pbrod, as listed in gh-1295.

Several points I'd like to flag for a review:

  • The exact properties of the c\to 0 limit. For example, I'm currently skipping the (otherwise failing) test for continuity of the ppf at small c --- I first wrote the test without much thinking, but in fact I'm not sure if the requirement of the test is not too stringent. The specific question here, I guess, is whether the Box-Cox transform is uniformly convergent at lambda\to 0.
  • We might want to add a new ufunc for computing log(1 + a*x)/x with the correct behavior at x=0, instead of a private helper in genpareto.
  • _munp and entropy are not properly vectorized at the moment (neither in master, not in this PR).
@coveralls

Coverage Status

Coverage remained the same when pulling 4c9413d on ev-br:genpareto into dc7555b on scipy:master.

@josef-pkt
Collaborator

about ppf: I don't know the new boxcox, but in the old version we have 1-eps - 1 which becomes mostly noise for c<1e-8

my guess is that it's a purely numerical problems that won't go away without using a (Taylor) series expansion around c=0 or something like that

>>> q = np.linspace(0., 1., 30, endpoint=False)
>>> stats.genpareto.ppf(q, 1e-8) - stats.expon.ppf(q)
array([  0.00000000e+00,  -3.44033249e-09,   6.59031606e-09,
         4.65046893e-09,  -1.35370382e-09,   1.05718484e-09,
        -3.23048349e-09,   4.84360874e-09,  -4.33792879e-09,
        -9.90649074e-09,  -2.17329854e-09,   6.37755515e-09,
         2.47717025e-09,  -2.54236632e-11,  -5.40378220e-09,
         3.53435514e-09,   1.01246712e-08,   2.18065133e-09,
         3.03871706e-10,  -8.03573652e-10,   1.36105660e-09,
         6.01152550e-09,  -1.86942684e-09,   1.36590266e-08,
         3.83822663e-09,   2.70998719e-08,   2.38693882e-08,
         2.95770421e-08,   2.74037433e-08,   5.31425592e-08])
>>> stats.genpareto.ppf(q, 1e-10) - stats.expon.ppf(q)
array([  0.00000000e+00,   2.18604272e-07,   8.28155354e-07,
        -3.50620899e-07,   2.42895362e-07,  -7.31690011e-07,
         1.74405200e-07,  -1.50587615e-07,  -8.03698507e-07,
        -2.54155556e-07,  -5.57284811e-07,   6.72511370e-07,
        -9.07905710e-07,  -5.99545857e-07,  -3.82879611e-07,
         5.80850328e-07,  -8.11440367e-07,   8.23745690e-07,
         7.55255528e-07,  -2.22848179e-07,   2.35655171e-08,
        -3.27055382e-07,   1.97970718e-07,  -2.30590039e-07,
        -8.84340193e-07,   6.04415845e-07,   7.78821045e-07,
        -3.03489865e-07,  -8.60774676e-07,  -2.79924348e-07])
>>> stats.genpareto.ppf(q, 1e-14) - stats.expon.ppf(q)
array([ 0.        ,  0.01050737, -0.00237949,  0.00566179, -0.00987408,
       -0.00468587, -0.00109895,  0.00075036,  0.00070752, -0.00140358,
       -0.00578482,  0.00953527, -0.00012303,  0.00933194, -0.00688377,
       -0.00480891, -0.0071884 ,  0.00752147, -0.00590785, -0.00410139,
       -0.01059372, -0.00493194,  0.01051179,  0.01020716, -0.01071676,
        0.00680183,  0.00570288,  0.0066788 ,  0.00089398, -0.00391493])
@pbrod

The scipy.special.boxcox for lmbda != 0 is implemented as (pow(x, lmbda) - 1.0) / lmbda and is numerically unstable for small lmbda.

A better solution in the _ppf method is to replace the call to boxcox with the following:

   def _ppf(self, q, c):
          x = -log(-log(q))
         return _lazywhere((x==x) & (c != 0), (x, c), lambda x, c: -expm1(-c*x) / c, x)

Replacing with the above you will get the machine precision as shown here:

In [36]: q = np.linspace(0., 1., 30, endpoint=False)
In [37]: c=1e-8;(np.abs(genpareto.cdf(genpareto.ppf(q, c),c) - q))
Out[37]:
array([ 0.00000000e+00, 0.00000000e+00, 1.38777878e-17,
0.00000000e+00, 0.00000000e+00, 2.77555756e-17,
0.00000000e+00, 5.55111512e-17, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 1.11022302e-16,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
1.11022302e-16, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00])

In [40]: c=1e-15;(np.abs(genpareto.cdf(genpareto.ppf(q, c),c) - q))
Out[40]:
array([ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 2.77555756e-17, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 1.11022302e-16, 0.00000000e+00,
0.00000000e+00, 1.11022302e-16, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
1.11022302e-16, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00])

@pv
Owner
pv commented

There's scipy.special.boxcox.

If there's something to improve, the improvement should be done in scipy.special.

scipy/stats/_continuous_distns.py
((43 lines not shown))
def _ppf(self, q, c):
- vals = 1.0/c * (pow(1-q, -c)-1)
- return vals
+ return -boxcox(1. - q, -c)
@ev-br Collaborator
ev-br added a note

@pv which is exactly what's used here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@ev-br
Collaborator

@pbrod would you be interested in fixing special.boxcox like you've shown?

@pv
Owner
pv commented

A suitable fix is probably to use for |c| << log(x) the expansion (x^c - 1)/c = (exp(c log(x)) - 1)/c = sum_{n=1}^inf c^{n-1} log(x)^n/n!.

@pbrod

Using scipy.special.boxcox is not a good idea for small q in genpareto._ppf. Even if you replace the (pow(x, lmbda) - 1.0) / lmbda part in scipy.special.boxcox with with expm1(lmbda*log(x))/lmbda, you will still loose precision in genpareto.ppf for small q because x = 1-q.

@WarrenWeckesser
Collaborator

special.boxcox should be fixed, and since I'm the one responsible for the naive implementation, I'm happy to fix it. (I'm already experimenting with the series given by @pv.) However, @pbrod is correct: if q is near zero, then the battle for precision is lost as soon as you compute 1 - q, regardless of how boxcox is implemented.

@pv
Owner
pv commented

If boxcox(1 - q, c) is common, it may make sense to extend the boxcox function to support also this, e.g. boxcox(-q, c, at_1=True).

@josef-pkt
Collaborator

q near 0 or near 1 is a bit a different issue, because we can use isf or ppf depending on whether we are in the upper or lower tail, I think. (*)
(Otherwise I didn't pay enough attention to understand the numerics of the different solutions.)

(*) so far a choice by user. we run in some cases into problems when we do use ppf in the extreme upper tail as in #3214
ppf in this case is also used for the rvs.

If boxcox(1 - q, c) is common

I have no idea how common the extreme cases are, I think usually not common with actual data.

@pv
Owner
pv commented
@ev-br
Collaborator

I think a good API would be to separate the offset explicitly: boxcox(x, lmbda, x0=0), calculating ((x+x0)**lmbda -1)/lmbda. This way both a user can pick the right one, and the implementation would have a chance to do the right thing under the hood.

I don't think there's much boxcox in the scipy codebase so far, given that it was only introduced in #3150.

Overall, I think it's worth it to grow the collection of ufuncs for, loosely speaking, 'simply-looking combinations of elementary functions with all the corner cases taken care of' (xlogy, log1p, boxcox and so on).

@WarrenWeckesser
Collaborator

I updated boxcox and added the new function boxcox1p here: #3229

It was sufficient to express the function use expm1 and either log or log1p--no need for the series expansion.

Consistent with the functions expm1, log1p and xlogyp1, I added the new function boxcox1p instead of adding additional arguments to boxcox.

@ev-br
Collaborator

Incorporated the new boxcox, boxcox1p, added an explicit _isf method, and squashed it all into a single commit.

@josef-pkt
Collaborator

looks good to me

@coveralls

Coverage Status

Coverage remained the same when pulling 9c3c02f on ev-br:genpareto into 6b6b41a on scipy:master.

scipy/stats/_continuous_distns.py
((51 lines not shown))
def _munp(self, n, c):
- k = arange(0, n+1)
- val = (-1.0/c)**n * sum(comb(n, k)*(-1)**k / (1.0-c*k), axis=0)
- return where(c*n < 1, val, inf)
+ if c != 0:
+ k = arange(0, n+1)
+ val = (-1.0/c)**n * sum(comb(n, k)*(-1)**k / (1.0-c*k), axis=0)
+ return where(c*n < 1, val, inf)
+ else:
+ return gam(n+1)
@pbrod
pbrod added a note

Vectorization of _munp can be done like this

def _munp(self, n, c):
    def __munp(n, c):
        val = 0.0
        k = arange(0, n + 1)
        for ki, cnk in zip(k, comb(n, k)):
            val = val + cnk * (-1) ** ki / (1.0 - c * ki)
        return where(c * n < 1, val * (-1.0 / c) ** n, inf)
    munp = lambda c: __munp(n, c)
    return _lazywhere(c != 0, (c,), munp, gam(n + 1))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
scipy/stats/_continuous_distns.py
((29 lines not shown))
def _logpdf(self, x, c):
- return (-1.0-1.0/c) * np.log1p(c*x)
+ return -(c + 1.) * self._log1pcx(x, c)
@pbrod
pbrod added a note

genpareto.logpdf(inf, 0.1) return nan. Correct answer is -inf (This failure is due to np.log1p(inf) returns nan in numpy version 1.8.0)
genpareto.logpdf(1, -1) return nan. Correct answer is 0

@WarrenWeckesser Collaborator

@pbrod: I can't reproduce the np.log1p bug. Using both 1.7.1 and 1.8.0, np.log1p(np.inf) returns inf. Perhaps it is platform dependent? (I'm using Ubuntu 12.04 (64 bit).) Have you reported the problem on the numpy issue tracker (https://github.com/numpy/numpy/issues)?

@pbrod
pbrod added a note

I am using windows 7 and numpy version 1.8.0 that comes with pythonxy and yes
I have just reported it on numpy.

@ev-br Collaborator
ev-br added a note

@pbrod what does scipy.special.log1p(inf) return on your system? --- apparently, there's one from cephes/unity.c
(I cannot test it myself, since I'm on ubuntu precise at the moment)

@pbrod
pbrod added a note
@pbrod
pbrod added a note

However, scipy.special.log1p gives the correct answer on my system:

In [2]: scipy.special.log1p(inf)
Out[2]: inf

@ev-br Collaborator
ev-br added a note

Great! Would you be able to run the full stats test suite on your machine, using the updated PR?

@pbrod
pbrod added a note
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@ev-br
Collaborator

In the last commit I've replaced the call to np.log1p with scipy.special.log1p in genpareto and elsewhere in _continuous_distns. All tests keep passing locally, but this needs to be tested on Windows (numpy/numpy#4225).

@coveralls

Coverage Status

Coverage remained the same when pulling 9fc4d7b on ev-br:genpareto into 6b6b41a on scipy:master.

@pbrod pbrod commented on the diff
scipy/stats/_continuous_distns.py
@@ -11,11 +11,11 @@
from scipy import special
from scipy import optimize
from scipy import integrate
-from scipy.special import (gammaln as gamln, gamma as gam)
+from scipy.special import (gammaln as gamln, gamma as gam, boxcox, boxcox1p)
@pbrod
pbrod added a note

Why not import log1p here?

@ev-br Collaborator
ev-br added a note

no strong preference here, mostly a matter of taste. I personally have a slight preference for being a little more explicit, and here at least I would definitely expect just log1p being a numpy function. Can change it if there are strong opinions though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
scipy/stats/_continuous_distns.py
((6 lines not shown))
from numpy import (where, arange, putmask, ravel, sum, shape,
log, sqrt, exp, arctanh, tan, sin, arcsin, arctan,
- tanh, cos, cosh, sinh, log1p, expm1)
+ tanh, cos, cosh, sinh, expm1)
@pbrod
pbrod added a note

Why not import expm1 from scipy.special also, since numpy.expm1(inf) returns nan on windows?

@ev-br Collaborator
ev-br added a note

It does not seem to be directly related to this PR --- expm1 is only used in _cdf, and the argument does not go to positive inf. (BTW boxcox has been reimplemeted in terms of expm1, is it affected? ping @WarrenWeckesser)
But I agree, it might be that expm1 & log1p have to be replaced everywhere in the scipy codebase. I'll open a ticket.

@WarrenWeckesser Collaborator

Why wouldn't the argument of _cdf go to inf? The result should be 1. If we use the buggy version of np.expm1, _cdf(inf, c) will return nan.

@josef-pkt Collaborator

Having a correct _cdf(inf, c) is a bonus and desired. However, IIRC, cdf wouldn't/shouldn't delegate to _cdf if x == .b == inf and the wrapper code should return 1.

@ev-br Collaborator
ev-br added a note

@WarrenWeckesser indeed, I was wrong.
The last commit uses special.expm1

@WarrenWeckesser Collaborator

@josef-pkt: Good point. So it is not essential that _cdf handle inf, but it would be nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@coveralls

Coverage Status

Coverage remained the same when pulling 4bfdceb on ev-br:genpareto into 6b6b41a on scipy:master.

@pv pv added the PR label
@ev-br
Collaborator

rebased, added vectorized _munp implementation (vectorization beats readability, huh), force-pushed the result. I believe I've incorporated all review comments.

@coveralls

Coverage Status

Coverage remained the same when pulling 6169f67 on ev-br:genpareto into 32cd96d on scipy:master.

@pbrod

It looks good except that genpareto.logpdf(1, -1) still returns nan. Correct answer is 0

@ev-br
Collaborator

@pbrod does the last commit fixes it for you? [I cannot reproduce it locally, so I'm reduced to guessing and trying]. @pv this involves a small tweak in special.

@coveralls

Coverage Status

Changes Unknown when pulling d4a484d on ev-br:genpareto into ** on scipy:master**.

@WarrenWeckesser
Collaborator

This does not fix the problem that @pbrod pointed out. The problem is not in log1p; it is in _logpdf(). _log1pcx correctly returns inf, but then in _logpdf, that value is mutilplied by (c + 1.) (which is 0), and 0 * inf is nan.

You could refactor a bit, and maybe use xlog1py, but my initial impression is that somewhere in the code you'll need special cases for both c = 0 and c = -1.

@ev-br
Collaborator

Ah, yes, you're right. It's not just tweaking the plumbing: At c=-1, genpareto must reduce to a uniform distribution on [0, 1], which this implementation does not. Will revert the last commit and look at this a bit more.

@ev-br
Collaborator

In the last commit I'm using xlog1py (thanks @WarrenWeckesser) to special-case both c=0 and c=-1.

@coveralls

Coverage Status

Coverage remained the same when pulling 5c1ec5a on ev-br:genpareto into b54a499 on scipy:master.

@coveralls

Coverage Status

Coverage remained the same when pulling e4949fc on ev-br:genpareto into b54a499 on scipy:master.

ev-br added some commits
@ev-br ev-br changed the title from Add the limiting exponential distribution to generalized Pareto distribution with shape c=0 to Add the limiting distributions to generalized Pareto distribution with shapes c=0 and c=-1
@ev-br
Collaborator

Rebased, changed the title to better reflect the content of the PR (c=0 and c=-1). I believe I've addressed all the review comments.

@coveralls

Coverage Status

Coverage increased (+0.02%) when pulling 9718599 on ev-br:genpareto into 686537d on scipy:master.

@pv pv removed the PR label
@ev-br ev-br added the enhancement label
@ev-br ev-br added this to the 0.15.0 milestone
@ev-br
Collaborator

I think it'd be nice to have it in 0.15 if time permits.

@pbrod

I agree..

@argriffing argriffing merged commit 8b8531d into scipy:master

1 check passed

Details continuous-integration/travis-ci The Travis CI build passed
@argriffing
Collaborator

Thanks for making these improvements! I especially agree with:

Overall, I think it's worth it to grow the collection of ufuncs for, loosely speaking, 'simply-looking combinations of elementary functions with all the corner cases taken care of'

@ev-br
Collaborator

@argriffing you've mplemented a nice bunch of them recently, have you not :-)

@ev-br ev-br deleted the ev-br:genpareto branch
@argriffing
Collaborator

Yes I added some weird functions that have been graciously merged, but they are not yet ufunc-ified and they do not yet deal with all corner cases.

@ev-br
Collaborator

Ah, good to know! I think it'd be very useful to actually make them ufuncs and fix up all the corner cases

@argriffing
Collaborator

I think it'd be very useful to actually make them ufuncs and fix up all the corner cases

PR #3981

@argriffing
Collaborator

Adding boxcox inverse transform ufuncs to scipy.special could help clean up

def _logpdf(self, x, c):
    return _lazywhere((x == x) & (c != 0), (x, c),
        lambda x, c: -special.xlog1py(c+1., c*x) / c, -x)

as well as helping answer http://stackoverflow.com/questions/26391454

@ev-br
Collaborator

Yeah, it could. Open an issue for this? So that it's more visible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jul 23, 2014
  1. @ev-br

    ENH: add the limit shape c=0 to genpareto distribution

    ev-br authored
    For c=0, genpareto is equivalent to the exponential
    distribution.
  2. @ev-br

    BUG: use special.log1p instead of np.log1p

    ev-br authored
    np.log1p(np.inf) is reported to produce nans on some platforms,
    see numpy issue gh-4225
  3. @ev-br
  4. @ev-br

    ENH: vectorize genpareto._munp

    ev-br authored
    implementation is by @pbrod
  5. @ev-br
This page is out of date. Refresh to see the latest.
View
113 scipy/stats/_continuous_distns.py
@@ -11,11 +11,11 @@
from scipy import special
from scipy import optimize
from scipy import integrate
-from scipy.special import (gammaln as gamln, gamma as gam)
+from scipy.special import (gammaln as gamln, gamma as gam, boxcox, boxcox1p)
@pbrod
pbrod added a note

Why not import log1p here?

@ev-br Collaborator
ev-br added a note

no strong preference here, mostly a matter of taste. I personally have a slight preference for being a little more explicit, and here at least I would definitely expect just log1p being a numpy function. Can change it if there are strong opinions though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
from numpy import (where, arange, putmask, ravel, sum, shape,
log, sqrt, exp, arctanh, tan, sin, arcsin, arctan,
- tanh, cos, cosh, sinh, log1p, expm1)
+ tanh, cos, cosh, sinh)
from numpy import polyval, place, extract, any, asarray, nan, inf, pi
@@ -55,7 +55,7 @@ class kstwobign_gen(rv_continuous):
"""
def _cdf(self, x):
- return 1.0-special.kolmogorov(x)
+ return 1.0 - special.kolmogorov(x)
def _sf(self, x):
return special.kolmogorov(x)
@@ -941,10 +941,10 @@ def _logpdf(self, x):
return -x
def _cdf(self, x):
- return -expm1(-x)
+ return -special.expm1(-x)
def _ppf(self, q):
- return -log1p(-q)
+ return -special.log1p(-q)
def _sf(self, x):
return exp(-x)
@@ -985,17 +985,17 @@ def _pdf(self, x, a, c):
def _logpdf(self, x, a, c):
negxc = -x**c
- exm1c = -expm1(negxc)
+ exm1c = -special.expm1(negxc)
logp = (log(a) + log(c) + special.xlogy(a - 1.0, exm1c) +
negxc + special.xlogy(c - 1.0, x))
return logp
def _cdf(self, x, a, c):
- exm1c = -expm1(-x**c)
+ exm1c = -special.expm1(-x**c)
return exm1c**a
def _ppf(self, q, a, c):
- return (-log1p(-q**(1.0/a)))**asarray(1.0/c)
+ return (-special.log1p(-q**(1.0/a)))**asarray(1.0/c)
exponweib = exponweib_gen(a=0.0, name='exponweib')
@@ -1031,16 +1031,16 @@ def _logpdf(self, x, b):
return 1 + log(b) + (b-1.0)*log(x) + xb - exp(xb)
def _cdf(self, x, b):
- return -expm1(-expm1(x**b))
+ return -special.expm1(-special.expm1(x**b))
def _sf(self, x, b):
- return exp(-expm1(x**b))
+ return exp(-special.expm1(x**b))
def _isf(self, x, b):
- return (log1p(-log(x)))**(1./b)
+ return (special.log1p(-log(x)))**(1./b)
def _ppf(self, q, b):
- return pow(log1p(-log1p(-q)), 1.0/b)
+ return pow(special.log1p(-special.log1p(-q)), 1.0/b)
exponpow = exponpow_gen(a=0.0, name='exponpow')
@@ -1289,10 +1289,10 @@ def _logpdf(self, x, c):
return log(c) + (c-1)*log(x) - pow(x, c)
def _cdf(self, x, c):
- return -expm1(-pow(x, c))
+ return -special.expm1(-pow(x, c))
def _ppf(self, q, c):
- return pow(-log1p(-q), 1.0/c)
+ return pow(-special.log1p(-q), 1.0/c)
def _munp(self, n, c):
return special.gamma(1.0+n*1.0/c)
@@ -1367,7 +1367,7 @@ def _pdf(self, x, c):
return exp(self._logpdf(x, c))
def _logpdf(self, x, c):
- return log(c) - x - (c+1.0)*log1p(exp(-x))
+ return log(c) - x - (c+1.0)*special.log1p(exp(-x))
def _cdf(self, x, c):
Cx = (1+exp(-x))**(-c)
@@ -1400,42 +1400,65 @@ class genpareto_gen(rv_continuous):
genpareto.pdf(x, c) = (1 + c * x)**(-1 - 1/c)
- for ``c != 0``, and for ``x >= 0`` for all c,
- and ``x < 1/abs(c)`` for ``c < 0``.
+ for ``c >= 0`` ``x >= 0``, and
+ for ``c < 0`` ``0 <= x <= -1/c``
+
+ For ``c == 0``, `genpareto` reduces to the exponential
+ distribution, `expon`::
+
+ genpareto.pdf(x, c=0) = exp(-x)
+
+ For ``c == -1``, `genpareto` is uniform on ``[0, 1]``::
+
+ genpareto.cdf(x, c=-1) = x
%(example)s
"""
def _argcheck(self, c):
c = asarray(c)
- self.b = where(c < 0, 1.0/abs(c), inf)
- return where(c == 0, 0, 1)
+ self.b = _lazywhere(c < 0, (c,),
+ lambda c: -1. / c, np.inf)
+ return True
def _pdf(self, x, c):
- Px = pow(1+c*x, asarray(-1.0-1.0/c))
- return Px
+ return np.exp(self._logpdf(x, c))
def _logpdf(self, x, c):
- return (-1.0-1.0/c) * np.log1p(c*x)
+ return _lazywhere((x == x) & (c != 0), (x, c),
+ lambda x, c: -special.xlog1py(c+1., c*x) / c,
+ -x)
def _cdf(self, x, c):
- return 1.0 - pow(1+c*x, asarray(-1.0/c))
+ return -special.expm1(self._logsf(x, c))
+
+ def _sf(self, x, c):
+ return np.exp(self._logsf(x, c))
+
+ def _logsf(self, x, c):
+ return _lazywhere((x == x) & (c != 0), (x, c),
+ lambda x, c: -special.log1p(c*x) / c,
+ -x)
def _ppf(self, q, c):
- vals = 1.0/c * (pow(1-q, -c)-1)
- return vals
+ return -boxcox1p(-q, -c)
+
+ def _isf(self, q, c):
+ return -boxcox(q, -c)
def _munp(self, n, c):
- k = arange(0, n+1)
- val = (-1.0/c)**n * sum(comb(n, k)*(-1)**k / (1.0-c*k), axis=0)
- return where(c*n < 1, val, inf)
+ def __munp(n, c):
+ val = 0.0
+ k = arange(0, n + 1)
+ for ki, cnk in zip(k, comb(n, k)):
+ val = val + cnk * (-1) ** ki / (1.0 - c * ki)
+ return where(c * n < 1, val * (-1.0 / c) ** n, inf)
+ return _lazywhere(c != 0, (c,),
+ lambda c: __munp(n, c),
+ gam(n + 1))
def _entropy(self, c):
- if (c > 0):
- return 1+c
- else:
- self.b = -1.0 / c
- return rv_continuous._entropy(self, c)
+ return 1. + c
genpareto = genpareto_gen(a=0.0, name='genpareto')
@@ -1465,13 +1488,15 @@ class genexpon_gen(rv_continuous):
"""
def _pdf(self, x, a, b, c):
- return (a+b*(-expm1(-c*x)))*exp((-a-b)*x+b*(-expm1(-c*x))/c)
+ return (a + b*(-special.expm1(-c*x)))*exp((-a-b)*x +
+ b*(-special.expm1(-c*x))/c)
def _cdf(self, x, a, b, c):
- return -expm1((-a-b)*x + b*(-expm1(-c*x))/c)
+ return -special.expm1((-a-b)*x + b*(-special.expm1(-c*x))/c)
def _logpdf(self, x, a, b, c):
- return np.log(a+b*(-expm1(-c*x))) + (-a-b)*x+b*(-expm1(-c*x))/c
+ return np.log(a+b*(-special.expm1(-c*x))) + \
+ (-a-b)*x+b*(-special.expm1(-c*x))/c
genexpon = genexpon_gen(a=0.0, name='genexpon')
@@ -1505,7 +1530,7 @@ def _argcheck(self, c):
def _pdf(self, x, c):
cx = c*x
- logex2 = where((c == 0)*(x == x), 0.0, log1p(-cx))
+ logex2 = where((c == 0)*(x == x), 0.0, special.log1p(-cx))
logpex2 = where((c == 0)*(x == x), -x, logex2/c)
pex2 = exp(logpex2)
# Handle special cases
@@ -1514,12 +1539,12 @@ def _pdf(self, x, c):
return exp(logpdf)
def _cdf(self, x, c):
- loglogcdf = where((c == 0)*(x == x), -x, log1p(-c*x)/c)
+ loglogcdf = where((c == 0)*(x == x), -x, special.log1p(-c*x)/c)
return exp(-exp(loglogcdf))
def _ppf(self, q, c):
x = -log(-log(q))
- return where((c == 0)*(x == x), x, -expm1(-c*x)/c)
+ return where((c == 0)*(x == x), x, -special.expm1(-c*x)/c)
def _stats(self, c):
g = lambda n: gam(n*c+1)
@@ -1529,9 +1554,9 @@ def _stats(self, c):
g4 = g(4)
g2mg12 = where(abs(c) < 1e-7, (c*pi)**2.0/6.0, g2-g1**2.0)
gam2k = where(abs(c) < 1e-7, pi**2.0/6.0,
- expm1(gamln(2.0*c+1.0)-2*gamln(c+1.0))/c**2.0)
+ special.expm1(gamln(2.0*c+1.0)-2*gamln(c+1.0))/c**2.0)
eps = 1e-14
- gamk = where(abs(c) < eps, -_EULER, expm1(gamln(c+1))/c)
+ gamk = where(abs(c) < eps, -_EULER, special.expm1(gamln(c+1))/c)
m = where(c < -1.0, nan, -gamk)
v = where(c < -0.5, nan, g1**2.0*gam2k)
@@ -2016,7 +2041,7 @@ def _pdf(self, x):
return 2.0/pi/(1.0+x*x)
def _logpdf(self, x):
- return np.log(2.0/pi) - np.log1p(x*x)
+ return np.log(2.0/pi) - special.log1p(x*x)
def _cdf(self, x):
return 2.0/pi*arctan(x)
@@ -3739,9 +3764,9 @@ def _munp(self, n, b):
# wrong answer with formula, same as in continuous.pdf
# return gam(n+1)-special.gammainc(1+n, b)
if n == 1:
- return (1-(b+1)*exp(-b))/(-expm1(-b))
+ return (1-(b+1)*exp(-b))/(-special.expm1(-b))
elif n == 2:
- return 2*(1-0.5*(b*b+2*b+2)*exp(-b))/(-expm1(-b))
+ return 2*(1-0.5*(b*b+2*b+2)*exp(-b))/(-special.expm1(-b))
else:
# return generic for higher moments
# return rv_continuous._mom1_sc(self, n, b)
View
97 scipy/stats/tests/test_distributions.py
@@ -465,6 +465,103 @@ def test_stats(self):
assert_allclose(k, 6*(4.5**3 + 4.5**2 - 6*4.5 - 2)/(4.5*1.5*0.5))
+class TestGenpareto(TestCase):
+ def test_ab(self):
+ # c >= 0: a, b = [0, inf]
+ for c in [1., 0.]:
+ c = np.asarray(c)
+ stats.genpareto._argcheck(c) # ugh
+ assert_equal(stats.genpareto.a, 0.)
+ assert_(np.isposinf(stats.genpareto.b))
+
+ # c < 0: a=0, b=1/|c|
+ c = np.asarray(-2.)
+ stats.genpareto._argcheck(c)
+ assert_allclose([stats.genpareto.a, stats.genpareto.b], [0., 0.5])
+
+ def test_c0(self):
+ # with c=0, genpareto reduces to the exponential distribution
+ rv = stats.genpareto(c=0.)
+ x = np.linspace(0, 10., 30)
+ assert_allclose(rv.pdf(x), stats.expon.pdf(x))
+ assert_allclose(rv.cdf(x), stats.expon.cdf(x))
+ assert_allclose(rv.sf(x), stats.expon.sf(x))
+
+ q = np.linspace(0., 1., 10)
+ assert_allclose(rv.ppf(q), stats.expon.ppf(q))
+
+ def test_cm1(self):
+ # with c=-1, genpareto reduces to the uniform distr on [0, 1]
+ rv = stats.genpareto(c=-1.)
+ x = np.linspace(0, 10., 30)
+ assert_allclose(rv.pdf(x), stats.uniform.pdf(x))
+ assert_allclose(rv.cdf(x), stats.uniform.cdf(x))
+ assert_allclose(rv.sf(x), stats.uniform.sf(x))
+
+ q = np.linspace(0., 1., 10)
+ assert_allclose(rv.ppf(q), stats.uniform.ppf(q))
+
+ # logpdf(1., c=-1) should be zero
+ assert_allclose(rv.logpdf(1), 0)
+
+ def test_x_inf(self):
+ # make sure x=inf is handled gracefully
+ rv = stats.genpareto(c=0.1)
+ assert_allclose([rv.pdf(np.inf), rv.cdf(np.inf)], [0., 1.])
+ assert_(np.isneginf(rv.logpdf(np.inf)))
+
+ rv = stats.genpareto(c=0.)
+ assert_allclose([rv.pdf(np.inf), rv.cdf(np.inf)], [0., 1.])
+ assert_(np.isneginf(rv.logpdf(np.inf)))
+
+ rv = stats.genpareto(c=-1.)
+ assert_allclose([rv.pdf(np.inf), rv.cdf(np.inf)], [0., 1.])
+ assert_(np.isneginf(rv.logpdf(np.inf)))
+
+ def test_c_continuity(self):
+ # pdf is continuous at c=0, -1
+ x = np.linspace(0, 10, 30)
+ for c in [0, -1]:
+ pdf0 = stats.genpareto.pdf(x, c)
+ for dc in [1e-14, -1e-14]:
+ pdfc = stats.genpareto.pdf(x, c + dc)
+ assert_allclose(pdf0, pdfc, atol=1e-12)
+
+ cdf0 = stats.genpareto.cdf(x, c)
+ for dc in [1e-14, 1e-14]:
+ cdfc = stats.genpareto.cdf(x, c + dc)
+ assert_allclose(cdf0, cdfc, atol=1e-12)
+
+ def test_c_continuity_ppf(self):
+ q = np.r_[np.logspace(1e-12, 0.01, base=0.1),
+ np.linspace(0.01, 1, 30, endpoint=False),
+ 1. - np.logspace(1e-12, 0.01, base=0.1)]
+ for c in [0., -1.]:
+ ppf0 = stats.genpareto.ppf(q, c)
+ for dc in [1e-14, -1e-14]:
+ ppfc = stats.genpareto.ppf(q, c + dc)
+ assert_allclose(ppf0, ppfc, atol=1e-12)
+
+ def test_c_continuity_isf(self):
+ q = np.r_[np.logspace(1e-12, 0.01, base=0.1),
+ np.linspace(0.01, 1, 30, endpoint=False),
+ 1. - np.logspace(1e-12, 0.01, base=0.1)]
+ for c in [0., -1.]:
+ isf0 = stats.genpareto.isf(q, c)
+ for dc in [1e-14, -1e-14]:
+ isfc = stats.genpareto.isf(q, c + dc)
+ assert_allclose(isf0, isfc, atol=1e-12)
+
+ def test_cdf_ppf_roundtrip(self):
+ # this should pass with machine precision. hat tip @pbrod
+ q = np.r_[np.logspace(1e-12, 0.01, base=0.1),
+ np.linspace(0.01, 1, 30, endpoint=False),
+ 1. - np.logspace(1e-12, 0.01, base=0.1)]
+ for c in [1e-8, -1e-18, 1e-15, -1e-15]:
+ assert_allclose(stats.genpareto.cdf(stats.genpareto.ppf(q, c), c),
+ q, atol=1e-15)
+
+
class TestPearson3(TestCase):
def test_rvs(self):
vals = stats.pearson3.rvs(0.1, size=(2, 50))
Something went wrong with that request. Please try again.