-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the limiting distributions to generalized Pareto distribution with shapes c=0 and c=-1 #3225
Conversation
about ppf: I don't know the new boxcox, but in the old version we have 1-eps - 1 which becomes mostly noise for c<1e-8 my guess is that it's a purely numerical problems that won't go away without using a (Taylor) series expansion around c=0 or something like that
|
The scipy.special.boxcox for lmbda != 0 is implemented as (pow(x, lmbda) - 1.0) / lmbda and is numerically unstable for small lmbda. A better solution in the _ppf method is to replace the call to boxcox with the following:
Replacing with the above you will get the machine precision as shown here: In [36]: q = np.linspace(0., 1., 30, endpoint=False) In [40]: c=1e-15;(np.abs(genpareto.cdf(genpareto.ppf(q, c),c) - q)) |
There's If there's something to improve, the improvement should be done in scipy.special. |
|
||
def _ppf(self, q, c): | ||
vals = 1.0/c * (pow(1-q, -c)-1) | ||
return vals | ||
return -boxcox(1. - q, -c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pv which is exactly what's used here
@pbrod would you be interested in fixing |
A suitable fix is probably to use for |
Using scipy.special.boxcox is not a good idea for small q in genpareto._ppf. Even if you replace the (pow(x, lmbda) - 1.0) / lmbda part in scipy.special.boxcox with with expm1(lmbda*log(x))/lmbda, you will still loose precision in genpareto.ppf for small q because x = 1-q. |
|
If |
q near 0 or near 1 is a bit a different issue, because we can use isf or ppf depending on whether we are in the upper or lower tail, I think. (*) (*) so far a choice by user. we run in some cases into problems when we do use ppf in the extreme upper tail as in #3214
I have no idea how common the extreme cases are, I think usually not common with actual data. |
How common is the call boxcox(1 - q, c) in the Scipy codebase? |
I think a good API would be to separate the offset explicitly: I don't think there's much boxcox in the scipy codebase so far, given that it was only introduced in #3150. Overall, I think it's worth it to grow the collection of ufuncs for, loosely speaking, 'simply-looking combinations of elementary functions with all the corner cases taken care of' ( |
I updated It was sufficient to express the function use Consistent with the functions |
Incorporated the new |
looks good to me |
val = (-1.0/c)**n * sum(comb(n, k)*(-1)**k / (1.0-c*k), axis=0) | ||
return where(c*n < 1, val, inf) | ||
else: | ||
return gam(n+1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vectorization of _munp can be done like this
def _munp(self, n, c):
def __munp(n, c):
val = 0.0
k = arange(0, n + 1)
for ki, cnk in zip(k, comb(n, k)):
val = val + cnk * (-1) ** ki / (1.0 - c * ki)
return where(c * n < 1, val * (-1.0 / c) ** n, inf)
munp = lambda c: __munp(n, c)
return _lazywhere(c != 0, (c,), munp, gam(n + 1))
In the last commit I've replaced the call to |
@@ -11,11 +11,11 @@ | |||
from scipy import special | |||
from scipy import optimize | |||
from scipy import integrate | |||
from scipy.special import (gammaln as gamln, gamma as gam) | |||
from scipy.special import (gammaln as gamln, gamma as gam, boxcox, boxcox1p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not import log1p here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no strong preference here, mostly a matter of taste. I personally have a slight preference for being a little more explicit, and here at least I would definitely expect just log1p
being a numpy function. Can change it if there are strong opinions though.
This does not fix the problem that @pbrod pointed out. The problem is not in You could refactor a bit, and maybe use |
Ah, yes, you're right. It's not just tweaking the plumbing: At c=-1, genpareto must reduce to a uniform distribution on [0, 1], which this implementation does not. Will revert the last commit and look at this a bit more. |
In the last commit I'm using xlog1py (thanks @WarrenWeckesser) to special-case both c=0 and c=-1. |
For c=0, genpareto is equivalent to the exponential distribution.
np.log1p(np.inf) is reported to produce nans on some platforms, see numpy issue scipygh-4225
implementation is by @pbrod
Rebased, changed the title to better reflect the content of the PR (c=0 and c=-1). I believe I've addressed all the review comments. |
I think it'd be nice to have it in 0.15 if time permits. |
I agree.. |
Add the limiting distributions to generalized Pareto distribution with shapes c=0 and c=-1
Thanks for making these improvements! I especially agree with:
|
@argriffing you've mplemented a nice bunch of them recently, have you not :-) |
Yes I added some weird functions that have been graciously merged, but they are not yet ufunc-ified and they do not yet deal with all corner cases. |
Ah, good to know! I think it'd be very useful to actually make them ufuncs and fix up all the corner cases |
PR #3981 |
Adding boxcox inverse transform ufuncs to scipy.special could help clean up def _logpdf(self, x, c):
return _lazywhere((x == x) & (c != 0), (x, c),
lambda x, c: -special.xlog1py(c+1., c*x) / c, -x) as well as helping answer http://stackoverflow.com/questions/26391454 |
Yeah, it could. Open an issue for this? So that it's more visible |
closes gh-1295
The implementation here is similar to the original one by @pbrod, as listed in gh-1295.
Several points I'd like to flag for a review:
ppf
at smallc
--- I first wrote the test without much thinking, but in fact I'm not sure if the requirement of the test is not too stringent. The specific question here, I guess, is whether the Box-Cox transform is uniformly convergent atlambda\to 0
.log(1 + a*x)/x
with the correct behavior atx=0
, instead of a private helper ingenpareto
._munp
andentropy
are not properly vectorized at the moment (neither in master, not in this PR).