See ticket 1545, and (closed) pull request #101 for more information.
BUG: stats: fixed calculation of the variance and kurtosis of the Tuk…
…ey Lambda distribution (ticket 1545)
Looks like a lot of work. I just read through the changes, not all details yet.
I think the generic stats functions are not vectorized, only the distribution specific _stats might work (by chance) for array arguments. I never checked this, but trying a few examples all raise exception (unintentional because it doesn't work) in the generic wrapping code in stats.
I didn't think about this earlier only when I saw the raise NotImplementedError. In general vectorized functions don't raise an exception but return nan for invalid arguments and return the valid results.
However, some methods that are not vectorized in scipy.stats.distributions raise exceptions, mostly ValueError.
The docstring for the stats() method (.e.g norm.stats(), gamma.stats(), etc) says:
Some statistics of the given RV
arg1, arg2, arg3,... : array_like
The shape parameter(s) for the distribution (see docstring of the
instance object for more information)
That says to me that the stats() method should handle arrays. Some do, and some don't:
In : dgamma.stats([1,2])
Out: (array([ 0., 0.]), array([ 2., 6.]))
In : gamma.stats([1,2])
ValueError Traceback (most recent call last)
/Users/warren/<ipython-input-47-aea54145098c> in <module>()
----> 1 gamma.stats([1,2])
/Users/warren/local_scipy/lib/python2.7/site-packages/scipy/stats/distributions.pyc in stats(self, *args, **kwds)
1642 mu = self._munp(1.0,*goodargs)
1643 mu2 = mu2p - mu*mu
-> 1644 if np.isinf(mu):
1645 #if mean is inf then var is also inf
1646 mu2 = np.inf
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I think that's a bug in gamma.stats(), and any other distribution that doesn't handle an array.
Maybe raising NotImplementedError isn't the right response, but the fact that the function breaks when lambda > 98 is a limitation of the implementation, not of the function itself. The mathematical function is defined for lambda > 98, but this implementation doesn't compute it. That's what I had in mind by using the NotImplementedError, but maybe just putting in nan is fine, as long as this is explained in the docstring. (Better, of course, would be to make the function work!)
I think you can use gammaln for large values
>>> reg=200;( (2.0 / (reg**2)) * (1.0 / (1 + 2*reg) - np.exp(2*gammaln(reg + 1) - gammaln(2*reg + 2))) )
It looks like eventually it will be just a degenerate distribution with a point at zero, so we could also set the variance to zero for large enough. There are other distributions where the approach to the degenerate distribution is not well defined.
To the vectorized stats question: I think this deserves a separate ticket. The stats method is partially, (but incorrectly or incompletely) set up to handle array inputs.
However, I don't understand why dgamma.stats works, my impression is that all array inputs should eventually hit the "if np.isinf(mu):" in the generic code. (I only tried gamma and lognorm). One possibility is because mu=0 in dgamma, maybe it's always a scalar at the isinf check.
I experimented a bit with the gammaln version, and it seems to work just as well as the version I implemented for the same range, and it allows much larger arguments--thanks!
As the wikipedia page points out, the formulas for variance and kurtosis can also be expressed in terms of beta functions, so I'm going to try replacing the use of the gamma (or gammaln) function with beta.
I'll also implement the calculation near lambda=0 with Pade approximations. The implementation using interpolation was an interesting experiment, but far less efficient than using Pade.
So I'm basically going to rewrite the implementation. At least the tests won't have to change. :)
Rewrite the Tukey Lambda variance and kurtosis functions.
I just committed a rewrite of the variance and kurtosis functions. The formulas are implemented using the beta function, and values near lambda=0 are computed using Pade approximations. (I'll rebase before actually committing anything, so even if it looks good, don't hit the big green button yet.)
Looks nice, and thanks for including the mpmath code (I need to remember this as a recipe)
The changes are fine with me. My impression was that the variance is inf for lam<-0.5, but looking for half an hour on the internet doesn't give any clear statements. (var for lam<-0.5 is "not finite") Without any contrary evidence the nan in this case is fine.
Thanks for reviewing the request, Josef.
Rebased and committed in 0cac0b5