fix moments for invgamma #2849

ev-br · 2013-09-10T09:46:41Z

Yesterday I've erroneously reported that gh-1866 has been fixed. This patch actually fixes it.

n-th moment only exists for n < a.

josef-pkt · 2013-09-10T11:21:55Z

my guess is that the first moments should be inf and not nan
http://en.wikipedia.org/wiki/Inverse-gamma_distribution
since it's only on the real line, the terms are positive. If a moment doesn't exist, then it is inf.

josef-pkt · 2013-09-10T11:25:11Z

also .stats has explicit formulas, see wikipedia

ev-br · 2013-09-10T13:51:04Z

OK, yes, makes sense: it's definitely a positive infinity. Same can be sensibly argued for skew/kurtosis (the higher the moment, the faster the divergence). Have just pushed the commit with explicit stats and slightly better tests.

josef-pkt · 2013-09-10T14:24:51Z

the pattern in other distributions is stats = inf inf nan nan
skew and kurtosis divide by the variance (to a power) so it's inf / inf in the limit

but I never checked the details there (what's skew and kurtosis in weird cases)
my guess is they should be nan
(standard definition of skewness and kurtosis doesn't define a special sequence to take a limit if variance is inf)

ev-br · 2013-09-10T15:10:47Z

Okay, the last commit follows the pattern (whether it's possible / is worth it to enforce it across the board is a separate question; but here it's easy).

josef-pkt · 2013-09-10T15:54:14Z

scipy/stats/distributions.py

-    def _munp(self, n, a):
-        return exp(gamln(a-n) - gamln(a))
+    def _stats(self, a, moments='mvsk'):
+        a1, a2, a3, a4 = a - 1., a -2., a - 3., a - 4.


one missing space a - 2 :)

josef-pkt · 2013-09-10T15:55:52Z

looks good now

WarrenWeckesser · 2013-09-10T18:17:47Z

Here's a problem:

In [44]: invgamma.stats([0.0, 0.5, 1.0], 1, 0.5)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-44-36163480f388> in <module>()
----> 1 invgamma.stats([0.0, 0.5, 1.0], 1, 0.5)

/home/warren/local_scipy/lib/python2.7/site-packages/scipy/stats/distributions.pyc in stats(self, *args, **kwds)
   1612                     mu = self._munp(1.0,*goodargs)
   1613                 out0 = default.copy()
-> 1614                 place(out0,cond,mu*scale+loc)
   1615                 output.append(out0)
   1616 

ValueError: operands could not be broadcast together with shapes (3) (2)

josef-pkt · 2013-09-10T18:24:37Z

@WarrenWeckesser IMO, this looks like a generic problem not of invgamma

Does this work with all distributions? I never systematically checked broadcasting in .stats, and if I remember correctly, there are several problems.

WarrenWeckesser · 2013-09-10T18:29:34Z

Is it a conscious design decision to raise warnings when an inf or nan result is generated?

E.g. (with numpy 1.7.1):

In [2]: invgamma.stats([1,1.5,2])
/home/warren/local_scipy/lib/python2.7/site-packages/scipy/stats/distributions.py:4217: RuntimeWarning: divide by zero encountered in true_divide
  mu1 = np.where(a > 1., 1. / a1, np.inf)
/home/warren/local_scipy/lib/python2.7/site-packages/scipy/stats/distributions.py:4218: RuntimeWarning: divide by zero encountered in true_divide
  mu2 = np.where(a > 2., 1. / a1 / a1 / a2, np.inf)
Out[2]: (array([ inf,   2.,   1.]), array([ inf,  inf,  inf]))

(A problem with this design is that the decision is being left to whatever numpy decides to do. The default behavior of numpy has changed over the years, and in fact it can depend on the platform.)

If the function is documented to return nan or inf, then my preference is to not generate warnings. (Perhaps I've been biased by too many spurious warnings being generated in older versions of scipy.) I don't know if @rgommers, @josef-pkt, or @pv agree with this. What do you guys think?

WarrenWeckesser · 2013-09-10T18:34:19Z

@josef-pkt: I haven't checked lately, but I'm pretty sure that not all the distributions' stats methods handle broadcasting. However, in this PR, @EvgeniBurovski is implementing broadcasting (+1!), so I would expect my example to also work.

josef-pkt · 2013-09-10T18:35:45Z

separate issue/bug in my reading:
https://github.com/EvgeniBurovski/scipy/blob/c6c62b9595ea5e6eb3c35b21b3b83ed45ffafe2c/scipy/stats/distributions.py#L1608
argsreduce is called after the call to .stats, which means .stats doesn't have the broadcasted array

ev-br · 2013-09-10T18:37:06Z

In [4]: stats.t.stats([0.0, 0.5, 1.0], 0.5)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<snip>

In [5]: stats.norm.stats([0.0, 0.5, 1.0])
Out[5]: (array([ 0. ,  0.5,  1. ]), array([ 1.,  1.,  1.]))

In [6]: stats.poisson.stats([0.0, 0.5, 1.0], 3)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-634115ac2298> in <module>()
----> 1 stats.poisson.stats([0.0, 0.5, 1.0], 3)
<snip>

But

In [9]: stats.bradford.stats([1, 2, 3], 3)
Out[9]: 
(array([ 3.44269504,  3.41023923,  3.38801419]),
 array([ 0.08267358,  0.08170378,  0.08078069]))

So, does not seem to be consistent. Also, all the errors I've seen so far (haven't done a systematic sweep) seem to come from the generic part of the .stats, so my first guess is it's similar to #2069

josef-pkt · 2013-09-10T18:37:56Z

If the function is documented to return nan or inf, then my preference is to not generate warnings.

I agree. We should silence warnings (or not create them in the first place) in these cases.
(a reason why I don't like np.where - it always evaluates both terms)

josef-pkt · 2013-09-10T18:42:26Z

@WarrenWeckesser can you open a new issue with some examples for vectorized/broadcasting in .stats?
The issue I mentioned above, doesn't bite in this case, because shape has the full size. So, there are some other fishy generic parts still wrong.

ev-br · 2013-09-10T18:44:53Z

@WarrenWeckesser (warnings) No, not a conscious decsion, just reliance on numpy. There's not much more to the warning-generating code than what's printed in the traceback. It's literally 1/(a-1) with a=1 which generates it.

Now, I would argue that relying on numpy is a good thing here. It might not be consistent across numpy installations, but it's consistent across code paths and scripts on any given machine. IMO all divisions by zero should be handled in a uniform way, so that a user can control them all at once.

pv · 2013-09-10T18:48:32Z

@WarrenWeckesser: IMHO, if inf/nan are expected possible results, there should be no warnings. Also, for the practical reasons you state, the warnings generated by Numpy are probably not very useful in any case.

josef-pkt · 2013-09-10T18:51:11Z

I don't understand the failure from reading the code
What does the private method return?
invgamma._stats(np.array([0.0, 0.5, 1.0]))

ev-br · 2013-09-10T18:53:49Z

@josef-pkt
In [16]: stats.invgamma._stats(np.array([0.0, 0.5, 1.0]))
Out[16]: (None, None, None, None)

josef-pkt · 2013-09-10T19:03:42Z

Out[16]: (None, None, None, None)
That's the old result. I don't see how mu1 and mu2 can not be arrays in this PR

ev-br · 2013-09-10T19:03:49Z

Here:

In [8]: stats.invgamma.stats(a=[0.5, 1.0])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-7acb79aaf511> in <module>()
----> 1 stats.invgamma.stats(a=[0.5, 1.0])

/home/br/virtualenvs/scipy-dev/lib/python2.7/site-packages/scipy/stats/distributions.pyc in stats(self, *args, **kwds)
   1621                         mu = self._munp(1.0,*goodargs)
   1622                     mu2 = mu2p - mu*mu
-> 1623                     if np.isinf(mu):
   1624                         #if mean is inf then var is also inf
   1625                         mu2 = np.inf

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
> /home/br/virtualenvs/scipy-dev/lib/python2.7/site-packages/scipy/stats/distributions.py(1623)stats()
   1622                     mu2 = mu2p - mu*mu
-> 1623                     if np.isinf(mu):
   1624                         #if mean is inf then var is also inf

ipdb> mu
array([  2.,  inf])

WarrenWeckesser · 2013-09-10T19:05:32Z

@EvgeniBurovski wrote

IMO all divisions by zero should be handled in a uniform way, so that a user can control them all at once.

Not if the division by zero is an implementation detail, which is the case here. For example, the formula for the mean, a / (a - 1) is only defined for a > 1. The moment diverges for a <= 1. That fact that your code evaluates this expression when a = 1 is an implementation detail whose behavior should not leak into the public API.

ev-br · 2013-09-10T19:09:24Z

@josef-pkt oops, scratch that.

In [7]: stats.invgamma._stats(a=np.array([0.5, 1.5]))
/home/br/virtualenvs/scipy-dev/lib/python2.7/site-packages/scipy/stats/distributions.py:4221: RuntimeWarning: invalid value encountered in sqrt
  g1 = np.where(a > 3., 4. * np.sqrt(a2) / a3, np.nan)
Out[7]: 
(array([ inf,   2.]),
 array([ inf,  inf]),
 array([ nan,  nan]),
 array([ nan,  nan]))

josef-pkt · 2013-09-10T19:18:20Z

This looks good, however I don't see where the generic code get's the wrong shape (broadcast error) in Warren's example. (scale or loc is (2,) ? sounds weird)

But this is not a problem of invgamma in this PR. the ._stats return looks good

josef-pkt · 2013-09-10T19:24:33Z

back to my earlier bug: argsreduce is after the call to ._stats
a=0 is invalid shape parameter in Warren's example
._stats get's the full a (3 elements). When combining with loc and scale in line 1614, loc and scale only have valid args which means 2 elements in this case.
solution: argsreduce needs to go before ._stats.
(But don't add this to this PR, in case we want to backport just the fix to invgamma.)

josef-pkt · 2013-09-10T20:01:12Z

argsreduce is used in all methods to screen out invalid arguments and to broadcast the valid arguments to have common shape before calling the private methods.
The private .__xxx methods then only get valid arguments (that satisfy ._argcheck) and all arguments have the same shape.

In this case a=0 is removed (we set nans in the generic part), the two valid arguments together with loc and scale are broadcasted together.

In this case, the usage of argsreduce after the call to ._stats is screwed up.

ev-br · 2013-09-10T20:05:58Z

@josef-pkt [yes, took me half an hour more (and a debugger) to figure that out :-(]. A matter for a separate issue/PR/ a slew of test?

josef-pkt · 2013-09-10T20:11:49Z

A matter for a separate issue/PR/ a slew of test?

Yes, but as in your example and what Warren mentioned, there are also other problems before stats really works vectorized.
In general, you/we/sombody could add more broadcasting tests that are still missing, and skip or mark as knownfailure the cases that don't work yet.

ev-br · 2013-09-11T09:30:14Z

The last commit avoids division-by-zero warnings.

rgommers · 2013-09-15T11:41:13Z

So separate PR needed for argsreduce and this is good to go now, correct?

josef-pkt · 2013-09-15T11:43:22Z

Yes, good to merge.
I will open an issue for vectorized .stats

fix moments for invgamma

BUG: fix moments for invgamma

37fd88a

n-th moment only exists for n < a.

MAINT: use explicit formulas for invgamma stats

778712c

MAINT: return nan instead inf for skew/kurtosis

c6c62b9

josef-pkt reviewed Sep 10, 2013
View reviewed changes

MAINT: avoid spurious warnings for invalid arguments

e1cf09a

ev-br mentioned this pull request Sep 12, 2013

TST: [stats] add tests for the moments #2857

Merged

rgommers added a commit that referenced this pull request Sep 15, 2013

Merge pull request #2849 from EvgeniBurovski/invgamma

d2e1cf0

fix moments for invgamma

rgommers merged commit d2e1cf0 into scipy:master Sep 15, 2013

rgommers mentioned this pull request Sep 15, 2013

stats for invgamma (Trac #1340) #1866

Closed

josef-pkt mentioned this pull request Sep 15, 2013

distribution._stats is not vectorized #2876

Closed

ev-br deleted the invgamma branch September 15, 2013 12:10

ev-br mentioned this pull request Sep 15, 2013

[stats] fix formulas for higher moments of R, Burr, Fisk and F distributions #2879

Merged

ev-br mentioned this pull request Jan 26, 2014

BUG/ENH: stats: make frozen distributions hold separate instances #3245

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix moments for invgamma #2849

fix moments for invgamma #2849

ev-br commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

ev-br commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

ev-br commented Sep 10, 2013

josef-pkt Sep 10, 2013

josef-pkt commented Sep 10, 2013

WarrenWeckesser commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

WarrenWeckesser commented Sep 10, 2013

WarrenWeckesser commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

ev-br commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

ev-br commented Sep 10, 2013

pv commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

ev-br commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

ev-br commented Sep 10, 2013

WarrenWeckesser commented Sep 10, 2013

ev-br commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

ev-br commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

ev-br commented Sep 11, 2013

rgommers commented Sep 15, 2013

josef-pkt commented Sep 15, 2013

fix moments for invgamma #2849

fix moments for invgamma #2849

Conversation

ev-br commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

ev-br commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

ev-br commented Sep 10, 2013

josef-pkt Sep 10, 2013

Choose a reason for hiding this comment

josef-pkt commented Sep 10, 2013

WarrenWeckesser commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

WarrenWeckesser commented Sep 10, 2013

WarrenWeckesser commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

ev-br commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

ev-br commented Sep 10, 2013

pv commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

ev-br commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

ev-br commented Sep 10, 2013

WarrenWeckesser commented Sep 10, 2013

ev-br commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

ev-br commented Sep 10, 2013

josef-pkt commented Sep 10, 2013

ev-br commented Sep 11, 2013

rgommers commented Sep 15, 2013

josef-pkt commented Sep 15, 2013