bugs stats entropy #2765

josef-pkt · 2013-08-21T19:33:52Z

running the script at the bottom with python 3.3 and scipy 0.12.0
shows the following differences between the reported entropy and the one calculated by numerical integration (generic code)

This should have fewer false positives than the tests I did before with comparison to a large random sample (theoretical entropy versus sample entropy).

distributions raising an exception

[['reciprocal', (0.0062309367010521255, 1.0062309367010522)], 
['truncnorm', (-1.0978730080013919, 2.730675410903198)]]

distribution with more than 0.1 relative tolerance
last value is what the distributions return
second to last value is from numerical integration

(not shown: a very large number of warnings)

['alpha', (3.570477051665046,), -1.036769456792257, nan]
['dgamma', (1.1023326088288166,), 1.7876451235296962, nan]
['erlang', (20,), 2.899928334598662, -16.100071665401337]
['exponpow', (2.697119160358469,), -0.023913603818474717, -0.028157783503781696]
['gamma', (1.9932305483800778,), 1.5748065290920485, 0.58157598071197558]
['genextreme', (-0.1,), 1.6262384754944788, nan]
['gumbel_l', (), 1.5634607039735562, 1.0608407169541685]
['gumbel_r', (), 1.5634607039735562, 1.0608407169541685]
['invgauss', (0.14546264555347513,), -1.5749344358698636, nan]
['ksone', (1000,), nan, nan]
['kstwobign', (), nan, nan]
['levy', (), 3.3244828015117034, -0.028980334288262362]
['levy_l', (), 3.3244828015117034, -0.028980334288262362]
['logistic', (), 1.984183489108535, 1.0]
['recipinvgauss', (0.6300426780936912,), 1.7750403907054204, nan]
['wald', (), 0.8769456078722748, nan]

Notes:

I wrote the script originally for scipy 0.9 and python 2.6 (no xlogy)
it uses an old copy of the distribution cases from the test suite that is importable from the statsmodels test suite.
the cont_entropy function is almost completely a copy of the generic code in stats.distributions

# -*- coding: utf-8 -*-
"""

Created on Wed Aug 21 12:06:48 2013
Author: Josef Perktold
"""

import numpy as np
from scipy import special, integrate, stats

# I'm working with old scipy without special.xlogy
def xlogy(x, y):
    return x * np.log(y + 1e-30)


def cont_entropy(self, *args):
    def integ(x):
        val = self._pdf(x, *args)
        return xlogy(val, val)

    entr = -integrate.quad(integ,self.a,self.b)[0]
    if not np.isnan(entr):
        return entr
    else: # try with different limits if integration problems
        low,upp = self.ppf([0.001,0.999],*args)
        if np.isinf(self.b):
            upper = upp
        else:
            upper = self.b
        if np.isinf(self.a):
            lower = low
        else:
            lower = self.a
        return -integrate.quad(integ,lower,upper)[0]

print(cont_entropy(stats.rayleigh))
print(stats.rayleigh._entropy())

from statsmodels.sandbox.distributions.tests.distparams import distcont, distdiscrete

res = []
res_raised = []
for distname, distargs in distcont[:]:
        #if distname not in distex_0: continue
        # newer scipy use invgauss instead of invnorm
        if distname == 'invnorm':
            distname = 'invgauss'
        distfn = getattr(stats, distname)

        try:
            e1 = cont_entropy(distfn, *distargs)
            e2 = distfn._entropy(*distargs)
            res.append([distname, distargs, e1, e2])
        except:
            #print ('exception in', distfn, distargs))
            res_raised.append([distname, distargs])

print("distributions raising an exception")
print (res_raised)

print("distribution with more than 0.1 relative tolerance")        
for d in res:
    if not np.allclose(d[-2], d[-1], rtol=0.1):
        print(d)

The text was updated successfully, but these errors were encountered:

josef-pkt · 2013-08-22T13:47:28Z

as cross-ref: a similar list with (possibly) remaining bug for higher moments and stats skew kurtosis gh-1329

WarrenWeckesser · 2013-08-22T14:47:03Z

I added the easy-fix label, because most of these are either already working or will be easy to fix.
Here's an update on the distributions @josef-pkt shows above:

Appear OK in master
* alpha
* dgamma
* genextreme
* invgauss
* levy
* levy_l
* wald
*recipinvgauss
* reciprocal
* truncnorm

Easy fix
* gamma: Formula is wrong in gamma._entropy (see wikipedia article). Fixing this will also fix erlang
* gumbel_r: The entropy is just gam + 1 where gam is the Euler constant 0.577... (See the wikipedia article for the Gumbel distribution.) The code has this hardcoded as 1.06... I don't know where that number comes from. gumbel_l is just gumbel_r with x -> -x, so its entropy is the same.
* logistic: The entropy of the standard distribution is simply 2, but logistic._entropy returns 1.

Needs further investigation
* exponpow ("overflow encountered")
* ksone (Perhaps the shape parameter 1000 is just too big? Using 10 returns a finite number)
* kstwobign

For some of the distributions, there is a known explicit formula for the entropy that is not used (e.g. levy). An additional easy task would be to add an _entropy method that uses the explicit formula wherever possible. If anyone does this, note that _entropy should calculate the entropy for the standard distribution (loc=0, scale=1).

reciprocal and truncnorm raise an exception in Josef's code because it bypasses a required call to _argcheck. In those distributions, _argcheck creates additional attributes that are used in the pdf and entropy calculations.

josef-pkt · 2013-08-22T15:26:15Z

Thanks for updating this.

ksone, and kstwobign can be ignored.
They are written specifically for the Kolmogorov-Smirnov tests, and have only code for sf and ppf which are used in the kstests. IIRC, they fail all over the place. I looked a bit at the ksone case, a lower parameter helps, but it's still strange (support looks more like (0,1] in the examples I tried).
In any case it's a failure of the generic code when only cdf or sf is defined, and not an entropy failure. I think we don't have any other distribution like these.
(If never seen ksone used outside of kstest, kstwobign would be worth to have available as functioning distribution, but no closed form solutions are available for any method.)

rgommers · 2013-09-15T10:33:27Z

After merging gh-2774, this is what's left:

distributions raising an exception
[['reciprocal', (0.0062309367010521255, 1.0062309367010522)], 
 ['truncnorm', (-1.0978730080013919, 2.730675410903198)]]

distribution with more than 0.1 relative tolerance
['ksone', (1000,), nan, nan]
['kstwobign', (), nan, nan]
['ncf', (27, 27, 0.41578441799226107), 0.49078270223194753, -0.0]

ev-br · 2013-09-15T12:01:32Z

It can probably be closed now: Warren explained truncnorm and reciprocal fails are explained above, ksone/kstwobign are special anyway, and ncf seems to have an accuracy issue, special-cased in https://github.com/scipy/scipy/blob/master/scipy/stats/tests/test_continuous_basic.py#L428

josef-pkt · 2013-09-15T12:09:42Z

@EvgeniBurovski Can you open a new issue for numerical problems in ncf? I don't have an overview for which methods the numerical problems are serious.

Then we can close this.

As we get closer to being bugfree, I would prefer issues for individual distributions that still need attention (instead of these generic entropy or moments don't work issues.)

ev-br · 2013-09-15T13:12:58Z

@josef-pkt done. (not sure is any of these is high priority)

rgommers · 2013-09-15T13:14:10Z

Thanks, closing.

pvanmulbregt · 2019-02-09T19:44:13Z

#8737 fixes the entropy for kstwobign (accuracy of special.{kolmogi,kolmogp}).
#8737 and #9661 together fix the entropy for ksone (accuracy of special.{smirnovi,smirnovp}and setting appropriate right endpoint in ksone_gen.)

josef-pkt mentioned this issue Aug 21, 2013

refactor stats/distributions.py #2724

Merged

ev-br mentioned this issue Aug 23, 2013

BUG+TST: distribution-specific _entropy methods #2774

Merged

WarrenWeckesser mentioned this issue Sep 4, 2013

ksone fitting #2834

Closed

ev-br mentioned this issue Sep 15, 2013

distributions.ncf numerical issues #2877

Closed

rgommers closed this as completed Sep 15, 2013

pvanmulbregt mentioned this issue Feb 9, 2019

Master Tracker for some Kolmogorov-Smirnov test Issues #9797

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugs stats entropy #2765

bugs stats entropy #2765

josef-pkt commented Aug 21, 2013

josef-pkt commented Aug 22, 2013

WarrenWeckesser commented Aug 22, 2013

josef-pkt commented Aug 22, 2013

rgommers commented Sep 15, 2013

ev-br commented Sep 15, 2013

josef-pkt commented Sep 15, 2013

ev-br commented Sep 15, 2013

rgommers commented Sep 15, 2013

pvanmulbregt commented Feb 9, 2019

bugs stats entropy #2765

bugs stats entropy #2765

Comments

josef-pkt commented Aug 21, 2013

josef-pkt commented Aug 22, 2013

WarrenWeckesser commented Aug 22, 2013

josef-pkt commented Aug 22, 2013

rgommers commented Sep 15, 2013

ev-br commented Sep 15, 2013

josef-pkt commented Sep 15, 2013

ev-br commented Sep 15, 2013

rgommers commented Sep 15, 2013

pvanmulbregt commented Feb 9, 2019