ENH: Burr12 distribution to stats module #5654

maniteja123 · 2016-01-02T14:40:53Z

Add burr12 distribution as suggested in gh-5589.

Currently, I skipped the ci build since some tests are failing.
In _distr_params.py, the default params need to be greater than zero, else a Domain error is raised.

But for this distribution, for positive values of c and d, it seems cdp and pdf are giving negative values, thereby causing log_cdf and log_pdf to raise a RuntimeWarning: invalid value encountered in log.

I am not an expert in stats and have learnt about this distribution while doing this. But I anyways did create a PR so that I can ask for suggestions and reviews. Any leads are appreciated.

Thanks !

maniteja123 · 2016-01-02T14:46:40Z

The following is the traceback in the tests in stats

======================================================================
ERROR: test_continuous_basic.test_cont_basic(<scipy.stats._continuous_distns.burr12_gen object at 0xb2de6a8c>, (10, 4))
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/maniteja/FOSS/scipy/scipy/stats/tests/common_tests.py", line 215, in check_meth_dtype
    val = meth(x, *arg)
  File "/home/maniteja/FOSS/scipy/build/testenv/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py", line 1634, in logpdf
    place(output, cond, self._logpdf(*goodargs) - log(scale))
  File "/home/maniteja/FOSS/scipy/build/testenv/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py", line 1545, in _logpdf
    return log(self._pdf(x, *args))
RuntimeWarning: divide by zero encountered in log

======================================================================
ERROR: test_continuous_basic.test_cont_basic(<scipy.stats._continuous_distns.burr12_gen object at 0xb2de6a8c>, (10, 4))
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/decorators.py", line 147, in skipper_func
    return f(*args, **kwargs)
  File "/home/maniteja/FOSS/scipy/scipy/stats/tests/common_tests.py", line 246, in check_cmplx_deriv
    assert_allclose(deriv(distfn.logcdf, x, *arg), pdf/cdf, rtol=1e-5)
  File "/home/maniteja/FOSS/scipy/scipy/stats/tests/common_tests.py", line 233, in deriv
    return (f(x + h*1j, *arg)/h).imag
  File "/home/maniteja/FOSS/scipy/build/testenv/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py", line 1717, in logcdf
    place(output, cond, self._logcdf(*goodargs))
  File "/home/maniteja/FOSS/scipy/build/testenv/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py", line 810, in _logcdf
    return log(self._cdf(x, *args))
RuntimeWarning: divide by zero encountered in log

----------------------------------------------------------------------
Ran 1308 tests in 40.970s


FAILED (KNOWNFAIL=2, SKIP=55, errors=2)

argriffing · 2016-01-02T14:52:04Z

The log pdf, survival functions, etc. could be written manually (ie. something like c*d*(x**(c-1.0))*((1+x**(c*1.0))**(-d-1.0)) -> log(c) + log(d) + xlogy(c-1, x) + xlog1py(-d-1, x**c))

maniteja123 · 2016-01-02T14:55:46Z

Thanks a lot @argriffing. I didn't think of writing them on my own since most of them use log(cdf) for log_cdf. Will do the changes and get back.

josef-pkt · 2016-01-02T15:04:22Z

it should also mention
"The Burr type 12 distribution is also sometimes referred to as the Singh-Maddala distribution" from NIST

IIRC, I have seen articles that use Singh-Maddala as name.

maniteja123 · 2016-01-02T15:45:36Z

I did use the special functions to define logpdf and logcdf functions.
The cdf is 1-(1+x**(c*1.0))**(-d**1.0) and it is throwing a

Type Error: ufunc 'log1p' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

if log1p is used for calculating logcdf. Which function needs to be used here ?

Also, one clarification regarding the fisk distribution. The present documentation says that it is a special case of burr with d = 1, which refers to the Burr Type 3 distribution currently implemented. But the wikipedia page says that it is special case of Burr Type 12 distribution.

argriffing · 2016-01-02T15:53:50Z

it is throwing

It would help to see more of the traceback, but I'll make a guess. I'm guessing that there is scipy stats testing code that tries to compute a derivative for whatever reason which is using a trick based on imaginary numbers that is interacting poorly with the log1p function that you are using. Maybe this is considered a scipy/numpy bug?

>>> from numpy import log1p
>>> log1p(1 + 1j)
(0.80471895621705025+0.46364760900080609j)
>>> 
>>> from scipy import special
>>> special.log1p(1 + 1j)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ufunc 'log1p' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

maniteja123 · 2016-01-02T15:58:34Z

Sorry for lack of clarity. this is the total traceback

ERROR: test_continuous_basic.test_cont_basic(<scipy.stats._continuous_distns.burr12_gen object at 0xb3260a2c>, (10, 4))
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/decorators.py", line 147, in skipper_func
    return f(*args, **kwargs)
  File "/home/maniteja/FOSS/scipy/build/testenv/lib/python2.7/site-packages/scipy/stats/tests/common_tests.py", line 246, in check_cmplx_deriv
    assert_allclose(deriv(distfn.logcdf, x, *arg), pdf/cdf, rtol=1e-5)
  File "/home/maniteja/FOSS/scipy/build/testenv/lib/python2.7/site-packages/scipy/stats/tests/common_tests.py", line 233, in deriv
    return (f(x + h*1j, *arg)/h).imag
  File "/home/maniteja/FOSS/scipy/build/testenv/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py", line 1715, in logcdf
    place(output, cond, self._logcdf(*goodargs))
  File "/home/maniteja/FOSS/scipy/build/testenv/lib/python2.7/site-packages/scipy/stats/_continuous_distns.py", line 709, in _logcdf
    return special.log1p(-(1+x**(c*1.0))**(-d**1.0))
TypeError: ufunc 'log1p' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

This is what I have used for logcdf

    def _logcdf(self, x, c, d):
        return special.log1p(-(1+x**(c*1.0))**(-d**1.0))

argriffing · 2016-01-02T16:02:58Z

Yes the test framework is trying to use imaginary numbers to numerically approximate a derivative. I think the failure is a scipy bug or a missing feature so I've opened #5655.

ev-br · 2016-01-02T16:04:04Z

This test has an explicit list of exceptions (a loong one, unfortunately).

Edit: I feel I need to apologize for the original version of this comment, now edited: I should probably spell check better the result of the spell check and autocorrect helper on my phone.

argriffing · 2016-01-02T16:08:18Z

@maniteja123 You can work around the incompleteness of the log1p implementation by adding burr12 to the fails_cmplx set in https://github.com/scipy/scipy/blob/master/scipy/stats/tests/test_continuous_basic.py.

maniteja123 · 2016-01-02T16:16:11Z

Thanks @argriffing for the work around.

Could you please look into the relation from fisk distribution and burr distribution ?
Thanks

argriffing · 2016-01-02T16:26:32Z

The scipy fisk distribution has 1 shape parameter, but the fisk distribution as explained in https://en.wikipedia.org/wiki/Champernowne_distribution appears to have a couple of shape parameters. Unless its x0 is somehow actually a disguised scale parameter. My cynical guess is that "the fisk distribution" is not necessarily a single well-defined thing, but rather it could mean slightly different things to people in slightly different subfields.

maniteja123 · 2016-01-02T16:35:06Z

Oh, but can you look at this https://www.wikiwand.com/en/Log-logistic_distribution

f(x,c) = c * x**(c-1) * (1 + x**(c))**(-2)

while the documentation http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.fisk.html says

f(x,c) = c * x**(-c-1) * (1 + x**(-c))**(-2)

EDIT : Initially wrote the wrong formula.

and the pdf for burr type 12 is this

f(x,c,d) = c*d*(x**(c-1))*((1+x**(c))**(-d-1))

while pdf for burr type 3 is

f(x,c,d) = c * d * x**(-c-1) * (1+x**(-c))**(-d-1)

argriffing · 2016-01-02T16:54:56Z

while the documentation http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.fisk.html says
f(x,c) = c * x**(c-1) * (1 + x**(c))**(-2)

That's not what I see on that link...

btw I now see that in the champernowne link I posted above x0 is indeed a disguised scale parameter and not a shape parameter.

maniteja123 · 2016-01-02T17:02:11Z

Apologies, I got confused with so many formula. Have updated it.

And in the link you provided, http://www.wikiwand.com/en/Champernowne_distribution#/Distribution_of_income also shows the same formula as in https://www.wikiwand.com/en/Log-logistic_distribution

argriffing · 2016-01-02T17:20:10Z

@maniteja123 could you double check that
f(x,c) = c * x**(c-1) * (1 + x**(c))**(-2)
f(x,c) = c * x**(-c-1) * (1 + x**(-c))**(-2)
are not algebraically identical to each other?

maniteja123 · 2016-01-02T17:35:51Z

Ah, I see it now @argriffing , thanks..

f(x,c) = c * x**(c-1) * (1 + x**(c))**(-2)
f(x,c) = c * x**(-c-1) * (1 + x**(-c))**(-2)

If we put -c in second formula, we get the first one. So, basically the shape parameter is made negative for the other type of Burr distribution. But I am not sure what this means algebraically.

argriffing · 2016-01-02T17:47:12Z

No, they are just equal to each other. They are also equal to c/(x**(1+c) + 2*x + x**(1-c)).

maniteja123 · 2016-01-02T17:54:22Z

Oops, my bad. You are right indeed! I should go back to revise my basic math :)

What is meant was this.
f(x,-c) = - g(x,c)

f(x,c) = c * x**(c-1) * (1 + x**(c))**(-2)
g(x,c) = c * x**(-c-1) * (1 + x**(-c))**(-2)

Does, that mean that it is special case of both types of distributions ?

josef-pkt · 2016-01-02T17:54:32Z

In the two versions the c in front doesn't have a changed sign. So, either something drops out of the integration, or one of them doesn't integrate to one because the normalization constant is wrong.

argriffing · 2016-01-02T17:54:55Z

In other words, burr 3 and burr 12 are the same when d=1. You can see this in your comment
#5589 (comment)
"With respect to cdf, either of the burr distributions would suffice since they give the same formula for d=1."
For distributions that are plain enough that you don't need to care about measure theory or about the distinction between functions and distributions, if the cdfs are the same then the pdfs are too.

argriffing · 2016-01-02T17:55:37Z

Does, that mean that it is special case of both types of distributions ?

Yes.

maniteja123 · 2016-01-02T17:56:33Z

Yeah got it now. I did see them to be same from cdf but couldn't discern it from the pdf. Thanks a lot for clarifying.

maniteja123 · 2016-01-02T17:59:26Z

Now will finally mention that the Fisk distribution is special case of both the Burr III and Burr XII distributions. Also do I need to anything else for the documentation or testing purposes ?

maniteja123 · 2016-01-02T18:02:49Z

For distributions that are plain enough that you don't need to care about measure theory or about the distinction between functions and distributions, if the cdfs are the same then the pdfs are too.

Thanks for this. Didn't know it before.

ev-br · 2016-01-03T17:10:09Z

scipy/stats/__init__.py

@@ -30,6 +30,7 @@
   betaprime         -- Beta Prime
   bradford          -- Bradford
   burr              -- Burr
+   burr12            -- Burr12


Better expand it: Burr (type III) above, Burr (type XII)

Changed it.

ev-br · 2016-01-03T17:12:53Z

scipy/stats/_continuous_distns.py

+
+    """
+    def _pdf(self, x, c, d):
+        return c*d*(x**(c-1.0))*((1+x**(c*1.0))**(-d-1.0))


This can be replaced with np.exp(self._logpdf(x, c, d))

Oh, I thought that logpdf is in general evaluated using pdf but not the other way. Sorry. Will do it. Thanks.

Yes, this is the generic code path. But if you define _logpdf explicitly, it might make sense to also explicitly define _pdf

Okay got it. Thanks. Shall I revert this or let it be as it is then ? And the reason why I explicitly defined logpdf was because of the RuntimeWarning: divide by zero encountered in log error if I use np.log(self._pdf(x, c, d)).

OK, you defined the logpdf as Alex suggested. Now you can exponentiate that expression to get the pdf.

Sure will do it.

maniteja123 · 2016-01-03T18:55:08Z

@ev-br I have tried to answer most of the suggestions. Thank you so much for bearing patiently with me. Please do let me know if there is any need for specific tests other than the present tests. I have also ran the tests locally after removing unnecessary floating point operations. They did succeed.

On a side note, there is a hard coded value for the number of scipy stats continuous distributions in here. In case this is merged, probably other PR implementing new distributions might be needed to update because I had faced a failure before when testing after rebasing. Just telling though you already know :)

ev-br · 2016-01-03T19:03:35Z

scipy/stats/_continuous_distns.py

+    %(after_notes)s 
+
+    The Burr type 12 distribution is also sometimes referred to as
+    the Singh-Maddala distribution from NIST.


@josef-pkt has mentioned this in this comment. From a simple search this link came up. Would this be good ?

codecov-io · 2016-01-03T20:35:57Z

@@            master   #5654   diff @@
======================================
  Files          234     234       
  Stmts        43092   43111    +19
  Branches      8152    8152       
  Methods          0       0       
======================================
+ Hit          33408   33427    +19
  Partial       2604    2604       
  Missed        7080    7080

Review entire Coverage Diff as of 5574361

Powered by Codecov. Updated on successful CI builds.

closes scipygh-5589

ev-br · 2016-01-08T19:41:06Z

Would be good to squash the commits.

maniteja123 · 2016-01-09T03:47:01Z

Squashed the commits and rebased with master. Please let me know if anything else needs to be done.

ENH: Burr12 distribution to stats module

ev-br · 2016-01-10T14:14:26Z

Thanks Maniteja, merged

maniteja123 · 2016-01-11T04:16:44Z

Happy to contribute. Thank you Evgeni for patiently reviewing the PR and guiding me :)

maniteja123 · 2016-01-11T04:28:49Z

Also the experts @argriffing and @josef-pkt for helping with the statistical background to this novice.

ev-br reviewed Jan 3, 2016
View reviewed changes

ev-br added scipy.stats enhancement A new feature or improvement needs-work Items that are pending response from the author labels Jan 3, 2016

ev-br reviewed Jan 3, 2016
View reviewed changes

ENH: Add burr type 12 distribution to stats

da9ad1e

closes scipygh-5589

ev-br removed the needs-work Items that are pending response from the author label Jan 8, 2016

maniteja123 force-pushed the burr12 branch from 874e710 to da9ad1e Compare January 9, 2016 03:45

ev-br added a commit that referenced this pull request Jan 10, 2016

Merge pull request #5654 from maniteja123/burr12

5f7c7da

ENH: Burr12 distribution to stats module

ev-br merged commit 5f7c7da into scipy:master Jan 10, 2016

ev-br added this to the 0.18.0 milestone Jan 10, 2016

maniteja123 deleted the burr12 branch January 12, 2016 18:03

maniteja123 mentioned this pull request Apr 19, 2016

PDF Equation for Burr and Fisk distributions is wrong #6073

Closed

ENH: Burr12 distribution to stats module #5654

ENH: Burr12 distribution to stats module #5654

Conversation

maniteja123 commented Jan 2, 2016

maniteja123 commented Jan 2, 2016

argriffing commented Jan 2, 2016

maniteja123 commented Jan 2, 2016

josef-pkt commented Jan 2, 2016

maniteja123 commented Jan 2, 2016

argriffing commented Jan 2, 2016

maniteja123 commented Jan 2, 2016

argriffing commented Jan 2, 2016

ev-br commented Jan 2, 2016

argriffing commented Jan 2, 2016

maniteja123 commented Jan 2, 2016

argriffing commented Jan 2, 2016

maniteja123 commented Jan 2, 2016

argriffing commented Jan 2, 2016

maniteja123 commented Jan 2, 2016

argriffing commented Jan 2, 2016

maniteja123 commented Jan 2, 2016

argriffing commented Jan 2, 2016

maniteja123 commented Jan 2, 2016

josef-pkt commented Jan 2, 2016

argriffing commented Jan 2, 2016

argriffing commented Jan 2, 2016

maniteja123 commented Jan 2, 2016

maniteja123 commented Jan 2, 2016

maniteja123 commented Jan 2, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maniteja123 commented Jan 3, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Jan 3, 2016

ev-br commented Jan 8, 2016

maniteja123 commented Jan 9, 2016

ev-br commented Jan 10, 2016

maniteja123 commented Jan 11, 2016

maniteja123 commented Jan 11, 2016