Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Burr12 distribution to stats module #5654

Merged
merged 1 commit into from
Jan 10, 2016
Merged

Conversation

maniteja123
Copy link
Contributor

Add burr12 distribution as suggested in gh-5589.

Currently, I skipped the ci build since some tests are failing.
In _distr_params.py, the default params need to be greater than zero, else a Domain error is raised.

But for this distribution, for positive values of c and d, it seems cdp and pdf are giving negative values, thereby causing log_cdf and log_pdf to raise a RuntimeWarning: invalid value encountered in log.

I am not an expert in stats and have learnt about this distribution while doing this. But I anyways did create a PR so that I can ask for suggestions and reviews. Any leads are appreciated.

Thanks !

@maniteja123
Copy link
Contributor Author

The following is the traceback in the tests in stats

======================================================================
ERROR: test_continuous_basic.test_cont_basic(<scipy.stats._continuous_distns.burr12_gen object at 0xb2de6a8c>, (10, 4))
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/maniteja/FOSS/scipy/scipy/stats/tests/common_tests.py", line 215, in check_meth_dtype
    val = meth(x, *arg)
  File "/home/maniteja/FOSS/scipy/build/testenv/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py", line 1634, in logpdf
    place(output, cond, self._logpdf(*goodargs) - log(scale))
  File "/home/maniteja/FOSS/scipy/build/testenv/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py", line 1545, in _logpdf
    return log(self._pdf(x, *args))
RuntimeWarning: divide by zero encountered in log

======================================================================
ERROR: test_continuous_basic.test_cont_basic(<scipy.stats._continuous_distns.burr12_gen object at 0xb2de6a8c>, (10, 4))
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/decorators.py", line 147, in skipper_func
    return f(*args, **kwargs)
  File "/home/maniteja/FOSS/scipy/scipy/stats/tests/common_tests.py", line 246, in check_cmplx_deriv
    assert_allclose(deriv(distfn.logcdf, x, *arg), pdf/cdf, rtol=1e-5)
  File "/home/maniteja/FOSS/scipy/scipy/stats/tests/common_tests.py", line 233, in deriv
    return (f(x + h*1j, *arg)/h).imag
  File "/home/maniteja/FOSS/scipy/build/testenv/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py", line 1717, in logcdf
    place(output, cond, self._logcdf(*goodargs))
  File "/home/maniteja/FOSS/scipy/build/testenv/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py", line 810, in _logcdf
    return log(self._cdf(x, *args))
RuntimeWarning: divide by zero encountered in log

----------------------------------------------------------------------
Ran 1308 tests in 40.970s


FAILED (KNOWNFAIL=2, SKIP=55, errors=2)

@argriffing
Copy link
Contributor

The log pdf, survival functions, etc. could be written manually (ie. something like c*d*(x**(c-1.0))*((1+x**(c*1.0))**(-d-1.0)) -> log(c) + log(d) + xlogy(c-1, x) + xlog1py(-d-1, x**c))

@maniteja123
Copy link
Contributor Author

Thanks a lot @argriffing. I didn't think of writing them on my own since most of them use log(cdf) for log_cdf. Will do the changes and get back.

@josef-pkt
Copy link
Member

it should also mention
"The Burr type 12 distribution is also sometimes referred to as the Singh-Maddala distribution" from NIST

IIRC, I have seen articles that use Singh-Maddala as name.

@maniteja123
Copy link
Contributor Author

I did use the special functions to define logpdf and logcdf functions.
The cdf is 1-(1+x**(c*1.0))**(-d**1.0) and it is throwing a

Type Error: ufunc 'log1p' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

if log1p is used for calculating logcdf. Which function needs to be used here ?

Also, one clarification regarding the fisk distribution. The present documentation says that it is a special case of burr with d = 1, which refers to the Burr Type 3 distribution currently implemented. But the wikipedia page says that it is special case of Burr Type 12 distribution.

@argriffing
Copy link
Contributor

it is throwing

It would help to see more of the traceback, but I'll make a guess. I'm guessing that there is scipy stats testing code that tries to compute a derivative for whatever reason which is using a trick based on imaginary numbers that is interacting poorly with the log1p function that you are using. Maybe this is considered a scipy/numpy bug?

>>> from numpy import log1p
>>> log1p(1 + 1j)
(0.80471895621705025+0.46364760900080609j)
>>> 
>>> from scipy import special
>>> special.log1p(1 + 1j)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ufunc 'log1p' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

@maniteja123
Copy link
Contributor Author

Sorry for lack of clarity. this is the total traceback

ERROR: test_continuous_basic.test_cont_basic(<scipy.stats._continuous_distns.burr12_gen object at 0xb3260a2c>, (10, 4))
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/usr/local/lib/python2.7/dist-packages/numpy/testing/decorators.py", line 147, in skipper_func
    return f(*args, **kwargs)
  File "/home/maniteja/FOSS/scipy/build/testenv/lib/python2.7/site-packages/scipy/stats/tests/common_tests.py", line 246, in check_cmplx_deriv
    assert_allclose(deriv(distfn.logcdf, x, *arg), pdf/cdf, rtol=1e-5)
  File "/home/maniteja/FOSS/scipy/build/testenv/lib/python2.7/site-packages/scipy/stats/tests/common_tests.py", line 233, in deriv
    return (f(x + h*1j, *arg)/h).imag
  File "/home/maniteja/FOSS/scipy/build/testenv/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py", line 1715, in logcdf
    place(output, cond, self._logcdf(*goodargs))
  File "/home/maniteja/FOSS/scipy/build/testenv/lib/python2.7/site-packages/scipy/stats/_continuous_distns.py", line 709, in _logcdf
    return special.log1p(-(1+x**(c*1.0))**(-d**1.0))
TypeError: ufunc 'log1p' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

This is what I have used for logcdf

    def _logcdf(self, x, c, d):
        return special.log1p(-(1+x**(c*1.0))**(-d**1.0))

@argriffing
Copy link
Contributor

Yes the test framework is trying to use imaginary numbers to numerically approximate a derivative. I think the failure is a scipy bug or a missing feature so I've opened #5655.

@ev-br
Copy link
Member

ev-br commented Jan 2, 2016

This test has an explicit list of exceptions (a loong one, unfortunately).

Edit: I feel I need to apologize for the original version of this comment, now edited: I should probably spell check better the result of the spell check and autocorrect helper on my phone.

@argriffing
Copy link
Contributor

@maniteja123 You can work around the incompleteness of the log1p implementation by adding burr12 to the fails_cmplx set in https://github.com/scipy/scipy/blob/master/scipy/stats/tests/test_continuous_basic.py.

@maniteja123
Copy link
Contributor Author

Thanks @argriffing for the work around.

Could you please look into the relation from fisk distribution and burr distribution ?
Thanks

@argriffing
Copy link
Contributor

The scipy fisk distribution has 1 shape parameter, but the fisk distribution as explained in https://en.wikipedia.org/wiki/Champernowne_distribution appears to have a couple of shape parameters. Unless its x0 is somehow actually a disguised scale parameter. My cynical guess is that "the fisk distribution" is not necessarily a single well-defined thing, but rather it could mean slightly different things to people in slightly different subfields.

@maniteja123
Copy link
Contributor Author

Oh, but can you look at this https://www.wikiwand.com/en/Log-logistic_distribution

f(x,c) = c * x**(c-1) * (1 + x**(c))**(-2)

while the documentation http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.fisk.html says

f(x,c) = c * x**(-c-1) * (1 + x**(-c))**(-2)

EDIT : Initially wrote the wrong formula.

and the pdf for burr type 12 is this

f(x,c,d) = c*d*(x**(c-1))*((1+x**(c))**(-d-1))

while pdf for burr type 3 is

f(x,c,d) = c * d * x**(-c-1) * (1+x**(-c))**(-d-1)

@argriffing
Copy link
Contributor

while the documentation http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.fisk.html says
f(x,c) = c * x**(c-1) * (1 + x**(c))**(-2)

That's not what I see on that link...

btw I now see that in the champernowne link I posted above x0 is indeed a disguised scale parameter and not a shape parameter.

@maniteja123
Copy link
Contributor Author

Apologies, I got confused with so many formula. Have updated it.

And in the link you provided, http://www.wikiwand.com/en/Champernowne_distribution#/Distribution_of_income also shows the same formula as in https://www.wikiwand.com/en/Log-logistic_distribution

@argriffing
Copy link
Contributor

@maniteja123 could you double check that
f(x,c) = c * x**(c-1) * (1 + x**(c))**(-2)
f(x,c) = c * x**(-c-1) * (1 + x**(-c))**(-2)
are not algebraically identical to each other?

@maniteja123
Copy link
Contributor Author

Ah, I see it now @argriffing , thanks..

f(x,c) = c * x**(c-1) * (1 + x**(c))**(-2)
f(x,c) = c * x**(-c-1) * (1 + x**(-c))**(-2)

If we put -c in second formula, we get the first one. So, basically the shape parameter is made negative for the other type of Burr distribution. But I am not sure what this means algebraically.

@argriffing
Copy link
Contributor

No, they are just equal to each other. They are also equal to c/(x**(1+c) + 2*x + x**(1-c)).

@maniteja123
Copy link
Contributor Author

Oops, my bad. You are right indeed! I should go back to revise my basic math :)

What is meant was this.
f(x,-c) = - g(x,c)

f(x,c) = c * x**(c-1) * (1 + x**(c))**(-2)
g(x,c) = c * x**(-c-1) * (1 + x**(-c))**(-2)

Does, that mean that it is special case of both types of distributions ?

@josef-pkt
Copy link
Member

In the two versions the c in front doesn't have a changed sign. So, either something drops out of the integration, or one of them doesn't integrate to one because the normalization constant is wrong.

@argriffing
Copy link
Contributor

In other words, burr 3 and burr 12 are the same when d=1. You can see this in your comment
#5589 (comment)
"With respect to cdf, either of the burr distributions would suffice since they give the same formula for d=1."
For distributions that are plain enough that you don't need to care about measure theory or about the distinction between functions and distributions, if the cdfs are the same then the pdfs are too.

@argriffing
Copy link
Contributor

Does, that mean that it is special case of both types of distributions ?

Yes.

@maniteja123
Copy link
Contributor Author

Yeah got it now. I did see them to be same from cdf but couldn't discern it from the pdf. Thanks a lot for clarifying.

@maniteja123
Copy link
Contributor Author

Now will finally mention that the Fisk distribution is special case of both the Burr III and Burr XII distributions. Also do I need to anything else for the documentation or testing purposes ?

@maniteja123
Copy link
Contributor Author

For distributions that are plain enough that you don't need to care about measure theory or about the distinction between functions and distributions, if the cdfs are the same then the pdfs are too.

Thanks for this. Didn't know it before.

@@ -30,6 +30,7 @@
betaprime -- Beta Prime
bradford -- Bradford
burr -- Burr
burr12 -- Burr12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better expand it: Burr (type III) above, Burr (type XII)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it.


"""
def _pdf(self, x, c, d):
return c*d*(x**(c-1.0))*((1+x**(c*1.0))**(-d-1.0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be replaced with np.exp(self._logpdf(x, c, d))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I thought that logpdf is in general evaluated using pdf but not the other way. Sorry. Will do it. Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the generic code path. But if you define _logpdf explicitly, it might make sense to also explicitly define _pdf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay got it. Thanks. Shall I revert this or let it be as it is then ? And the reason why I explicitly defined logpdf was because of the RuntimeWarning: divide by zero encountered in log error if I use np.log(self._pdf(x, c, d)).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, you defined the logpdf as Alex suggested. Now you can exponentiate that expression to get the pdf.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure will do it.

@ev-br ev-br added scipy.stats enhancement A new feature or improvement needs-work Items that are pending response from the author labels Jan 3, 2016
@maniteja123
Copy link
Contributor Author

@ev-br I have tried to answer most of the suggestions. Thank you so much for bearing patiently with me. Please do let me know if there is any need for specific tests other than the present tests. I have also ran the tests locally after removing unnecessary floating point operations. They did succeed.

On a side note, there is a hard coded value for the number of scipy stats continuous distributions in here. In case this is merged, probably other PR implementing new distributions might be needed to update because I had faced a failure before when testing after rebasing. Just telling though you already know :)

%(after_notes)s

The Burr type 12 distribution is also sometimes referred to as
the Singh-Maddala distribution from NIST.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josef-pkt has mentioned this in this comment. From a simple search this link came up. Would this be good ?

@codecov-io
Copy link

@@            master   #5654   diff @@
======================================
  Files          234     234       
  Stmts        43092   43111    +19
  Branches      8152    8152       
  Methods          0       0       
======================================
+ Hit          33408   33427    +19
  Partial       2604    2604       
  Missed        7080    7080       

Review entire Coverage Diff as of 5574361

Powered by Codecov. Updated on successful CI builds.

@ev-br ev-br removed the needs-work Items that are pending response from the author label Jan 8, 2016
@ev-br
Copy link
Member

ev-br commented Jan 8, 2016

Would be good to squash the commits.

@maniteja123
Copy link
Contributor Author

Squashed the commits and rebased with master. Please let me know if anything else needs to be done.

ev-br added a commit that referenced this pull request Jan 10, 2016
ENH: Burr12 distribution to stats module
@ev-br ev-br merged commit 5f7c7da into scipy:master Jan 10, 2016
@ev-br
Copy link
Member

ev-br commented Jan 10, 2016

Thanks Maniteja, merged

@ev-br ev-br added this to the 0.18.0 milestone Jan 10, 2016
@maniteja123
Copy link
Contributor Author

Happy to contribute. Thank you Evgeni for patiently reviewing the PR and guiding me :)

@maniteja123
Copy link
Contributor Author

Also the experts @argriffing and @josef-pkt for helping with the statistical background to this novice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants