-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Burr12 distribution to stats module #5654
Conversation
The following is the traceback in the tests in stats
|
The log pdf, survival functions, etc. could be written manually (ie. something like |
Thanks a lot @argriffing. I didn't think of writing them on my own since most of them use |
it should also mention IIRC, I have seen articles that use Singh-Maddala as name. |
I did use the special functions to define
if log1p is used for calculating Also, one clarification regarding the fisk distribution. The present documentation says that it is a |
It would help to see more of the traceback, but I'll make a guess. I'm guessing that there is scipy stats testing code that tries to compute a derivative for whatever reason which is using a trick based on imaginary numbers that is interacting poorly with the log1p function that you are using. Maybe this is considered a scipy/numpy bug?
|
Sorry for lack of clarity. this is the total traceback
This is what I have used for
|
Yes the test framework is trying to use imaginary numbers to numerically approximate a derivative. I think the failure is a scipy bug or a missing feature so I've opened #5655. |
This test has an explicit list of exceptions (a loong one, unfortunately). Edit: I feel I need to apologize for the original version of this comment, now edited: I should probably spell check better the result of the spell check and autocorrect helper on my phone. |
@maniteja123 You can work around the incompleteness of the log1p implementation by adding burr12 to the |
Thanks @argriffing for the work around. Could you please look into the relation from fisk distribution and burr distribution ? |
The scipy fisk distribution has 1 shape parameter, but the fisk distribution as explained in https://en.wikipedia.org/wiki/Champernowne_distribution appears to have a couple of shape parameters. Unless its x0 is somehow actually a disguised scale parameter. My cynical guess is that "the fisk distribution" is not necessarily a single well-defined thing, but rather it could mean slightly different things to people in slightly different subfields. |
Oh, but can you look at this https://www.wikiwand.com/en/Log-logistic_distribution
while the documentation http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.fisk.html says
EDIT : Initially wrote the wrong formula. and the pdf for burr type 12 is this
while pdf for burr type 3 is
|
That's not what I see on that link... btw I now see that in the champernowne link I posted above x0 is indeed a disguised scale parameter and not a shape parameter. |
Apologies, I got confused with so many formula. Have updated it. And in the link you provided, http://www.wikiwand.com/en/Champernowne_distribution#/Distribution_of_income also shows the same formula as in https://www.wikiwand.com/en/Log-logistic_distribution |
@maniteja123 could you double check that |
Ah, I see it now @argriffing , thanks..
If we put -c in second formula, we get the first one. So, basically the shape parameter is made negative for the other type of Burr distribution. But I am not sure what this means algebraically. |
No, they are just equal to each other. They are also equal to |
Oops, my bad. You are right indeed! I should go back to revise my basic math :) What is meant was this.
Does, that mean that it is special case of both types of distributions ? |
In the two versions the |
In other words, burr 3 and burr 12 are the same when d=1. You can see this in your comment |
Yes. |
Yeah got it now. I did see them to be same from cdf but couldn't discern it from the pdf. Thanks a lot for clarifying. |
Now will finally mention that the Fisk distribution is special case of both the Burr III and Burr XII distributions. Also do I need to anything else for the documentation or testing purposes ? |
Thanks for this. Didn't know it before. |
@@ -30,6 +30,7 @@ | |||
betaprime -- Beta Prime | |||
bradford -- Bradford | |||
burr -- Burr | |||
burr12 -- Burr12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better expand it: Burr (type III) above, Burr (type XII)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it.
|
||
""" | ||
def _pdf(self, x, c, d): | ||
return c*d*(x**(c-1.0))*((1+x**(c*1.0))**(-d-1.0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be replaced with np.exp(self._logpdf(x, c, d))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I thought that logpdf
is in general evaluated using pdf
but not the other way. Sorry. Will do it. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is the generic code path. But if you define _logpdf explicitly, it might make sense to also explicitly define _pdf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay got it. Thanks. Shall I revert this or let it be as it is then ? And the reason why I explicitly defined logpdf
was because of the RuntimeWarning: divide by zero encountered in log
error if I use np.log(self._pdf(x, c, d))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, you defined the logpdf as Alex suggested. Now you can exponentiate that expression to get the pdf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure will do it.
@ev-br I have tried to answer most of the suggestions. Thank you so much for bearing patiently with me. Please do let me know if there is any need for specific tests other than the present tests. I have also ran the tests locally after removing unnecessary floating point operations. They did succeed. On a side note, there is a hard coded value for the number of scipy stats continuous distributions in here. In case this is merged, probably other PR implementing new distributions might be needed to update because I had faced a failure before when testing after rebasing. Just telling though you already know :) |
%(after_notes)s | ||
|
||
The Burr type 12 distribution is also sometimes referred to as | ||
the Singh-Maddala distribution from NIST. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@josef-pkt has mentioned this in this comment. From a simple search this link came up. Would this be good ?
@@ master #5654 diff @@
======================================
Files 234 234
Stmts 43092 43111 +19
Branches 8152 8152
Methods 0 0
======================================
+ Hit 33408 33427 +19
Partial 2604 2604
Missed 7080 7080
|
Would be good to squash the commits. |
Squashed the commits and rebased with master. Please let me know if anything else needs to be done. |
ENH: Burr12 distribution to stats module
Thanks Maniteja, merged |
Happy to contribute. Thank you Evgeni for patiently reviewing the PR and guiding me :) |
Also the experts @argriffing and @josef-pkt for helping with the statistical background to this novice. |
Add burr12 distribution as suggested in gh-5589.
Currently, I skipped the ci build since some tests are failing.
In
_distr_params.py
, the default params need to be greater than zero, else a Domain error is raised.But for this distribution, for positive values of
c
andd
, it seemscdp
andpdf
are giving negative values, thereby causinglog_cdf
andlog_pdf
to raise aRuntimeWarning: invalid value encountered in log
.I am not an expert in stats and have learnt about this distribution while doing this. But I anyways did create a PR so that I can ask for suggestions and reviews. Any leads are appreciated.
Thanks !