-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor stats/distributions.py #2724
Conversation
npt.assert_equal(distfn.logsf(x, *args), [0.0, -np.inf]) | ||
|
||
npt.assert_equal(distfn.ppf([0.0, 1.0], *args), x) | ||
npt.assert_equal(distfn.isf([0.0, 1.0], *args), x[::-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is wrong for discrete distributions. (I always have to draw graphs and look at the definition, and never remember exactly)
scipy 0.0
>>> stats.poisson.isf(1, 1)
-1.0
>>> stats.poisson.ppf(0, 1)
-1.0
>>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@josef-pkt yes, indeed. That's why these tests are only run for the continuous distributions. In fact these tests can be removed: these were basically for me to see if there's a clean way of moving cdf
and friends to rv_generic
(and no, one of the blockers is the different behaviour at the edges of the support)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I didn't realize it's only in the continuous tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the tests are good, I don't think the edge behavior is tested otherwise.
There is a lot of moving around, that's difficult to review. Needs a check of test coverage, because I would have thought that we cannot combine these for discrete and continuous. |
@josef-pkt That's precisely why I kept very atomic commits, almost one commit per function moved. Does looking at individual commits help? Entropy: deliberately breaking the generic |
But IIRC they have explicit implementation of entropy and don't use the generic path. a basic test for the changes of entropy would be to compare I will try to go over the changes later today or in the next days. |
@josef-pkt Is this sort of what you mean (line 463 of test_continuous_basic.py):
|
To help reviewing this, here's a gist having three versions of the functions moved: the first 'revision' is functions from the |
two comments:
To the second point: Since you are working on this now, maybe you can pick |
@josef-pkt Agree, these tests not failing might indicate a tautology rather than a bug. Nonetheless, none of them fail at the moment --- so that all distributions which do define _entropy seem to be in the clear. Avoiding the tautology is definitely doable (by walking the |
Subtract two for |
I'm very surprised that all the tests pass, I thought I saw some failures with random tests before (but maybe my information is outdated, there have been several bug-fixes since I looked at this.) Avoiding the tautology is not critical at all, more of a cleanup job to avoid unnecessary time used by the test suite. |
@josef-pkt ooops, my bad. it was only enabled for continuous distributions with Locally all tests pass (python2.7, numpy1.7.1). There are several floating-point warnings ( |
for the tautology: do you think it would make sense to open a separate issue for cleaning up the test suite? |
Sounds too good to be true, without verifying I think I interpreted looks like all tautologies (except for vectorization) I don't know yet how to access the generic calculation. I never really looked at entropy very closely. |
Yes, |
I don't know why entropy follows a different pattern. |
see gh-2765 for a quick try on the entropy bugs (after outsourcing the numerical integration code for _entropy) |
I think it's time to finish reviewing and merge this. @EvgeniBurovski could you rebase it first? Also, with an interactive rebase you can remove the |
Lift moment, _munp, __call__ and freeze to rv_generic.
rv_continuous version had a guard for vecentropy w/ numargs ==0, kept it.
Changed betwen rv_continuous and rv_discrete were: * rv_c had guards for nan**1.5; kept. * rv_c had a shortcut for no valid arguments; kept.
Also trivial renamings: _veccdf vs _cdfvec etc.
rv_discrete(rv_continuous) version used bitwise(logical) and. Kept the logical_and.
@rgommers done. I understand it's not easy to review, let me know if there's something I can do to simplify it (https://gist.github.com/EvgeniBurovski/6292751) some individual commits here show leftovers from conflict merges, but the final code does not have any. Let me know if this is ok. The last commit (0fa9789) differs from the rest since it's not just copy-paste, it actually does change things a little (confines |
No worries, I think this is reviewable commit by commit. And our test suite has expanded quite a bit recently, which also helps. |
This all looks good. Moving the |
Useful cleanup, some things had diverged where they shouldn't have. Thanks @EvgeniBurovski |
refactor stats/distributions.py
Deduplicate a bit of code in stats/distributions.py: move exact duplicates from
rv_continuous
andrv_discrete
torv_generic
, fix a couple of typos.