refactor stats/distributions.py #2724

ev-br · 2013-08-14T06:41:33Z

Deduplicate a bit of code in stats/distributions.py: move exact duplicates from rv_continuous and rv_discrete to rv_generic, fix a couple of typos.

josef-pkt · 2013-08-14T08:44:17Z

scipy/stats/tests/test_continuous_basic.py

+    npt.assert_equal(distfn.logsf(x, *args), [0.0, -np.inf])
+
+    npt.assert_equal(distfn.ppf([0.0, 1.0], *args), x)
+    npt.assert_equal(distfn.isf([0.0, 1.0], *args), x[::-1])


I think this is wrong for discrete distributions. (I always have to draw graphs and look at the definition, and never remember exactly)

scipy 0.0

>>> stats.poisson.isf(1, 1) -1.0 >>> stats.poisson.ppf(0, 1) -1.0 >>>

@josef-pkt yes, indeed. That's why these tests are only run for the continuous distributions. In fact these tests can be removed: these were basically for me to see if there's a clean way of moving cdf and friends to rv_generic (and no, one of the blockers is the different behaviour at the edges of the support)

OK, I didn't realize it's only in the continuous tests.

I think the tests are good, I don't think the edge behavior is tested otherwise.

josef-pkt · 2013-08-14T08:54:57Z

There is a lot of moving around, that's difficult to review.

Needs a check of test coverage, because I would have thought that we cannot combine these for discrete and continuous.
entropy doesn't have an generic tests AFAIR

ev-br · 2013-08-14T11:40:09Z

@josef-pkt That's precisely why I kept very atomic commits, almost one commit per function moved. Does looking at individual commits help?
Moreover, commit messages list (I think) all changes in functionality other than copy-paste.

Entropy: deliberately breaking the generic entropy (returning 42*output instead of output) produces four test failures in test_distributions (TestBinom, TestHypergeom, TestBernoulli, and TestRvDiscrete).

josef-pkt · 2013-08-14T11:51:35Z

Entropy: deliberately breaking the generic entropy (returning 42*output instead of output) produces four test failures in test_distributions (TestBinom, TestHypergeom, TestBernoulli, and TestRvDiscrete).

But IIRC they have explicit implementation of entropy and don't use the generic path.

a basic test for the changes of entropy would be to compare .entropy with the generic implementation .vecentropy or whatever that is called.

I will try to go over the changes later today or in the next days.

ev-br · 2013-08-14T12:47:30Z

@josef-pkt Is this sort of what you mean (line 463 of test_continuous_basic.py):

+def check_vecentropy(distfn, args):
+    npt.assert_equal( distfn.vecentropy(*args), distfn._entropy(*args) )

ev-br · 2013-08-21T11:59:43Z

To help reviewing this, here's a gist having three versions of the functions moved: the first 'revision' is functions from the rv_discrete of the current master, the second 'revision' is the same functions from rv_continuous (master as well), and the third revision is rv_generic from the PR.
https://gist.github.com/bburlo/6292751
Hope it provide at least some line-by-line diffs.

josef-pkt · 2013-08-21T12:28:46Z

@josef-pkt https://github.com/josef-pkt Is this sort of what you mean
(line 463 of test_continuous_basic.py):

+def check_vecentropy(distfn, args):

npt.assert_equal( distfn.vecentropy(_args), distfn._entropy(_args) )

Yes, that should check the generic (numerical integration) solution with
the explicit ones, when they are defined

two comments:

I expect that there are several bugs in specific distributions, but this
should be a better way of identifying the cases without too many false
positives
this is a "tautology" if a distribution doesn't specify _entropy, it
just calculates twice the same thing.

To the second point: Since you are working on this now, maybe you can pick
up an idea that I never implemented: Inspect whether the specific
distribution defines the ._xxx method *) and only then compare with the
generic implementation like in this case.
This would also be useful for the higher order moments k=3,4 and skew and
kurtosis in stats.
*) I asked the question on the dev (IIRC) mailing list, and Robert Kern
answered how to find which class in the class hierarchy defines a method.

ev-br · 2013-08-21T14:25:31Z

@josef-pkt Agree, these tests not failing might indicate a tautology rather than a bug. Nonetheless, none of them fail at the moment --- so that all distributions which do define _entropy seem to be in the clear.

Avoiding the tautology is definitely doable (by walking the __bases__ or maybe there's a smarter way), but is it critical for the tests really?

ev-br · 2013-08-21T14:27:52Z

$ grep 'def _entropy' distributions.py |wc -l
48

Subtract two for rv_discrete & rv_continuous.

josef-pkt · 2013-08-21T14:47:01Z

I'm very surprised that all the tests pass, I thought I saw some failures with random tests before (but maybe my information is outdated, there have been several bug-fixes since I looked at this.)

Avoiding the tautology is not critical at all, more of a cleanup job to avoid unnecessary time used by the test suite.

ev-br · 2013-08-21T15:22:01Z

@josef-pkt ooops, my bad. it was only enabled for continuous distributions with numargs==0. Have just pushed a commit with these tests enabled for all distributions (other than those marked as 'slow' --- these tests do not run as a part of 'usual' test suite).

Locally all tests pass (python2.7, numpy1.7.1). There are several floating-point warnings (foldnorm and logistic) --- both spurious in the sense you mentioned.

ev-br · 2013-08-21T15:26:11Z

for the tautology: do you think it would make sense to open a separate issue for cleaning up the test suite?

josef-pkt · 2013-08-21T15:43:31Z

Locally all tests pass

Sounds too good to be true, without verifying

I think I interpreted vecentropy wrongly, it just delegates to the _entropy method, not to the generic calculations like vecppf and veccdf
https://github.com/scipy/scipy/blob/master/scipy/stats/distributions.py#L1069

looks like all tautologies (except for vectorization)

I don't know yet how to access the generic calculation. I never really looked at entropy very closely.

josef-pkt · 2013-08-21T15:55:41Z

for the tautology: do you think it would make sense to open a separate
issue for cleaning up the test suite?

Yes,
In a quick search I only found gh-1795 but I thought there are several more
for improving the stats.distributions test suite

josef-pkt · 2013-08-21T16:01:11Z

I don't know why entropy follows a different pattern.
related: I was puzzled why entropy needs the vectorize workaround when ppf and cdf don't, mentioned here gh-2702
but reviewing that sounds like more than this PR needs to handle

josef-pkt · 2013-08-21T19:35:02Z

see gh-2765 for a quick try on the entropy bugs (after outsourcing the numerical integration code for _entropy)

rgommers · 2013-09-15T14:41:16Z

I think it's time to finish reviewing and merge this. @EvgeniBurovski could you rebase it first? Also, with an interactive rebase you can remove the TST: enable check_vecentropy commit instead of reverting it.

Lift moment, _munp, __call__ and freeze to rv_generic.

rv_continuous version had a guard for vecentropy w/ numargs ==0, kept it.

Changed betwen rv_continuous and rv_discrete were: * rv_c had guards for nan**1.5; kept. * rv_c had a shortcut for no valid arguments; kept.

Also trivial renamings: _veccdf vs _cdfvec etc.

rv_discrete(rv_continuous) version used bitwise(logical) and. Kept the logical_and.

…support

ev-br · 2013-09-15T16:30:35Z

@rgommers done. I understand it's not easy to review, let me know if there's something I can do to simplify it (https://gist.github.com/EvgeniBurovski/6292751)

some individual commits here show leftovers from conflict merges, but the final code does not have any. Let me know if this is ok.

The last commit (0fa9789) differs from the rest since it's not just copy-paste, it actually does change things a little (confines inspect to __init__s only)

rgommers · 2013-09-15T19:54:42Z

No worries, I think this is reviewable commit by commit. And our test suite has expanded quite a bit recently, which also helps.

rgommers · 2013-09-22T10:00:56Z

This all looks good. Moving the inspect makes sense, doesn't really help for speed but keeps the magic in one place. The leftover merge thinks are not that bad - they will make git bisect skip a commit here and there, but other than that it doesn't really hurt.

rgommers · 2013-09-22T10:03:17Z

Useful cleanup, some things had diverged where they shouldn't have. Thanks @EvgeniBurovski

refactor stats/distributions.py

josef-pkt reviewed Aug 14, 2013
View reviewed changes

ev-br mentioned this pull request Aug 19, 2013

BUG: keyword arguments in moment methods of distributions #2753

Closed

ev-br added 9 commits September 15, 2013 17:14

MAINT: remove a bit of code duplication; remove import numpy, math

d1c406d

Lift moment, _munp, __call__ and freeze to rv_generic.

MAINT: Lift entropy to rv_generic.

9c97b6f

rv_continuous version had a guard for vecentropy w/ numargs ==0, kept it.

MAINT: Lift the stats method to rv_generic.

8916f38

Changed betwen rv_continuous and rv_discrete were: * rv_c had guards for nan**1.5; kept. * rv_c had a shortcut for no valid arguments; kept.

MAINT: lift generic _ppf and friends to rv_generic

36cbbee

Also trivial renamings: _veccdf vs _cdfvec etc.

MAINT: lift _argcheck to rv_generic.

8a11d9a

rv_discrete(rv_continuous) version used bitwise(logical) and. Kept the logical_and.

TST: reenable a dlaplace chi-square test; add tests for the edges of …

899c586

…support

DOC: typos in the discrete logsf docstring

a262b4c

DOC: correct the docstring of _construct_argparser

54b849d

MAINT: remove inspect calls from regular code paths (.stats & .moment)

0fa9789

ev-br mentioned this pull request Sep 15, 2013

[stats] fix formulas for higher moments of dweibull distribution #2878

Merged

rgommers added a commit that referenced this pull request Sep 22, 2013

Merge pull request #2724 from EvgeniBurovski/stats_refactor

29f1330

refactor stats/distributions.py

rgommers merged commit 29f1330 into scipy:master Sep 22, 2013

ev-br deleted the stats_refactor branch August 12, 2015 14:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor stats/distributions.py #2724

refactor stats/distributions.py #2724

ev-br commented Aug 14, 2013

josef-pkt Aug 14, 2013

ev-br Aug 14, 2013

josef-pkt Aug 14, 2013

josef-pkt Aug 14, 2013

josef-pkt commented Aug 14, 2013

ev-br commented Aug 14, 2013

josef-pkt commented Aug 14, 2013

ev-br commented Aug 14, 2013

ev-br commented Aug 21, 2013

josef-pkt commented Aug 21, 2013

ev-br commented Aug 21, 2013

ev-br commented Aug 21, 2013

josef-pkt commented Aug 21, 2013

ev-br commented Aug 21, 2013

ev-br commented Aug 21, 2013

josef-pkt commented Aug 21, 2013

josef-pkt commented Aug 21, 2013

josef-pkt commented Aug 21, 2013

josef-pkt commented Aug 21, 2013

rgommers commented Sep 15, 2013

ev-br commented Sep 15, 2013

rgommers commented Sep 15, 2013

rgommers commented Sep 22, 2013

rgommers commented Sep 22, 2013

refactor stats/distributions.py #2724

refactor stats/distributions.py #2724

Conversation

ev-br commented Aug 14, 2013

josef-pkt Aug 14, 2013

Choose a reason for hiding this comment

ev-br Aug 14, 2013

Choose a reason for hiding this comment

josef-pkt Aug 14, 2013

Choose a reason for hiding this comment

josef-pkt Aug 14, 2013

Choose a reason for hiding this comment

josef-pkt commented Aug 14, 2013

ev-br commented Aug 14, 2013

josef-pkt commented Aug 14, 2013

ev-br commented Aug 14, 2013

ev-br commented Aug 21, 2013

josef-pkt commented Aug 21, 2013

ev-br commented Aug 21, 2013

ev-br commented Aug 21, 2013

josef-pkt commented Aug 21, 2013

ev-br commented Aug 21, 2013

ev-br commented Aug 21, 2013

josef-pkt commented Aug 21, 2013

josef-pkt commented Aug 21, 2013

josef-pkt commented Aug 21, 2013

josef-pkt commented Aug 21, 2013

rgommers commented Sep 15, 2013

ev-br commented Sep 15, 2013

rgommers commented Sep 15, 2013

rgommers commented Sep 22, 2013

rgommers commented Sep 22, 2013