Skip to content

MAINT: split distributions.py into three files #3134

Merged
merged 6 commits into from Dec 25, 2013

8 participants

@argriffing

This is a redo of #3128 with fewer commits.

@josef-pkt
SciPy member

can you check what git blame shows on your computer? github doesn't track moves or renames.

@pv
SciPy member
pv commented Dec 7, 2013

Putting it in a single commit is probably as good as it gets. git blame -M -C -C still shows alex wrote many lines, but I think we can live with that.

@argriffing

@josef-pkt I'm pretty bad with git, so I don't know what options to use for git blame. The git commit command gave the following interpretation:

 rename scipy/stats/{distributions.py => _continuous_distns.py} (61%)
 create mode 100644 scipy/stats/_discrete_distns.py
 create mode 100644 scipy/stats/_distn_infrastructure.py
 rewrite scipy/stats/distributions.py (99%)

According to some googling, I don't think that git can track blame across splits, although it can track renaming. In this case, I think that it treats distributions.py as renamed to _continuous_distns.py and the other code loses blame information. I am not good with git, so maybe there is a better way to do it.

@josef-pkt
SciPy member

@argriffing I use git gui which has Browse Branch Files that can be used the follow a line through it's history.

git is much better in this than other vcs, It can keep track of cut and paste as long as the two segments don't diverge much.
I'm usually careful with cut and paste in statsmodels, but I don't use blame very often for statsmodels so I don't really know how good git is doing with this.
changing lines after a clean cut and paste is no problem because that's just a new commit that doesn't disrupt that git can track the history.

@coveralls

Coverage Status

Coverage remained the same when pulling 7559f79 on argriffing:retry-split-distributions into fb3a591 on scipy:master.

@argriffing

@josef-pkt I don't have gui access to the machine I'm using for development, but I locally cloned the argriffing/scipy repo and checked out the specific retry-split-distribution branch, and much of the git-gui blame history seems to be available. Opposite of git's stdout output from my commit, the blame history seems almost full for _discrete_distns and _distn_infrastructure but not so great for _continuous_distns.

@argriffing

Because the diffs are hard to read, I'll add some extra commentary that might help reviewing.

The split was easier than I was afraid it might be, but two parts needed more attention than just copying and pasting.

First, the base class of continuous and discrete distributions calls globals() at one point when it needs to do something related to docstring generation (I think), and this broke when the files were split. I patched this by just letting that function take an extra optional parameter which I used to pass the globals() from the file with the derived class (discrete only).

Second, both the continuous and discrete distributions use a noncentral chi square distribution, so I put the logpdf, cdf, and pdf of this class into the third file and called them from the continuous-specific and discrete-specific classes.

@argriffing argriffing and 1 other commented on an outdated diff Dec 7, 2013
scipy/stats/_distn_infrastructure.py
+ raise SyntaxError(
+ 'shapes must be valid python identifiers')
+ else:
+ # find out the call signatures (_pdf, _cdf etc), deduce shape
+ # arguments
+ shapes_list = []
+ for name in names_to_inspect:
+ # look for names in instance methods, then global namespace
+ # the latter is needed for rv_discrete with explicit `values`
+ try:
+ meth = get_method_function(getattr(self, name))
+ except:
+ try:
+ meth = globals()[name]
+ except KeyError:
+ meth = morevars[name]
@argriffing
argriffing added a note Dec 7, 2013

I added this second nested try/except and added the optional argument morevars so that rv_discrete can pass its globals() into this base class function. This works around problems caused by the effect of file splitting on the dictionary returned by globals().

@ev-br
SciPy member
ev-br added a note Dec 7, 2013

IIRC adding this was an attempt to work around the problem with inspecting the signatures of _pdf and friends for discrete sample distribution, where global module-level functions are being attached as instancemethods.

@ev-br
SciPy member
ev-br added a note Dec 7, 2013

Indeed, the first try-except and passing around globals() can be avoided altogether by passing to _construct_argparser the methods themselves, rather than their names.
(No idea why I made it more complicated than it actually needs to be.) Can you have a look at ev-br@7840023? I tried sending a PR against your PR but I don't seem to be able to figure out how to do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@argriffing argriffing commented on an outdated diff Dec 7, 2013
scipy/stats/_distn_infrastructure.py
+
+# Both the continuous and discrete distributions depend on ncx2.
+# I think the function name is a clever abbreviation for noncentral chi squared.
+
+def _ncx2_log_pdf(x, df, nc):
+ a = asarray(df/2.0)
+ fac = (-nc/2.0 - x/2.0 + (a-1)*np.log(x) - a*np.log(2) -
+ special.gammaln(a))
+ return fac + np.nan_to_num(np.log(special.hyp0f1(a, nc * x/4.0)))
+
+def _ncx2_pdf(x, df, nc):
+ return np.exp(_ncx2_log_pdf(x, df, nc))
+
+def _ncx2_cdf(x, df, nc):
+ return special.chndtr(x, df, nc)
+
@argriffing
argriffing added a note Dec 7, 2013

I added these _ncx2_* functions to this base file because they are used by both the discrete and continuous distribution files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@argriffing

I added line comments to the parts that needed special attention when the distributions.py file was split.

@ev-br
SciPy member
ev-br commented Dec 7, 2013

\begin{bikeshedding}
I'd regard the generic code paths of rv_continuous / rv_discrete as infrastructure. Have you considered moving these two classes back to _infrastructure.py, and have _discrete_distns have specific distributions only?
Maybe it'll help with the history as well?
\end{bikeshedding}

@argriffing

@ev-br I don't know know much about distributions.py and I don't have any opinion about where exactly the boundary of the split should be, but if other people agree that the rv_continuous and rv_discrete classes should be moved to the common file before this PR is merged then I'll try to do it.

@argriffing

@ev-br manually incorporated your changes to remove the calls to globals()

@coveralls

Coverage Status

Coverage remained the same when pulling f5b823e on argriffing:retry-split-distributions into fb3a591 on scipy:master.

@rgommers
SciPy member

Overall looks good to me. Some comments:

  • argsreduce and rv_generic were not in the stats namespace before, now they are. I think they should remain private.
  • All three new files have a number of unused imports that can be cleaned up.
  • I agree that rv_continuous and rv_discrete belong with the infrastructure.
  • git blame -M -C -C still gives useful output. May be worth a comment in distributions.py though that this is the way to use git blame; git blame -Lxxx,+x (which I normally use) doesn't help anymore.
@rgommers
SciPy member

Removal of globals call makes sense.

All tests pass also with numpy 1.5.1

@coveralls

Coverage Status

Coverage remained the same when pulling 36282ee on argriffing:retry-split-distributions into fb3a591 on scipy:master.

@coveralls

Coverage Status

Coverage remained the same when pulling 329f102 on argriffing:retry-split-distributions into fb3a591 on scipy:master.

@coveralls

Coverage Status

Coverage remained the same when pulling 82d0a18 on argriffing:retry-split-distributions into fb3a591 on scipy:master.

@argriffing

@rgommers the PR is updated according to your comments

@coveralls

Coverage Status

Coverage remained the same when pulling 82d0a18 on argriffing:retry-split-distributions into fb3a591 on scipy:master.

@rgommers rgommers merged commit 9f48142 into scipy:master Dec 25, 2013

1 check passed

Details default The Travis CI build passed
@rgommers
SciPy member

This looks good to me now, merging. Thanks Alex.

@WarrenWeckesser

@argriffing: This was a much needed, long overdue refactoring job. Thanks!

@argriffing

This seems to have backwards incompatibly changed stats.distributions.__all__ somehow https://github.com/pymc-devs/pymc/issues/454.

@rgommers
SciPy member
rgommers commented Feb 7, 2014

Fixed again by gh-3292

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.