Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAINT: split distributions.py into three files #3134

Merged
merged 6 commits into from Dec 25, 2013

Conversation

argriffing
Copy link
Contributor

This is a redo of #3128 with fewer commits.

@josef-pkt
Copy link
Member

can you check what git blame shows on your computer? github doesn't track moves or renames.

@pv
Copy link
Member

pv commented Dec 7, 2013

Putting it in a single commit is probably as good as it gets. git blame -M -C -C still shows alex wrote many lines, but I think we can live with that.

@argriffing
Copy link
Contributor Author

@josef-pkt I'm pretty bad with git, so I don't know what options to use for git blame. The git commit command gave the following interpretation:

 rename scipy/stats/{distributions.py => _continuous_distns.py} (61%)
 create mode 100644 scipy/stats/_discrete_distns.py
 create mode 100644 scipy/stats/_distn_infrastructure.py
 rewrite scipy/stats/distributions.py (99%)

According to some googling, I don't think that git can track blame across splits, although it can track renaming. In this case, I think that it treats distributions.py as renamed to _continuous_distns.py and the other code loses blame information. I am not good with git, so maybe there is a better way to do it.

@josef-pkt
Copy link
Member

@argriffing I use git gui which has Browse Branch Files that can be used the follow a line through it's history.

git is much better in this than other vcs, It can keep track of cut and paste as long as the two segments don't diverge much.
I'm usually careful with cut and paste in statsmodels, but I don't use blame very often for statsmodels so I don't really know how good git is doing with this.
changing lines after a clean cut and paste is no problem because that's just a new commit that doesn't disrupt that git can track the history.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 7559f79 on argriffing:retry-split-distributions into fb3a591 on scipy:master.

@argriffing
Copy link
Contributor Author

@josef-pkt I don't have gui access to the machine I'm using for development, but I locally cloned the argriffing/scipy repo and checked out the specific retry-split-distribution branch, and much of the git-gui blame history seems to be available. Opposite of git's stdout output from my commit, the blame history seems almost full for _discrete_distns and _distn_infrastructure but not so great for _continuous_distns.

@argriffing
Copy link
Contributor Author

Because the diffs are hard to read, I'll add some extra commentary that might help reviewing.

The split was easier than I was afraid it might be, but two parts needed more attention than just copying and pasting.

First, the base class of continuous and discrete distributions calls globals() at one point when it needs to do something related to docstring generation (I think), and this broke when the files were split. I patched this by just letting that function take an extra optional parameter which I used to pass the globals() from the file with the derived class (discrete only).

Second, both the continuous and discrete distributions use a noncentral chi square distribution, so I put the logpdf, cdf, and pdf of this class into the third file and called them from the continuous-specific and discrete-specific classes.

try:
meth = globals()[name]
except KeyError:
meth = morevars[name]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this second nested try/except and added the optional argument morevars so that rv_discrete can pass its globals() into this base class function. This works around problems caused by the effect of file splitting on the dictionary returned by globals().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC adding this was an attempt to work around the problem with inspecting the signatures of _pdf and friends for discrete sample distribution, where global module-level functions are being attached as instancemethods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the first try-except and passing around globals() can be avoided altogether by passing to _construct_argparser the methods themselves, rather than their names.
(No idea why I made it more complicated than it actually needs to be.) Can you have a look at ev-br@7840023? I tried sending a PR against your PR but I don't seem to be able to figure out how to do it.

@argriffing
Copy link
Contributor Author

I added line comments to the parts that needed special attention when the distributions.py file was split.

@ev-br
Copy link
Member

ev-br commented Dec 7, 2013

\begin{bikeshedding}
I'd regard the generic code paths of rv_continuous / rv_discrete as infrastructure. Have you considered moving these two classes back to _infrastructure.py, and have _discrete_distns have specific distributions only?
Maybe it'll help with the history as well?
\end{bikeshedding}

@argriffing
Copy link
Contributor Author

@ev-br I don't know know much about distributions.py and I don't have any opinion about where exactly the boundary of the split should be, but if other people agree that the rv_continuous and rv_discrete classes should be moved to the common file before this PR is merged then I'll try to do it.

@argriffing
Copy link
Contributor Author

@ev-br manually incorporated your changes to remove the calls to globals()

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling f5b823e on argriffing:retry-split-distributions into fb3a591 on scipy:master.

@rgommers
Copy link
Member

Overall looks good to me. Some comments:

  • argsreduce and rv_generic were not in the stats namespace before, now they are. I think they should remain private.
  • All three new files have a number of unused imports that can be cleaned up.
  • I agree that rv_continuous and rv_discrete belong with the infrastructure.
  • git blame -M -C -C still gives useful output. May be worth a comment in distributions.py though that this is the way to use git blame; git blame -Lxxx,+x (which I normally use) doesn't help anymore.

@rgommers
Copy link
Member

Removal of globals call makes sense.

All tests pass also with numpy 1.5.1

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 36282ee on argriffing:retry-split-distributions into fb3a591 on scipy:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 329f102 on argriffing:retry-split-distributions into fb3a591 on scipy:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 82d0a18 on argriffing:retry-split-distributions into fb3a591 on scipy:master.

@argriffing
Copy link
Contributor Author

@rgommers the PR is updated according to your comments

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 82d0a18 on argriffing:retry-split-distributions into fb3a591 on scipy:master.

rgommers added a commit that referenced this pull request Dec 25, 2013
MAINT: split distributions.py into three files
@rgommers rgommers merged commit 9f48142 into scipy:master Dec 25, 2013
@rgommers
Copy link
Member

This looks good to me now, merging. Thanks Alex.

@WarrenWeckesser
Copy link
Member

@argriffing: This was a much needed, long overdue refactoring job. Thanks!

@argriffing
Copy link
Contributor Author

This seems to have backwards incompatibly changed stats.distributions.__all__ somehow pymc-devs/pymc#454.

@rgommers
Copy link
Member

rgommers commented Feb 7, 2014

Fixed again by gh-3292

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants