Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: Refactored so that there is no longer a need for 2to3 #1520

Merged
merged 31 commits into from Apr 10, 2014

Conversation

Projects
None yet
5 participants
@bashtage
Copy link
Contributor

commented Mar 26, 2014

The codebase has been refactored so that it can exist in a single, unified base.
The strategy closely followed six and pandas.compat, but does not involve any further
dependencies.

The compatability location has been renamed form compatnp, which housed compatnp.py3k, to
just compat, and the main py3k compatability files are in init.py so that they can
be directly accessed

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Mar 26, 2014

oh no, why did you merge all commits.

It's difficult to see which are innocent changes, and which need closer review

https://github.com/scipy/scipy/pull/397/commits

from statsmodels.graphics.gofplots import qqplot, qqplot_2samples, qqline, ProbPlot
from statsmodels.graphics import api as graphics
from statsmodels.stats import api as stats
from statsmodels.emplike import api as emplike

This comment has been minimized.

Copy link
@jseabold

jseabold Mar 26, 2014

Member

Are these changes strictly necessary? Explicit relative imports are a little easier to read IMO.

@bashtage

This comment has been minimized.

Copy link
Contributor Author

commented Mar 26, 2014

They are all innocent :)

On Mar 26, 2014 7:33 PM, Josef Perktold notifications@github.com wrote:

oh no, why did you merge all commits.

It's difficult to see which are innocent changes, and which need closer review

https://github.com/scipy/scipy/pull/397/commits


Reply to this email directly or view it on GitHubhttps://github.com//pull/1520#issuecomment-38729131.

@jseabold

This comment has been minimized.

Copy link
Member

commented Mar 26, 2014

FWIW, pandas is the only package I contribute to with the squash everything and let god sort 'em out philosophy. I find it to be pretty much awful, but I (try to) pick my battles over there. I know other non-pandas devs are also surprised when they're asked to squash everything in a PR over there. "But version control and logs are...useful...?"

We tend to prefer more commits even if some are meandering to squashing all of them. That said, I don't know which case this falls into. I don't mind so much here just browsing so far, but Josef ends up doing the (vast) majority of the outside code review, so I usually defer to him.

@jseabold

This comment has been minimized.

Copy link
Member

commented Mar 26, 2014

If it's dire, I think you can recover the unsquashed commits with git reflog, but I'd do it on a temporary local branch and be very sure first.

@jseabold

This comment has been minimized.

Copy link
Member

commented Mar 26, 2014

This is a huge effort, though. Thanks for taking it on.

Right now I'm thinking about things like lrange, lmap, and cPickle. I'm almost never going to remember to use these, though I know travis wouldn't let me forget.

Would it be more desirable to just add a list call whenever we really need a list back vs. when xrange e.g. would do the job? I didn't look to see if this is already the case or if you replaced all map/range/etc.

For cPickle, should we alias pickle to cPickle in 2.7, so that it's the odd person out, or should we stick with the slower pickle in 2.x and not worry about special casing it. I think I vote for the latter.

@jseabold

This comment has been minimized.

Copy link
Member

commented Mar 26, 2014

Keep in mind that I'm pretty much exclusively using 2.7 for at least the near future.

@jseabold

This comment has been minimized.

Copy link
Member

commented Mar 26, 2014

I also wonder if we just want to use six or if there's a reason we haven't been?

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Mar 26, 2014

There are two problems with squashing too much:

  • for immediate code review: I cannot review every line in detail, so I need to screen what will be most likely innocent, cosmetic, style changes, and what are the critical parts. I got quite good in spotting this, but it is impossible to look at some details if its hidden in a few hundred or thousand lines of changes.
  • history: If we have a problem that shows up later, we often need to go back to understand why or where it was introduced. If we don't have nice commits and commit messages, then this is a needle in a haystack. blame often helps but is difficult across code moves.

commits should be logical units, but if selective squashing is too difficult (I often don't try), then I rather have too many than too few commits. Since commits are bunched together in merges, many commits still don't pollute the main history line of master.

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Mar 26, 2014

I also wonder if we just want to use six or if there's a reason we haven't been?

I don't think we gain anything using six. Most of what I've seen in this PR looks good, close to python 3. scipy has deleted most of the six module.

@jseabold

This comment has been minimized.

Copy link
Member

commented Mar 26, 2014

And for the above, I'm suggesting to just use range everywhere we're using xrange and just accept that it's more memory efficient in Python 3. Stick/carrot and all that.

@bashtage

This comment has been minimized.

Copy link
Contributor Author

commented Mar 26, 2014

There is surprisingly little surprising about porting to a single code base. The biggest su[prise was that it has to be done all at once - or has to involve an external dependence like six. This happens since running 2to3 on compat mungs (http://en.wikipedia.org/wiki/Mung) it.

I think the only challenges I came across were (a) circular import issues, which took a while but were ultimately easy to avoid and (b) print >> buf, something which I had never seen and for which google was no help. Thinking about it and seeing some other code examples (but no explantion) indicated that this was just an old shorthand for buf.write(str(something) + '\n') which is what redirecting print to buf would do.

FWIW, pandas is the only package I contribute to with the squash everything and let god sort 'em out philosophy. I find it to be pretty much awful, but I (try to) pick my battles over there. I know other non-pandas devs are also surprised when they're asked to squash everything in a PR over there. "But version control and logs are...useful...?"

We tend to prefer more commits even if some are meandering to squashing all of them. That said, I don't know which case this falls into. I don't mind so much here just browsing so far, but Josef ends up doing the (vast) majority of the outside code review, so I usually defer to him.

@bashtage

This comment has been minimized.

Copy link
Contributor Author

commented Mar 26, 2014

Code that uses list(zip()) and list(range()) will obviously work correctly without issue, although range should be explicitly imported from compat to ensure that xrange is used on 2.x and range is used on 3k.

In most cases with numerical code, the difference between range and xrange is small.

This is a huge effort, though. Thanks for taking it on.

Right now I'm thinking about things like lrange, lmap, and cPickle. I'm almost never going to
remember to use these, though I know travis wouldn't let me forget.

Would it be more desirable to just add a list call whenever we really need a list back vs. when xrange e.g. would do the job? I didn't look to see if this is already the case or if you replaced all map/range/etc.

For cPickle, should we alias pickle to cPickle in 2.7, so that it's the odd person out, or should we stick with the slower pickle in 2.x and not worry about special casing it. I think I vote for the latter.

@jseabold

This comment has been minimized.

Copy link
Member

commented Mar 26, 2014

print >> buf is a Wes McKinney-ism AFAIK. I've never seen anyone else use it.

@bashtage

This comment has been minimized.

Copy link
Contributor Author

commented Mar 26, 2014

I seriousness the commits were not done in any particular order, mostly working through the issues I saw in the 2to3 report until there were no more actual issues.

oh no, why did you merge all commits.

It's difficult to see which are innocent changes, and which need closer review

@bashtage

This comment has been minimized.

Copy link
Contributor Author

commented Mar 26, 2014

Given the rarity of that construct, it seems like it should be avoided.

print >> buf is a Wes McKinney-ism AFAIK. I've never seen anyone else use it.

@bashtage

This comment has been minimized.

Copy link
Contributor Author

commented Mar 26, 2014

You can blame @jreback for my commit style - I do sort of like it. Certainly more than my natural semi-random or too-frequent commit style.

FWIW, pandas is the only package I contribute to with the squash everything and let god sort 'em out philosophy. I find it to be pretty much awful, but I (try to) pick my battles over there. I know other non-pandas devs are also surprised when they're asked to squash everything in a PR over there. "But version control and logs are...useful...?"

@bashtage

This comment has been minimized.

Copy link
Contributor Author

commented Mar 26, 2014

Eay to roll back some, but then you have a mix of absolute import statsmodels.blah and from .blah import blerg. I find the all absolute version to be slightly easier to read.

Are these changes strictly necessary? Explicit relative imports are a little easier to read IMO.

@bashtage

This comment has been minimized.

Copy link
Contributor Author

commented Mar 26, 2014

The biggest difficulty with this PR is that it will require rebasing anything else in the queue since all non-trivial files were touched. This said, while going through the code, I notes many examples of

(a) non-imported code being called (although some of this was determined by PyCharm, which isn't 100% reliable)
(b) modules called which are not in requirements
(c) print functions in actual code serving as warnings

These should all be eliminated in the near future.

@jseabold

This comment has been minimized.

Copy link
Member

commented Mar 26, 2014

Thanks. We had an unwritten, though I believe spoken, rule of explicit
relative imports in api.py files and full paths in test suites and in the
code.

@bashtage

This comment has been minimized.

Copy link
Contributor Author

commented Mar 26, 2014

The problem of unwritten rules is that they are ...

@jseabold

This comment has been minimized.

Copy link
Member

commented Mar 26, 2014

Sure, though it was 'spoken' about on the mailing list many moons ago. It's been mainly two of us for a while, so not a large need to write everything down. Maybe it should go in the dev docs or we can revisit these decisions.

@jseabold

This comment has been minimized.

Copy link
Member

commented Mar 26, 2014

I know many things that I knew long ago to be completely and utterly wrong these days.

@jreback

This comment has been minimized.

Copy link
Contributor

commented Mar 26, 2014

squash bashing should be directed to. @wesm :)

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Mar 27, 2014

about imports
using relative imports withing a directory and absolute imports across directories is by far the easiest way to stay out of circular import problems IMO

@josef-pkt

This comment has been minimized.

Copy link
Member

commented Mar 27, 2014

about lzip, lrange and similar:
I like them for now: They show that they were changed for py 2 py 3 compatibility, and can be changed again on an individual basis.

for example: dict(lzip(...)) doesn't need list, but I haven't checked and don't remember for python 2.6. In the long term I find np.array(list(zip(...)) easier to understand than using lzip (explicit is better than ...)

@bashtage

This comment has been minimized.

Copy link
Contributor Author

commented Mar 27, 2014

Right now I'm thinking about things like lrange, lmap, and cPickle. I'm almost never going to remember to use these, though I know travis wouldn't let me forget.

I think using pickle everywhere is reasonable - in fact, using things that look more like Python3 probably makes sense. As for the l functions - lzip, lmap and lrange - the advantage of using these is that they avoid extra calls to list on 2.x - although in most application this is not a big deal.

@coveralls

This comment has been minimized.

Copy link

commented Mar 27, 2014

Coverage Status

Coverage remained the same when pulling 97351b3 on bashtage:remove-2to3 into e397e42 on statsmodels:master.

@coveralls

This comment has been minimized.

Copy link

commented Mar 28, 2014

Coverage Status

Coverage remained the same when pulling 5113bed on bashtage:remove-2to3 into e397e42 on statsmodels:master.

@coveralls

This comment has been minimized.

Copy link

commented Apr 1, 2014

Coverage Status

Changes Unknown when pulling b887e2f on bashtage:remove-2to3 into * on statsmodels:master*.

Kevin Sheppard added some commits Mar 28, 2014

Kevin Sheppard
Cleaned up old, out dated import code for packages that are now requi…
…red.

Removed a small amount of unnecessary compatability code for slogdet

Conflicts:
	statsmodels/iolib/tests/test_table_econpy.py
Kevin Sheppard
Fix missing parentheses in tuple comprehension and missing lmap
Conflicts:
	statsmodels/graphics/tests/test_tsaplots.py
Kevin Sheppard
Improved idiomatic python use by removing type comparisons and using …
…isinstance

Removed unneeded .sort() and replaced with sorted()
Replaced range with lrange where needed

Conflicts:
	statsmodels/genmod/generalized_estimating_equations.py
Kevin Sheppard
Refactored from compatnp to compat using the structure
compat for core Python compatibility
compat.numpy for numpy compatibility
compat.scipy for scipy compatibility

Other locations can be used for compatibility across version of other dependencies, e.g.
compat.pandas

Conflicts:
	statsmodels/base/model.py
Kevin Sheppard
Moved NumpyVersion to compat.scipy since it comes from scipy
Fixed import path for NumpyVersion from scipy to lib._version

Conflicts:
	statsmodels/compat/scipy/__init__.py
Kevin Sheppard
Moved compat.__init__ to compat.python
Imported from .python in __init__
Kevin Sheppard
Moved compat.__init__ to compat.python
Conflicts:
	statsmodels/compat/__init__.py
	statsmodels/compat/scipy/__init__.py
	statsmodels/compat/tests/test_collections.py
	statsmodels/compat/tests/test_itercompat.py

Conflicts:
	statsmodels/genmod/tests/test_gee.py
Kevin Sheppard
@josef-pkt

This comment has been minimized.

Copy link
Member

commented Apr 10, 2014

@bashtage @jseabold any objection to merging?

TravisCI should come back green soon. My latest 3.3 run also came back green (except unrelated failure)

@jseabold

This comment has been minimized.

Copy link
Member

commented Apr 10, 2014

Merge is fine with me. Will update other PRs as needed.

@bashtage

This comment has been minimized.

Copy link
Contributor Author

commented Apr 10, 2014

Seems find now. Had missed an absolute_import

@bashtage

This comment has been minimized.

Copy link
Contributor Author

commented Apr 10, 2014

from statsmodels.compat import range, ....
or
from statsmodels.compat.python import iterkeys

If the second, then importing compat.python into compat.init.py is redundant.

My preference is for the former since it is very common, and it occasionally saves splitting imports across two lines.

josef-pkt added a commit that referenced this pull request Apr 10, 2014

Merge pull request #1520 from bashtage/remove-2to3
REF: Refactored so that there is no longer a need for 2to3  closes #616

@josef-pkt josef-pkt merged commit 5931ec4 into statsmodels:master Apr 10, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details

@bashtage bashtage deleted the bashtage:remove-2to3 branch Apr 10, 2014

@josef-pkt josef-pkt added the PR label Apr 14, 2014

PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this pull request Sep 2, 2014

Merge pull request statsmodels#1520 from bashtage/remove-2to3
REF: Refactored so that there is no longer a need for 2to3  closes statsmodels#616
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.