Python 3 compatibility. #2

Closed
wants to merge 20 commits into
from

Conversation

Projects
None yet
6 participants
Contributor

takluyver commented Mar 10, 2012

Updated: This adds compatibility with Python 3, and adapts the test suite from Manfred Moitzi's xlrd3 port. All tests are passing on Python 2 and 3.

xlrd/formula.py
@@ -716,7 +723,7 @@ def num2strg(num):
tAdd: (_arith_argdict, oNUM, opr.add, 30, '+'),
tSub: (_arith_argdict, oNUM, opr.sub, 30, '-'),
tMul: (_arith_argdict, oNUM, opr.mul, 40, '*'),
- tDiv: (_arith_argdict, oNUM, opr.div, 40, '/'),
+ tDiv: (_arith_argdict, oNUM, floordiv, 40, '/'),
@takluyver

takluyver Mar 10, 2012

Contributor

N.B. On reflection, I think truediv might be a better choice here. div behaves like floordiv if both arguments are integers, otherwise like truediv.

Contributor

takluyver commented Mar 10, 2012

I've copied across the tests from Manfred Moitzi's xlrd3 port, and got them passing on Python 2 and 3, with the exception of test_xfcell, which seems to relate to something Manfred added to his port.

To run the tests, install xlrd (e.g. in a virtualenv), cd into tests and run python run_tests.py.

Owner

cjw296 commented Mar 27, 2012

Hi Thomas,

Some comments from John in discussions with me:

The are some serious problems with this pull request. For example, code like

SIGNATURE = "\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1"

This is binary crud and needs to remain a str object in py2 and become a
bytes object in py3. John's solution:

SIGNATURE = BYTES_LITERAL("\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1")

py2: BYTES_LITERAL = lambda x: x
py3: BYTES_LITERAL = lambda x: x.encode('latin1')

I would likely use b instead of BYTES_LITERAL but John prefers it to be "in your
face" rather than unintrusive.

Also BYTES_LITERAL("") and BYTES_LITERAL("\x00") appear often enough to
warrant special-casing.

Of about 13 cases of \xNN in sheet.py, you've fixed only about 5.
Most of these you appear to have found not by applying general principles but from the change log of the now-abandoned xlrd3 port
(and those changes to xlrd3 were in response to an issue that John raised!).

Worse: 2 of the remaining cases will crash with a UnicodeEncodeError
because you've used cp1252 instead of latin1.

Can you explain why you used cp1252?!

Unfortunately, until all these problems are addresses, we can't really consider this pull request.

Contributor

takluyver commented Mar 27, 2012

Fixing the remainder of those cases is easy enough, I'll get on it. Using cp1252 was just an oversight.

I'd still argue in favour of using b() - this is becoming convention among projects which need to support Python < 2.6 as well as Python 3. E.g. the six compatibility library offers b() and u() helper functions.

Owner

cjw296 commented Mar 27, 2012

I'd argue in favour of whatever gets this pull request merged sooner, which in this case means BYTES_LITERAL and, if needed, UNICODE_LITERAL ;-)

If you could make those changes, I'll ping John and get him to re-review the branch...

Chris

Owner

cjw296 commented Mar 27, 2012

One other thing: can you merge as many of your commits as possible before you push them up to your branch? We don't really need a million commits for this work ;-)

Contributor

takluyver commented Mar 27, 2012

OK, I think I've dealt with most of those issues. I've gone through and identified likely bytes literals containing \x or \0. I've special-cased byte_0 and bytes_empty. I've tried to work out which empty strings should be bytes, although I've left some that I wasn't sure about, mainly in formula.py. The mistake with cp1252 is fixed.

I haven't renamed b to BYTES_LITERAL, as I think it's better to be consistent with other projects doing the same thing. If you still have a strong preference for the longer name, I can change it.

Contributor

takluyver commented Mar 27, 2012

That's not a million commits, I'm more used to this. ;-) I can rebase and squash them if you want, although you'll need to delete the branch and re-pull it if you've already checked out a copy.

Contributor

takluyver commented Mar 27, 2012

Just seen your message re b/BYTES_LITERAL : if you'd lean towards b as well, I'd like to ask John to at least consider the argument in favour of that. If he still says no, I'll change them.

Owner

cjw296 commented Mar 27, 2012

Looks good. I'll likely squash when I do the merge. Don't worry, I'll make sure you get attribution :-)

Now, any idea how I can get urls for:

  • a diff between your branch and master
  • a .zip download of the whole branch

Once I've got those, I'll give John a poke :-)

Owner

cjw296 commented Mar 27, 2012

I lean to whatever keeps John happy, so BYTES_LITERAL for me ;-)

Contributor

takluyver commented Mar 29, 2012

I think the argument for convention (every other project I've seen doing this uses b()) is significant. I would like to ask John to give it due consideration. This is the understanding about a BDFL: he gets to decide, but he has a responsibility to hear and acknowledge other opinions first.

Of course, if his preference is enough to convince you, you're free to pull this branch and replace b( with BYTES_LITERAL( yourself. But I don't intend to make the change without some real discussion of why it's worth breaking with convention.

Owner

cjw296 commented Apr 2, 2012

From @sjmachin : 'I chose BYTES_LITERAL a long while ago. The "convention" IMHO
is both twee and Gadarene. A one-letter function name', so, I'd suggest changing b( to BYTES_LITERAL( if you want the branch merged ;-)

Owner

cjw296 commented Apr 2, 2012

More feedback from @sjmachin :

  • please consider adding more unit tests to this branch
  • please make sure all unit tests pass with Python 2 as well as with Python 3
  • if you fix one instance of a problem, please find them all (grep is your friend, there are still some ord() instances that haven't been fixed)
Owner

cjw296 commented Apr 2, 2012

One final one from me, I'd suggest re-merging master into your branch to get the more recent commits...

Contributor

takluyver commented Apr 2, 2012

Thank-you, I've renamed b to BYTES_LITERAL, tidied up the remaining calls to ord, and merged master in.

The unit tests are all passing on Python 2 and Python 3.

Does xlrd already have any unit tests, perhaps not checked into the repository? I agree that adding more would be a good idea, but that seems like a separate concern from this.

Owner

cjw296 commented Apr 4, 2012

test_xfcell appears to be commented out in run_tests.py, can you get that working too please?

Contributor

takluyver commented Apr 4, 2012

I think that relates to something that Manfred added to his port, that's not in the main codebase. I'll remove the file.

Contributor

takluyver commented Apr 13, 2012

I've just rebased this - anyone who has a copy of the branch in their local repository should delete it and pull from my repository again.

Contributor

takluyver commented Apr 13, 2012

OK, I've updated things to use @sjmachin's compatibility functions from 5ee3e13, and checked that the tests still pass on Python 2 and 3.

Contributor

takluyver commented Apr 15, 2012

Can I politely suggest that we get this merged into master, now that the 0.7 branch has been split off? If there is going to be more refactoring, having tests will be useful to highlight anything that gets broken. And the longer it waits, the more merge conflicts there will be for someone to resolve.

Owner

cjw296 commented Apr 15, 2012

Nope, not yet. The current trunk/master is for the 0.8 release, which will feature xlsx but not Python 3 support.

Contributor

takluyver commented Apr 15, 2012

Any particular reason why not? As far as I can tell, this is all
working OK, and it shouldn't break anything on Python 2. Python 3
support can always be marked as experimental if you prefer.

@sjmachin sjmachin closed this in 531c14c Apr 15, 2012

@cjw296 cjw296 reopened this Apr 15, 2012

Owner

cjw296 commented Apr 15, 2012

Er, I meant #3 sigh

Owner

cjw296 commented Apr 15, 2012

@takluyver - give it time, please, one thing at a time..

Contributor

takluyver commented Apr 16, 2012

Chris, I am striving to be patient, but "one thing at a time" is not really a reason. It's not like I'm suggesting something outlandish here: it's perfectly normal for a major release to include more than one new feature.

This thing (Python 3 support) was largely complete before that thing (xlsx) started to be committed. This thing also includes automated tests that will help in development of that thing (or any thing - tests are always a good idea). John has already started duplicating some of my effort in master, which is a waste of his time. And while this branch sits unmerged, I have to keep resolving conflicts with changes in master, which is a waste of my time. These are reasons to merge it soon. Maybe there's a reason to hold off that trumps them all: if so, please describe it, because I can't see what it is.

My apologies if this seems harsh. I don't want to get into an argument with you, but this process is riling me. I have put in the time to do this the way you want, and now you seemingly want to delay it until some unspecified point in the future, without offering one good reason.

Contributor

takluyver commented Apr 16, 2012

In addition, if you've not already seen it: xlrd is among the 25 most requested packages for Python 3 support.

Contributor

takluyver commented May 12, 2012

Merged changes from master again.

Contributor

takluyver commented Sep 23, 2012

Would you consider this again now that 0.8 has been released?

Contributor

takluyver commented Oct 11, 2012

I've merged master back in, and all the tests still pass on Python 2.7 and 3.2.

Takluyver’s py3 branch works quite nice for me. It would really be nice if the pull request could be accepted soon.

robhuls commented Oct 26, 2012

It would be very nice if the pull is made. Now that the numpy/scipy/matplotlib stuff is working on py3, I see many people around me taking the plunge, including me, but I still need to go back to py2 for xlrd.

Owner

cjw296 commented Oct 26, 2012

@takluyver - sorry, I've lost a bit of perspective, can you let me know whether your changes use any external packages (2to3, six, etc) or if they just involve importing from a local compatibility module?

Also, can you let me know which of the following xlrd would be compatible with should the pull request get merged:

python 2.5
python 2.6
python 2.7
python 3.0
python 3.1
python 3.2
python 3.3

Contributor

takluyver commented Oct 26, 2012

Hi Chris,

It doesn't use any extra modules at runtime - all the compatibility stuff is added to timemachine. John already committed most of that, so it doesn't appear in this diff. Installation in Python 3 uses 2to3, which is part of the Python standard library.

I've aimed for compatibility with all of the versions you list, although I've only tested with 2.7 and 3.2. I haven't used 2.5 in years, so I could have accidentally used some feature that was new with 2.6 - but I think I have a pretty good idea what those features are. If testing shows any problems, they should be easy to fix.

Owner

cjw296 commented Oct 26, 2012

Ah, okay, I was worried 2to3 might have been used.

I'm not keen on any solution that involves a code translation as part of
install.

Personally, I would prefer us to target Python 2.7 and 3.3, and have the
code structured in such a way that it works fine under both without any
translation step.

Would this be possible from your prespective?

Contributor

takluyver commented Oct 26, 2012

It's certainly possible - if we needn't support Python 2.5, we can do away with the ugly BYTES_LITERAL bits, as real bytes literals exist in 2.6 and above. Shall I start working on that?

I'd still aim to support at least 3.2 as well, as it will remain the default python3 in Linux distros for some time, depending on release cycles.

Contributor

takluyver commented Oct 26, 2012

I have it working (passing the tests) without 2to3 in a new branch. Unfortunately, while we can get rid of BYTES_LITERAL, we do need UNICODE_LITERAL - I'd hoped that we could avoid that with a future statement, but that causes problems with struct.

https://github.com/takluyver/xlrd/tree/py3-native

If you prefer this route (dropping Python 2.5 support), let me know and I'll polish it up.

vadmium commented Oct 27, 2012

If your problems with struct are all because the format strings can’t be in Unicode, perhaps you could change them all to bytes strings. It seems even in Python 3 the struct module accepts bytes format strings, although I haven’t seen this mentioned in the documentation. I’ve been caught out before by using normal Unicode strings, when everything works smoother if you use bytes strings:

>>> struct.Struct("<H").format + "H"
TypeError: can't concat bytes to str
>>> struct.Struct("<H").format
b'<H'
>>> struct.Struct(b"<H").format + b"H"
b'<HH'
Contributor

takluyver commented Oct 28, 2012

Thanks, that's handy to know. Looking at the source code, it looks like that's a deliberate feature, not just an accident. I've submitted an issue to clarify whether that's safe to rely on.

I'm still waiting for confirmation that that's the route we'd like to take before I spend more time on it, though.

Contributor

takluyver commented Nov 30, 2012

I'm closing this in favour of PR #21, but it can always be reopened if we choose to.

@takluyver takluyver closed this Nov 30, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment