GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
Already on GitHub? Sign in to your account
Updated: This adds compatibility with Python 3, and adapts the test suite from Manfred Moitzi's xlrd3 port. All tests are passing on Python 2 and 3.
N.B. On reflection, I think truediv might be a better choice here. div behaves like floordiv if both arguments are integers, otherwise like truediv.
I've copied across the tests from Manfred Moitzi's xlrd3 port, and got them passing on Python 2 and 3, with the exception of test_xfcell, which seems to relate to something Manfred added to his port.
To run the tests, install xlrd (e.g. in a virtualenv), cd into tests and run python run_tests.py.
Some comments from John in discussions with me:
The are some serious problems with this pull request. For example, code like
SIGNATURE = "\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1"
This is binary crud and needs to remain a str object in py2 and become a
bytes object in py3. John's solution:
SIGNATURE = BYTES_LITERAL("\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1")
py2: BYTES_LITERAL = lambda x: x
py3: BYTES_LITERAL = lambda x: x.encode('latin1')
I would likely use b instead of BYTES_LITERAL but John prefers it to be "in your
face" rather than unintrusive.
Also BYTES_LITERAL("") and BYTES_LITERAL("\x00") appear often enough to
Of about 13 cases of \xNN in sheet.py, you've fixed only about 5.
Most of these you appear to have found not by applying general principles but from the change log of the now-abandoned xlrd3 port
(and those changes to xlrd3 were in response to an issue that John raised!).
Worse: 2 of the remaining cases will crash with a UnicodeEncodeError
because you've used cp1252 instead of latin1.
Can you explain why you used cp1252?!
Unfortunately, until all these problems are addresses, we can't really consider this pull request.
Fixing the remainder of those cases is easy enough, I'll get on it. Using cp1252 was just an oversight.
I'd still argue in favour of using b() - this is becoming convention among projects which need to support Python < 2.6 as well as Python 3. E.g. the six compatibility library offers b() and u() helper functions.
I'd argue in favour of whatever gets this pull request merged sooner, which in this case means BYTES_LITERAL and, if needed, UNICODE_LITERAL ;-)
If you could make those changes, I'll ping John and get him to re-review the branch...
One other thing: can you merge as many of your commits as possible before you push them up to your branch? We don't really need a million commits for this work ;-)
OK, I think I've dealt with most of those issues. I've gone through and identified likely bytes literals containing \x or \0. I've special-cased byte_0 and bytes_empty. I've tried to work out which empty strings should be bytes, although I've left some that I wasn't sure about, mainly in formula.py. The mistake with cp1252 is fixed.
I haven't renamed b to BYTES_LITERAL, as I think it's better to be consistent with other projects doing the same thing. If you still have a strong preference for the longer name, I can change it.
That's not a million commits, I'm more used to this. ;-) I can rebase and squash them if you want, although you'll need to delete the branch and re-pull it if you've already checked out a copy.
Just seen your message re b/BYTES_LITERAL : if you'd lean towards b as well, I'd like to ask John to at least consider the argument in favour of that. If he still says no, I'll change them.
Looks good. I'll likely squash when I do the merge. Don't worry, I'll make sure you get attribution :-)
Now, any idea how I can get urls for:
Once I've got those, I'll give John a poke :-)
I lean to whatever keeps John happy, so BYTES_LITERAL for me ;-)
I think the argument for convention (every other project I've seen doing this uses b()) is significant. I would like to ask John to give it due consideration. This is the understanding about a BDFL: he gets to decide, but he has a responsibility to hear and acknowledge other opinions first.
Of course, if his preference is enough to convince you, you're free to pull this branch and replace b( with BYTES_LITERAL( yourself. But I don't intend to make the change without some real discussion of why it's worth breaking with convention.
From @sjmachin : 'I chose BYTES_LITERAL a long while ago. The "convention" IMHO
is both twee and Gadarene. A one-letter function name', so, I'd suggest changing b( to BYTES_LITERAL( if you want the branch merged ;-)
More feedback from @sjmachin :
One final one from me, I'd suggest re-merging master into your branch to get the more recent commits...
Thank-you, I've renamed b to BYTES_LITERAL, tidied up the remaining calls to ord, and merged master in.
The unit tests are all passing on Python 2 and Python 3.
Does xlrd already have any unit tests, perhaps not checked into the repository? I agree that adding more would be a good idea, but that seems like a separate concern from this.
test_xfcell appears to be commented out in run_tests.py, can you get that working too please?
I think that relates to something that Manfred added to his port, that's not in the main codebase. I'll remove the file.
Add trove classifiers for Python 2 and 3 compatibility.
Use truediv instead of floordiv in formula.
Add tests from xlrd3
Remove extraneous print from demo code.
Tweaks to tests.
Include changes to formula parsing from xlrd3
Initial changes for Python 3 compatibility.
Some more fixes for bytes literals
Add script to run tests.
Correction to tests for Python 2.
Use unicode in tests with Python 2.
bytes fixes in hex_char_dump()
More use of byte_0 and bytes_empty.
Use get_int_1byte in more places.
b() -> BYTES_LITERAL()
Remove non-corresponding xfcell tests.
I've just rebased this - anyone who has a copy of the branch in their local repository should delete it and pull from my repository again.
Update tests to new package structure.
Use sjmachin's Python 3 compatibility functions.
OK, I've updated things to use @sjmachin's compatibility functions from 5ee3e13, and checked that the tests still pass on Python 2 and 3.
Can I politely suggest that we get this merged into master, now that the 0.7 branch has been split off? If there is going to be more refactoring, having tests will be useful to highlight anything that gets broken. And the longer it waits, the more merge conflicts there will be for someone to resolve.
Nope, not yet. The current trunk/master is for the 0.8 release, which will feature xlsx but not Python 3 support.
Any particular reason why not? As far as I can tell, this is all
working OK, and it shouldn't break anything on Python 2. Python 3
support can always be marked as experimental if you prefer.
Er, I meant #3 sigh
@takluyver - give it time, please, one thing at a time..
Chris, I am striving to be patient, but "one thing at a time" is not really a reason. It's not like I'm suggesting something outlandish here: it's perfectly normal for a major release to include more than one new feature.
This thing (Python 3 support) was largely complete before that thing (xlsx) started to be committed. This thing also includes automated tests that will help in development of that thing (or any thing - tests are always a good idea). John has already started duplicating some of my effort in master, which is a waste of his time. And while this branch sits unmerged, I have to keep resolving conflicts with changes in master, which is a waste of my time. These are reasons to merge it soon. Maybe there's a reason to hold off that trumps them all: if so, please describe it, because I can't see what it is.
My apologies if this seems harsh. I don't want to get into an argument with you, but this process is riling me. I have put in the time to do this the way you want, and now you seemingly want to delay it until some unspecified point in the future, without offering one good reason.
In addition, if you've not already seen it: xlrd is among the 25 most requested packages for Python 3 support.
Merge branch 'master' of https://github.com/python-excel/xlrd into py3
Merged changes from master again.
Would you consider this again now that 0.8 has been released?
Merge branch 'master' into py3
I've merged master back in, and all the tests still pass on Python 2.7 and 3.2.
Takluyver’s py3 branch works quite nice for me. It would really be nice if the pull request could be accepted soon.
It would be very nice if the pull is made. Now that the numpy/scipy/matplotlib stuff is working on py3, I see many people around me taking the plunge, including me, but I still need to go back to py2 for xlrd.
@takluyver - sorry, I've lost a bit of perspective, can you let me know whether your changes use any external packages (2to3, six, etc) or if they just involve importing from a local compatibility module?
Also, can you let me know which of the following xlrd would be compatible with should the pull request get merged:
It doesn't use any extra modules at runtime - all the compatibility stuff is added to timemachine. John already committed most of that, so it doesn't appear in this diff. Installation in Python 3 uses 2to3, which is part of the Python standard library.
I've aimed for compatibility with all of the versions you list, although I've only tested with 2.7 and 3.2. I haven't used 2.5 in years, so I could have accidentally used some feature that was new with 2.6 - but I think I have a pretty good idea what those features are. If testing shows any problems, they should be easy to fix.
Ah, okay, I was worried 2to3 might have been used.
I'm not keen on any solution that involves a code translation as part of
Personally, I would prefer us to target Python 2.7 and 3.3, and have the
code structured in such a way that it works fine under both without any
Would this be possible from your prespective?
It's certainly possible - if we needn't support Python 2.5, we can do away with the ugly BYTES_LITERAL bits, as real bytes literals exist in 2.6 and above. Shall I start working on that?
I'd still aim to support at least 3.2 as well, as it will remain the default python3 in Linux distros for some time, depending on release cycles.
I have it working (passing the tests) without 2to3 in a new branch. Unfortunately, while we can get rid of BYTES_LITERAL, we do need UNICODE_LITERAL - I'd hoped that we could avoid that with a future statement, but that causes problems with struct.
If you prefer this route (dropping Python 2.5 support), let me know and I'll polish it up.
If your problems with struct are all because the format strings can’t be in Unicode, perhaps you could change them all to bytes strings. It seems even in Python 3 the struct module accepts bytes format strings, although I haven’t seen this mentioned in the documentation. I’ve been caught out before by using normal Unicode strings, when everything works smoother if you use bytes strings:
>>> struct.Struct("<H").format + "H"
TypeError: can't concat bytes to str
>>> struct.Struct(b"<H").format + b"H"
Thanks, that's handy to know. Looking at the source code, it looks like that's a deliberate feature, not just an accident. I've submitted an issue to clarify whether that's safe to rely on.
I'm still waiting for confirmation that that's the route we'd like to take before I spend more time on it, though.
I'm closing this in favour of PR #21, but it can always be reopened if we choose to.