GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
Already on GitHub? Sign in to your account
The new code in fromstring now uses appropriate arguments to startswith depending on whether a bytes object was given as input or not. I also added a test case, that gives utf-8 encoded data and provides the encoding via the parser argument.
I only tried out the test case with python 2.7 and 3.2. Hopefully earlier versions do, like I read, simply ignore the b in front of literal strings. The output at the end will also differ from the actual output in python3 by the missing b in front. But as far as I have seen, this also happens in selftest.py.
Changed lxml.html.fromstring to be able to handle input given as a bytes
object in python3.
object in python3. See also issue #33.
Merge branch 'master' of https://github.com/canaaerus/lxml
The b'' string prefix is not available in Py2.4.
Added test case for parsing a bytes object with lxml.html.fromstring
Sorry, I can't accept code that doesn't work in Py2.4/5 and that contains a test that (IIUC) fails in Py3.
But why did you close the pull request?
Because I just read your comment on my commit. I wished I would have been informed about it somehow...
Please see my comment on the issue. To be honest, this feels like a terrible way to do a discussion.
Replaced 'b' prefix with .encode() to make it work in earlier versions
Adjusted the test to run under python2 too, but had to ommit testing
with actual non-ascii characters.
Reverted overwritten changes.
Further reverting white spaces.
Ok, I hope it works now in all necessary versions of python.
Thanks. However, there's way too much code churn in your changes now. It's even hard to see if you really managed to undo all the accidental changes. They look like a broken merge or something.
Could you try to remove those changes that introduced and reverted all the whitespace changes etc.?
In the “Files Changed”-view you can see that the reverted changes are ok. But if these things would mess up the commit history, I’ll try to remove all the commits and only push a single clean one, although I don’t know yet how to do this with git…
The cause of the mess up was that I first made the changes to my local (outdated) lxml version and then copied it into the git tree, which of course was a mistake. When trying to restore the current version, I first still messed up the white spaces.