Allow bytes as input to lxml.html.fromstring, thereby fixing issue #33. #72

Closed
wants to merge 8 commits into
from

Conversation

Projects
None yet
2 participants

The new code in fromstring now uses appropriate arguments to startswith depending on whether a bytes object was given as input or not. I also added a test case, that gives utf-8 encoded data and provides the encoding via the parser argument.

I only tried out the test case with python 2.7 and 3.2. Hopefully earlier versions do, like I read, simply ignore the b in front of literal strings. The output at the end will also differ from the actual output in python3 by the missing b in front. But as far as I have seen, this also happens in selftest.py.

The b'' string prefix is not available in Py2.4.

@canaaerus canaaerus closed this Oct 15, 2012

Owner

scoder commented Oct 15, 2012

Sorry, I can't accept code that doesn't work in Py2.4/5 and that contains a test that (IIUC) fails in Py3.

But why did you close the pull request?

Because I just read your comment on my commit. I wished I would have been informed about it somehow...
Please see my comment on the issue. To be honest, this feels like a terrible way to do a discussion.

@canaaerus canaaerus reopened this Oct 27, 2012

Ok, I hope it works now in all necessary versions of python.

Owner

scoder commented Oct 27, 2012

Thanks. However, there's way too much code churn in your changes now. It's even hard to see if you really managed to undo all the accidental changes. They look like a broken merge or something.

Could you try to remove those changes that introduced and reverted all the whitespace changes etc.?

In the “Files Changed”-view you can see that the reverted changes are ok. But if these things would mess up the commit history, I’ll try to remove all the commits and only push a single clean one, although I don’t know yet how to do this with git…
The cause of the mess up was that I first made the changes to my local (outdated) lxml version and then copied it into the git tree, which of course was a mistake. When trying to restore the current version, I first still messed up the white spaces.

@canaaerus canaaerus closed this Oct 28, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment