There are three commits in this pull request. The first two should be fairly non-controversial. The third may warrant some discussion.
The first commit fixes an unrelated test which was failing for me.
The second commit includes unit tests for html5parser.py.
The third commit addresses what I think is a bug. When the code in html5parser generates dummy wrapper elements (e.g. a <div> to wrap multiple fragments) it currently generates non-namespaced elements. By default the html5lib parser generates namespaced elements, so this is a bit of a mismatch. This patch namespaces these generated wrapper elements when the parser is configured to namespace its output.
I've only tested this under python 2.6. I suspect the unit tests will need a little bodywork before they run in py3k.
I have not tested the unit tests for the XHTMLParser. From what I can tell, html5lib.XHTMLParser is obsolete. My version of html5lib (0.90) does not include it in any case.
Unit tests for lxml.html.html5parser
Fixes so that unit tests run under python 3.1
Note however that while there is a python3 version of html5lib,
it appears to be unmaintained, so the worth of all this is
Add XHTML namespace to wrapper elements if parser is namespacing gene…
(The default parser generates namespaced elements.)
Okay, here's another try.
The first commit, dairiki@49a49e6, contains a re-do of the unit tests.
The second commit, dairiki@4dc02d1, contains fixes for html5parser.py under python 3. Note, however, that there does not currently seem to be a maintained version of html5lib for py3k, so this is of questionable worth.
The third commit, dairiki@3bb3bd8, adds namespaces to the wrapper elements generated by html5parser (see above for more.)
I've now tested this under pythons 2.5, 2.6 and 3.1.
(There are a number of tests, mostly in lxml.tests.test_elementtree and lxml.tests.test_io which are failing for me under python 3.1. Is that expected?)
I merged the first two commits for now. Thanks a lot for those, that's really helpful!
The third commit needs some more consideration.
Should be superseded by recent changes in html5parser.