As promised, here are unit tests for lxml.html.html5parser #43

Closed
wants to merge 3 commits into
from

Projects

None yet

2 participants

@dairiki
dairiki commented Mar 31, 2012

There are three commits in this pull request. The first two should be fairly non-controversial. The third may warrant some discussion.

The first commit fixes an unrelated test which was failing for me.

The second commit includes unit tests for html5parser.py.

The third commit addresses what I think is a bug. When the code in html5parser generates dummy wrapper elements (e.g. a <div> to wrap multiple fragments) it currently generates non-namespaced elements. By default the html5lib parser generates namespaced elements, so this is a bit of a mismatch. This patch namespaces these generated wrapper elements when the parser is configured to namespace its output.

Other Notes

I've only tested this under python 2.6. I suspect the unit tests will need a little bodywork before they run in py3k.

I have not tested the unit tests for the XHTMLParser. From what I can tell, html5lib.XHTMLParser is obsolete. My version of html5lib (0.90) does not include it in any case.

dairiki added some commits Mar 31, 2012
@dairiki dairiki Unit tests for lxml.html.html5parser 49a49e6
@dairiki dairiki Fixes so that unit tests run under python 3.1
Note however that while there is a python3 version of html5lib,
it appears to be unmaintained, so the worth of all this is
questionable.

References:
  http://code.google.com/p/html5lib/issues/detail?id=144
  http://code.google.com/p/html5lib/source/browse/#hg%2Fpython3
4dc02d1
@dairiki dairiki Add XHTML namespace to wrapper elements if parser is namespacing gene…
…rated elements

(The default parser generates namespaced elements.)
3bb3bd8
@dairiki
dairiki commented Apr 1, 2012

Okay, here's another try.

The first commit, dairiki@49a49e6, contains a re-do of the unit tests.

The second commit, dairiki@4dc02d1, contains fixes for html5parser.py under python 3. Note, however, that there does not currently seem to be a maintained version of html5lib for py3k, so this is of questionable worth.

The third commit, dairiki@3bb3bd8, adds namespaces to the wrapper elements generated by html5parser (see above for more.)

I've now tested this under pythons 2.5, 2.6 and 3.1.

(There are a number of tests, mostly in lxml.tests.test_elementtree and lxml.tests.test_io which are failing for me under python 3.1. Is that expected?)

@scoder
Member
scoder commented Apr 21, 2012

I merged the first two commits for now. Thanks a lot for those, that's really helpful!

The third commit needs some more consideration.

@scoder
Member
scoder commented Sep 4, 2015

Should be superseded by recent changes in html5parser.

@scoder scoder closed this Sep 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment