This simple script fails with html5lib.
import html5lib
import lxml.html.clean
parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("lxml"), namespaceHTMLElements=False)
# tree = lxml.html.document_fromstring(html)
tree = parser.parse("<html><body><!-- a comment --></body></html>")
cleaner = lxml.html.clean.Cleaner()
cleaner(tree)
The problem is lxml.html.document_fromstring
return an element with type lxml.html.HtmlElement
, but HTMLParser.parse
returns with type lxml.etree._ElementTree