Skip to content

lxml trees parsed by html5lib can not be used with lxml.clean #102

@tahajahangir

Description

@tahajahangir

This simple script fails with html5lib.

import html5lib
import lxml.html.clean

parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("lxml"), namespaceHTMLElements=False)
# tree = lxml.html.document_fromstring(html)
tree = parser.parse("<html><body><!-- a comment --></body></html>")

cleaner = lxml.html.clean.Cleaner()
cleaner(tree)

The problem is lxml.html.document_fromstring return an element with type lxml.html.HtmlElement, but HTMLParser.parse returns with type lxml.etree._ElementTree

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions