Skip to content

Sanitizer and lxml tree walker: TypeError: unhashable type #68

@gsnedders

Description

@gsnedders

http://code.google.com/p/html5lib/issues/detail?id=210

Reported by r.kintzi, Aug 14, 2012

What steps will reproduce the problem?

from html5lib import HTMLParser
from html5lib.treebuilders import getTreeBuilder
from html5lib.treewalkers import getTreeWalker
from html5lib.filters.sanitizer import Filter as Sanitizer
html = "<html><body><h1>Header"

parser = HTMLParser(tree = getTreeBuilder("lxml"),
        namespaceHTMLElements = False)
doc = parser.parse(html)
root = doc.getroot()
body = doc.xpath('/html/body')
walker = getTreeWalker('lxml')
stream = walker(body)
stream = Sanitizer(stream)
for token in stream:
    print token

What is the expected output? What do you see instead?

I do not know exactly what should be printed. Instead, an exception is raised:

$ python t.py
{'namespace': u'None', 'type': 'Characters', 'data': u'<body>'}
Traceback (most recent call last):
  File "t.py", line 17, in <module>
    for token in stream:
  File "/home/radek/.virtualenvs/blog/local/lib/python2.7/site-packages/html5lib-0.95-py2.7.egg/html5lib/filters/sanitizer.py", line 7, in __iter__
    token = self.sanitize_token(token)
  File "/home/radek/.virtualenvs/blog/local/lib/python2.7/site-packages/html5lib-0.95-py2.7.egg/html5lib/sanitizer.py", line 171, in sanitize_token
    token["data"][::-1] 
TypeError: unhashable type

Please provide any additional information below.

the faulty token is:

{'namespace': u'None ',' type ':' StartTag ',' name ': u'h1', 'data': {}}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions