-
Notifications
You must be signed in to change notification settings - Fork 297
Closed
Description
http://code.google.com/p/html5lib/issues/detail?id=210
Reported by r.kintzi, Aug 14, 2012
What steps will reproduce the problem?
from html5lib import HTMLParser from html5lib.treebuilders import getTreeBuilder from html5lib.treewalkers import getTreeWalker from html5lib.filters.sanitizer import Filter as Sanitizer html = "<html><body><h1>Header" parser = HTMLParser(tree = getTreeBuilder("lxml"), namespaceHTMLElements = False) doc = parser.parse(html) root = doc.getroot() body = doc.xpath('/html/body') walker = getTreeWalker('lxml') stream = walker(body) stream = Sanitizer(stream) for token in stream: print tokenWhat is the expected output? What do you see instead?
I do not know exactly what should be printed. Instead, an exception is raised:
$ python t.py {'namespace': u'None', 'type': 'Characters', 'data': u'<body>'} Traceback (most recent call last): File "t.py", line 17, in <module> for token in stream: File "/home/radek/.virtualenvs/blog/local/lib/python2.7/site-packages/html5lib-0.95-py2.7.egg/html5lib/filters/sanitizer.py", line 7, in __iter__ token = self.sanitize_token(token) File "/home/radek/.virtualenvs/blog/local/lib/python2.7/site-packages/html5lib-0.95-py2.7.egg/html5lib/sanitizer.py", line 171, in sanitize_token token["data"][::-1] TypeError: unhashable type
Please provide any additional information below.
the faulty token is:
{'namespace': u'None ',' type ':' StartTag ',' name ': u'h1', 'data': {}}