-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTMLParser fails to handle charref in attribute value #41975
Comments
The HTML spec describes two ways to encode an attribute http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2.2 >>> from HTMLParser import *
>>> class P(HTMLParser):
... def handle_starttag(self, tag, attrs):
... print attrs
...
>>> P().feed("<tag attr=\"&\">")
[('attr', '&')]
>>> P().feed("<tag attr=\"&\">")
[('attr', '&')] It seems that each string should produce the same |
Maybe the charrefs were lost in the SF -> Roundup transition? |
unescape() already converts named, decimal and hexadecimal entities, so this can be closed. |
New changeset 3c3009f63700 by Ezio Melotti in branch '2.7': New changeset 16ed15ff0d7c by Ezio Melotti in branch '3.2': New changeset 426f7a2b1826 by Ezio Melotti in branch 'default': |
There was actually a bug with entities in unquoted attribute values. I fixed it and added tests for all the cases (quoted and unquoted). |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: