-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTMLParse handing of non-numeric charrefs broken #64487
Comments
Python 2.7 HTMLParse.py lines 185-199 (similar lines still exist in Python 3.4) if you feed a broken charref, that is non-numeric, it will pass whatever random string that happened to be at the start of rawdata to handle_data(). Eg: p = HTMLParser()
p.handle_data = lambda x: sys.stdout.write(x)
p.feed('<p>&#foo;</p>') will print '<p' which is clearly wrong. I think the intention of the code is to pass '&#', which seems saner. |
Thanks for the report, this is indeed a bug. |
Here's a patch against 2.7. |
New changeset 0d50b5851f38 by Ezio Melotti in branch '2.7': New changeset 32097f193892 by Ezio Melotti in branch '3.3': New changeset 92b3928bfde1 by Ezio Melotti in branch 'default': |
This is now fixed, thanks for the report!
I created bpo-20623 to track this. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: