-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTMLParser silently stops parsing with malformed attributes #56838
Comments
Given the input '<x><y z=""o"" /></x>', HTMLParser only detects the opening x tag, and then stops parsing. Ideally this should behave like the case '<x><y z="""" /></x>' which raises an error and then can continue parsing the close x tag. |
A workaround is to call close() after feed(), which I supposed I should have done anyways. However, this does not resolve the issue that the two cases behave so differently. The code that causes the difference is lines 351-355 of parser.py, which also has a misleading comment stating it detects the / in a /> ending (which is actually done at 334). |
I think <x><y z=""o"" /></x> should be parser as <x><y z="" /></x>, and the o"" should be ignored. Currently the parser doesn't seem to handle extraneous data in the start tag too well, because the locatestarttagend_tolerant regex looks for (more or less) well-formed attributes. |
|
I haven't found anything in the HTML5 spec but I haven't looked closely. |
New changeset 3c3009f63700 by Ezio Melotti in branch '2.7': New changeset 16ed15ff0d7c by Ezio Melotti in branch '3.2': New changeset 426f7a2b1826 by Ezio Melotti in branch 'default': |
Fixed, thanks for the report! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: