-
-
Notifications
You must be signed in to change notification settings - Fork 33.1k
[3.9] gh-135661: Fix CDATA section parsing in HTMLParser (GH-135665) (GH-137774) #139661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…onGH-135665) (pythonGH-137774) "] ]>" and "]] >" no longer end the CDATA section. Make CDATA section parsing context depending. Add private method HTMLParser._set_support_cdata() to change the context. If called with True, "<[CDATA[" starts a CDATA section which ends with "]]>". If called with False, "<[CDATA[" starts a bogus comments which ends with ">". (cherry picked from commit 0cbbfc4) (cherry picked from commit dcf2476) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3.9 seems to have two additional tests at the end of test_htmlparser
:
test_invalid_keyword_error_exception
test_invalid_keyword_error_pass
These are missing in 3.10+ and the former is currently failing:
======================================================================
FAIL: test_invalid_keyword_error_exception (test.test_htmlparser.AttributesTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/runner/work/cpython/cpython/Lib/test/test_htmlparser.py", line 1118, in test_invalid_keyword_error_exception
parser.feed('<![invalid>')
AssertionError: InvalidMarkupException not raised
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase And if you don't make the requested changes, you will be poked with soft cushions! |
Let me check how they got to be removed from 3.10 and if this isn't problematic, we'll do the same here. |
@serhiy-storchaka the additional tests were added in #32256. Do you think they are valid? |
Originally, the HTML parser used the code from These tests were correct for existing code, but are no longer correct for the new code. If we decide to accept these changes in 3.9, then the tests should be removed (and there is no replacement, the tested code no longer exist). If we decided to give up on the backport, they remain. |
"] ]>" and "]] >" no longer end the CDATA section.
Make CDATA section parsing context depending.
Add private method HTMLParser._set_support_cdata() to change the context.
If called with True, "<[CDATA[" starts a CDATA section which ends with "]]>".
If called with False, "<[CDATA[" starts a bogus comments which ends with ">".
(cherry picked from commit 0cbbfc4)
(cherry picked from commit dcf2476)
Co-authored-by: Serhiy Storchaka storchaka@gmail.com