Unclosed script/style tag handling Fixes #1614 #1615
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Explicitly handle unclosed <script> and <style> tags which previously
would result in O(n^2) work to lex as Error tokens per character up to
the end of the line or end of file (whichever comes first).
Now we try lexing the rest of the line as Javascript/CSS if there's no
closing script/style tag. We recover on the next line in the root state
if there is a newline, otherwise just keep parsing as Javascript/CSS.
This is similar to how the error handling in lexer.py works except we
get Javascript or CSS tokens instead of Error tokens. And we get to the
end of the line much faster since we don't apply an O(n) regex for every
character in the line.
I added a new test suite for html lexer (there wasn't one except for
coverage in test_examplefiles.py) including a trivial happy-path case
and several cases around <script> and <style> fragments, including
regression coverage that fails on the old logic.