Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a catastrophic backtracking bug in JavaLexer #1594

Merged
merged 2 commits into from Nov 9, 2020

Conversation

@kurtmckee
Copy link
Contributor

@kurtmckee kurtmckee commented Nov 9, 2020

This closes #1586, which triggered the bug.

  1. Pygments guessed from an input string that it should use SspLexer to lex the file.
  2. The SspLexer is a delegating lexer that relies on XmlLexer and JspRootLexer.
  3. JspRootLexer relies on regular expressions from the JavaLexer.
  4. The JavaLexer has a catastrophic backtracking bug in its string literal regular expression.

This patch addresses the problem by replacing the string literal regex with a sub-state for string literal parsing.

@Anteru Anteru merged commit fb40b71 into pygments:master Nov 9, 2020
13 checks passed
@Anteru
Copy link
Collaborator

@Anteru Anteru commented Nov 9, 2020

Thanks! High time we add pytest-time, I guess :)

@Anteru Anteru added this to the 2.7.3 milestone Nov 9, 2020
@kurtmckee
Copy link
Contributor Author

@kurtmckee kurtmckee commented Nov 9, 2020

Thanks! High time we add pytest-time, I guess :)

If it's okay with you and Georg, I'll work on a patch to add pytest-timeout (which will attempt to hard-stop tests that run too long) and to update existing backtrack-checking tests to use the timeout.

Let me know if that's something you'd like me to tackle. =)

@Anteru
Copy link
Collaborator

@Anteru Anteru commented Nov 9, 2020

I have no objection, and given Georg didn't red flag it, I'd say please go ahead. It's fairly obvious to me by now that backtracking needs robust testing, and I don't see how we can do it without timeouts.

@kurtmckee
Copy link
Contributor Author

@kurtmckee kurtmckee commented Nov 9, 2020

@birkenfeld
Copy link
Contributor

@birkenfeld birkenfeld commented Nov 9, 2020

Sure, I'd just like a graceful behavior if the timeout plugin isn't installed.

@kurtmckee
Copy link
Contributor Author

@kurtmckee kurtmckee commented Nov 9, 2020

@kurtmckee
Copy link
Contributor Author

@kurtmckee kurtmckee commented Mar 3, 2021

By the way, I just tested the pytest timeout plugin. It supports a thread-based and a signal-based method.

On Windows, which only supports thread-based timeouts, the plugin is unable to interrupt catastrophic backtracking like this:

import re

def test_catastrophic_backtracking():
    re.search(r'\d+\d+a', '0' * 10_000)

The thread-based timeouts can successfully interrupt operations like time.sleep(60) but cannot successfully interrupt catastrophic backtracking. For this reason I'm not pursuing using the pytest-timeout plugin.

@kurtmckee kurtmckee deleted the java-backtracking branch Mar 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

3 participants