-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Denial of service with malformed file #1586
Comments
The sample input file causes Pygments to guess that this should be parsed by the SspLexer. The SspLexer is a delegating lexer that uses the following lexers: XmlLexer (which does not choke on the input file) and JspRootLexer. JspRootLexer includes regex patterns from the JavaLexer (which also does not choke on the input file). However, when the JspRootLexer hands things off to the JavaLexer it appears that there is a mis-match in the quotes, and the JavaLexer is encountering catastrophic backtracking in the string literal regex. I used this code to determine where in the file the JspRootLexer is choking up, and it's happening at line 115, right after these tokens:
The code I used was: import pygments.lexers.templates
with open('timeout-9a00111e78b5cd0979a370fc9a5cd22e39a249e4.txt', 'rb') as f:
data = f.read()
lexer = pygments.lexers.templates.JspRootLexer()
for i, t, v in lexer.get_tokens_unprocessed(str(data)):
print((i, t, v))
if i == 3705:
breakpoint() After stepping forward in the code for a while, I discovered that everything was hanging at I've exploded that regex from a single-line regex to a new regex state named "string", which resolves the catastrophic backtracking and allows the code provided by the reporter to run without hanging. I'm working on unit test for this and then I can submit a PR to close this issue. |
* JavaLexer: Demonstrate a catastrophic backtracking bug * JavaLexer: Fix a catastrophic backtracking bug Closes #1586
Thanks a lot for the fix! |
When running the following code with the latest git version of pygments on the attached input results of in 100% CPU consumption for an arbitrary long time:
timeout-9a00111e78b5cd0979a370fc9a5cd22e39a249e4.txt
The text was updated successfully, but these errors were encountered: