Denial of service with malformed file #1586
When running the following code with the latest git version of pygments on the attached input results of in 100% CPU consumption for an arbitrary long time:
import sys import pygments import pygments.formatters import pygments.lexers with open(sys.argv, 'rb') as f: data = f.read() lexer = pygments.lexers.guess_lexer(str(data)) pygments.highlight(str(data), lexer, pygments.formatters.HtmlFormatter())
The text was updated successfully, but these errors were encountered:
The sample input file causes Pygments to guess that this should be parsed by the SspLexer.
The SspLexer is a delegating lexer that uses the following lexers: XmlLexer (which does not choke on the input file) and JspRootLexer. JspRootLexer includes regex patterns from the JavaLexer (which also does not choke on the input file). However, when the JspRootLexer hands things off to the JavaLexer it appears that there is a mis-match in the quotes, and the JavaLexer is encountering catastrophic backtracking in the string literal regex.
I used this code to determine where in the file the JspRootLexer is choking up, and it's happening at line 115, right after these tokens:
The code I used was:
import pygments.lexers.templates with open('timeout-9a00111e78b5cd0979a370fc9a5cd22e39a249e4.txt', 'rb') as f: data = f.read() lexer = pygments.lexers.templates.JspRootLexer() for i, t, v in lexer.get_tokens_unprocessed(str(data)): print((i, t, v)) if i == 3705: breakpoint()
After stepping forward in the code for a while, I discovered that everything was hanging at
I've exploded that regex from a single-line regex to a new regex state named "string", which resolves the catastrophic backtracking and allows the code provided by the reporter to run without hanging.
I'm working on unit test for this and then I can submit a PR to close this issue.