Skip to content

String matching regex in JsonLexer causes catastrophic backtracking #1065

@Anteru

Description

@Anteru

(Original issue 1361 created by howardchris on 2017-07-11T08:45:49.801114+00:00)

The attached text file contains JSON data, downloaded from a request to google.co.uk (the text isn't actually legal JSON, but that's besides the point).

When using the JsonLexer to parse this file, the process hangs for a very long time, and the CPU of one core is maxed out.

The reason is catastrophic backtracking in the regex engine, caused by the regex used to match strings:

#!python
r'"(\\\\|\\"|[^"])*"'

I believe the problem is that the groups in the regex are not mutually exclusive, as the last group can match a backslash character.

The solution I have found is to make each group in the regex mutually exclusive, by adding a backslash character to the final group:

#!python

r'"(\\\\|\\"|[^"\\])*"'

Note that this particular regex is actually in two places (in both the simplevalue and objectvalue sections), and so both regexes should be updated.

A good description of the problem can be found here:

http://www.regular-expressions.info/catastrophic.html

After making these changes, the data in the attached file can be formatted in ~1s on my machine.

Metadata

Metadata

Assignees

Labels

S-majorseverity: majorT-bugtype: a bugX-importedimported from Bitbucket

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions