Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Template parsing (lexing) can have quadratic time complexity #857
Parsing the following template should have linear complexity and be done in reasonable time.
It has quadratic complexity and takes a few seconds at 20000, a minute at 100k.
In general, it appears any large segments of whitespace (" \t\n") without jinja tags that consume whitespace will degrade template parsing performance significantly.
I investigated the cause of the slow rendering and it lies in two pathological regular expressions used in Jinja lexer, "root" and TOKEN_RAW_BEGIN (not illustrated in examples but suffers from the same issue). In the extremely simplified version of the root lexer regex shown below, notice whitespace may be consumed either by
Jinja lexer does that to implement the whitespace stripping feature on the left side of tags, but this approach is extremely inefficient with built in Python
I created a fix for this issue by re-implementing the way striping whitespace on the left side of tags is done (including the lstrip_blocks environment flag) by doing this outside of regular expressions, which in my opinion simply are not the right tool for this task. I didn't change the implementation of striping whitespace to the right of tags as the "-" character is matched before any whitespace, changing the state of the regex automaton before any backtracking needs to be done.
The most significant implication is Jinja being extremely slow when parsing templates with segments of whitespace in tens of thousands. Such templates should be rare, but may occur. The application where I discovered this issue has almost 100k Jinja templates created by semi-technical users and a few of them contained such large segments of whitespace, most likely added by accident.
The more frequent implication is Jinja template parsing being sub optimal, as most Jinja templates will contains blocks of whitespace in tens or hundreds of characters. The performance impact of quadratic complexity on such small scale won't be that dramatic, but I measured over 2x average improvement in template parsing speeds after applying the fix on the above mentioned almost 100k different templates. I will post a histogram in the PR description.