You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Parsing the following template should have linear complexity and be done in reasonable time.
from jinja2 import Environment
Environment().from_string(' ' * 20000)
Actual Behavior
It has quadratic complexity and takes a few seconds at 20000, a minute at 100k.
In general, it appears any large segments of whitespace (" \t\n") without jinja tags that consume whitespace will degrade template parsing performance significantly.
# This parses quickly due to "-"
Environment().from_string('{{ 0 -}}' + ' ' * 20000)
Environment().from_string(' ' * 20000 + '{{- 0 }}')
# This is very slow
Environment().from_string('{{ 0 }}' + ' ' * 20000)
Environment().from_string(' ' * 20000 + '{{ 0 }}')
Cause
I investigated the cause of the slow rendering and it lies in two pathological regular expressions used in Jinja lexer, "root" and TOKEN_RAW_BEGIN (not illustrated in examples but suffers from the same issue). In the extremely simplified version of the root lexer regex shown below, notice whitespace may be consumed either by (.*?) or by \s*, with the latter having priority due to being greedy, but only in case the block of whitespace is followed by variable (or block/comment) opening tag with the "-" modifier (that forces left-stripping whitespace). This is a pathological case for the built-in Python re module and will have quadratic complexity due to backtracking done after consuming every single whitespace character (don't quote me on that though).
r'(.*?)(?:(?P<variable_begin>\s*{{-|{{))'
Jinja lexer does that to implement the whitespace stripping feature on the left side of tags, but this approach is extremely inefficient with built in Python re module (note the regex library doesn't suffer from this, but it is slower in general).
Proposed fix
I created a fix for this issue by re-implementing the way striping whitespace on the left side of tags is done (including the lstrip_blocks environment flag) by doing this outside of regular expressions, which in my opinion simply are not the right tool for this task. I didn't change the implementation of striping whitespace to the right of tags as the "-" character is matched before any whitespace, changing the state of the regex automaton before any backtracking needs to be done.
The most significant implication is Jinja being extremely slow when parsing templates with segments of whitespace in tens of thousands. Such templates should be rare, but may occur. The application where I discovered this issue has almost 100k Jinja templates created by semi-technical users and a few of them contained such large segments of whitespace, most likely added by accident.
The more frequent implication is Jinja template parsing being sub optimal, as most Jinja templates will contains blocks of whitespace in tens or hundreds of characters. The performance impact of quadratic complexity on such small scale won't be that dramatic, but I measured over 2x average improvement in template parsing speeds after applying the fix on the above mentioned almost 100k different templates. I will post a histogram in the PR description.
Environment
Python version: tested with 2.7, 3.6, apparently any Python version
Jinja version: 2.8.1, current master, apparently any Jinja version in last few years
The text was updated successfully, but these errors were encountered:
petee-d
pushed a commit
to exponea/jinja
that referenced
this issue
May 27, 2018
Expected Behavior
Parsing the following template should have linear complexity and be done in reasonable time.
Actual Behavior
It has quadratic complexity and takes a few seconds at 20000, a minute at 100k.
In general, it appears any large segments of whitespace (" \t\n") without jinja tags that consume whitespace will degrade template parsing performance significantly.
Cause
I investigated the cause of the slow rendering and it lies in two pathological regular expressions used in Jinja lexer, "root" and TOKEN_RAW_BEGIN (not illustrated in examples but suffers from the same issue). In the extremely simplified version of the root lexer regex shown below, notice whitespace may be consumed either by
(.*?)
or by\s*
, with the latter having priority due to being greedy, but only in case the block of whitespace is followed by variable (or block/comment) opening tag with the "-" modifier (that forces left-stripping whitespace). This is a pathological case for the built-in Pythonre
module and will have quadratic complexity due to backtracking done after consuming every single whitespace character (don't quote me on that though).Jinja lexer does that to implement the whitespace stripping feature on the left side of tags, but this approach is extremely inefficient with built in Python
re
module (note theregex
library doesn't suffer from this, but it is slower in general).Proposed fix
I created a fix for this issue by re-implementing the way striping whitespace on the left side of tags is done (including the lstrip_blocks environment flag) by doing this outside of regular expressions, which in my opinion simply are not the right tool for this task. I didn't change the implementation of striping whitespace to the right of tags as the "-" character is matched before any whitespace, changing the state of the regex automaton before any backtracking needs to be done.
#858
Implications
The most significant implication is Jinja being extremely slow when parsing templates with segments of whitespace in tens of thousands. Such templates should be rare, but may occur. The application where I discovered this issue has almost 100k Jinja templates created by semi-technical users and a few of them contained such large segments of whitespace, most likely added by accident.
The more frequent implication is Jinja template parsing being sub optimal, as most Jinja templates will contains blocks of whitespace in tens or hundreds of characters. The performance impact of quadratic complexity on such small scale won't be that dramatic, but I measured over 2x average improvement in template parsing speeds after applying the fix on the above mentioned almost 100k different templates. I will post a histogram in the PR description.
Environment
The text was updated successfully, but these errors were encountered: