Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Fix lexer being extremely slow on large amounts of whitespace. Fixes #857 #858
Fix for #857, where parsing whitespace in templates usually has quadratic time complexity.
It addresses two pathological regular expressions used in Jinja lexer, "root" and TOKEN_RAW_BEGIN (not illustrated in examples but suffers from the same issue). In the extremely simplified version of the root lexer regex shown below, notice whitespace may be consumed either by
Jinja lexer does that to implement the whitespace stripping feature on the left side of tags, but this approach is extremely inefficient with built in Python
My approach uses
Testing and performance measurement
I tested the fix on almost 100k of different Jinja2 templates, created by users with different technical skills and used for various purposes (from templates without any jinja tags to huge personalized email templates). Parsing all templates produced the exact same parsing tree both before and after my fix. The Jinja2 test suite of course passes as well.
The table below shows the parsing times of the templates I tested the fix with, each parsing attempt before or after was added to a bucket based on rounded log2 of the parsing time. You can see the 4 extremely slow cases being fixed and a general improvement of parsing time.
I didn't add any test case. I could add a test that parses something like (
Sorry for letting this sit so long! I've marked this for 2.11. Unfortunately, there have been some other changes to the lexer and
Oh, hey there. I'm afraid I only barely remember the changes I have done almost year and a half ago, but can help if needed. I understand you already rebased this (without pushing I guess), so I won't dive into it unless you say you'd like me to.
BTW, I would be quite happy if this got merged (or otherwise fixed), our project was using a fork with my fix ever since I made it and being able to update Jinja would be nice.