-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
extremely poor performance on certain markdown file #1617
Comments
I do see Error tokens showing up for '\n' on its own line which I think is because the rule here doesn't match a newline: https://github.com/pygments/pygments/blob/master/pygments/lexers/markup.py#L601 That seems like a separate issue. If I add a rule to match r'\n' (most) of the errors go away, but that doesn't speed up processing any. |
I think tables is a red herring. I pulled out the table rules and performance didn't change. However, this rule looks expensive:
I don't know why, but if I comment it out the file is processed in 400ms down from 21s. Note that this file doesn't have the character "~" anywhere in it. it looks like the leading ([^~]*) group is the culprit here. changing the rule to this gets the same improvement in performance (400ms down from 21s):
I don't think we need that since there's already a catch all single character rule that will eat up text characters that don't match. Also, why the heavy use of bygroups throughout? |
Note, if I change the rule as suggested above the output of pygmentize doesn't change and all test cases still pass. It just happens in 400ms instead of 21s. |
This is fixed by #1623 |
This particular markdown takes a long time (21 seconds on my laptop) to parse:
https://github.com/date-fns/date-fns/blob/a9fc0c7b715883349555bfb94daa1059430eda52/src/locale/en-US/snapshot.md
I've seen slow-ish parsing performance on markdown with tables before, however I'm not certain it's tables that's causing the issue.
The text was updated successfully, but these errors were encountered: