Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue when rendering a lot of space #505

Closed
krasnoukhov opened this issue May 15, 2018 · 8 comments
Closed

Performance issue when rendering a lot of space #505

krasnoukhov opened this issue May 15, 2018 · 8 comments
Assignees
Labels

Comments

@krasnoukhov
Copy link

@krasnoukhov krasnoukhov commented May 15, 2018

Here's a pretty comprehensive example:

note = "1"+" "*20000+"2"
Benchmark.measure { Kramdown::Document.new(note) }.real

This takes around 3 seconds on my machine. A DOS vulnerability potentially?

@gettalong gettalong self-assigned this May 15, 2018
@gettalong gettalong added the bug label May 15, 2018
@gettalong
Copy link
Owner

@gettalong gettalong commented May 15, 2018

Thanks for reporting and the test. The culprit is the setext header parser, will have to investigate a bit more why its regexp is slow.

@gettalong
Copy link
Owner

@gettalong gettalong commented May 15, 2018

The problem exists not only for the setext header parser but at least also for the atx header parser, i.e. these two regexp have bad behaviour when used on very long lines:

SETEXT_HEADER_START = /^(#{OPT_SPACE}[^ \t].*?)#{HEADER_ID}[ \t]*?\n(-|=)+\s*?\n/
ATX_HEADER_MATCH = /^(\#{1,6})(.+?(?:\\#)?)\s*?#*#{HEADER_ID}\s*?\n/

I have tried a few things but haven't yet come up with a viable solution.

@krasnoukhov
Copy link
Author

@krasnoukhov krasnoukhov commented May 15, 2018

I don't have much experience with optimizing the regexes either. Can't put my finger on what's slow

@gettalong
Copy link
Owner

@gettalong gettalong commented May 15, 2018

When removing everything after the first occurence of \n in each regexp, they get fast. So I guess that the regexp has to backtrack much more because of them. However, I haven't come across a solution yet.

@glebm
Copy link
Contributor

@glebm glebm commented May 17, 2018

Some of the instances of *? are unnecessary, e.g. \s*?\n can be replaced with [ \t\r\f\v]*\n in both regexes (see https://ruby-doc.org/core-2.5.1/Regexp.html#class-Regexp-label-Character+Classes).

IIRC, *? can be expensive. I am not completely sure about this though.

@gettalong
Copy link
Owner

@gettalong gettalong commented May 17, 2018

Thanks but I have already tried that. And also changing lazy to possessive quantifiers or vice versa. Didn't find a combination that was fast and didn't break the tests...

glebm added a commit to glebm/kramdown that referenced this issue May 17, 2018
@glebm
Copy link
Contributor

@glebm glebm commented May 17, 2018

Splitting the parsing into two regexes seems to work: #509 Seemed to work but really just worked around the previous benchmark.

Here are the benchmarks that take the parsing further:

"setext":

ruby -rbenchmark -Ilib -rkramdown -e 'p Benchmark.measure{Kramdown::Document.new("1#{" "*20000}2\n==\n")}'

"atx":

ruby -rbenchmark -Ilib -rkramdown -e 'p Benchmark.measure{Kramdown::Document.new("## 1#{" "*20000}2")}'

Update: Parsing semi-manually seems to work: #513

glebm added a commit to glebm/kramdown that referenced this issue May 18, 2018
glebm added a commit to glebm/kramdown that referenced this issue May 18, 2018
Fixes gettalong#505

Benchmarks:

"setext":

```bash
ruby -rbenchmark -Ilib -rkramdown -e 'p Benchmark.measure{Kramdown::Document.new("1#{" "*20000}2\n==\n")}'
```

"atx":

```bash
ruby -rbenchmark -Ilib -rkramdown -e 'p Benchmark.measure{Kramdown::Document.new("## 1#{" "*20000}2")}'
```
glebm added a commit to glebm/kramdown that referenced this issue May 18, 2018
Fixes gettalong#505

Benchmarks:

"setext":

```bash
ruby -rbenchmark -Ilib -rkramdown -e 'p Benchmark.measure{Kramdown::Document.new("1#{" "*20000}2\n==\n")}'
```

"atx":

```bash
ruby -rbenchmark -Ilib -rkramdown -e 'p Benchmark.measure{Kramdown::Document.new("## 1#{" "*20000}2")}'
```
glebm added a commit to glebm/kramdown that referenced this issue May 18, 2018
Fixes gettalong#505

Benchmarks:

"setext":

```bash
ruby -rbenchmark -Ilib -rkramdown -e 'p Benchmark.measure{Kramdown::Document.new("1#{" "*20000}2\n==\n")}'
```

"atx":

```bash
ruby -rbenchmark -Ilib -rkramdown -e 'p Benchmark.measure{Kramdown::Document.new("## 1#{" "*20000}2")}'
```
@krasnoukhov
Copy link
Author

@krasnoukhov krasnoukhov commented May 18, 2018

Wow @glebm nice work! I’ll give a try tomorrow

glebm added a commit to glebm/kramdown that referenced this issue May 18, 2018
Fixes gettalong#505

Benchmarks:

"setext":

```bash
ruby -rbenchmark -Ilib -rkramdown -e 'p Benchmark.measure{Kramdown::Document.new("1#{" "*20000}2\n==\n")}'
```

"atx":

```bash
ruby -rbenchmark -Ilib -rkramdown -e 'p Benchmark.measure{Kramdown::Document.new("## 1#{" "*20000}2")}'
```
@gettalong gettalong closed this in 135f1fd May 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

3 participants