Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad performance with larger documents #48

Open
osmarks opened this issue Dec 30, 2020 · 4 comments
Open

Bad performance with larger documents #48

osmarks opened this issue Dec 30, 2020 · 4 comments
Labels
enhancement New feature or request

Comments

@osmarks
Copy link

osmarks commented Dec 30, 2020

I didn't really want to just post this without also providing a good fix, but it's causing me problems, and I lack the understanding of the large amount of code in this necessary to do much to it.

It seems that this library parses and renders Markdown quite slowly, especially on larger documents. I noticed this while testing with ~10KB documents but have mostly tried testing it on ~100KB ones, for which rendering takes about 4 seconds, while a Python CommonMark implementation seems to manage 400ms and a JavaScript one ~50ms. This is proving problematic for an application I want to use this in, and while one fix might be to switch to cmark bindings, they don't provide options to customize parsing, so I would have to reimplement a lot of logic.

By switching most of the internals away from doubly linked lists, I was able to get a roughly 25% improvement (very rough, as I just ran time over it a few times with a big test file as input) but I think bigger changes would be needed to get a significant difference; I think it needs to avoid allocating as many objects (maybe use an ADT or something instead) and to handle chunks of unformatted text better. This also breaks quite a few of the test cases (mostly only spacing in lists as far as I can tell): https://gist.github.com/osmarks/c7d8db89896047368d6512f6284cd7c2

Here is a test input file (without any formatting) and profiling output from it:

out2.txt
profile_results.txt

@soasme soasme added the enhancement New feature or request label Dec 31, 2020
@soasme
Copy link
Owner

soasme commented Dec 31, 2020

Performance is indeed a problem. I intend to solve it in 2021. ⌨️

@ghost
Copy link

ghost commented Mar 5, 2021

@osmarks can you recheck performance after #50?

@osmarks
Copy link
Author

osmarks commented Mar 7, 2021

This does seem to improve performance a bit on my test case, but it still takes 3 seconds to work for some reason. I'm just using cmark-gfm for now.

@soasme
Copy link
Owner

soasme commented Mar 19, 2021

@osmarks Although there are still plenty to optimize, I'm satisfied with the current result (as of v0.8.4):

$ nim c -d:release src/markdown.nim && time ./src/markdown < /tmp/out2.txt
real	0m1.390s
user	0m0.156s
sys	0m0.013s

Nimprof profile result also shows a significant reduce on the number of instruction calls, 3277 from the profile_results.txt you provided v/s 323 on the one run against v0.8.4:

total executions of each stack trace:
Entry: 1/131 Calls: 13/323 = 4.02% [sum: 13; 13/323 = 4.02%]

LMK if the performance improvement works on your side. Meanwhile, I'll keep working on improving the performance as tracked in #51.

Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants