Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of parse() is 15 times worse in releases after 2.1.3 #2297

Closed
tsmic opened this issue Nov 22, 2021 · 7 comments · Fixed by #2302
Closed

Performance of parse() is 15 times worse in releases after 2.1.3 #2297

tsmic opened this issue Nov 22, 2021 · 7 comments · Fixed by #2302
Labels
category: lists has PR The issue has a Pull Request associated L0 - security A security vulnerability within the Marked library is discovered released

Comments

@tsmic
Copy link

tsmic commented Nov 22, 2021

Marked version: >2.1.3

Describe the bug
The speed of parse() call with large markdown source was about 15 times better in v2.1.3 than releases after it. Maximum size of markdown source that can be processed in the browser is respectively way smaller.

To Reproduce
Steps to reproduce the behavior:

const startTime = Date.now();
marked.parse(mdData800kB);
console.log('time in milliseconds', Date.now() - startTime);

Output:
time in milliseconds 30000

Expected behavior

time in milliseconds 2000

@UziTech
Copy link
Member

UziTech commented Nov 22, 2021

Our benchmarks show that performance stayed about the same. If you have any insight into what might be causing your slowdown that might be helpful otherwise I don't see any reason to keep this issue open as there isn't anything we can do about your issue.

@calculuschild
Copy link
Contributor

calculuschild commented Nov 22, 2021

Version 3.0 had major reworking of Lists as well as minor changes to Tables.

It's possible your document uses some pattern of lists or tables that our new approach doesn't handle well, but unless we can pinpoint what that pattern is we won't be able to fix this.

Perhaps if you can narrow down to a minimal example where the drastic slowdown is still measurable and we can dig into where the bottleneck is.

@tsmic
Copy link
Author

tsmic commented Nov 23, 2021

I don't think that some pattern in my data triggered the slowness issue. To confirm, I measured execution times with markdown data like I used to have (generated mostly using graphql-markdown) and with randomly chosen markdown data (the README.md of yazl package).

To get comparable numbers I tested several times with the markdown data extended 100 times (by concatenating in a loop before the test) the original size.

The average difference in speed (with both inputs) between versions 2.1.3 and 4.0.4 was more than ten times measured with Node 12. The 15 times difference was measured in Chrome browser. The measurement is rather simple and results are consistent.

Could it be possible that the data used in your benchmarks has not been big enough or lacks common patterns?

@calculuschild
Copy link
Contributor

calculuschild commented Nov 23, 2021

My understanding is that our testing uses the entire Commonmark (and Github-Flavored Markdown) spec, repeated 1000 times. We should be covering nearly every important pattern, but it's possible that skews our results to favor specific entries that have many examples in the specs, or favoring very short items, i.e. lists with only 2 or 3 entries.

Testing the yazl README.md I do see a slowdown of ~5x between 2.1.3 and 3.0.0 if I append 25 copies of the document end-to-end. Link here

If I isolate out Lists I see a change of about 8x slowdown. If I isolate just headers and paragraphs I don't really see slowdown.

This leads me to believe lists did slow down quite a bit, but are a small part of the benchmarks so they weren't noticed.

@UziTech Would it be possible / helpful to have a secondary benchmark that gives a metric specific token categories like we do with our Spec tests? Then when we make changes to e.g. Lists it can be clear where the slowdown is occurring? Maybe even adding cli tags so developers can run benchmarks focused just on just the relevant token type they are working on at the time?

@calculuschild
Copy link
Contributor

On further investigation, it looks like the List slowdown is tied specifically to long documents. There's some O(n^2) scaling going on or something and I have a pretty good idea where it's happening based on some profiling tests. I might be able to put together a tweak.

@UziTech
Copy link
Member

UziTech commented Nov 24, 2021

@calculuschild I would not be opposed to more bench marks. I would love some way to monitor the speed in PRs but I know GitHub actions servers are not very consistent when it comes to speed.

@UziTech UziTech added category: lists L0 - security A security vulnerability within the Marked library is discovered has PR The issue has a Pull Request associated and removed need more info labels Nov 24, 2021
@github-actions
Copy link

github-actions bot commented Dec 2, 2021

🎉 This issue has been resolved in version 4.0.6 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: lists has PR The issue has a Pull Request associated L0 - security A security vulnerability within the Marked library is discovered released
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants