Performance of parse() is 15 times worse in releases after 2.1.3 #2297

tsmic · 2021-11-22T11:42:43Z

Marked version: >2.1.3

Describe the bug
The speed of parse() call with large markdown source was about 15 times better in v2.1.3 than releases after it. Maximum size of markdown source that can be processed in the browser is respectively way smaller.

To Reproduce
Steps to reproduce the behavior:

const startTime = Date.now();
marked.parse(mdData800kB);
console.log('time in milliseconds', Date.now() - startTime);

Output:
time in milliseconds 30000

Expected behavior

time in milliseconds 2000

UziTech · 2021-11-22T23:04:21Z

Our benchmarks show that performance stayed about the same. If you have any insight into what might be causing your slowdown that might be helpful otherwise I don't see any reason to keep this issue open as there isn't anything we can do about your issue.

calculuschild · 2021-11-22T23:44:25Z

Version 3.0 had major reworking of Lists as well as minor changes to Tables.

It's possible your document uses some pattern of lists or tables that our new approach doesn't handle well, but unless we can pinpoint what that pattern is we won't be able to fix this.

Perhaps if you can narrow down to a minimal example where the drastic slowdown is still measurable and we can dig into where the bottleneck is.

tsmic · 2021-11-23T06:39:16Z

I don't think that some pattern in my data triggered the slowness issue. To confirm, I measured execution times with markdown data like I used to have (generated mostly using graphql-markdown) and with randomly chosen markdown data (the README.md of yazl package).

To get comparable numbers I tested several times with the markdown data extended 100 times (by concatenating in a loop before the test) the original size.

The average difference in speed (with both inputs) between versions 2.1.3 and 4.0.4 was more than ten times measured with Node 12. The 15 times difference was measured in Chrome browser. The measurement is rather simple and results are consistent.

Could it be possible that the data used in your benchmarks has not been big enough or lacks common patterns?

calculuschild · 2021-11-23T16:58:14Z

My understanding is that our testing uses the entire Commonmark (and Github-Flavored Markdown) spec, repeated 1000 times. We should be covering nearly every important pattern, but it's possible that skews our results to favor specific entries that have many examples in the specs, or favoring very short items, i.e. lists with only 2 or 3 entries.

Testing the yazl README.md I do see a slowdown of ~5x between 2.1.3 and 3.0.0 if I append 25 copies of the document end-to-end. Link here

If I isolate out Lists I see a change of about 8x slowdown. If I isolate just headers and paragraphs I don't really see slowdown.

This leads me to believe lists did slow down quite a bit, but are a small part of the benchmarks so they weren't noticed.

@UziTech Would it be possible / helpful to have a secondary benchmark that gives a metric specific token categories like we do with our Spec tests? Then when we make changes to e.g. Lists it can be clear where the slowdown is occurring? Maybe even adding cli tags so developers can run benchmarks focused just on just the relevant token type they are working on at the time?

calculuschild · 2021-11-24T04:17:29Z

On further investigation, it looks like the List slowdown is tied specifically to long documents. There's some O(n^2) scaling going on or something and I have a pretty good idea where it's happening based on some profiling tests. I might be able to put together a tweak.

UziTech · 2021-11-24T07:39:21Z

@calculuschild I would not be opposed to more bench marks. I would love some way to monitor the speed in PRs but I know GitHub actions servers are not very consistent when it comes to speed.

github-actions · 2021-12-02T03:20:23Z

🎉 This issue has been resolved in version 4.0.6 🎉

The release is available on:

Your semantic-release bot 📦🚀

UziTech added the need more info label Nov 22, 2021

calculuschild mentioned this issue Nov 24, 2021

fix: speed up parsing long lists #2302

Merged

5 tasks

UziTech added category: lists L0 - security A security vulnerability within the Marked library is discovered has PR The issue has a Pull Request associated and removed need more info labels Nov 24, 2021

UziTech closed this as completed in #2302 Dec 2, 2021

github-actions bot added the released label Dec 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of parse() is 15 times worse in releases after 2.1.3 #2297

Performance of parse() is 15 times worse in releases after 2.1.3 #2297

tsmic commented Nov 22, 2021 •

edited

UziTech commented Nov 22, 2021

calculuschild commented Nov 22, 2021 •

edited

tsmic commented Nov 23, 2021 •

edited

calculuschild commented Nov 23, 2021 •

edited

calculuschild commented Nov 24, 2021

UziTech commented Nov 24, 2021

github-actions bot commented Dec 2, 2021

Performance of parse() is 15 times worse in releases after 2.1.3 #2297

Performance of parse() is 15 times worse in releases after 2.1.3 #2297

Comments

tsmic commented Nov 22, 2021 • edited

UziTech commented Nov 22, 2021

calculuschild commented Nov 22, 2021 • edited

tsmic commented Nov 23, 2021 • edited

calculuschild commented Nov 23, 2021 • edited

calculuschild commented Nov 24, 2021

UziTech commented Nov 24, 2021

github-actions bot commented Dec 2, 2021

tsmic commented Nov 22, 2021 •

edited

calculuschild commented Nov 22, 2021 •

edited

tsmic commented Nov 23, 2021 •

edited

calculuschild commented Nov 23, 2021 •

edited