refac: improve lexing inline elements step's performance #3146

SpencerWhitehead7 · 2023-12-28T04:57:33Z

Marked version: 11.1.0

Markdown flavor: n/a

Description

Improves performance of the lexing step, specifically lexing inline elements. Prompted by this issue #2355

What was attempted

Create a benchmark file bench-100.md - this file copy pasted x 100 https://raw.githubusercontent.com/thephpleague/commonmark/2.2/tests/benchmark/sample.md (I saw that file linked in the issue and it looked good to me)

master:

npm run build
create a cpu profile with node --cpu-prof bin/marked.js -i bench-100.md -o bench-100.html

Apply my changes:

npm run build
create an updated cpu profile with node --cpu-prof bin/marked.js -i bench-100.md -o bench-100.html

Load the two profiles in https://www.speedscope.app/

before: 358ms doing marked stuff, 334ms lexing
after: 242ms doing marked stuff, 218ms lexing
the difference is even bigger with larger files
I tried to attach the actual cpu profiles, but github wouldn't let me. I have screenshots of the speedscope though
before:

after:

The output file is identical, all the tests still pass, and as far as I can tell from reading the code, there shouldn't be any differences in behavior.

The reason this change speeds it up is because changing from while loop + shift to for loop is that while + shift is an n^2 operation; shift shuffles every remaining item in the array down by one, an n time operation, which is done once per item in the array (n * n). Running through the array with a for loop is just n time.

This optimization makes no difference in the automated benchmark. I haven’t read all the benchmark test cases, but I think they’re mostly tiny single item parses, which don’t accumulate a big queue to burn though. The benchmark file has lots of inline elements, so the optimization shows up strongly there.

I think a markdown file that had a good, statistically accurate representation of real world use patterns for all the different markdown features would be super helpful for benchmarking, although it seemed like overkill for this PR.

I’m not sure, and I don’t know if I have the time to do them if they do exist, but I think there might be other architecture level opportunities to improve performance, as opposed to micro level optimizations on things like the regexes. Is there interest in changes like that, even if they meant significant changes to the existing classes?

Contributor

Test(s) exist to ensure functionality and minimize regression (if no tests added, list tests covering this PR); or,
no tests required for this PR.
If submitting new feature, it has been documented in the appropriate places.

Committer

In most cases, this should be a different person than the contributor.

CI is green (no forced merge required).
Squash and Merge PR following conventional commit guidelines.

vercel · 2023-12-28T04:57:37Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
marked-website	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Dec 29, 2023 4:27pm

calculuschild

Such a small change, but if it passes the tests this looks like an awesome improvement, even if it is mainly visible only on larger documents.

I think we are generally open to any performance boosts even if they require some retooling. As long as we remain compliant with the specs and extensions remain functional, we can always make a major version bump if signatures are significantly changed.

UziTech · 2023-12-29T05:42:23Z

The only issue I can see is that this.inlineQueue will be an empty array at the end of the while loop and not at the end of the for loop. This could cause issues with some advanced uses of marked

SpencerWhitehead7 · 2023-12-29T07:07:52Z

As far as I could tell, nothing reads the array after the end of the while loop. I don't see Lexer using it anywhere else, or it being accessed elsewhere, and the lexer just returns the array of tokens at the end.

If there's cases I'm missing, it could be reset to an empty array after the for loop runs, or reversed and then popped instead of shifted.

UziTech · 2023-12-29T15:25:25Z

I don't think it will break anything in marked if it isn't reset but if someone uses the lexer with new marked.Lexer() than it could cause issues when they reuse the same lexer.

I think we can just reset it to an empty array after the for loop

SpencerWhitehead7 · 2023-12-29T16:27:02Z

Oh, that makes sense. Thank you for explaining the case so clearly. I added reseting it to an empty array.

## [11.1.1](v11.1.0...v11.1.1) (2023-12-31) ### Bug Fixes * improve lexing inline elements step's performance ([#3146](#3146)) ([4f87b2a](4f87b2a))

refac: improve lexing inline elements step's performance

3761cdc

vercel bot deployed to Preview December 28, 2023 04:58 View deployment

calculuschild approved these changes Dec 28, 2023

View reviewed changes

refac: clear out queue after tokens have been inlined

fdc3525

vercel bot deployed to Preview December 29, 2023 16:27 View deployment

UziTech approved these changes Dec 29, 2023

View reviewed changes

UziTech merged commit 4f87b2a into markedjs:master Dec 31, 2023
8 checks passed

github-actions bot pushed a commit that referenced this pull request Dec 31, 2023

chore(release): 11.1.1 [skip ci]

05cbf5f

## [11.1.1](v11.1.0...v11.1.1) (2023-12-31) ### Bug Fixes * improve lexing inline elements step's performance ([#3146](#3146)) ([4f87b2a](4f87b2a))

SpencerWhitehead7 deleted the improve-performance--inline-element-lexing branch January 6, 2024 03:36

mnathsnyk mentioned this pull request Mar 15, 2024

[Snyk] Upgrade marked from 0.3.5 to 12.0.0 mnathsnyk/nodejs-goof#5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refac: improve lexing inline elements step's performance #3146

refac: improve lexing inline elements step's performance #3146

SpencerWhitehead7 commented Dec 28, 2023 •

edited

vercel bot commented Dec 28, 2023 •

edited

calculuschild left a comment

UziTech commented Dec 29, 2023

SpencerWhitehead7 commented Dec 29, 2023

UziTech commented Dec 29, 2023

SpencerWhitehead7 commented Dec 29, 2023

refac: improve lexing inline elements step's performance #3146

refac: improve lexing inline elements step's performance #3146

Conversation

SpencerWhitehead7 commented Dec 28, 2023 • edited

Description

What was attempted

Contributor

Committer

vercel bot commented Dec 28, 2023 • edited

calculuschild left a comment

Choose a reason for hiding this comment

UziTech commented Dec 29, 2023

SpencerWhitehead7 commented Dec 29, 2023

UziTech commented Dec 29, 2023

SpencerWhitehead7 commented Dec 29, 2023

SpencerWhitehead7 commented Dec 28, 2023 •

edited

vercel bot commented Dec 28, 2023 •

edited