-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Static lexer #1909
Static lexer #1909
Conversation
This pull request is being automatically deployed with Vercel (learn more). 🔍 Inspect: https://vercel.com/markedjs/markedjs/ddcd2oz4d |
It doesn't seem to speed up the current benchmarks but I see how it could if #1872 was merged and |
src/Lexer.js
Outdated
@@ -75,6 +78,8 @@ module.exports = class Lexer { | |||
this.tokenizer.rules = rules; | |||
} | |||
|
|||
static staticLexer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this static property is necessary since Lexer.staticLexer
will be undefined
without this line until it is set anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
I wonder if this will actually be any faster since we will still have to run |
The current version of #1872 is passing this as a parameter but it doesn't seem to make a difference in my machine. I have some other ideas that seemed to speed up parsing, but repeated calls got quite slow with the setup overhead. Maybe I will post in a separate PR so you can judge. |
Made the fixes requested plus a small tweak so Also removed unneeded Babel extension, and it now passes the Linter. Even if performance isn't improved here, I think it will streamline any future updates that include heavier setup steps in the constructor. |
this.tokens = []; | ||
this.tokens.links = Object.create(null); | ||
this.options = options || defaults; | ||
this.options.tokenizer = this.options.tokenizer || new Tokenizer(); | ||
this.tokenizer = this.options.tokenizer; | ||
if (!this.tokenizer) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they pass a different tokenizer in the options wouldn't we want to use that one here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That still happens, if my logic is correct.
Instead of this with every loop:
- Check for
this.options.tokenizer
- If it doesn't exist, assign it the default value
new Tokenizer
- Assign the result to
this.tokenizer
Do this:
- Check for
this.tokenizer
- If it doesn't exist, assign it a value, preferably
this.options.tokenizer
, but if not, default tonew Tokenizer
The end result is the same: this.tokenizer == this.options.tokenizer || new Token
, but we have saved an object write on every loop in the second case (this.tokenizer = this.options.tokenizer
)
Instead of "saving" the optional Tokenizer to this.options.tokenizer
to be reapplied every loop, we "save" it where it actually belongs, in this.tokenizer
, and only have to apply it the first time around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact... I wonder if we even need this separate "setOptions" function run every time at all. I just put it there to appease the unit tests. Once the tokenizer is defined and its options are set the first time, normal use would generally not be swapping options on the fly, right? Wouldn't 99% of use cases just set options one time and then run marked(string)
with those same options every time?
The only real reason I see the need to assign options on the fly is for unit testing in development, because it needs to update options as it steps through the spec tests for GFM, Pedantic, etc. in a random order. We could instead modify the unit test script to create a new marked object with each test, and thereby have the correct options as it goes, but I don't see a need for random option switching in production code, and especially not repeated re-assignment of the same options over and over. (I could be missing some reason we need to allow hot-swapping options in production, just can't think of any.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens in this situation?
marked("# test");
marked.use({
tokenizer: {
heading() {
// no heading
}
}
});
marked("# test");
with this PR this.tokenizer
wouldn't be updated and the new heading tokenizer is ignored.
This PR:
<h1 id="test">test</h1>
<h1 id="test">test</h1>
latest release:
<h1 id="test">test</h1>
<p># test</p>
I see what you are doing here but I don't see any benefits. It doesn't seem any faster especially since we still need to set the options. If we want it to be faster we should make marked a class that you can use like Currently the way marked sets the options they are global so with this PR the only way to update the tokenizer would be to restart the node process. |
Ok, so we do need to support hot-swapping options in production. This is good to know.
This is the key point I was trying to make in the OP. In its current state, this doesn't do a whole lot, especially if we have to re-apply the same options over and over whether they are changed or not. However, if we want to make future changes that add a lot of overhead to the constructor, such as
This is a good point, and re-emphasises to me that this PR is a good place to optimize. We don't really see speed gains against the current master because we haven't gone far enough :P. If we are needlessly resetting the same options over and over with each call to |
Maybe we can add a way to do We could change marked to: const {Marked} = require("marked");
const marked = new Marked(options);
marked("# markdown"); Or add this way of doing it and keep the old way but just tell people that if they want speed they need to do it this way? It seems like this would have all of theses benefits (plus some ESM benefits) and not alienate users that don't want to change but don't care about speed. |
Sure, I would not be opposed to that. If it opens up better options for #1872 or similar changes down the line I'm all for it. FYI I just tried the approach of using |
Description
PR #1872 is running into some optimization issues, and part of that is due to the way the Benchmark is run. With each individual call to marked(markdown), we create a brand new Lexer, Tokenizer, Parser, Renderer, etc, and doing this for each of the thousands of calls in the Benchmark makes any setup code take up a significant portion of the execution time, i.e.,
bind()
ing functions in a constructor, which should be faster thancall()
because it only needs to happen once, is happening with every Marked execution.I added a
static
propertystaticLexer
to the Lexer.js, and instead of creating a new Lexer in each call tostatic Lexer.lex()
, it instead returns the existing staticstaticLexer
property, and just applies any changed options. Did a similar thing with the Tokenizer.js. I don't see any reason to be building up entire separate trees of objects every time someone wants to compile a line of markdown; we only use one at a time, so lets "singleton" it?Only issue is Lint doesn't like it because
static property
isn't a formal part of JS yet, even though Babel can handle it for us, so we might need to use something likehttps://github.com/babel/babel/tree/main/eslint/babel-eslint-parser
It seems to give a bit of a speed boost to the current Master, but it's really hard to tell with the current Benchmarking setup since results vary wildly. I'd appreciate a second opinion.
Contributor
Committer
In most cases, this should be a different person than the contributor.