Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(lexer): use SIMD char search when parsing numbers #3250

Closed
wants to merge 2 commits into from

Conversation

DonIsaac
Copy link
Collaborator

What This PR Does

Use memchr_iter to find and remove underscores when parsing number literals.

I'm seeing some performance improvements when running benchmarks on my machine, but I want confirmation from the CI benchmark job.

@DonIsaac DonIsaac added A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance labels May 12, 2024
Copy link

graphite-app bot commented May 12, 2024

Your org has enabled the Graphite merge queue for merging into main

Add the label “merge” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge. Or use the label “hotfix” to add to the merge queue as a hot fix.

You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link.

Copy link

codspeed-hq bot commented May 12, 2024

CodSpeed Performance Report

Merging #3250 will not alter performance

Comparing don/perf/lexer-simd-numbers (487a8c7) with main (eefb66f)

Summary

✅ 27 untouched benchmarks

@DonIsaac
Copy link
Collaborator Author

Looks like it doesn't help, so I'm closing this PR

@DonIsaac DonIsaac closed this May 12, 2024
@Boshen Boshen requested a review from overlookmotel May 13, 2024 03:14
@Boshen
Copy link
Member

Boshen commented May 13, 2024

The benchmark probably doesn't have any relevant data, assigning @overlookmotel for a quick look.

@Boshen Boshen reopened this May 13, 2024
@overlookmotel
Copy link
Collaborator

overlookmotel commented May 13, 2024

I can see 3 reasons why this appears to be slightly hurting performance rather than improving it:

  1. Probably a majority of numbers in typical JS source code are short integers e.g. in for (let i = 0; i < 5; i++). Using SIMD will be more costly than just iterating through characters in these common cases.
  2. memchr may be overkill here, because (apart from BigInts), JS numbers can only be a few chars long. memchr is optimized for searching for uncommon needles in large haystacks, but here most commonly you'd have a _ every 3 chars e.g. oneMillion = 1_000_000.
  3. There's likely a cost to initializing a Memchr. It could possibly be speeded up by creating a single finder and reusing it each time, the way the memchr finder for end of multiline comments is here:

/// `memchr` Finder for end of multi-line comments. Created lazily when first used.
multi_line_comment_end_finder: Option<memchr::memmem::Finder<'static>>,

But... if we want to optimize this, can I suggest a different approach?

The lexer already iterates through the characters of the source code comprising the number, and deals with _s. At present we do the same check again later, which is unnecessary. We could avoid checking again for _ when converting the number to f64 with something like this:

  1. Add more Kinds e.g. Kind::DecimalWithUnderscores, Kind::BinaryWithUnderscores.
  2. If lexer finds an underscore while lexing the number, it returns Kind::DecimalWithUnderscores instead of Kind::Decimal (in these functions).
  3. Then during conversion to f64, in the majority of cases it'll be a plain Kind::Decimal, and the check for _s can be bypassed entirely, because we already know there isn't one.

@DonIsaac What do you think of that approach?

Also, can I ask out of interest: What was your motivation for getting into this? Were you using Oxc on some input with lots of _s in integers, and found it was slow? Or pursuing optimization for optimization's sake? (either way, thank you!)

@DonIsaac
Copy link
Collaborator Author

@overlookmotel Your suggestions look sound. I like the idea of underscore checking during lexing (as per suggestion 2). Since we already need to iterate over numbers there, we could skip the use of memchr entirely.

As for suggestion 1, I like the idea of storing underscore data within the number's Kind, but adding an WithUnderscores variant for each number kind feels verbose to me. What do you think about doing something like using bitflags! for numeric kinds, or some other similar approach?

My motivation for getting into this is that I've spent time optimizing other parts of Oxc before, but never the most important part: the lexer and parser. It's so heavily optimized it's hard to find anything that needs improving. I'm not experiencing any problems with underscores, it's just optimization for optimization's sake 😄

@overlookmotel
Copy link
Collaborator

overlookmotel commented May 13, 2024

@DonIsaac Thanks for coming back.

just optimization for optimization's sake

A man after my own heart!

adding an WithUnderscores variant for each number kind feels verbose to me

Yes, you're right, it is verbose and unwieldy. But there is a perf advantage to keeping Kind simple (simple to the compiler, not us!). At present it's just a single byte, and that's as tight as you can get performance-wise.

What we can do is add explicit discriminants to Kind enum, and make sure there are patterns in the discriminant values so it's actually behaving like a bit flags set under the hood. e.g. make it so that converting any Kind::*WithUnderscores to the corresponding Kind::* is just "subtract 8 from the discriminant" (1 CPU operation).

Personally, my view is that as the lexer is the most primitive level of the stack, and everything else depends on it, it's acceptable to be quite nasty code, as long as it's as fast as possible. But that's just my opinion, feel free to disagree.

It's so heavily optimized it's hard to find anything that needs improving.

I actually see quite a lot of opportunity for further optimization in the lexer. For a start, everywhere we use the Chars iterator, we should iterate over bytes instead. Chars iterator is really slow. #2352 got some heavy perf improvements from doing this (along with other tricks). Secondly, many of the simpler matchers (e.g. when you find a =, is the token =, == or ===?) could be made branchless to avoid the penalty of incorrect branch prediction. I had intended to finish off that work on the lexer, and I expect there's another 5% to 10% gain on parser benchmarks there for the taking, but I've got sidetracked to other things.

If by any chance you'd be interested in taking up that baton, I'd be very happy to talk it through with you.

@@ -96,3 +96,42 @@ pub fn parse_big_int(s: &str, kind: Kind) -> Result<BigInt, &'static str> {
};
BigInt::parse_bytes(s.as_bytes(), radix).ok_or("invalid bigint")
}

#[inline]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a deoptimization I think, it will bust the cpu cache for loading so many CPU instructions, #[cold] should be the better option here.

Remember, data oriented design, is thinking about the statistics.

Copy link
Collaborator

@overlookmotel overlookmotel May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to contradict you Boshen, but I don't think #[cold] is the right option here either.

#[cold] is for telling the branch predictor "this function is rarely called", and so to assume another arm in a match is more likely to be the one which executes (or same in an if/else branch). But in this case, without_underscores is always called, so #[cold] won't do anything.

If want to make sure the function doesn't get inlined, use #[inline(never)]. Or just don't use any attribute, and let the compiler decide whether to inline or not. The latter is usually the best option.

So I agree with you that #[inline] should be removed here, but probably not replaced with anything else.

FYI, I think I've over-used #[cold] in the lexer, and could probably remove some of them. I know more now than I did in Jan!

@Boshen
Copy link
Member

Boshen commented May 14, 2024

numbers are pretty short, and underscores are rare, so this routine may need to be done differently. Feel free to close the PR again 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants