perf(lexer): dedupe numeric separator check #3283

DonIsaac · 2024-05-15T00:44:15Z

What This PR Does

Updates numeric literal token lexing to record when separator characters (_) are found in a new Token flag. This then gets passed to parse_int and parse_float, removing the need for a second _ check in those two functions.

When run locally, I see no change to lexer benchmarks and minor improvements to codegen benchmarks. For some reason, semantic and source map benches seem to be doing slightly worse.

Note that I attempted to implement this with bitflags! (making escaped and is_on_newline flags as well) and this caused performance degradation. My best guess is that it turned reads on these flags from a mov to a mov + a binary and.

graphite-app · 2024-05-15T00:44:20Z

Your org has enabled the Graphite merge queue for merging into main

Add the label “merge” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge. Or use the label “hotfix” to add to the merge queue as a hot fix.

You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link.

codspeed-hq · 2024-05-15T00:50:32Z

CodSpeed Performance Report

Merging #3283 will improve performances by 5.83%

_{Comparing don/perf/lexer-dedupe-sep-check (7fd434e) with main (dad47a5)}

Summary

⚡ 6 improvements
✅ 21 untouched benchmarks

Benchmarks breakdown

	Benchmark	`main`	`don/perf/lexer-dedupe-sep-check`	Change
⚡	`lexer[RadixUIAdoptionSection.jsx]`	90.2 µs	86 µs	+4.87%
⚡	`lexer[antd.js]`	113.5 ms	107.8 ms	+5.31%
⚡	`lexer[cal.com.tsx]`	28.2 ms	26.7 ms	+5.41%
⚡	`lexer[checker.ts]`	66.6 ms	63.6 ms	+4.75%
⚡	`lexer[pdf.mjs]`	18.6 ms	17.6 ms	+5.83%
⚡	`semantic[pdf.mjs]`	141.6 ms	136.5 ms	+3.7%

Boshen

We can always move the underscore string out if the token gets filled up with other data.

oxc/crates/oxc_parser/src/lexer/mod.rs

Lines 91 to 96 in dad47a5

    
           /// Data store for escaped strings, indexed by [Token::start] when [Token::escaped] is true 
        
           pub escaped_strings: FxHashMap<u32, &'a str>, 
        
           /// Data store for escaped templates, indexed by [Token::start] when [Token::escaped] is true 
        
           /// `None` is saved when the string contains an invalid escape sequence. 
        
           pub escaped_templates: FxHashMap<u32, Option<&'a str>>,

crates/oxc_parser/src/lexer/number.rs

Co-authored-by: Boshen <boshenc@gmail.com>

overlookmotel · 2024-05-15T07:48:57Z

@DonIsaac Very nice work!

I didn't get a chance to review this before it was merged (everything happened in the middle of the night in my timezone, and I was asleep). Can I make a request for a follow-on?

The one thing I'd like to pick up on is the use of unsafe. I understand why you've made parse_float_without_underscores_unchecked an unsafe function to draw attention to the precondition. But personally I think we should reserve unsafe for cases where the preconditions MUST be met or it's UB.

In my mind, there's quite a big difference between:

(a) this function may provide nonsense output or panic if you give it the wrong input, and
(b) this function may do basically anything if you give it the wrong input, and it may cause errors or malfunctions in unrelated parts of the codebase (UB).

In the potential-UB cases, the person using the API should take it slow, rigorously check all the preconditions, and document how they can prove the preconditions are satisfied. In my view we want to keep this to an absolute minimum otherwise it's like "crying wolf" - people will get used to calling unsafe functions, and will get blasé and just write // SAFETY: All conditions are satisfied, even for the cases where they really should be being careful.

I would also like to institute a policy where any PR using unsafe can only be merged after a more rigorous than usual code review, by multiple reviewers. Again, we'd need to keep this to a minimum, or it'd be impractical.

So... I've probably explained this in more detail than I needed to (sorry, I have a tendency to talk too much!). But the short version is: If you can understand my logic and agree, would you mind doing a follow-on PR to remove the usage of unsafe here?

overlookmotel · 2024-05-15T07:58:54Z

Second thing:

I don't understand the debug_assert! in set_has_separator. Why the addition of || self.kind == Kind::default()?

overlookmotel · 2024-05-15T08:00:58Z

Some side notes:

Lexer benchmarks don't really matter. The lexer is not exposed to user as a stand-alone API, so it's the parser benchmark which actually matter. The lexer benchmark is only there for our internal use to give greater visibility when working on the lexer. The flame graphs on CodSpeed for lexer are much easier to interpret than the parser's.

In the case of this PR, the real gain is the +1% on the parser. In one sense 1% isn't much, but I think it's significant. Multiple incremental gains of 1% add up fast.

I am mystified as to why we're seeing 5% gain on lexer benchmarks here. If anything, you'd expect the lexer to be slightly slower because it's doing a little bit more work, in order to save work later for the parser.

The only thing I can think of is that changing one of the padding bytes in Token from u8 to bool somehow improves codegen. Would you like to replace the other 2 padding bytes with bools too, and see what happens?

Or do you have any idea what's going on here?

DonIsaac · 2024-05-15T15:12:40Z

@overlookmotel

Yes, I will make a separate PR removing unsafe
That check is needed since self.token is currently under construction and has a kind of Kind::Eof. It won't have a numeric kind until the token kind is set by the same methods consuming set_has_separator
I'm really not sure; these improvements baffled me as well. I think your hypothesis is solid; I can't think of anything else that may affect it. I'll try making _padding2 a [bool; 4] and see what effects that has.

overlookmotel · 2024-05-15T15:39:15Z

Yes, I will make a separate PR removing unsafe

Thank you!

That check is needed since self.token is currently under construction and has a kind of Kind::Eof. It won't have a numeric kind until the token kind is set by the same methods consuming set_has_separator

Ah ha that makes sense now.

I'm really not sure; these improvements baffled me as well. I think your hypothesis is solid; I can't think of anything else that may affect it. I'll try making _padding2 a [bool; 4] and see what effects that has.

I don't know if my hypothesis is solid or not! It's very mysterious.

This is all real micro-opt stuff. It only makes a difference because creating tokens is the majority of what the lexer does, so it's the definition of a hot path.

overlookmotel · 2024-05-15T15:58:28Z

I think I may have solved the mystery! Please see #3289 (comment)

[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [oxlint](https://oxc-project.github.io) ([source](https://togithub.com/oxc-project/oxc/tree/HEAD/npm/oxlint)) | [`0.3.2` -> `0.3.5`](https://renovatebot.com/diffs/npm/oxlint/0.3.2/0.3.5) | [![age](https://developer.mend.io/api/mc/badges/age/npm/oxlint/0.3.5?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/npm/oxlint/0.3.5?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/npm/oxlint/0.3.2/0.3.5?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/oxlint/0.3.2/0.3.5?slim=true)](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>oxc-project/oxc (oxlint)</summary> ### [`v0.3.5`](https://togithub.com/oxc-project/oxc/releases/tag/oxlint_v0.3.5): oxlint v0.3.5 [Compare Source](https://togithub.com/oxc-project/oxc/compare/7193d75e9444ae8c2ba757b3bc64459abd0e128a...754d9f4c98aab052cf6b2756f7af12557042708d) #### What's Changed - feat(linter): add use-isnan fixer for (in)equality operations by [@DonIsaac](https://togithub.com/DonIsaac) in [oxc-project/oxc#3284 - feat(linter/eslint): Implement fixer for unicode-bom rule by [@jelly](https://togithub.com/jelly) in [oxc-project/oxc#3259 - fix(linter/no-direct-mutation-state): false positive when class is declared inside a `CallExpression` by [@Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3294 - fix(parser): parse `DecoratorCallExpression` when `Arguments` contains `MemberExpression` by [@Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3265 - perf(ast): inline all `ASTBuilder` methods by [@Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3295 - perf(lexer): dedupe numeric separator check by [@DonIsaac](https://togithub.com/DonIsaac) in [oxc-project/oxc#3283 - perf(linter): rewrite react/require-render-return by [@mysteryven](https://togithub.com/mysteryven) in [oxc-project/oxc#3276 #### New Contributors - [@g-plane](https://togithub.com/g-plane) made their first contribution in [oxc-project/oxc#3268 **Full Changelog**: oxc-project/oxc@oxlint_v0.3.4...oxlint_v0.3.5 ### [`v0.3.4`](https://togithub.com/oxc-project/oxc/releases/tag/oxlint_v0.3.4): oxlint v0.3.4 [Compare Source](https://togithub.com/oxc-project/oxc/compare/6149e49ef79a22004e36820c81afcb0c755fcc81...7193d75e9444ae8c2ba757b3bc64459abd0e128a) #### What's Changed - [feat(linter): move react/rules_of_hooks to nursery](https://togithub.com/oxc-project/oxc/commit/6edcae86cda8922ea8f9e5eae91290018e1b1637) - feat(linter/eslint): Implement max-classes-per-file by [@jelly](https://togithub.com/jelly) in [oxc-project/oxc#3241 - **Full Changelog**: oxc-project/oxc@oxlint_v0.3.3...oxlint_v0.3.4 *** ### From v0.3.3 #### What's Changed ##### Features - add `--symlinks` to allow symbolic walking by [@Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3244 - add `--format github` for github check annotation by [@Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3191 - change the category of all react-perf rules to perf by [@Dunqing](https://togithub.com/Dunqing) in [oxc-project/oxc#3243 - remove deprecated eslint v9 rules `no-return-await` and `no-mixed-operators` by [@Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3188 - move prefer-node-protocol to restriction by [@Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3171 ##### New Rules - react/rules-of-hooks by [@rzvxa](https://togithub.com/rzvxa) in [oxc-project/oxc#3071 - eslint/radix by [@KubaJastrz](https://togithub.com/KubaJastrz) in [oxc-project/oxc#3167 - eslint/no-new-native-nonconstructor by [@Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3187 - eslint/unicode-bom by [@jelly](https://togithub.com/jelly) in [oxc-project/oxc#3239 - eslint/no-empty-function rule by [@jelly](https://togithub.com/jelly) in [oxc-project/oxc#3181 - eslint-plugin-next/no-duplicate-head by [@Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3174 - eslint-plugin-next/no-page-custom-font by [@Dunqing](https://togithub.com/Dunqing) in [oxc-project/oxc#3185 - eslint-plugin-next/no-styled-jsx-in-document by [@Dunqing](https://togithub.com/Dunqing) in [oxc-project/oxc#3184 - unicorn/no-anonymous-default-export by [@1zumii](https://togithub.com/1zumii) in [oxc-project/oxc#3220 ##### Bug Fixes - improve `prefer-string-starts-ends-with` rule by [@camc314](https://togithub.com/camc314) in [oxc-project/oxc#3176 - import/export: improve multiple exports error message by [@Dunqing](https://togithub.com/Dunqing) in [oxc-project/oxc#3160 - import/named: handle `import { default as foo }` by [@Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3255 - shorten eslint/eqeqeq rule error message's span by [@mysteryven](https://togithub.com/mysteryven) in [oxc-project/oxc#3193 - fix(parser): correctly parse cls.fn<C> = x by [@Dunqing](https://togithub.com/Dunqing) in [oxc-project/oxc#3208 #### New Contributors - [@KubaJastrz](https://togithub.com/KubaJastrz) made their first contribution in [oxc-project/oxc#3167 - [@1zumii](https://togithub.com/1zumii) made their first contribution in [oxc-project/oxc#3220 **Full Changelog**: oxc-project/oxc@oxlint_v0.3.2...oxlint_v0.3.3 ### [`v0.3.3`](https://togithub.com/oxc-project/oxc/releases/tag/oxlint_v0.3.3): oxlint v0.3.3 [Compare Source](https://togithub.com/oxc-project/oxc/compare/a7940868c6e66d16814ebef5c8dbbfd9b948a0cd...6149e49ef79a22004e36820c81afcb0c755fcc81) #### What's Changed ##### Features - add `--symlinks` to allow symbolic walking by [@Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3244 - add `--format github` for github check annotation by [@Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3191 - change the category of all react-perf rules to perf by [@Dunqing](https://togithub.com/Dunqing) in [oxc-project/oxc#3243 - remove deprecated eslint v9 rules `no-return-await` and `no-mixed-operators` by [@Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3188 - move prefer-node-protocol to restriction by [@Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3171 ##### New Rules - react/rules-of-hooks by [@rzvxa](https://togithub.com/rzvxa) in [oxc-project/oxc#3071 - eslint/radix by [@KubaJastrz](https://togithub.com/KubaJastrz) in [oxc-project/oxc#3167 - eslint/no-new-native-nonconstructor by [@Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3187 - eslint/unicode-bom by [@jelly](https://togithub.com/jelly) in [oxc-project/oxc#3239 - eslint/no-empty-function rule by [@jelly](https://togithub.com/jelly) in [oxc-project/oxc#3181 - eslint-plugin-next/no-duplicate-head by [@Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3174 - eslint-plugin-next/no-page-custom-font by [@Dunqing](https://togithub.com/Dunqing) in [oxc-project/oxc#3185 - eslint-plugin-next/no-styled-jsx-in-document by [@Dunqing](https://togithub.com/Dunqing) in [oxc-project/oxc#3184 - unicorn/no-anonymous-default-export by [@1zumii](https://togithub.com/1zumii) in [oxc-project/oxc#3220 ##### Bug Fixes - improve `prefer-string-starts-ends-with` rule by [@camc314](https://togithub.com/camc314) in [oxc-project/oxc#3176 - import/export: improve multiple exports error message by [@Dunqing](https://togithub.com/Dunqing) in [oxc-project/oxc#3160 - import/named: handle `import { default as foo }` by [@Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3255 - shorten eslint/eqeqeq rule error message's span by [@mysteryven](https://togithub.com/mysteryven) in [oxc-project/oxc#3193 - fix(parser): correctly parse cls.fn<C> = x by [@Dunqing](https://togithub.com/Dunqing) in [oxc-project/oxc#3208 #### New Contributors - [@KubaJastrz](https://togithub.com/KubaJastrz) made their first contribution in [oxc-project/oxc#3167 - [@1zumii](https://togithub.com/1zumii) made their first contribution in [oxc-project/oxc#3220 **Full Changelog**: oxc-project/oxc@oxlint_v0.3.2...oxlint_v0.3.3 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/toeverything/AFFiNE).

@overlookmotel

## What This PR Does - perf(lexer): use bit shifting when parsing hex, octal, and binary integers instead of `mul_add`-ing on `f64`s. Check out the difference in assembly generated [here](https://godbolt.org/z/zMEKaeYzh) - perf(lexer): skip redundant utf8 check when parsing BigInts - refactor(lexer): remove `unsafe` usage (as per @overlookmotel's request [here](#3283 (comment))) - test(lexer): add numeric parsing unit tests I don't expect this PR to have a large performance improvement, since the most common case (`Kind::Decimal`) is not affected. We could do this, however, by splitting `Kind::Decimal` into `Kind::DecimalFloat` and `Kind::DecimalInt` when the lexer encounters a `.`

perf(lexer): dedupe numeric separator check

fdef56d

DonIsaac added A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance labels May 15, 2024

DonIsaac requested review from overlookmotel and Boshen May 15, 2024 00:44

DonIsaac self-assigned this May 15, 2024

style: run cargo fmt

9b2f64e

fix(lexer): move set_has_separator to before illegal char check

c11e783

Boshen approved these changes May 15, 2024

View reviewed changes

Boshen reviewed May 15, 2024

View reviewed changes

crates/oxc_parser/src/lexer/number.rs Outdated Show resolved Hide resolved

crates/oxc_parser/src/lexer/number.rs Outdated Show resolved Hide resolved

Apply suggestions from code review

7fd434e

Co-authored-by: Boshen <boshenc@gmail.com>

DonIsaac enabled auto-merge (squash) May 15, 2024 01:46

DonIsaac merged commit 508dae6 into main May 15, 2024
29 checks passed

DonIsaac deleted the don/perf/lexer-dedupe-sep-check branch May 15, 2024 01:48

This was referenced May 15, 2024

Fast path for parsing decimal literals #3288

Open

Change type of padding in lexer Token? #3289

Open

DonIsaac mentioned this pull request May 15, 2024

perf(lexer): use bitshifting when parsing known integers #3296

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(lexer): dedupe numeric separator check #3283

perf(lexer): dedupe numeric separator check #3283

DonIsaac commented May 15, 2024 •

edited

graphite-app bot commented May 15, 2024

codspeed-hq bot commented May 15, 2024 •

edited

Boshen left a comment

overlookmotel commented May 15, 2024

overlookmotel commented May 15, 2024

overlookmotel commented May 15, 2024 •

edited

DonIsaac commented May 15, 2024

overlookmotel commented May 15, 2024

overlookmotel commented May 15, 2024

	/// Data store for escaped strings, indexed by [Token::start] when [Token::escaped] is true
	pub escaped_strings: FxHashMap<u32, &'a str>,

	/// Data store for escaped templates, indexed by [Token::start] when [Token::escaped] is true
	/// `None` is saved when the string contains an invalid escape sequence.
	pub escaped_templates: FxHashMap<u32, Option<&'a str>>,

perf(lexer): dedupe numeric separator check #3283

perf(lexer): dedupe numeric separator check #3283

Conversation

DonIsaac commented May 15, 2024 • edited

What This PR Does

graphite-app bot commented May 15, 2024

Your org has enabled the Graphite merge queue for merging into main

codspeed-hq bot commented May 15, 2024 • edited

CodSpeed Performance Report

Merging #3283 will improve performances by 5.83%

Summary

Benchmarks breakdown

Boshen left a comment

Choose a reason for hiding this comment

overlookmotel commented May 15, 2024

overlookmotel commented May 15, 2024

overlookmotel commented May 15, 2024 • edited

DonIsaac commented May 15, 2024

overlookmotel commented May 15, 2024

overlookmotel commented May 15, 2024

DonIsaac commented May 15, 2024 •

edited

codspeed-hq bot commented May 15, 2024 •

edited

overlookmotel commented May 15, 2024 •

edited