Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(lexer): dedupe numeric separator check #3283

Merged
merged 4 commits into from
May 15, 2024

Conversation

DonIsaac
Copy link
Collaborator

@DonIsaac DonIsaac commented May 15, 2024

What This PR Does

Updates numeric literal token lexing to record when separator characters (_) are found in a new Token flag. This then gets passed to parse_int and parse_float, removing the need for a second _ check in those two functions.

When run locally, I see no change to lexer benchmarks and minor improvements to codegen benchmarks. For some reason, semantic and source map benches seem to be doing slightly worse.

Note that I attempted to implement this with bitflags! (making escaped and is_on_newline flags as well) and this caused performance degradation. My best guess is that it turned reads on these flags from a mov to a mov + a binary and.

@DonIsaac DonIsaac added A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance labels May 15, 2024
@DonIsaac DonIsaac self-assigned this May 15, 2024
Copy link

graphite-app bot commented May 15, 2024

Your org has enabled the Graphite merge queue for merging into main

Add the label “merge” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge. Or use the label “hotfix” to add to the merge queue as a hot fix.

You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link.

Copy link

codspeed-hq bot commented May 15, 2024

CodSpeed Performance Report

Merging #3283 will improve performances by 5.83%

Comparing don/perf/lexer-dedupe-sep-check (7fd434e) with main (dad47a5)

Summary

⚡ 6 improvements
✅ 21 untouched benchmarks

Benchmarks breakdown

Benchmark main don/perf/lexer-dedupe-sep-check Change
lexer[RadixUIAdoptionSection.jsx] 90.2 µs 86 µs +4.87%
lexer[antd.js] 113.5 ms 107.8 ms +5.31%
lexer[cal.com.tsx] 28.2 ms 26.7 ms +5.41%
lexer[checker.ts] 66.6 ms 63.6 ms +4.75%
lexer[pdf.mjs] 18.6 ms 17.6 ms +5.83%
semantic[pdf.mjs] 141.6 ms 136.5 ms +3.7%

Copy link
Member

@Boshen Boshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always move the underscore string out if the token gets filled up with other data.

/// Data store for escaped strings, indexed by [Token::start] when [Token::escaped] is true
pub escaped_strings: FxHashMap<u32, &'a str>,
/// Data store for escaped templates, indexed by [Token::start] when [Token::escaped] is true
/// `None` is saved when the string contains an invalid escape sequence.
pub escaped_templates: FxHashMap<u32, Option<&'a str>>,

crates/oxc_parser/src/lexer/number.rs Outdated Show resolved Hide resolved
crates/oxc_parser/src/lexer/number.rs Outdated Show resolved Hide resolved
Co-authored-by: Boshen <boshenc@gmail.com>
@DonIsaac DonIsaac enabled auto-merge (squash) May 15, 2024 01:46
@DonIsaac DonIsaac merged commit 508dae6 into main May 15, 2024
29 checks passed
@DonIsaac DonIsaac deleted the don/perf/lexer-dedupe-sep-check branch May 15, 2024 01:48
@overlookmotel
Copy link
Collaborator

@DonIsaac Very nice work!

I didn't get a chance to review this before it was merged (everything happened in the middle of the night in my timezone, and I was asleep). Can I make a request for a follow-on?

The one thing I'd like to pick up on is the use of unsafe. I understand why you've made parse_float_without_underscores_unchecked an unsafe function to draw attention to the precondition. But personally I think we should reserve unsafe for cases where the preconditions MUST be met or it's UB.

In my mind, there's quite a big difference between:

(a) this function may provide nonsense output or panic if you give it the wrong input, and
(b) this function may do basically anything if you give it the wrong input, and it may cause errors or malfunctions in unrelated parts of the codebase (UB).

In the potential-UB cases, the person using the API should take it slow, rigorously check all the preconditions, and document how they can prove the preconditions are satisfied. In my view we want to keep this to an absolute minimum otherwise it's like "crying wolf" - people will get used to calling unsafe functions, and will get blasé and just write // SAFETY: All conditions are satisfied, even for the cases where they really should be being careful.

I would also like to institute a policy where any PR using unsafe can only be merged after a more rigorous than usual code review, by multiple reviewers. Again, we'd need to keep this to a minimum, or it'd be impractical.

So... I've probably explained this in more detail than I needed to (sorry, I have a tendency to talk too much!). But the short version is: If you can understand my logic and agree, would you mind doing a follow-on PR to remove the usage of unsafe here?

@overlookmotel
Copy link
Collaborator

Second thing:

I don't understand the debug_assert! in set_has_separator. Why the addition of || self.kind == Kind::default()?

@overlookmotel
Copy link
Collaborator

overlookmotel commented May 15, 2024

Some side notes:

Lexer benchmarks don't really matter. The lexer is not exposed to user as a stand-alone API, so it's the parser benchmark which actually matter. The lexer benchmark is only there for our internal use to give greater visibility when working on the lexer. The flame graphs on CodSpeed for lexer are much easier to interpret than the parser's.

In the case of this PR, the real gain is the +1% on the parser. In one sense 1% isn't much, but I think it's significant. Multiple incremental gains of 1% add up fast.

I am mystified as to why we're seeing 5% gain on lexer benchmarks here. If anything, you'd expect the lexer to be slightly slower because it's doing a little bit more work, in order to save work later for the parser.

The only thing I can think of is that changing one of the padding bytes in Token from u8 to bool somehow improves codegen. Would you like to replace the other 2 padding bytes with bools too, and see what happens?

Or do you have any idea what's going on here?

@DonIsaac
Copy link
Collaborator Author

@overlookmotel

  1. Yes, I will make a separate PR removing unsafe
  2. That check is needed since self.token is currently under construction and has a kind of Kind::Eof. It won't have a numeric kind until the token kind is set by the same methods consuming set_has_separator
  3. I'm really not sure; these improvements baffled me as well. I think your hypothesis is solid; I can't think of anything else that may affect it. I'll try making _padding2 a [bool; 4] and see what effects that has.

@overlookmotel
Copy link
Collaborator

  1. Yes, I will make a separate PR removing unsafe

Thank you!

  1. That check is needed since self.token is currently under construction and has a kind of Kind::Eof. It won't have a numeric kind until the token kind is set by the same methods consuming set_has_separator

Ah ha that makes sense now.

  1. I'm really not sure; these improvements baffled me as well. I think your hypothesis is solid; I can't think of anything else that may affect it. I'll try making _padding2 a [bool; 4] and see what effects that has.

I don't know if my hypothesis is solid or not! It's very mysterious.

This is all real micro-opt stuff. It only makes a difference because creating tokens is the majority of what the lexer does, so it's the definition of a hot path.

@overlookmotel
Copy link
Collaborator

I think I may have solved the mystery! Please see #3289 (comment)

Brooooooklyn pushed a commit to toeverything/AFFiNE that referenced this pull request May 16, 2024
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [oxlint](https://oxc-project.github.io) ([source](https://togithub.com/oxc-project/oxc/tree/HEAD/npm/oxlint)) | [`0.3.2` -> `0.3.5`](https://renovatebot.com/diffs/npm/oxlint/0.3.2/0.3.5) | [![age](https://developer.mend.io/api/mc/badges/age/npm/oxlint/0.3.5?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/npm/oxlint/0.3.5?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/npm/oxlint/0.3.2/0.3.5?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/oxlint/0.3.2/0.3.5?slim=true)](https://docs.renovatebot.com/merge-confidence/) |

---

### Release Notes

<details>
<summary>oxc-project/oxc (oxlint)</summary>

### [`v0.3.5`](https://togithub.com/oxc-project/oxc/releases/tag/oxlint_v0.3.5): oxlint v0.3.5

[Compare Source](https://togithub.com/oxc-project/oxc/compare/7193d75e9444ae8c2ba757b3bc64459abd0e128a...754d9f4c98aab052cf6b2756f7af12557042708d)

#### What's Changed

-   feat(linter): add use-isnan fixer for (in)equality operations by [@&#8203;DonIsaac](https://togithub.com/DonIsaac) in [oxc-project/oxc#3284
-   feat(linter/eslint): Implement fixer for unicode-bom rule by [@&#8203;jelly](https://togithub.com/jelly) in [oxc-project/oxc#3259
-   fix(linter/no-direct-mutation-state): false positive when class is declared inside a `CallExpression` by [@&#8203;Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3294
-   fix(parser): parse `DecoratorCallExpression` when `Arguments` contains `MemberExpression` by [@&#8203;Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3265
-   perf(ast): inline all `ASTBuilder` methods by [@&#8203;Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3295
-   perf(lexer): dedupe numeric separator check by [@&#8203;DonIsaac](https://togithub.com/DonIsaac) in [oxc-project/oxc#3283
-   perf(linter): rewrite react/require-render-return by [@&#8203;mysteryven](https://togithub.com/mysteryven) in [oxc-project/oxc#3276

#### New Contributors

-   [@&#8203;g-plane](https://togithub.com/g-plane) made their first contribution in [oxc-project/oxc#3268

**Full Changelog**: oxc-project/oxc@oxlint_v0.3.4...oxlint_v0.3.5

### [`v0.3.4`](https://togithub.com/oxc-project/oxc/releases/tag/oxlint_v0.3.4): oxlint v0.3.4

[Compare Source](https://togithub.com/oxc-project/oxc/compare/6149e49ef79a22004e36820c81afcb0c755fcc81...7193d75e9444ae8c2ba757b3bc64459abd0e128a)

#### What's Changed

-   [feat(linter): move react/rules_of_hooks to nursery](https://togithub.com/oxc-project/oxc/commit/6edcae86cda8922ea8f9e5eae91290018e1b1637)
-   feat(linter/eslint): Implement max-classes-per-file by [@&#8203;jelly](https://togithub.com/jelly) in [oxc-project/oxc#3241
-

**Full Changelog**: oxc-project/oxc@oxlint_v0.3.3...oxlint_v0.3.4

***

### From v0.3.3

#### What's Changed

##### Features

-   add `--symlinks` to allow symbolic walking by [@&#8203;Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3244
-   add `--format github` for github check annotation by [@&#8203;Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3191
-   change the category of all react-perf rules to perf by [@&#8203;Dunqing](https://togithub.com/Dunqing) in [oxc-project/oxc#3243
-   remove deprecated eslint v9 rules `no-return-await` and `no-mixed-operators` by [@&#8203;Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3188
-   move prefer-node-protocol to restriction by [@&#8203;Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3171

##### New Rules

-   react/rules-of-hooks by [@&#8203;rzvxa](https://togithub.com/rzvxa) in [oxc-project/oxc#3071
-   eslint/radix by [@&#8203;KubaJastrz](https://togithub.com/KubaJastrz) in [oxc-project/oxc#3167
-   eslint/no-new-native-nonconstructor by [@&#8203;Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3187
-   eslint/unicode-bom by [@&#8203;jelly](https://togithub.com/jelly) in [oxc-project/oxc#3239
-   eslint/no-empty-function rule by [@&#8203;jelly](https://togithub.com/jelly) in [oxc-project/oxc#3181
-   eslint-plugin-next/no-duplicate-head by [@&#8203;Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3174
-   eslint-plugin-next/no-page-custom-font by [@&#8203;Dunqing](https://togithub.com/Dunqing) in [oxc-project/oxc#3185
-   eslint-plugin-next/no-styled-jsx-in-document by [@&#8203;Dunqing](https://togithub.com/Dunqing) in [oxc-project/oxc#3184
-   unicorn/no-anonymous-default-export by [@&#8203;1zumii](https://togithub.com/1zumii) in [oxc-project/oxc#3220

##### Bug Fixes

-   improve `prefer-string-starts-ends-with` rule by [@&#8203;camc314](https://togithub.com/camc314) in [oxc-project/oxc#3176
-   import/export: improve multiple exports error message by [@&#8203;Dunqing](https://togithub.com/Dunqing) in [oxc-project/oxc#3160
-   import/named: handle `import { default as foo }` by [@&#8203;Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3255
-   shorten eslint/eqeqeq rule error message's span by [@&#8203;mysteryven](https://togithub.com/mysteryven) in [oxc-project/oxc#3193
-   fix(parser): correctly parse cls.fn<C> = x by [@&#8203;Dunqing](https://togithub.com/Dunqing) in [oxc-project/oxc#3208

#### New Contributors

-   [@&#8203;KubaJastrz](https://togithub.com/KubaJastrz) made their first contribution in [oxc-project/oxc#3167
-   [@&#8203;1zumii](https://togithub.com/1zumii) made their first contribution in [oxc-project/oxc#3220

**Full Changelog**: oxc-project/oxc@oxlint_v0.3.2...oxlint_v0.3.3

### [`v0.3.3`](https://togithub.com/oxc-project/oxc/releases/tag/oxlint_v0.3.3): oxlint v0.3.3

[Compare Source](https://togithub.com/oxc-project/oxc/compare/a7940868c6e66d16814ebef5c8dbbfd9b948a0cd...6149e49ef79a22004e36820c81afcb0c755fcc81)

#### What's Changed

##### Features

-   add `--symlinks` to allow symbolic walking by [@&#8203;Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3244
-   add `--format github` for github check annotation by [@&#8203;Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3191
-   change the category of all react-perf rules to perf by [@&#8203;Dunqing](https://togithub.com/Dunqing) in [oxc-project/oxc#3243
-   remove deprecated eslint v9 rules `no-return-await` and `no-mixed-operators` by [@&#8203;Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3188
-   move prefer-node-protocol to restriction by [@&#8203;Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3171

##### New Rules

-   react/rules-of-hooks by [@&#8203;rzvxa](https://togithub.com/rzvxa) in [oxc-project/oxc#3071
-   eslint/radix by [@&#8203;KubaJastrz](https://togithub.com/KubaJastrz) in [oxc-project/oxc#3167
-   eslint/no-new-native-nonconstructor by [@&#8203;Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3187
-   eslint/unicode-bom by [@&#8203;jelly](https://togithub.com/jelly) in [oxc-project/oxc#3239
-   eslint/no-empty-function rule by [@&#8203;jelly](https://togithub.com/jelly) in [oxc-project/oxc#3181
-   eslint-plugin-next/no-duplicate-head by [@&#8203;Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3174
-   eslint-plugin-next/no-page-custom-font by [@&#8203;Dunqing](https://togithub.com/Dunqing) in [oxc-project/oxc#3185
-   eslint-plugin-next/no-styled-jsx-in-document by [@&#8203;Dunqing](https://togithub.com/Dunqing) in [oxc-project/oxc#3184
-   unicorn/no-anonymous-default-export by [@&#8203;1zumii](https://togithub.com/1zumii) in [oxc-project/oxc#3220

##### Bug Fixes

-   improve `prefer-string-starts-ends-with` rule by [@&#8203;camc314](https://togithub.com/camc314) in [oxc-project/oxc#3176
-   import/export: improve multiple exports error message by [@&#8203;Dunqing](https://togithub.com/Dunqing) in [oxc-project/oxc#3160
-   import/named: handle `import { default as foo }` by [@&#8203;Boshen](https://togithub.com/Boshen) in [oxc-project/oxc#3255
-   shorten eslint/eqeqeq rule error message's span by [@&#8203;mysteryven](https://togithub.com/mysteryven) in [oxc-project/oxc#3193
-   fix(parser): correctly parse cls.fn<C> = x by [@&#8203;Dunqing](https://togithub.com/Dunqing) in [oxc-project/oxc#3208

#### New Contributors

-   [@&#8203;KubaJastrz](https://togithub.com/KubaJastrz) made their first contribution in [oxc-project/oxc#3167
-   [@&#8203;1zumii](https://togithub.com/1zumii) made their first contribution in [oxc-project/oxc#3220

**Full Changelog**: oxc-project/oxc@oxlint_v0.3.2...oxlint_v0.3.3

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/toeverything/AFFiNE).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zNTEuMiIsInVwZGF0ZWRJblZlciI6IjM3LjM2My41IiwidGFyZ2V0QnJhbmNoIjoiY2FuYXJ5IiwibGFiZWxzIjpbImRlcGVuZGVuY2llcyJdfQ==-->
Boshen pushed a commit that referenced this pull request May 17, 2024
## What This PR Does

- perf(lexer): use bit shifting when parsing hex, octal, and binary
integers instead of `mul_add`-ing on `f64`s. Check out the difference in
assembly generated [here](https://godbolt.org/z/zMEKaeYzh)
- perf(lexer): skip redundant utf8 check when parsing BigInts
- refactor(lexer): remove `unsafe` usage (as per @overlookmotel's
request
[here](#3283 (comment)))
- test(lexer): add numeric parsing unit tests

I don't expect this PR to have a large performance improvement, since
the most common case (`Kind::Decimal`) is not affected. We could do
this, however, by splitting `Kind::Decimal` into `Kind::DecimalFloat`
and `Kind::DecimalInt` when the lexer encounters a `.`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants