-
-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(parser): macro for ASCII byte handlers #2066
Conversation
NB: This approach is also slightly better in terms of performance. It produces a small speed-up (~0.3%) as the assertions now cover more cases, so compiler can optimize a few other bits of code. |
CodSpeed Performance ReportMerging #2066 will not alter performanceComparing Summary
|
I also tried another approach, replacing But for reasons I don't understand, this didn't work at all. It did result in |
Neat! FYI the code used to be |
Here's the original source: https://github.com/ratel-rust/ratel-core/blob/e55a1310ba69a3f5ce2a9a6eef643feced02ac08/ratel/src/lexer/mod.rs#L60 You may find other optimization techniques in there, but I didn't delve deeper. |
Great! Glad we found a compromise where we're both happy with the readability/safety balance.
Is the maintainer of Ratel involved in OXC at all? Presumably he chose the |
Ratel predates oxc by 8 years, rslint 4 years and biome 3 years. oxc is only 1 year old in the open. The later project copied some coded from the predecessor. (FYI I never looked at swc's code while implementing oxc)
Unfortunately he stopped working in public by the looks of his github profile. But he has many really good crates that you can learn from, e.g. the I think he sup |
Thanks for the notes. I've been looking at Ratel, and there may be some other optimizations we can port across, though they do make the code a bit more C-style. |
As discussed on oxc-project#2046, it wasn't ideal to have `unsafe { lexer.consume_ascii_char() }` in every byte handler. It also wasn't great to have a safe function `consume_ascii_char()` which could cause UB if called incorrectly (so wasn't really safe at all). This PR achieves the same objective of oxc-project#2046, but using a macro to define byte handlers for ASCII chars, which builds in the assertion that next char is guaranteed to be ASCII. Before oxc-project#2046: ```rs const SPS: ByteHandler = |lexer| { lexer.consume_char(); Kind::WhiteSpace }; ``` After this PR: ```rs ascii_byte_handler!(SPS(lexer) { lexer.consume_char(); Kind::WhiteSpace }); ``` i.e. The body of the handlers are unchanged from how they were before oxc-project#2046. This expands to: ```rs const SPS: ByteHandler = |lexer| { unsafe { let s = lexer.current.chars.as_str(); assert_unchecked!(!s.is_empty()); assert_unchecked!(s.as_bytes()[0] < 128); } lexer.consume_char(); Kind::WhiteSpace }; ``` But due to the assertions the macro inserts, `consume_char()` is now optimized for ASCII characters, and reduces to a single instruction. So the `consume_ascii_char()` function introduced by oxc-project#2046 is unnecessary, and can be removed again. The "boundary of unsafe" is moved to a new function `handle_byte()` which `read_next_token()` calls. `read_next_token()` is responsible for upholding the safety invariants, which include ensuring that `ascii_byte_handler!()` macro is not being misused (that last part is strictly speaking a bit of a cheat, but...). I am not a fan of macros, as they're not great for readability. But in this case I don't think it's *too* bad, because: 1. The macro is well-documented. 2. It's not too clever (only one syntax is accepted). 3. It's used repetitively in a clear pattern, and once you've understood one, you understand them all. What do you think? Does this strike a reasonable balance between readability and safety?
As discussed on #2046, it wasn't ideal to have
unsafe { lexer.consume_ascii_char() }
in every byte handler. It also wasn't great to have a safe functionconsume_ascii_char()
which could cause UB if called incorrectly (so wasn't really safe at all).This PR achieves the same objective of #2046, but using a macro to define byte handlers for ASCII chars, which builds in the assertion that next char is guaranteed to be ASCII.
Before #2046:
After this PR:
i.e. The body of the handlers are unchanged from how they were before #2046.
This expands to:
But due to the assertions the macro inserts,
consume_char()
is now optimized for ASCII characters, and reduces to a single instruction. So theconsume_ascii_char()
function introduced by #2046 is unnecessary, and can be removed again.The "boundary of unsafe" is moved to a new function
handle_byte()
whichread_next_token()
calls.read_next_token()
is responsible for upholding the safety invariants, which include ensuring thatascii_byte_handler!()
macro is not being misused (that last part is strictly speaking a bit of a cheat, but...).I am not a fan of macros, as they're not great for readability. But in this case I don't think it's too bad, because:
What do you think? Does this strike a reasonable balance between readability and safety?