-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Optimize std::str::Chars::next
and std::str::Chars::next_back
#142038
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
ping @scottmcm ? |
So you know, you can make diff views in godbolt: https://godbolt.org/z/Thn1bf9qG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general structure here does make sense to me, but overall I feel like it removed a bunch of helpers and constants unnecessarily. Not having utf8_first_byte
, sure, but this ends up repeating the X << 6 | (Y & 0x3F)
in a bunch of places, so keeping the utf8_acc_cont_byte
to do that would make sense to me. The standard library is always compiled with optimizations, and the MIR inliner will inline it, so there's no reason to avoid the function call. Having the u32::from
in there would also make the two functions more similar, since now the forward one is using as u32
in a different line instead with no obvious reason whey they should differ.
There are only 0x10FFFF possible codepoints, so we can exhaustively test all of them.
By reordering some operations, we can expose some opportunites for CSE. Also convert the series of nested `if` branches to early return, which IMO makes the code clearer. Comparison of assembly before and after for `next_code_point`: https://godbolt.org/z/9Te84YzhK Comparison of assembly before and after for `next_code_point_reverse`: https://godbolt.org/z/fTx1a7oz1
26b614c
to
54a699b
Compare
I could not get LLVM to produce the |
r? @scottmcm |
Requested reviewer is already assigned to this pull request. Please choose another assignee. |
Before/after for
next
: https://godbolt.org/z/9Te84YzhKBefore/after for
next_back
: https://godbolt.org/z/fTx1a7oz1std::sys_common::wtf8::Wtf8CodePoints
will also benefit from this, since it uses the samenext_code_point
andnext_code_point_reverse
functions internally.I also added tests for all codepoints in the range
0..=char::MAX
(including surrogats that can only appear in WTF-8), so the new implementations have been exhaustively tested