Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Improve `char::is_ascii_*` codegen #67585
This PR is an attempt to fix #65127
A couple of warnings:
An alternative approach to remove the code duplication might be the use of macros, but currently most of the duplication is actually in the doc comments, so maybe just keeping the redundancy could be ok
Some benchmark numbers:
Please do not take these measurement as absolute as they suffer from 2 limitations:
These methods explicitly check if a char is in a specific ASCII range, therefore the `is_ascii()` check is not needed, but LLVM seems to be unable to remove it. WARNING: this change improves the performance on ASCII `char`s, but complex checks such as `is_ascii_punctuation` become slower on non-ASCII `char`s.
There is a way to improve performance even further, at the cost of uglier code. If you need to check both uppercase and lowercase, you can clear bit 6 (0x20) of the character code to convert to uppercase and then do a single range check on the uppercase value.
So for example,
…r=Amanieu Improve `char::is_ascii_*` codegen This PR is an attempt to fix rust-lang#65127 A couple of warnings: 1. the generated code might be further improved (in LLVM and/or MIR) by emitting better comparison sequences; in particular, this would improve the performance of "complex" checks such as those in `is_ascii_punctuation` 2. the second commit is currently marked "DO NOT MERGE", because it regresses SIMD on `u8` slices; this could likely be fixed by improving the computation/usage of demanded bits in LLVM An alternative approach to remove the code duplication might be the use of macros, but currently most of the duplication is actually in the doc comments, so maybe just keeping the redundancy could be ok
Rollup of 8 pull requests Successful merges: - #67585 (Improve `char::is_ascii_*` codegen) - #68914 (Speed up `SipHasher128`.) - #68994 (rustbuild: include channel in sanitizers installed name) - #69032 (ICE in nightly-2020-02-08: handle TerminatorKind::Yield in librustc_mir::transform::promote_consts::Validator method) - #69034 (parser: Remove `Parser::prev_token_kind`) - #69042 (Remove backtrace header text) - #69059 (Remove a few unused objects) - #69089 (Properly use the darwin archive format on Apple targets) Failed merges: r? @ghost