Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Optimize Ord trait implementation for bool #66881
Casting the booleans to
example::boolean_cmp: mov ecx, edi xor ecx, esi test esi, esi mov eax, 255 cmove eax, ecx test edi, edi cmovne eax, ecx ret
example::boolean_cmp: mov eax, edi sub al, sil ret
Old LLVM-MCA statistics:
New LLVM-MCA statistics:
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @sfackler (or someone else) soon.
If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.
Please see the contribution instructions for more information.
Click to expand the log.
I am using bool's implementation of Ord at several places in my code when matching some_new_boolean_state.cmp(&some_old_boolean_state); but also I'm sure this comparison must take place many times when (as mentioned above) sorting, or when modifying a BTreeMap or BinaryHeap.
That said if this doesn't seem worth the hassle, I'll just use a custom bool_cmp in all relevant places in my code, and that'll work for me.
…-ord-optimization, r=sfackler Optimize Ord trait implementation for bool Casting the booleans to `i8`s and converting their difference into `Ordering` generates better assembly than casting them to `u8`s and comparing them. Fixes rust-lang#66780 #### Comparison([Godbolt link](https://rust.godbolt.org/z/PjBpvF)) ##### Old assembly: ```asm example::boolean_cmp: mov ecx, edi xor ecx, esi test esi, esi mov eax, 255 cmove eax, ecx test edi, edi cmovne eax, ecx ret ``` ##### New assembly: ```asm example::boolean_cmp: mov eax, edi sub al, sil ret ``` ##### Old LLVM-MCA statistics: ``` Iterations: 100 Instructions: 800 Total Cycles: 234 Total uOps: 1000 Dispatch Width: 6 uOps Per Cycle: 4.27 IPC: 3.42 Block RThroughput: 1.7 ``` ##### New LLVM-MCA statistics: ``` Iterations: 100 Instructions: 300 Total Cycles: 110 Total uOps: 500 Dispatch Width: 6 uOps Per Cycle: 4.55 IPC: 2.73 Block RThroughput: 1.0 ```
Rollup of 6 pull requests Successful merges: - #66881 (Optimize Ord trait implementation for bool) - #67015 (Fix constant propagation for scalar pairs) - #67074 (Add options to --extern flag.) - #67164 (Ensure that panicking in constants eventually errors) - #67174 (Remove `checked_add` in `Layout::repeat`) - #67205 (Make `publish_toolstate.sh` executable) Failed merges: r? @ghost