Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Ord trait implementation for bool #66881

Merged

Conversation

@krishna-veerareddy
Copy link

krishna-veerareddy commented Nov 29, 2019

Casting the booleans to i8s and converting their difference into Ordering generates better assembly than casting them to u8s and comparing them.

Fixes #66780

Comparison(Godbolt link)

Old assembly:
example::boolean_cmp:
        mov     ecx, edi
        xor     ecx, esi
        test    esi, esi
        mov     eax, 255
        cmove   eax, ecx
        test    edi, edi
        cmovne  eax, ecx
        ret
New assembly:
example::boolean_cmp:
        mov     eax, edi
        sub     al, sil
        ret
Old LLVM-MCA statistics:
Iterations:        100
Instructions:      800
Total Cycles:      234
Total uOps:        1000

Dispatch Width:    6
uOps Per Cycle:    4.27
IPC:               3.42
Block RThroughput: 1.7
New LLVM-MCA statistics:
Iterations:        100
Instructions:      300
Total Cycles:      110
Total uOps:        500

Dispatch Width:    6
uOps Per Cycle:    4.55
IPC:               2.73
Block RThroughput: 1.0
Casting the booleans to `i8`s and converting their difference
into `Ordering` generates better assembly than casting them to
`u8`s and comparing them.
@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Nov 29, 2019

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @sfackler (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented Nov 29, 2019

In what context is the performance of bool's Ord implementation important?

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Nov 29, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
2019-11-29T22:36:03.0272189Z ##[command]git remote add origin https://github.com/rust-lang/rust
2019-11-29T22:36:03.0455235Z ##[command]git config gc.auto 0
2019-11-29T22:36:03.0543804Z ##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
2019-11-29T22:36:03.0613388Z ##[command]git config --get-all http.proxy
2019-11-29T22:36:03.0743604Z ##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/66881/merge:refs/remotes/pull/66881/merge
---
2019-11-29T22:41:28.4914510Z    Compiling serde_json v1.0.40
2019-11-29T22:41:29.9993582Z    Compiling tidy v0.1.0 (/checkout/src/tools/tidy)
2019-11-29T22:41:40.2863524Z     Finished release [optimized] target(s) in 1m 21s
2019-11-29T22:41:40.2973192Z tidy check
2019-11-29T22:41:41.1372764Z tidy error: /checkout/src/libcore/cmp.rs:1137: undocumented unsafe
2019-11-29T22:41:42.7887837Z some tidy checks failed
2019-11-29T22:41:42.7888011Z Found 486 error codes
2019-11-29T22:41:42.7888051Z Found 0 error codes with no tests
2019-11-29T22:41:42.7888090Z Done!
2019-11-29T22:41:42.7888090Z Done!
2019-11-29T22:41:42.7888113Z 
2019-11-29T22:41:42.7888134Z 
2019-11-29T22:41:42.7891452Z command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor"
2019-11-29T22:41:42.7891578Z 
2019-11-29T22:41:42.7891625Z 
2019-11-29T22:41:42.7896350Z failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
2019-11-29T22:41:42.7896803Z Build completed unsuccessfully in 0:01:24
2019-11-29T22:41:42.7896803Z Build completed unsuccessfully in 0:01:24
2019-11-29T22:41:42.7959975Z == clock drift check ==
2019-11-29T22:41:42.7965124Z   local time: Fri Nov 29 22:41:42 UTC 2019
2019-11-29T22:41:42.8806777Z   network time: Fri, 29 Nov 2019 22:41:42 GMT
2019-11-29T22:41:42.8811448Z == end clock drift check ==
2019-11-29T22:41:44.2093575Z 
2019-11-29T22:41:44.2191949Z ##[error]Bash exited with code '1'.
2019-11-29T22:41:44.2216266Z ##[section]Starting: Checkout
2019-11-29T22:41:44.2217676Z ==============================================================================
2019-11-29T22:41:44.2217719Z Task         : Get sources
2019-11-29T22:41:44.2217771Z Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@krishna-veerareddy

This comment has been minimized.

Copy link
Author

krishna-veerareddy commented Nov 29, 2019

@sfackler You are right. I don't think the performance of bool's Ord implementation matters too much and is a micro-optimization so I am okay with closing the PR unless @carado thinks otherwise.

@moshg

This comment has been minimized.

Copy link

moshg commented Nov 30, 2019

FWIW using slice::sort_by_key where key is bool, we can split a slice into the part where a predicate holds and the other part it doesn't hold, though sort_by_key is not ideal API for this purpose.
And I'm curious what is mentioned in #64082 (comment).

@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented Dec 2, 2019

If you're concerned about performance when partitioning a list, it seems like the first step would be to not use a comparison based sort at all in favor of a single pass shuffle.

@carado

This comment has been minimized.

Copy link

carado commented Dec 2, 2019

I am using bool's implementation of Ord at several places in my code when matching some_new_boolean_state.cmp(&some_old_boolean_state); but also I'm sure this comparison must take place many times when (as mentioned above) sorting, or when modifying a BTreeMap or BinaryHeap.
In those cases the impl Ord for bool matters in places where you're sorting on a struct which has multiple fields, some of which are bool and some aren't.

That said if this doesn't seem worth the hassle, I'll just use a custom bool_cmp in all relevant places in my code, and that'll work for me.

Krishna Sai Veera Reddy
-1 => Less,
0 => Equal,
1 => Greater,
// SAFETY: Unreachable code

This comment has been minimized.

Copy link
@sfackler

sfackler Dec 10, 2019

Member

This doesn't seem like it's really doing anything. The unsafe code is calling unreachable_unchecked - of course it's unreachable code!

This comment has been minimized.

Copy link
@krishna-veerareddy

krishna-veerareddy Dec 10, 2019

Author

I had to document the unsafe block using the safety comment for the PR checks to pass. I can add a better comment but that's the reason for that comment. Specifically the tidy check fails with an undocumented unsafe.

This comment has been minimized.

Copy link
@sfackler

sfackler Dec 10, 2019

Member

If we're going to require documentation for unsafe blocks, it may as well be useful documentation. e.g. "SAFETY: bool as i8 returns 0 or 1, so the subtraction can't end up with anything else."

@krishna-veerareddy krishna-veerareddy force-pushed the krishna-veerareddy:issue-66780-bool-ord-optimization branch from c033780 to 1f07aa5 Dec 10, 2019
@krishna-veerareddy krishna-veerareddy requested a review from sfackler Dec 10, 2019
@sfackler

This comment has been minimized.

Copy link
Member

sfackler commented Dec 11, 2019

@bors r+

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Dec 11, 2019

📌 Commit 1f07aa5 has been approved by sfackler

Centril added a commit to Centril/rust that referenced this pull request Dec 11, 2019
…-ord-optimization, r=sfackler

Optimize Ord trait implementation for bool

Casting the booleans to `i8`s and converting their difference into `Ordering` generates better assembly than casting them to `u8`s and comparing them.

Fixes rust-lang#66780

#### Comparison([Godbolt link](https://rust.godbolt.org/z/PjBpvF))

##### Old assembly:
```asm
example::boolean_cmp:
        mov     ecx, edi
        xor     ecx, esi
        test    esi, esi
        mov     eax, 255
        cmove   eax, ecx
        test    edi, edi
        cmovne  eax, ecx
        ret
```

##### New assembly:
```asm
example::boolean_cmp:
        mov     eax, edi
        sub     al, sil
        ret
```

##### Old LLVM-MCA statistics:
```
Iterations:        100
Instructions:      800
Total Cycles:      234
Total uOps:        1000

Dispatch Width:    6
uOps Per Cycle:    4.27
IPC:               3.42
Block RThroughput: 1.7
```

##### New LLVM-MCA statistics:
```
Iterations:        100
Instructions:      300
Total Cycles:      110
Total uOps:        500

Dispatch Width:    6
uOps Per Cycle:    4.55
IPC:               2.73
Block RThroughput: 1.0
```
bors added a commit that referenced this pull request Dec 11, 2019
Rollup of 6 pull requests

Successful merges:

 - #66881 (Optimize Ord trait implementation for bool)
 - #67015 (Fix constant propagation for scalar pairs)
 - #67074 (Add options to --extern flag.)
 - #67164 (Ensure that panicking in constants eventually errors)
 - #67174 (Remove `checked_add` in `Layout::repeat`)
 - #67205 (Make `publish_toolstate.sh` executable)

Failed merges:

r? @ghost
@bors bors merged commit 1f07aa5 into rust-lang:master Dec 11, 2019
4 checks passed
4 checks passed
pr #20191210.41 succeeded
Details
pr (Linux mingw-check) Linux mingw-check succeeded
Details
pr (Linux x86_64-gnu-llvm-7) Linux x86_64-gnu-llvm-7 succeeded
Details
pr (Linux x86_64-gnu-tools) Linux x86_64-gnu-tools succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.