Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add trailing_zeros and leading_zeros to non zero types #79114

Merged
merged 1 commit into from
Nov 18, 2020

Conversation

andjo403
Copy link
Contributor

as a way towards being able to use the optimized intrinsics ctlz_nonzero and cttz_nonzero from stable.

have not crated any tracking issue if this is not a solution that is wanted

@rust-highfive
Copy link
Collaborator

r? @m-ou-se

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 16, 2020
@scottmcm
Copy link
Member

I'm a fan 👍 This is important for avoiding dead ASM for the default x64 target, but can even improve the generated code on newer target-cpus too because of the tighter value range. An example: #70835 (comment)

Copy link
Member

@m-ou-se m-ou-se left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good addition!

Maybe it's good to explain the existence of these functions in the doc comments? Maybe something like On many architectures, this function can perform better than `leading_zeros()` on the underlying integer type, as special handling of zero can be avoided., or something in that direction.

library/core/src/num/nonzero.rs Outdated Show resolved Hide resolved
library/core/src/num/nonzero.rs Outdated Show resolved Hide resolved
library/core/tests/nonzero.rs Outdated Show resolved Hide resolved
@m-ou-se m-ou-se added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Nov 17, 2020
@andjo403 andjo403 force-pushed the nonzero_leading_trailing_zeros branch from 7cda148 to 02a6aad Compare November 17, 2020 18:24
@andjo403
Copy link
Contributor Author

thanks for the comments have addressed them and pushed up the changes now

@andjo403 andjo403 force-pushed the nonzero_leading_trailing_zeros branch from 02a6aad to e8864a0 Compare November 17, 2020 18:29
Copy link
Member

@m-ou-se m-ou-se left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! One more small comment:

library/core/src/num/nonzero.rs Outdated Show resolved Hide resolved
@andjo403 andjo403 force-pushed the nonzero_leading_trailing_zeros branch from e8864a0 to 9bbc4c1 Compare November 17, 2020 18:55
@andjo403
Copy link
Contributor Author

thanks again and fixed

@m-ou-se
Copy link
Member

m-ou-se commented Nov 17, 2020

Thanks!

@bors r+ rollup=always

@bors
Copy link
Contributor

bors commented Nov 17, 2020

📌 Commit 9bbc4c1 has been approved by m-ou-se

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 17, 2020
m-ou-se added a commit to m-ou-se/rust that referenced this pull request Nov 18, 2020
…eros, r=m-ou-se

add trailing_zeros and leading_zeros to non zero types

as a way towards being able to use the optimized intrinsics ctlz_nonzero and cttz_nonzero from stable.

have not crated any tracking issue if this is not a solution that is wanted
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 18, 2020
Rollup of 11 pull requests

Successful merges:

 - rust-lang#78361 (Updated the list of white-listed target features for x86)
 - rust-lang#78785 (linux: try to use libc getrandom to allow interposition)
 - rust-lang#78999 (stability: More precise location for deprecation lint on macros)
 - rust-lang#79039 (Tighten the bounds on atomic Ordering in std::sys::unix::weak::Weak)
 - rust-lang#79079 (Turn top-level comments into module docs in MIR visitor)
 - rust-lang#79114 (add trailing_zeros and leading_zeros to non zero types)
 - rust-lang#79131 (Enable AVX512 *epi64 variants by updating stdarch)
 - rust-lang#79133 (bootstrap: use the same version number for rustc and cargo)
 - rust-lang#79145 (Fix handling of panic calls)
 - rust-lang#79151 (Fix typo in `std::io::Write` docs)
 - rust-lang#79158 (type is too big -> values of the type are too big)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 126d88b into rust-lang:master Nov 18, 2020
@rustbot rustbot added this to the 1.50.0 milestone Nov 18, 2020
@andjo403 andjo403 deleted the nonzero_leading_trailing_zeros branch November 18, 2020 20:39
@leonardo-m
Copy link

leonardo-m commented Nov 19, 2020

I've tried this in my codebase, looking at the generated asm, and I've seen that in 100% of the cases, thanks sometimes to inlining, LLVM was able to infer that the input isn't zero, so no performance change has happened.

@m-ou-se
Copy link
Member

m-ou-se commented Nov 19, 2020

In cases where the non-zero value are is created somewhat close to the place where leading_zeros/trailing_zeros is applied, it will definitely optimize this nicely. But just using NonZeroI32 by itself without the context that creates the value, doesn't result in this optimization (yet?): https://godbolt.org/z/bEoPE9

@scottmcm
Copy link
Member

@leonardo-m As another example, note that LLVM is currently not capable of optimizing this even when there's an explicit check for zero in the code before doing the uN::leading_zeros: https://rust.godbolt.org/z/4EhnK4

@leonardo-m
Copy link

But just using NonZeroI32 by itself without the context that creates the value, doesn't result in this optimization (yet?): https://godbolt.org/z/bEoPE9

Using "-C opt-level=2 -C target-cpu=native" it seems to optimize it well.

@leonardo-m
Copy link

As another example, note that LLVM is currently not capable of optimizing this even when there's an explicit check for zero in the code before doing the uN::leading_zeros: https://rust.godbolt.org/z/4EhnK4

With "-C opt-level=3 -C target-cpu=native -Z mir-opt-level=3" it seems to optimize well.

@m-ou-se
Copy link
Member

m-ou-se commented Nov 21, 2020

@leonardo-m With that option it makes use of an instruction that does not need a special case for zero. It still doesn't make use of the fact that NonZero* cannot be zero.

@leonardo-m
Copy link

Oh, fun, thank you. It's still the same LLVM limit, I guess:
#54868

@scottmcm
Copy link
Member

With "-C opt-level=3 -C target-cpu=native -Z mir-opt-level=3" it seems to optimize well.

native there isn't doing anything different from haswell in my example -- the choice was those two was just to pick up instruction sets with and without BMI1.

Very interesting that mir-opts affect this, though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants