Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new intrinsic is_constant and optimize pow #114390

Closed
wants to merge 11 commits into from

Conversation

Centri3
Copy link
Member

@Centri3 Centri3 commented Aug 2, 2023

This was suggested in #114254. This basically lets LLVM know that it can evaluate the next few statements at compile-time, and thus can completely remove them. This allows us to select a path at compile-time, and with optimizations on, the other will be entirely removed. This will likely be a bit slower in debug mode but this should be ok.

I've done this for the integer's pow methods as was done in #114254 and it works great!

We should probably restrict the types that can be passed to it, though, llvm.is.constant can be passed any type but some ICE. (So far, I know that all ZSTs cause an ICE)

cc @rust-lang/lang; I highly doubt this shouldn't be insta-const, but I think discussion should be warranted anyway.

@rustbot
Copy link
Collaborator

rustbot commented Aug 2, 2023

r? @Mark-Simulacrum

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Aug 2, 2023
@rustbot
Copy link
Collaborator

rustbot commented Aug 2, 2023

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

@oli-obk
Copy link
Contributor

oli-obk commented Aug 2, 2023

I highly doubt this shouldn't be insta-const, but I think discussion should be warranted anyway.

Make it insta-const and just always return false during const eval. It doesn't matter at const eval time 😅

@rust-log-analyzer

This comment has been minimized.

@oli-obk
Copy link
Contributor

oli-obk commented Aug 2, 2023

This basically lets LLVM know that it can evaluate the next few statements at compile-time, and thus can completely remove them.

That's not really it. This let's llvm know that we have a code path it will be able to optimize better if the arg is const, than the code path taken if the arg is not const.

This is a hint, similar to how we have size_of checks in generic code to optimize zst code paths.

@Centri3
Copy link
Member Author

Centri3 commented Aug 2, 2023

That's not really it. This let's llvm know that we have a code path it will be able to optimize better if the arg is const, than the code path taken if the arg is not const.

Yeah that's true, I was referring to the behavior though. In practice it'll do basically that, folding the next statements to a constant then removing the branch altogether

@rustbot
Copy link
Collaborator

rustbot commented Aug 2, 2023

The Miri subtree was changed

cc @rust-lang/miri

@rustbot
Copy link
Collaborator

rustbot commented Aug 2, 2023

Some changes occurred in src/tools/cargo

cc @ehuss

@Centri3
Copy link
Member Author

Centri3 commented Aug 2, 2023

?? How'd that happen?

@rust-log-analyzer

This comment has been minimized.

src/tools/miri/src/shims/intrinsics/mod.rs Outdated Show resolved Hide resolved
src/tools/miri/src/shims/intrinsics/mod.rs Outdated Show resolved Hide resolved
@RalfJung
Copy link
Member

RalfJung commented Aug 3, 2023

I'm not sure if this is correct. I cannot test MIRI locally as there are tons of compile errors, so maybe somebody else can verify it?

./x.py build --stage 0 miri should always work. If it doesn't, something is wrong either with your PR or your setup. test should also work but you'll have to add a testcase first, of course. If that raises strange error, find build/ -name "miri*" | xargs rm -rf helps (but this can only really happen if you've tested Miri before so is unlikely to be your problem).

@RalfJung
Copy link
Member

RalfJung commented Aug 3, 2023

You still have a cargo submodule update for cargo in your branch, please remove that. Looks like it is in an isolated commit so a git rebase --interactive origin/master should let you remove that one commit.

Not sure how that ended up in your branch, my guess would be a rebase got wrong. Please never rebase unless necessary.

src/tools/cargo Outdated Show resolved Hide resolved
Comment on lines +37 to +38
let mut saw_false = false;

Copy link
Member

@RalfJung RalfJung Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut saw_false = false;
let mut saw_false = false;

just a nit to make it visually more clear that these are all together testing one thing

EDIT: hm I tried to remove the empty line, not sure if github supports that...

@Centri3
Copy link
Member Author

Centri3 commented Sep 5, 2023

PR CI uses an older LLVM and has debug assertions enabled. Those could be causing differences.

I just ran it with rust.debug-assertions set to true yet no dice. I'll try to run with CI's version of LLVM and if that doesn't work, I'll open a zulip thread.

@davidtwco davidtwco assigned oli-obk and unassigned davidtwco Sep 6, 2023
@oli-obk
Copy link
Contributor

oli-obk commented Sep 7, 2023

You can use // min-llvm-version: 16 to ensure the tests are only run on the builders with llvm 16.

It's ok for opts to differ between llvm versions, so if older llvm ends up returning false from is_constant in cases that later llvm knows to be true, that's ok

@bors
Copy link
Contributor

bors commented Sep 22, 2023

☔ The latest upstream changes (presumably #114776) made this pull request unmergeable. Please resolve the merge conflicts.

@Dylan-DPC
Copy link
Member

@Centri3 any updates on this?

@Centri3
Copy link
Member Author

Centri3 commented Nov 14, 2023

No, I can add that comment and rebase but I won't have the time to debug it if it doesn't work

@bors
Copy link
Contributor

bors commented Nov 18, 2023

☔ The latest upstream changes (presumably #117525) made this pull request unmergeable. Please resolve the merge conflicts.

@NCGThompson
Copy link
Contributor

This could be useful for integer division too. While it is generally best to leave these kinds of optimizations to the compiler, the current version of LLVM produces sub-optimal division code in some cases. When the integer width is two machine words (u128 on 64-bit machines), LLVM only does the divide by constant optimization if the odd factor of the divisor can exactly divide 2bits - 1. Otherwise, it calls a function. When the integer width is four machine words (u128 on 32-bit machines), LLVM divides using subtraction, even when the divisor is a constant.

@RalfJung
Copy link
Member

Are there LLVM issues tracking these missed optimizations?

@NCGThompson
Copy link
Contributor

Are there LLVM issues tracking these missed optimizations?

There are a few related issues. While these are all open as of writing, some seem to be partly out of date:

In Rust:

Many use 10 or 5 as an example. Since rustc 1.70 (corresponding to llvm-16), dividing by 10 and 5 is optimized when the dividend is 2 machine words. The issues seem to all be about dividing two machine words, while there are no issues specific to more than two words. @RalfJung, If you are thinking about posting an issue yourself, consider emphasizing 4 word division.

Here is a Compiler Explorer link to Rust code showing division by small constants. It is currently set to i686. To change the target back to the default, go to Overrides > Target architecture, and select the blank option at the top of the drop down menu. Here is a link to edit the llvm IR. Note the differences between versions 15 and 16.

I am not certain, but I have a hunch that the difference between divisors that are optimized for 2 words and divisors that aren't has to do with the divisibility of 2n - 1. I think so because of this paragraph on page 1 of T. Granlund, P. Montmery (1994):

Multiple authors [2, 11, 15] present al-
gorithms for division by constants, but only when the
divisor divides 2k − 1 for some small k. Magenheimer
et al [16, §7] give the foundation of a more general
approach, which Alverson [1] implements on the Tera
Computing System. Compiler writers are only begin-
ning to become aware of the general technique.

For reference:

  • 232 - 1 = 3 * 5 * 17 * 257 * 65537
  • 264 - 1 = 3 * 5 * 17 * 257 * 641 * 65537 * 6700417
  • 2128 - 1 = 3 * 5 * 17 * 257 * 641 * 65537 * 274177 * 6700417 * 67280421310721

@RalfJung
Copy link
Member

RalfJung commented Dec 17, 2023

If you are thinking about posting an issue yourself, consider emphasizing 4 word division.

No I just wanted to be sure we don't carry work-arounds for LLVM issues that LLVM isn't aware of. Also now I can ping @nikic to get an estimate of whether it makes more sense to prioritize improving LLVM optimizations vs providing an is_compile_time_known intrinsic. :)

Though to be clear, nothing is blocking such an is_compile_time_known, it just needs someone to finish the work that was started in this PR.

@nikic
Copy link
Contributor

nikic commented Dec 17, 2023

@RalfJung The above discussion about divisions is irrelevant to this PR -- you obviously shouldn't use is_compile_time_known() to optimize divisions in the frontend. As for the 2.pow issue being actually solved here, using is_compile_time_known() is a very reasonable approach to that problem. I mean, we could add a loop idiom recognition pattern to LLVM to handle this, but it seems like a pretty silly thing to do, and will be fragile if what rustc emits and what LLVM matches for diverges.

@mati865
Copy link
Contributor

mati865 commented Dec 21, 2023

LLVM 16 is the oldest supported version now: #117947
@Centri3 if you have the time to do the rebase it could be worth giving a try.

Comment on lines +2506 to +2515
/// ```rust
/// if !is_val_statically_known(0) { unreachable_unchecked(); }
/// ```
///
/// This also means that the following code's behavior is unspecified; it
/// may panic, or it may not:
///
/// ```rust,no_run
/// assert_eq!(is_val_statically_known(0), black_box(is_val_statically_known(0)))
/// ```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both doc tests fail when ran with ./x test core --stage 1 --doc -- is_val_statically_known. The doc tests are never ran with --stage 0 because of #[cfg(not(bootstrap())].

You need to add unsafe, use, and feature statements to make them compile. If you think they should be omitted from the docs, then you can prepend them with #.

Here is an example that passes the tests:

  /// ```rust
  /// #![feature(is_val_statically_known)]
  /// #![feature(core_intrinsics)]
  /// use std::hint::unreachable_unchecked;
  /// use std::intrinsics::is_val_statically_known;
  ///
  /// unsafe {
  ///    if !is_val_statically_known(0) { unreachable_unchecked(); }
  /// }
  /// ```
  ///
  /// This also means that the following code's behavior is unspecified; it
  /// may panic, or it may not:
  ///
  /// ```rust,no_run
  /// #![feature(is_val_statically_known)]
  /// #![feature(core_intrinsics)]
  /// use std::hint::black_box;
  /// use std::intrinsics::is_val_statically_known;
  ///
  /// unsafe {
  ///     assert_eq!(is_val_statically_known(0), black_box(is_val_statically_known(0)));
  /// }
  /// ```

@NCGThompson
Copy link
Contributor

The GCC backend could implement this with __builtin_constant_p. If given a single variable as an argument, it will behave similarly to llvm.is.constant. In fact, I think I remember that llvm.is.constant was added so Clang could support __builtin_constant_p. Other backends could just interpret is_val_statically_known as false.

I don't understand the work flow for the other backends so I don't know if the intrinsic should be added in this PR or if it needs to be done in the backends separate repos.

};
// So it panics. Have to use `overflowing_mul` to efficiently set the
// result to 0 if not.
#[cfg(debug_assertions)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is incorrect to use #[cfg(debug_assertions)] here. In general, we can't assume debug_assertions matches overflow_checks even though they go together by default. This is especially true because this function has the #[rustc_inherit_overflow_checks].

Without #[cfg(debug_assertions)], it is still incomplete because the greatest possible u32 is much greater than the number of bits in an integer.

To correctly check for overflow, test to see if the final result is less than 1. If that is the case use _ = <Self>::MAX * <Self>::MAX; to force a panic iff overflow_checks are enabled.

Although it isn't necessary, you can make the code less verbose by using saturating_mul instead of overflowing_mul and by using (1 as Self).checked_shl().unwrap_or_default() instead of (1 << num_shl) * fine as Self.

@NCGThompson
Copy link
Contributor

No, I can add that comment and rebase but I won't have the time to debug it if it doesn't work

@Centri3 I'd be happy to take over if you won't be able to finish it.

bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 11, 2024
Add Benchmarks for int_pow Methods.

There is quite a bit of room for improvement in performance of the `int_pow` family of methods. I added benchmarks for those functions. In particular, there are benchmarks for small compile-time bases to measure the effect of  rust-lang#114390. ~~I added a lot (245), but all but 22 of them are marked with `#[ignore]`. There are a lot of macros, and I would appreciate feedback on how to simplify them.~~

~~To run benches relevant to rust-lang#114390, use `./x bench core --stage 1 -- pow_base_const --include-ignored`.~~
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 12, 2024
Add Benchmarks for int_pow Methods.

There is quite a bit of room for improvement in performance of the `int_pow` family of methods. I added benchmarks for those functions. In particular, there are benchmarks for small compile-time bases to measure the effect of  rust-lang#114390. ~~I added a lot (245), but all but 22 of them are marked with `#[ignore]`. There are a lot of macros, and I would appreciate feedback on how to simplify them.~~

~~To run benches relevant to rust-lang#114390, use `./x bench core --stage 1 -- pow_base_const --include-ignored`.~~
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 20, 2024
…li-obk

Replacement of rust-lang#114390: Add new intrinsic `is_var_statically_known` and optimize pow for powers of two

This adds a new intrinsic `is_val_statically_known` that lowers to [``@llvm.is.constant.*`](https://llvm.org/docs/LangRef.html#llvm-is-constant-intrinsic).` It also applies the intrinsic in the int_pow methods to recognize and optimize the idiom `2isize.pow(x)`. See rust-lang#114390 for more discussion.

While I have extended the scope of the power of two optimization from rust-lang#114390, I haven't added any new uses for the intrinsic. That can be done in later pull requests.

Note: When testing or using the library, be sure to use `--stage 1` or higher. Otherwise, the intrinsic will be a noop and the doctests will be skipped. If you are trying out edits, you may be interested in [`--keep-stage 0`](https://rustc-dev-guide.rust-lang.org/building/suggested.html#faster-builds-with---keep-stage).

Fixes rust-lang#47234
Resolves rust-lang#114390
`@Centri3`
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 23, 2024
…li-obk

Replacement of rust-lang#114390: Add new intrinsic `is_var_statically_known` and optimize pow for powers of two

This adds a new intrinsic `is_val_statically_known` that lowers to [``@llvm.is.constant.*`](https://llvm.org/docs/LangRef.html#llvm-is-constant-intrinsic).` It also applies the intrinsic in the int_pow methods to recognize and optimize the idiom `2isize.pow(x)`. See rust-lang#114390 for more discussion.

While I have extended the scope of the power of two optimization from rust-lang#114390, I haven't added any new uses for the intrinsic. That can be done in later pull requests.

Note: When testing or using the library, be sure to use `--stage 1` or higher. Otherwise, the intrinsic will be a noop and the doctests will be skipped. If you are trying out edits, you may be interested in [`--keep-stage 0`](https://rustc-dev-guide.rust-lang.org/building/suggested.html#faster-builds-with---keep-stage).

Fixes rust-lang#47234
Resolves rust-lang#114390
`@Centri3`
@bors bors closed this in 039d887 Jan 25, 2024
bjorn3 pushed a commit to bjorn3/rust that referenced this pull request Jan 26, 2024
…li-obk

Replacement of rust-lang#114390: Add new intrinsic `is_var_statically_known` and optimize pow for powers of two

This adds a new intrinsic `is_val_statically_known` that lowers to [``@llvm.is.constant.*`](https://llvm.org/docs/LangRef.html#llvm-is-constant-intrinsic).` It also applies the intrinsic in the int_pow methods to recognize and optimize the idiom `2isize.pow(x)`. See rust-lang#114390 for more discussion.

While I have extended the scope of the power of two optimization from rust-lang#114390, I haven't added any new uses for the intrinsic. That can be done in later pull requests.

Note: When testing or using the library, be sure to use `--stage 1` or higher. Otherwise, the intrinsic will be a noop and the doctests will be skipped. If you are trying out edits, you may be interested in [`--keep-stage 0`](https://rustc-dev-guide.rust-lang.org/building/suggested.html#faster-builds-with---keep-stage).

Fixes rust-lang#47234
Resolves rust-lang#114390
`@Centri3`
GuillaumeGomez pushed a commit to GuillaumeGomez/rust that referenced this pull request Feb 21, 2024
…li-obk

Replacement of rust-lang#114390: Add new intrinsic `is_var_statically_known` and optimize pow for powers of two

This adds a new intrinsic `is_val_statically_known` that lowers to [``@llvm.is.constant.*`](https://llvm.org/docs/LangRef.html#llvm-is-constant-intrinsic).` It also applies the intrinsic in the int_pow methods to recognize and optimize the idiom `2isize.pow(x)`. See rust-lang#114390 for more discussion.

While I have extended the scope of the power of two optimization from rust-lang#114390, I haven't added any new uses for the intrinsic. That can be done in later pull requests.

Note: When testing or using the library, be sure to use `--stage 1` or higher. Otherwise, the intrinsic will be a noop and the doctests will be skipped. If you are trying out edits, you may be interested in [`--keep-stage 0`](https://rustc-dev-guide.rust-lang.org/building/suggested.html#faster-builds-with---keep-stage).

Fixes rust-lang#47234
Resolves rust-lang#114390
`@Centri3`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet