Add new intrinsic `is_constant` and optimize `pow` #114390

Centri3 · 2023-08-02T21:03:48Z

This was suggested in #114254. This basically lets LLVM know that it can evaluate the next few statements at compile-time, and thus can completely remove them. This allows us to select a path at compile-time, and with optimizations on, the other will be entirely removed. This will likely be a bit slower in debug mode but this should be ok.

I've done this for the integer's pow methods as was done in #114254 and it works great!

We should probably restrict the types that can be passed to it, though, llvm.is.constant can be passed any type but some ICE. (So far, I know that all ZSTs cause an ICE)

cc @rust-lang/lang; I highly doubt this shouldn't be insta-const, but I think discussion should be warranted anyway.

rustbot · 2023-08-02T21:03:54Z

r? @Mark-Simulacrum

(rustbot has picked a reviewer for you, use r? to override)

rustbot · 2023-08-02T21:03:57Z

Some changes occurred to the CTFE / Miri engine

cc @rust-lang/miri

oli-obk · 2023-08-02T21:06:23Z

I highly doubt this shouldn't be insta-const, but I think discussion should be warranted anyway.

Make it insta-const and just always return false during const eval. It doesn't matter at const eval time 😅

library/core/src/intrinsics.rs

compiler/rustc_const_eval/src/interpret/intrinsics.rs

oli-obk · 2023-08-02T21:13:37Z

This basically lets LLVM know that it can evaluate the next few statements at compile-time, and thus can completely remove them.

That's not really it. This let's llvm know that we have a code path it will be able to optimize better if the arg is const, than the code path taken if the arg is not const.

This is a hint, similar to how we have size_of checks in generic code to optimize zst code paths.

compiler/rustc_hir_analysis/src/check/intrinsic.rs

library/core/src/intrinsics.rs

Centri3 · 2023-08-02T22:26:59Z

That's not really it. This let's llvm know that we have a code path it will be able to optimize better if the arg is const, than the code path taken if the arg is not const.

Yeah that's true, I was referring to the behavior though. In practice it'll do basically that, folding the next statements to a constant then removing the branch altogether

rustbot · 2023-08-02T23:20:08Z

The Miri subtree was changed

cc @rust-lang/miri

src/tools/miri/src/shims/intrinsics/mod.rs

rustbot · 2023-08-02T23:28:07Z

Some changes occurred in src/tools/cargo

cc @ehuss

Centri3 · 2023-08-02T23:28:47Z

?? How'd that happen?

compiler/rustc_codegen_llvm/src/context.rs

src/tools/miri/src/shims/intrinsics/mod.rs

RalfJung · 2023-08-03T06:37:10Z

I'm not sure if this is correct. I cannot test MIRI locally as there are tons of compile errors, so maybe somebody else can verify it?

./x.py build --stage 0 miri should always work. If it doesn't, something is wrong either with your PR or your setup. test should also work but you'll have to add a testcase first, of course. If that raises strange error, find build/ -name "miri*" | xargs rm -rf helps (but this can only really happen if you've tested Miri before so is unlikely to be your problem).

compiler/rustc_const_eval/src/interpret/intrinsics.rs

RalfJung · 2023-08-03T06:46:56Z

You still have a cargo submodule update for cargo in your branch, please remove that. Looks like it is in an isolated commit so a git rebase --interactive origin/master should let you remove that one commit.

Not sure how that ended up in your branch, my guess would be a rebase got wrong. Please never rebase unless necessary.

src/tools/miri/src/shims/intrinsics/mod.rs

src/tools/cargo

library/core/src/intrinsics.rs

library/core/src/num/int_macros.rs

RalfJung · 2023-09-05T13:35:23Z

src/tools/miri/tests/pass/intrinsics.rs

+    let mut saw_false = false;
+


Suggested change

let mut saw_false = false;

let mut saw_false = false;

just a nit to make it visually more clear that these are all together testing one thing

EDIT: hm I tried to remove the empty line, not sure if github supports that...

Centri3 · 2023-09-05T14:19:24Z

PR CI uses an older LLVM and has debug assertions enabled. Those could be causing differences.

I just ran it with rust.debug-assertions set to true yet no dice. I'll try to run with CI's version of LLVM and if that doesn't work, I'll open a zulip thread.

oli-obk · 2023-09-07T08:11:49Z

You can use // min-llvm-version: 16 to ensure the tests are only run on the builders with llvm 16.

It's ok for opts to differ between llvm versions, so if older llvm ends up returning false from is_constant in cases that later llvm knows to be true, that's ok

bors · 2023-09-22T08:58:37Z

☔ The latest upstream changes (presumably #114776) made this pull request unmergeable. Please resolve the merge conflicts.

Dylan-DPC · 2023-11-05T06:02:52Z

@Centri3 any updates on this?

Centri3 · 2023-11-14T18:09:28Z

No, I can add that comment and rebase but I won't have the time to debug it if it doesn't work

bors · 2023-11-18T15:10:06Z

☔ The latest upstream changes (presumably #117525) made this pull request unmergeable. Please resolve the merge conflicts.

NCGThompson · 2023-12-17T05:24:54Z

This could be useful for integer division too. While it is generally best to leave these kinds of optimizations to the compiler, the current version of LLVM produces sub-optimal division code in some cases. When the integer width is two machine words (u128 on 64-bit machines), LLVM only does the divide by constant optimization if the odd factor of the divisor can exactly divide 2^bits - 1. Otherwise, it calls a function. When the integer width is four machine words (u128 on 32-bit machines), LLVM divides using subtraction, even when the divisor is a constant.

RalfJung · 2023-12-17T07:33:01Z

Are there LLVM issues tracking these missed optimizations?

NCGThompson · 2023-12-17T18:06:00Z

Are there LLVM issues tracking these missed optimizations?

There are a few related issues. While these are all open as of writing, some seem to be partly out of date:

In Rust:

Fast algorithm for u128 (and i128) divided by small constant #54867

Many use 10 or 5 as an example. Since rustc 1.70 (corresponding to llvm-16), dividing by 10 and 5 is optimized when the dividend is 2 machine words. The issues seem to all be about dividing two machine words, while there are no issues specific to more than two words. @RalfJung, If you are thinking about posting an issue yourself, consider emphasizing 4 word division.

Here is a Compiler Explorer link to Rust code showing division by small constants. It is currently set to i686. To change the target back to the default, go to Overrides > Target architecture, and select the blank option at the top of the drop down menu. Here is a link to edit the llvm IR. Note the differences between versions 15 and 16.

I am not certain, but I have a hunch that the difference between divisors that are optimized for 2 words and divisors that aren't has to do with the divisibility of 2ⁿ - 1. I think so because of this paragraph on page 1 of T. Granlund, P. Montmery (1994):

Multiple authors [2, 11, 15] present al-
gorithms for division by constants, but only when the
divisor divides 2k − 1 for some small k. Magenheimer
et al [16, §7] give the foundation of a more general
approach, which Alverson [1] implements on the Tera
Computing System. Compiler writers are only begin-
ning to become aware of the general technique.

For reference:

2³² - 1 = 3 * 5 * 17 * 257 * 65537
2⁶⁴ - 1 = 3 * 5 * 17 * 257 * 641 * 65537 * 6700417
2¹²⁸ - 1 = 3 * 5 * 17 * 257 * 641 * 65537 * 274177 * 6700417 * 67280421310721

RalfJung · 2023-12-17T18:21:51Z

If you are thinking about posting an issue yourself, consider emphasizing 4 word division.

No I just wanted to be sure we don't carry work-arounds for LLVM issues that LLVM isn't aware of. Also now I can ping @nikic to get an estimate of whether it makes more sense to prioritize improving LLVM optimizations vs providing an is_compile_time_known intrinsic. :)

Though to be clear, nothing is blocking such an is_compile_time_known, it just needs someone to finish the work that was started in this PR.

nikic · 2023-12-17T20:02:54Z

@RalfJung The above discussion about divisions is irrelevant to this PR -- you obviously shouldn't use is_compile_time_known() to optimize divisions in the frontend. As for the 2.pow issue being actually solved here, using is_compile_time_known() is a very reasonable approach to that problem. I mean, we could add a loop idiom recognition pattern to LLVM to handle this, but it seems like a pretty silly thing to do, and will be fragile if what rustc emits and what LLVM matches for diverges.

mati865 · 2023-12-21T22:11:43Z

LLVM 16 is the oldest supported version now: #117947
@Centri3 if you have the time to do the rebase it could be worth giving a try.

NCGThompson · 2023-12-30T20:46:48Z

library/core/src/intrinsics.rs

+    /// ```rust
+    /// if !is_val_statically_known(0) { unreachable_unchecked(); }
+    /// ```
+    ///
+    /// This also means that the following code's behavior is unspecified; it
+    /// may panic, or it may not:
+    ///
+    /// ```rust,no_run
+    /// assert_eq!(is_val_statically_known(0), black_box(is_val_statically_known(0)))
+    /// ```


Both doc tests fail when ran with ./x test core --stage 1 --doc -- is_val_statically_known. The doc tests are never ran with --stage 0 because of #[cfg(not(bootstrap())].

You need to add unsafe, use, and feature statements to make them compile. If you think they should be omitted from the docs, then you can prepend them with #.

Here is an example that passes the tests:

/// ```rust /// #![feature(is_val_statically_known)] /// #![feature(core_intrinsics)] /// use std::hint::unreachable_unchecked; /// use std::intrinsics::is_val_statically_known; /// /// unsafe { /// if !is_val_statically_known(0) { unreachable_unchecked(); } /// } /// ``` /// /// This also means that the following code's behavior is unspecified; it /// may panic, or it may not: /// /// ```rust,no_run /// #![feature(is_val_statically_known)] /// #![feature(core_intrinsics)] /// use std::hint::black_box; /// use std::intrinsics::is_val_statically_known; /// /// unsafe { /// assert_eq!(is_val_statically_known(0), black_box(is_val_statically_known(0))); /// } /// ```

NCGThompson · 2024-01-06T02:10:45Z

The GCC backend could implement this with __builtin_constant_p. If given a single variable as an argument, it will behave similarly to llvm.is.constant. In fact, I think I remember that llvm.is.constant was added so Clang could support __builtin_constant_p. Other backends could just interpret is_val_statically_known as false.

I don't understand the work flow for the other backends so I don't know if the intrinsic should be added in this PR or if it needs to be done in the backends separate repos.

NCGThompson · 2024-01-06T21:11:04Z

library/core/src/num/int_macros.rs

+                };
+                // So it panics. Have to use `overflowing_mul` to efficiently set the
+                // result to 0 if not.
+                #[cfg(debug_assertions)]


is incorrect to use #[cfg(debug_assertions)] here. In general, we can't assume debug_assertions matches overflow_checks even though they go together by default. This is especially true because this function has the #[rustc_inherit_overflow_checks].

Without #[cfg(debug_assertions)], it is still incomplete because the greatest possible u32 is much greater than the number of bits in an integer.

To correctly check for overflow, test to see if the final result is less than 1. If that is the case use _ = <Self>::MAX * <Self>::MAX; to force a panic iff overflow_checks are enabled.

Although it isn't necessary, you can make the code less verbose by using saturating_mul instead of overflowing_mul and by using (1 as Self).checked_shl().unwrap_or_default() instead of (1 << num_shl) * fine as Self.

NCGThompson · 2024-01-09T22:06:26Z

No, I can add that comment and rebase but I won't have the time to debug it if it doesn't work

@Centri3 I'd be happy to take over if you won't be able to finish it.

Add Benchmarks for int_pow Methods. There is quite a bit of room for improvement in performance of the `int_pow` family of methods. I added benchmarks for those functions. In particular, there are benchmarks for small compile-time bases to measure the effect of rust-lang#114390. ~~I added a lot (245), but all but 22 of them are marked with `#[ignore]`. There are a lot of macros, and I would appreciate feedback on how to simplify them.~~ ~~To run benches relevant to rust-lang#114390, use `./x bench core --stage 1 -- pow_base_const --include-ignored`.~~

…li-obk Replacement of rust-lang#114390: Add new intrinsic `is_var_statically_known` and optimize pow for powers of two This adds a new intrinsic `is_val_statically_known` that lowers to [``@llvm.is.constant.*`](https://llvm.org/docs/LangRef.html#llvm-is-constant-intrinsic).` It also applies the intrinsic in the int_pow methods to recognize and optimize the idiom `2isize.pow(x)`. See rust-lang#114390 for more discussion. While I have extended the scope of the power of two optimization from rust-lang#114390, I haven't added any new uses for the intrinsic. That can be done in later pull requests. Note: When testing or using the library, be sure to use `--stage 1` or higher. Otherwise, the intrinsic will be a noop and the doctests will be skipped. If you are trying out edits, you may be interested in [`--keep-stage 0`](https://rustc-dev-guide.rust-lang.org/building/suggested.html#faster-builds-with---keep-stage). Fixes rust-lang#47234 Resolves rust-lang#114390 `@Centri3`

rustbot assigned Mark-Simulacrum Aug 2, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Aug 2, 2023

RalfJung reviewed Aug 2, 2023

View reviewed changes

library/core/src/intrinsics.rs Outdated Show resolved Hide resolved

RalfJung reviewed Aug 2, 2023

View reviewed changes

compiler/rustc_const_eval/src/interpret/intrinsics.rs Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

RalfJung reviewed Aug 2, 2023

View reviewed changes

compiler/rustc_hir_analysis/src/check/intrinsic.rs Outdated Show resolved Hide resolved

RalfJung reviewed Aug 2, 2023

View reviewed changes

library/core/src/intrinsics.rs Outdated Show resolved Hide resolved

rust-cloud-vms bot force-pushed the llvm-is-constant branch from 723ab6d to e317dbd Compare August 2, 2023 23:20

Centri3 commented Aug 2, 2023

View reviewed changes

src/tools/miri/src/shims/intrinsics/mod.rs Outdated Show resolved Hide resolved

Centri3 force-pushed the llvm-is-constant branch from e317dbd to acbf90a Compare August 2, 2023 23:23

This comment has been minimized.

Sign in to view

Centri3 commented Aug 2, 2023

View reviewed changes

compiler/rustc_codegen_llvm/src/context.rs Outdated Show resolved Hide resolved

kevinji reviewed Aug 3, 2023

View reviewed changes

src/tools/miri/src/shims/intrinsics/mod.rs Outdated Show resolved Hide resolved

src/tools/miri/src/shims/intrinsics/mod.rs Outdated Show resolved Hide resolved

RalfJung reviewed Aug 3, 2023

View reviewed changes

compiler/rustc_const_eval/src/interpret/intrinsics.rs Outdated Show resolved Hide resolved

RalfJung reviewed Aug 3, 2023

View reviewed changes

src/tools/miri/src/shims/intrinsics/mod.rs Outdated Show resolved Hide resolved

oli-obk reviewed Aug 3, 2023

View reviewed changes

src/tools/cargo Outdated Show resolved Hide resolved

RalfJung reviewed Aug 3, 2023

View reviewed changes

library/core/src/intrinsics.rs Show resolved Hide resolved

RalfJung reviewed Aug 3, 2023

View reviewed changes

library/core/src/num/int_macros.rs Outdated Show resolved Hide resolved

RalfJung reviewed Sep 5, 2023

View reviewed changes

davidtwco assigned oli-obk and unassigned davidtwco Sep 6, 2023

nikic mentioned this pull request Dec 15, 2023

2.pow(n) does not optimize well #47234

Closed

NCGThompson mentioned this pull request Dec 30, 2023

Add Benchmarks for int_pow Methods. #119430

Merged

NCGThompson suggested changes Dec 30, 2023

View reviewed changes

NCGThompson reviewed Jan 6, 2024

View reviewed changes

NCGThompson mentioned this pull request Jan 12, 2024

Replacement of #114390: Add new intrinsic is_var_statically_known and optimize pow for powers of two #119911

Merged

bors closed this in 039d887 Jan 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new intrinsic `is_constant` and optimize `pow` #114390

Add new intrinsic `is_constant` and optimize `pow` #114390

Centri3 commented Aug 2, 2023

rustbot commented Aug 2, 2023

rustbot commented Aug 2, 2023

oli-obk commented Aug 2, 2023 •

edited

Loading

This comment has been minimized.

oli-obk commented Aug 2, 2023 •

edited

Loading

Centri3 commented Aug 2, 2023

rustbot commented Aug 2, 2023

rustbot commented Aug 2, 2023

Centri3 commented Aug 2, 2023

This comment has been minimized.

RalfJung commented Aug 3, 2023

RalfJung commented Aug 3, 2023 •

edited

Loading

RalfJung Sep 5, 2023 •

edited

Loading

Centri3 commented Sep 5, 2023

oli-obk commented Sep 7, 2023

bors commented Sep 22, 2023

Dylan-DPC commented Nov 5, 2023

Centri3 commented Nov 14, 2023

bors commented Nov 18, 2023

NCGThompson commented Dec 17, 2023

RalfJung commented Dec 17, 2023

NCGThompson commented Dec 17, 2023

RalfJung commented Dec 17, 2023 •

edited

Loading

nikic commented Dec 17, 2023

mati865 commented Dec 21, 2023 •

edited

Loading

NCGThompson Dec 30, 2023

NCGThompson commented Jan 6, 2024

NCGThompson Jan 6, 2024

NCGThompson commented Jan 9, 2024

Add new intrinsic is_constant and optimize pow #114390

Add new intrinsic is_constant and optimize pow #114390

Conversation

Centri3 commented Aug 2, 2023

rustbot commented Aug 2, 2023

rustbot commented Aug 2, 2023

oli-obk commented Aug 2, 2023 • edited Loading

This comment has been minimized.

oli-obk commented Aug 2, 2023 • edited Loading

Centri3 commented Aug 2, 2023

rustbot commented Aug 2, 2023

rustbot commented Aug 2, 2023

Centri3 commented Aug 2, 2023

This comment has been minimized.

RalfJung commented Aug 3, 2023

RalfJung commented Aug 3, 2023 • edited Loading

RalfJung Sep 5, 2023 • edited Loading

Choose a reason for hiding this comment

Centri3 commented Sep 5, 2023

oli-obk commented Sep 7, 2023

bors commented Sep 22, 2023

Dylan-DPC commented Nov 5, 2023

Centri3 commented Nov 14, 2023

bors commented Nov 18, 2023

NCGThompson commented Dec 17, 2023

RalfJung commented Dec 17, 2023

NCGThompson commented Dec 17, 2023

RalfJung commented Dec 17, 2023 • edited Loading

nikic commented Dec 17, 2023

mati865 commented Dec 21, 2023 • edited Loading

NCGThompson Dec 30, 2023

Choose a reason for hiding this comment

NCGThompson commented Jan 6, 2024

NCGThompson Jan 6, 2024

Choose a reason for hiding this comment

NCGThompson commented Jan 9, 2024

Add new intrinsic `is_constant` and optimize `pow` #114390

Add new intrinsic `is_constant` and optimize `pow` #114390

oli-obk commented Aug 2, 2023 •

edited

Loading

oli-obk commented Aug 2, 2023 •

edited

Loading

RalfJung commented Aug 3, 2023 •

edited

Loading

RalfJung Sep 5, 2023 •

edited

Loading

RalfJung commented Dec 17, 2023 •

edited

Loading

mati865 commented Dec 21, 2023 •

edited

Loading