Always const-evaluate the GCD in `slice::align_to_offsets` #111296

Sp00ph · 2023-05-06T17:47:48Z

Use an inline const-block to force the compiler to calculate the GCD at compile time, even in debug mode. This shouldn't affect the behavior of the program at all, but it drastically cuts down on the number of instructions emitted with optimizations disabled.

With the current implementation, a single slice::align_to instantiation (specifically <[u8]>::align_to::<u128>()) generates 676 instructions (on x86-64). Forcing the GCD computation to be const cuts it down to 327 instructions, so just over 50% less. This is obviously not representative of actual runtime gains, but I still see it as a significant win as long as it doesn't degrade compile times.

Not having to worry about LLVM const-evaluating the GCD function also allows it to use the textbook recursive euclidean algorithm instead of a much more complicated iterative implementation with multiple unsafe-blocks.

rustbot · 2023-05-06T17:47:55Z

r? @Mark-Simulacrum

(rustbot has picked a reviewer for you, use r? to override)

rustbot · 2023-05-06T17:47:58Z

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

Stabilizing library features
Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
Changing public documentation in ways that create new stability guarantees
Changing observable runtime behavior of library APIs

Mark-Simulacrum · 2023-05-07T15:45:44Z

@bors try @rust-timer queue

bors · 2023-05-07T15:45:52Z

⌛ Trying commit b5ee324 with merge 2a2eb58ca22fcee5651c02f204da3818f000d4fc...

Mark-Simulacrum · 2023-05-07T15:46:12Z

r=me with perf results OK

cc @scottmcm, I think you've worked on the GCD code and may be interested.

bors · 2023-05-07T17:55:22Z

☀️ Try build successful - checks-actions
Build commit: 2a2eb58ca22fcee5651c02f204da3818f000d4fc (2a2eb58ca22fcee5651c02f204da3818f000d4fc)

scottmcm · 2023-05-07T18:07:10Z

library/core/src/slice/mod.rs

+        let gcd: usize = const { gcd(mem::size_of::<T>(), mem::size_of::<U>()) };
        let ts: usize = mem::size_of::<U>() / gcd;
        let us: usize = mem::size_of::<T>() / gcd;


Maybe push even a bit more into the const block?

Suggested change

let gcd: usize = const { gcd(mem::size_of::<T>(), mem::size_of::<U>()) };

let ts: usize = mem::size_of::<U>() / gcd;

let us: usize = mem::size_of::<T>() / gcd;

let (ts, us) = const {

let gcd: usize = gcd(mem::size_of::<T>(), mem::size_of::<U>());

(mem::size_of::<U>() / gcd, mem::size_of::<T>() / gcd)

};

Doing that results in more instructions being emitted in debug mode for some reason.

You can also compute ts * us in const too.

I'm not convinced we should move more operations into the const-block if it increases the instruction count though.

Doing that results in more instructions being emitted in debug mode for some reason.

Caveat: small changes in instruction count in a debug build is a subpar proxy for estimating performance of such a build. LLVM can still select instruction sequences that execute in fewer cycles, but require more instructions in a debug mode, so long as it is able to match on IR sequences that trigger those instruction sequences. A prime example of that is e.g. dividing by a constant that produces a strength reduction that involves a multiplication and a shift. These two operations are still way faster to execute compared to a full division. Something similar may be happening here.

More importantly, though, this is all LLVM version dependent. It might do better in versions V and V+1, worse in versions V+2 and V+3 and even better in V+4... In this case I'd say, as long as the difference in machine instruction count isn’t like 10-20% or more, then we should strive to ensure that rustc does as much work as possible before handing it off to LLVM’s consideration.

EDIT: The actually correct thing would be to actually put the generated code through llvm-mca and have it compute the actual execution cost in cycles.

EDIT (part 2): Another thing that might be happening is that moving code into the const block enables LLVM to use FastIsel or somesuch mechanism which might have been not used before for whatever reason, which is just much faster in terms of generating code (even if the results aren't as good execution-time wise.)

If both T and U are ZSTs then doing the divisions in the const-block causes a compile time divide-by-zero error. align_to has an early return if at least one of the types is a ZST, so the divisor will never be zero at runtime, but it looks like the const-block still gets evaluated even though it's technically dead code.

Got it, sounds like we have no other choice right now, then. We could look into moving the check for ts == 0 || us == 0 into the const block too and bail in some other reasonable way, but it probably something that warrants a separate PR anyhow.

scottmcm · 2023-05-07T18:09:45Z

Nope, wasn't me. Maybe @nagisa ?

Great change, though! A really nice demonstration of a const block being worth it.

nagisa · 2023-05-07T18:34:50Z

Yeah, I “own” anything align_to-related. This is a great change and exactly what I was anticipating this to look like in the future. As @scottmcm pointed out you can still put more stuff into const and it should be possible to make the entire align_to_offsets a const fn (though it probably won't change much, since the argument to that function will always be non-const AFAIR…)

nagisa · 2023-05-07T18:37:31Z

library/core/src/slice/mod.rs

-            }
-            a << k
+        const fn gcd(a: usize, b: usize) -> usize {
+            if b == 0 { a } else { gcd(b, a % b) }


Doesn’t this need to check for a == 0 too? There's no guarantee that either of the types won’t be ZST (in isolation.) Or, add a comment.

If a == 0 then it returns gcd(b, 0 % b) which will then return b.

rust-timer · 2023-05-07T19:51:15Z

Finished benchmarking commit (2a2eb58ca22fcee5651c02f204da3818f000d4fc): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.6%	[2.6%, 2.6%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-2.9%	[-3.9%, -2.0%]	2
All ❌✅ (primary)	-	-	0

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 653.81s -> 653.024s (-0.12%)

nagisa · 2023-05-08T09:08:33Z

@bors r=nagisa,Mark-Simulacrum

bors · 2023-05-08T09:08:35Z

📌 Commit b5ee324 has been approved by nagisa,Mark-Simulacrum

It is now in the queue for this repository.

bors · 2023-05-08T23:47:42Z

⌛ Testing commit b5ee324 with merge 90c02c1...

bors · 2023-05-09T02:41:39Z

☀️ Test successful - checks-actions
Approved by: nagisa,Mark-Simulacrum
Pushing 90c02c1 to master...

rust-timer · 2023-05-09T03:56:06Z

Finished benchmarking commit (90c02c1): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.3%	[-0.3%, -0.2%]	3
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-1.6%	[-1.9%, -1.3%]	2
All ❌✅ (primary)	-	-	0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.5%	[0.8%, 6.4%]	11
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	3.5%	[0.8%, 6.4%]	11

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 658.017s -> 656.527s (-0.23%)

Always const-eval the gcd in slice::align_to_offsets

b5ee324

rustbot assigned Mark-Simulacrum May 6, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels May 6, 2023

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 7, 2023

This comment has been minimized.

Sign in to view

scottmcm reviewed May 7, 2023

View reviewed changes

nagisa reviewed May 7, 2023

View reviewed changes

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 7, 2023

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 8, 2023

bors added the merged-by-bors This PR was explicitly merged by bors. label May 9, 2023

bors merged commit 90c02c1 into rust-lang:master May 9, 2023

rustbot added this to the 1.71.0 milestone May 9, 2023

Sp00ph deleted the const_gcd branch May 9, 2023 12:55

Always const-evaluate the GCD in slice::align_to_offsets #111296

Always const-evaluate the GCD in slice::align_to_offsets #111296

Uh oh!

Conversation

Sp00ph commented May 6, 2023

Uh oh!

rustbot commented May 6, 2023

Uh oh!

rustbot commented May 6, 2023

Uh oh!

Mark-Simulacrum commented May 7, 2023

Uh oh!

This comment has been minimized.

bors commented May 7, 2023

Uh oh!

Mark-Simulacrum commented May 7, 2023

Uh oh!

bors commented May 7, 2023

Uh oh!

This comment has been minimized.

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nagisa May 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scottmcm commented May 7, 2023

Uh oh!

nagisa commented May 7, 2023

Uh oh!

nagisa May 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rust-timer commented May 7, 2023

Overall result: no relevant changes - no action needed

Uh oh!

nagisa commented May 8, 2023

Uh oh!

bors commented May 8, 2023

Uh oh!

bors commented May 8, 2023

Uh oh!

bors commented May 9, 2023

Uh oh!

rust-timer commented May 9, 2023

Overall result: ✅ improvements - no action needed

Uh oh!

Uh oh!

Always const-evaluate the GCD in `slice::align_to_offsets` #111296

Always const-evaluate the GCD in `slice::align_to_offsets` #111296

nagisa May 7, 2023 •

edited

Loading

nagisa May 7, 2023 •

edited

Loading