Skip to content

Always const-evaluate the GCD in slice::align_to_offsets #111296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 9, 2023

Conversation

Sp00ph
Copy link
Member

@Sp00ph Sp00ph commented May 6, 2023

Use an inline const-block to force the compiler to calculate the GCD at compile time, even in debug mode. This shouldn't affect the behavior of the program at all, but it drastically cuts down on the number of instructions emitted with optimizations disabled.

With the current implementation, a single slice::align_to instantiation (specifically <[u8]>::align_to::<u128>()) generates 676 instructions (on x86-64). Forcing the GCD computation to be const cuts it down to 327 instructions, so just over 50% less. This is obviously not representative of actual runtime gains, but I still see it as a significant win as long as it doesn't degrade compile times.

Not having to worry about LLVM const-evaluating the GCD function also allows it to use the textbook recursive euclidean algorithm instead of a much more complicated iterative implementation with multiple unsafe-blocks.

@rustbot
Copy link
Collaborator

rustbot commented May 6, 2023

r? @Mark-Simulacrum

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels May 6, 2023
@rustbot
Copy link
Collaborator

rustbot commented May 6, 2023

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@Mark-Simulacrum
Copy link
Member

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 7, 2023
@bors
Copy link
Collaborator

bors commented May 7, 2023

⌛ Trying commit b5ee324 with merge 2a2eb58ca22fcee5651c02f204da3818f000d4fc...

@Mark-Simulacrum
Copy link
Member

r=me with perf results OK

cc @scottmcm, I think you've worked on the GCD code and may be interested.

@bors
Copy link
Collaborator

bors commented May 7, 2023

☀️ Try build successful - checks-actions
Build commit: 2a2eb58ca22fcee5651c02f204da3818f000d4fc (2a2eb58ca22fcee5651c02f204da3818f000d4fc)

@rust-timer

This comment has been minimized.

Comment on lines +3483 to 3485
let gcd: usize = const { gcd(mem::size_of::<T>(), mem::size_of::<U>()) };
let ts: usize = mem::size_of::<U>() / gcd;
let us: usize = mem::size_of::<T>() / gcd;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe push even a bit more into the const block?

Suggested change
let gcd: usize = const { gcd(mem::size_of::<T>(), mem::size_of::<U>()) };
let ts: usize = mem::size_of::<U>() / gcd;
let us: usize = mem::size_of::<T>() / gcd;
let (ts, us) = const {
let gcd: usize = gcd(mem::size_of::<T>(), mem::size_of::<U>());
(mem::size_of::<U>() / gcd, mem::size_of::<T>() / gcd)
};

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing that results in more instructions being emitted in debug mode for some reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also compute ts * us in const too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced we should move more operations into the const-block if it increases the instruction count though.

Copy link
Member

@nagisa nagisa May 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing that results in more instructions being emitted in debug mode for some reason.

Caveat: small changes in instruction count in a debug build is a subpar proxy for estimating performance of such a build. LLVM can still select instruction sequences that execute in fewer cycles, but require more instructions in a debug mode, so long as it is able to match on IR sequences that trigger those instruction sequences. A prime example of that is e.g. dividing by a constant that produces a strength reduction that involves a multiplication and a shift. These two operations are still way faster to execute compared to a full division. Something similar may be happening here.

More importantly, though, this is all LLVM version dependent. It might do better in versions V and V+1, worse in versions V+2 and V+3 and even better in V+4... In this case I'd say, as long as the difference in machine instruction count isn’t like 10-20% or more, then we should strive to ensure that rustc does as much work as possible before handing it off to LLVM’s consideration.

EDIT: The actually correct thing would be to actually put the generated code through llvm-mca and have it compute the actual execution cost in cycles.

EDIT (part 2): Another thing that might be happening is that moving code into the const block enables LLVM to use FastIsel or somesuch mechanism which might have been not used before for whatever reason, which is just much faster in terms of generating code (even if the results aren't as good execution-time wise.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both T and U are ZSTs then doing the divisions in the const-block causes a compile time divide-by-zero error. align_to has an early return if at least one of the types is a ZST, so the divisor will never be zero at runtime, but it looks like the const-block still gets evaluated even though it's technically dead code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, sounds like we have no other choice right now, then. We could look into moving the check for ts == 0 || us == 0 into the const block too and bail in some other reasonable way, but it probably something that warrants a separate PR anyhow.

@scottmcm
Copy link
Member

scottmcm commented May 7, 2023

Nope, wasn't me. Maybe @nagisa ?

Great change, though! A really nice demonstration of a const block being worth it.

@nagisa
Copy link
Member

nagisa commented May 7, 2023

Yeah, I “own” anything align_to-related. This is a great change and exactly what I was anticipating this to look like in the future. As @scottmcm pointed out you can still put more stuff into const and it should be possible to make the entire align_to_offsets a const fn (though it probably won't change much, since the argument to that function will always be non-const AFAIR…)

}
a << k
const fn gcd(a: usize, b: usize) -> usize {
if b == 0 { a } else { gcd(b, a % b) }
Copy link
Member

@nagisa nagisa May 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn’t this need to check for a == 0 too? There's no guarantee that either of the types won’t be ZST (in isolation.) Or, add a comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a == 0 then it returns gcd(b, 0 % b) which will then return b.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (2a2eb58ca22fcee5651c02f204da3818f000d4fc): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.6% [2.6%, 2.6%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.9% [-3.9%, -2.0%] 2
All ❌✅ (primary) - - 0

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 653.81s -> 653.024s (-0.12%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 7, 2023
@nagisa
Copy link
Member

nagisa commented May 8, 2023

@bors r=nagisa,Mark-Simulacrum

@bors
Copy link
Collaborator

bors commented May 8, 2023

📌 Commit b5ee324 has been approved by nagisa,Mark-Simulacrum

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 8, 2023
@bors
Copy link
Collaborator

bors commented May 8, 2023

⌛ Testing commit b5ee324 with merge 90c02c1...

@bors
Copy link
Collaborator

bors commented May 9, 2023

☀️ Test successful - checks-actions
Approved by: nagisa,Mark-Simulacrum
Pushing 90c02c1 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label May 9, 2023
@bors bors merged commit 90c02c1 into rust-lang:master May 9, 2023
@rustbot rustbot added this to the 1.71.0 milestone May 9, 2023
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (90c02c1): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.3% [-0.3%, -0.2%] 3
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-1.6% [-1.9%, -1.3%] 2
All ❌✅ (primary) - - 0

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.5% [0.8%, 6.4%] 11
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 3.5% [0.8%, 6.4%] 11

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 658.017s -> 656.527s (-0.23%)

@Sp00ph Sp00ph deleted the const_gcd branch May 9, 2023 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants