Skip to content

Limit the non-incr. CGU count to the core count. #113555

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

nnethercote
Copy link
Contributor

r? @ghost

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 10, 2023
@nnethercote nnethercote marked this pull request as draft July 10, 2023 21:19
@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 10, 2023
@bors
Copy link
Collaborator

bors commented Jul 10, 2023

⌛ Trying commit 3577b1c with merge f962a86298e2d1bf8ad386c29945702b57b8692f...

@bors
Copy link
Collaborator

bors commented Jul 10, 2023

☀️ Try build successful - checks-actions
Build commit: f962a86298e2d1bf8ad386c29945702b57b8692f (f962a86298e2d1bf8ad386c29945702b57b8692f)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (f962a86298e2d1bf8ad386c29945702b57b8692f): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-3.4% [-16.8%, -0.6%] 24
Improvements ✅
(secondary)
-0.7% [-0.7%, -0.7%] 1
All ❌✅ (primary) -3.4% [-16.8%, -0.6%] 24

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
20.2% [2.8%, 43.5%] 18
Regressions ❌
(secondary)
1.6% [1.6%, 1.6%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 20.2% [2.8%, 43.5%] 18

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-3.7% [-16.3%, -1.0%] 21
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -3.7% [-16.3%, -1.0%] 21

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-5.4% [-11.9%, -1.5%] 25
Improvements ✅
(secondary)
-1.0% [-1.5%, -0.5%] 2
All ❌✅ (primary) -5.4% [-11.9%, -1.5%] 25

Bootstrap: 656.951s -> 656.632s (-0.05%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 10, 2023
@nnethercote
Copy link
Contributor Author

An explanation: in general, fewer CGUs means fewer cycles, small max-rss, and smaller binary-size., But larger walltimes due to less parallelism. But that assumes the machine can provide the parallelism.

@Kobzol asked me: "I wonder if we oversubscribe though - do we still generate 16 threads in parallel if you only have 2 cores?"

The answer is yes: rustc doesn’t consider the number of cores when choosing the CGU limit. So I thought that limiting the number of CGUs (in the non-incremental case) to the core count would therefore help. Cycles/max-rss/binary-size would be better, and walltimes would be no worse because the available parallelism is unchanged. (The perf CI machine has 12 cores.)

But walltimes and max-rss were both a lot worse in some cases. Surprise!

I think the walltime result is because with 12 cores and 16 CGUs we actually don’t oversubscribe. rustc never gets more than seven or eight LLVM opt threads running in parallel (e.g. the first LLVM opt thread finishes well before the 12th can start) so we get the usual “fewer, larger CGUs result in longer walltimes” result. On a machine with just two or four cores we probably do oversubscribe and so this change might be more beneficial.

As for max-rss: maybe we end up with more CGUs compiling in parallel? Not sure.

@joshtriplett
Copy link
Member

This may also make builds non-reproducible, though, since the number of CGUs can affect the build output.

@nnethercote
Copy link
Contributor Author

This may also make builds non-reproducible, though, since the number of CGUs can affect the build output.

True. That combined with the bad perf results means this has no future.

@nnethercote nnethercote deleted the cgus-cores branch July 11, 2023 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants