-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Limit the non-incr. CGU count to the core count. #113555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
⌛ Trying commit 3577b1c with merge f962a86298e2d1bf8ad386c29945702b57b8692f... |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (f962a86298e2d1bf8ad386c29945702b57b8692f): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 656.951s -> 656.632s (-0.05%) |
An explanation: in general, fewer CGUs means fewer cycles, small max-rss, and smaller binary-size., But larger walltimes due to less parallelism. But that assumes the machine can provide the parallelism. @Kobzol asked me: "I wonder if we oversubscribe though - do we still generate 16 threads in parallel if you only have 2 cores?" The answer is yes: rustc doesn’t consider the number of cores when choosing the CGU limit. So I thought that limiting the number of CGUs (in the non-incremental case) to the core count would therefore help. Cycles/max-rss/binary-size would be better, and walltimes would be no worse because the available parallelism is unchanged. (The perf CI machine has 12 cores.) But walltimes and max-rss were both a lot worse in some cases. Surprise! I think the walltime result is because with 12 cores and 16 CGUs we actually don’t oversubscribe. rustc never gets more than seven or eight LLVM opt threads running in parallel (e.g. the first LLVM opt thread finishes well before the 12th can start) so we get the usual “fewer, larger CGUs result in longer walltimes” result. On a machine with just two or four cores we probably do oversubscribe and so this change might be more beneficial. As for max-rss: maybe we end up with more CGUs compiling in parallel? Not sure. |
This may also make builds non-reproducible, though, since the number of CGUs can affect the build output. |
True. That combined with the bad perf results means this has no future. |
r? @ghost