Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a minimum CGU size in non-incremental builds. #112448

Merged
merged 3 commits into from Jun 14, 2023

Conversation

nnethercote
Copy link
Contributor

Because tiny CGUs slow down compilation and result in worse generated code.

r? @wesleywiser

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 9, 2023
@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 9, 2023
@bors
Copy link
Contributor

bors commented Jun 9, 2023

⌛ Trying commit d7903614aaba46d87a0288e6eea386edb08d4c30 with merge 01c64cbb288ae7a44bb43912291d2b7b6e47da7d...

@nnethercote
Copy link
Contributor Author

nnethercote commented Jun 9, 2023

I got some tests/run-make/sepcomp-* failures locally. I've now fixed them, but I'll wait for the perf run to complete before uploading the relevant changes.

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Jun 9, 2023

☀️ Try build successful - checks-actions
Build commit: 01c64cbb288ae7a44bb43912291d2b7b6e47da7d (01c64cbb288ae7a44bb43912291d2b7b6e47da7d)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (01c64cbb288ae7a44bb43912291d2b7b6e47da7d): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.2% [2.0%, 2.4%] 6
Regressions ❌
(secondary)
1.3% [0.4%, 2.2%] 31
Improvements ✅
(primary)
-1.9% [-7.6%, -0.3%] 20
Improvements ✅
(secondary)
-8.4% [-46.0%, -0.2%] 21
All ❌✅ (primary) -1.0% [-7.6%, 2.4%] 26

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.9% [0.9%, 2.8%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.7% [-7.5%, -0.4%] 17
Improvements ✅
(secondary)
-5.1% [-18.6%, -1.5%] 15
All ❌✅ (primary) -2.2% [-7.5%, 2.8%] 19

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.8% [1.6%, 2.1%] 6
Regressions ❌
(secondary)
1.7% [1.3%, 2.1%] 12
Improvements ✅
(primary)
-3.8% [-8.0%, -1.5%] 15
Improvements ✅
(secondary)
-11.2% [-47.0%, -1.6%] 17
All ❌✅ (primary) -2.2% [-8.0%, 2.1%] 21

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.8% [0.5%, 2.8%] 16
Regressions ❌
(secondary)
2.5% [0.2%, 2.8%] 71
Improvements ✅
(primary)
-2.1% [-13.5%, -0.1%] 18
Improvements ✅
(secondary)
-7.1% [-48.2%, -0.0%] 18
All ❌✅ (primary) -0.3% [-13.5%, 2.8%] 34

Bootstrap: 649.387s -> 650.164s (0.12%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jun 9, 2023
@nnethercote
Copy link
Contributor Author

Hmm. This should be a slam dunk, based on logic and local measurements. The results here are great (or "no change") across the board for almost every metric (even the obscure ones) except walltime, where it's very mixed, with a few more regressions than improvements. Which is puzzling, because this really should be a clear win, especially for cases like unicode-normalization, libc, and html5ever, where a bunch of tiny CGUs are eliminated.

Also, lots of changes for incremental cases, which shouldn't be happening because this doesn't change incremental compilation at all. Including lots of changes to binary size for incremental cases, which makes no sense. (In comparison, the "object file" measurements look as expected. What's the difference between binary size (a.k.a. "linked artifact") and "object file"?)

@rust-log-analyzer

This comment has been minimized.

@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 9, 2023
@bors
Copy link
Contributor

bors commented Jun 9, 2023

⌛ Trying commit 45b08a53a870b118a159e65b710f0a8f3dfb1126 with merge 6532381723f53907d56ef7c7f21d2c3a77f7340a...

@bors
Copy link
Contributor

bors commented Jun 9, 2023

☀️ Try build successful - checks-actions
Build commit: 6532381723f53907d56ef7c7f21d2c3a77f7340a (6532381723f53907d56ef7c7f21d2c3a77f7340a)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (6532381723f53907d56ef7c7f21d2c3a77f7340a): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.8% [0.8%, 2.5%] 9
Regressions ❌
(secondary)
1.1% [0.3%, 2.2%] 41
Improvements ✅
(primary)
-1.9% [-7.2%, -0.3%] 20
Improvements ✅
(secondary)
-8.4% [-46.0%, -0.4%] 21
All ❌✅ (primary) -0.7% [-7.2%, 2.5%] 29

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.4% [0.9%, 2.0%] 2
Regressions ❌
(secondary)
3.2% [1.5%, 4.9%] 3
Improvements ✅
(primary)
-3.2% [-6.0%, -1.3%] 8
Improvements ✅
(secondary)
-6.8% [-19.9%, -2.1%] 9
All ❌✅ (primary) -2.3% [-6.0%, 2.0%] 10

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.8% [1.6%, 2.0%] 5
Regressions ❌
(secondary)
1.6% [1.2%, 2.0%] 12
Improvements ✅
(primary)
-3.5% [-8.6%, -0.7%] 17
Improvements ✅
(secondary)
-9.6% [-46.5%, -1.7%] 21
All ❌✅ (primary) -2.3% [-8.6%, 2.0%] 22

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.8% [0.5%, 2.8%] 16
Regressions ❌
(secondary)
2.5% [0.2%, 2.8%] 71
Improvements ✅
(primary)
-1.8% [-13.5%, -0.1%] 22
Improvements ✅
(secondary)
-7.2% [-48.3%, -0.0%] 18
All ❌✅ (primary) -0.3% [-13.5%, 2.8%] 38

Bootstrap: 647.29s -> 647.896s (0.09%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 9, 2023
@nikic
Copy link
Contributor

nikic commented Jun 9, 2023

@nnethercote Could the fact that the CGUs now get sorted by size even if there are less than the limit make a difference?

@nnethercote
Copy link
Contributor Author

@nnethercote Could the fact that the CGUs now get sorted by size even if there are less than the limit make a difference?

I had a similar thought, but this runs shortly afterwards, sorting the CGUs by name. Also, I did some debug logging and the CGU contents were identical before and after. Even worse, I'm getting big discrepancies between instruction counts measurements from Cachegrind vs perf. It's all very puzzling, I'll have to investigate more next week.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 13, 2023
@bors
Copy link
Contributor

bors commented Jun 13, 2023

⌛ Trying commit 9a516e047a6016474be5811bb66323cb5370e7e5 with merge 244b1111491e0e0b819b9a9aedb433964d59c0dd...

@bors
Copy link
Contributor

bors commented Jun 13, 2023

☀️ Try build successful - checks-actions
Build commit: 244b1111491e0e0b819b9a9aedb433964d59c0dd (244b1111491e0e0b819b9a9aedb433964d59c0dd)

1 similar comment
@bors
Copy link
Contributor

bors commented Jun 13, 2023

☀️ Try build successful - checks-actions
Build commit: 244b1111491e0e0b819b9a9aedb433964d59c0dd (244b1111491e0e0b819b9a9aedb433964d59c0dd)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (244b1111491e0e0b819b9a9aedb433964d59c0dd): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.2% [-7.7%, -0.3%] 19
Improvements ✅
(secondary)
-8.5% [-46.0%, -0.5%] 22
All ❌✅ (primary) -2.2% [-7.7%, -0.3%] 19

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.2% [2.2%, 2.2%] 1
Improvements ✅
(primary)
-3.5% [-6.7%, -1.3%] 10
Improvements ✅
(secondary)
-7.2% [-18.7%, -1.9%] 8
All ❌✅ (primary) -3.5% [-6.7%, -1.3%] 10

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
4.0% [4.0%, 4.2%] 3
Improvements ✅
(primary)
-3.9% [-8.8%, -0.6%] 16
Improvements ✅
(secondary)
-10.9% [-46.6%, -1.9%] 18
All ❌✅ (primary) -3.9% [-8.8%, -0.6%] 16

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.9% [-13.3%, -0.1%] 22
Improvements ✅
(secondary)
-7.2% [-48.3%, -0.0%] 18
All ❌✅ (primary) -1.9% [-13.3%, -0.1%] 22

Bootstrap: 649.134s -> 650.1s (0.15%)

@rustbot rustbot removed S-waiting-on-perf Status: Waiting on a perf run to be completed. perf-regression Performance regression. labels Jun 13, 2023
compiler/rustc_session/src/session.rs Outdated Show resolved Hide resolved
Comment on lines 35 to 36
split into. (The exact number will depend on the size and structure of the
crate's source code.) It takes an integer greater than 0.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see the parenthetical statement being interpreted as "we may create more CGUs than the value you specified". Maybe this formulation is more clear on that point?

Suggested change
split into. (The exact number will depend on the size and structure of the
crate's source code.) It takes an integer greater than 0.
split into. (Fewer code generation units may be created depending on the
size and structure of the crate's source code.) It takes an integer
greater than 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove the parenthetical. The important part of the change is specifying that it's a maximum.

@wesleywiser
Copy link
Member

Another interesting result from your final commit, adding up all of the CGU sizes shows:

Crate Before After
html5ever 19,138 17,584
tt-muncher 9,097 9,075

My guess is that this is due to fewer duplicated inline functions in different CGUs. This might be another statistic worth reporting on.

For example, we go from this:
```
FINAL (4059 items, total_size=232342; 16 CGUs, max_size=39608,
min_size=5468, max_size/min_size=7.2):
- CGU[0] regex.f2ff11e98f8b05c7-cgu.0 (318 items, size=39608):
  - fn ...
  - fn ...
```
to this:
```
FINAL
- unique items: 2726 (1459 root + 1267 inlined), unique size: 201214 (146046 root + 55168 inlined)
- placed items: 4059 (1459 root + 2600 inlined), placed size: 232342 (146046 root + 86296 inlined)
- placed/unique items ratio: 1.49, placed/unique size ratio: 1.15
- CGUs: 16, mean size: 14521.4, sizes: [39608, 31122, 20318, 20236, 16268, 13777, 12310, 10531, 10205, 9810, 9250, 9065 (x2), 7785, 7524, 5468]

- CGU[0]
  - regex.f2ff11e98f8b05c7-cgu.0, size: 39608
  - items: 318, mean size: 124.6, sizes: [28395, 3418, 558, 485, 259, 228, 176, 166, 146, 118, 117 (x3), 114 (x5), 113 (x3), 101, 84, 82, 77, 76, 72, 71 (x2), 66, 65, 62, 61, 59 (x2), 57, 55, 54 (x2), 53 (x4), 52 (x5), 51 (x4), 50, 48, 47, 46, 45 (x3), 44, 43 (x5), 42, 40, 38 (x4), 37, 35, 34 (x2), 32 (x2), 31, 30, 28 (x2), 27 (x2), 26 (x3), 24 (x2), 23 (x3), 22 (x2), 21, 20, 16 (x4), 15, 13 (x7), 12 (x3), 11 (x6), 10, 9 (x2), 8 (x4), 7 (x8), 6 (x38), 5 (x21), 4 (x7), 3 (x45), 2 (x63), 1 (x13)]
  - fn ...
  - fn ...
```
This is a lot more information, distinguishing between root items and
inlined items, showing how much duplication there is of inlined items,
plus the full range of sizes for CGUs and items within CGUs. All of
which is really helpful when analyzing this stuff and trying different
CGU formation algorithms.
Because tiny CGUs make compilation less efficient *and* result in worse
generated code.

We don't do this when the number of CGUs is explicitly given, because
there are times when the requested number is very important, as
described in some comments within the commit. So the commit also
introduces a `CodegenUnits` type that distinguishes between default
values and user-specified values.

This change has a roughly neutral effect on walltimes across the
rustc-perf benchmarks; there are some speedups and some slowdowns. But
it has significant wins for most other metrics on numerous benchmarks,
including instruction counts, cycles, binary size, and max-rss. It also
reduces parallelism, which is good for reducing jobserver competition
when multiple rustc processes are running at the same time. It's smaller
benchmarks that benefit the most; larger benchmarks already have CGUs
that are all larger than the minimum size.

Here are some example before/after CGU sizes for opt builds.

- html5ever
  - CGUs: 16, mean size: 1196.1, sizes: [3908, 2992, 1706, 1652, 1572,
    1136, 1045, 948, 946, 938, 579, 471, 443, 327, 286, 189]
  - CGUs: 4, mean size: 4396.0, sizes: [6706, 3908, 3490, 3480]

- libc
  - CGUs: 12, mean size: 35.3, sizes: [163, 93, 58, 53, 37, 8, 2 (x6)]
  - CGUs: 1, mean size: 424.0, sizes: [424]

- tt-muncher
  - CGUs: 5, mean size: 1819.4, sizes: [8508, 350, 198, 34, 7]
  - CGUs: 1, mean size: 9075.0, sizes: [9075]

Note that CGUs of size 100,000+ aren't unusual in larger programs.
@nnethercote
Copy link
Contributor Author

My guess is that this is due to fewer duplicated inline functions in different CGUs. This might be another statistic worth reporting on.

Yes, that's definitely a big part of the improvement here. It's hard to report on because merging currently happens before inlining, so you can't make this measurement without two versions of the compiler, one doing the tiny CGU merging and one not.

@nnethercote
Copy link
Contributor Author

I have addressed the review comments. I also made some very small tweaks to the debug printing, but nothing needing additional review. Thanks!

@bors r=wesleywiser

@bors
Copy link
Contributor

bors commented Jun 14, 2023

📌 Commit 7c3ce02 has been approved by wesleywiser

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 14, 2023
@bors
Copy link
Contributor

bors commented Jun 14, 2023

⌛ Testing commit 7c3ce02 with merge fa8762b...

@bors
Copy link
Contributor

bors commented Jun 14, 2023

☀️ Test successful - checks-actions
Approved by: wesleywiser
Pushing fa8762b to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jun 14, 2023
@bors bors merged commit fa8762b into rust-lang:master Jun 14, 2023
12 checks passed
@rustbot rustbot added this to the 1.72.0 milestone Jun 14, 2023
@nnethercote nnethercote deleted the no-tiny-cgus branch June 14, 2023 06:52
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (fa8762b): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.1% [-7.8%, -0.3%] 19
Improvements ✅
(secondary)
-8.5% [-45.9%, -0.4%] 22
All ❌✅ (primary) -2.1% [-7.8%, -0.3%] 19

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-3.6% [-6.4%, -1.8%] 10
Improvements ✅
(secondary)
-7.1% [-18.9%, -2.4%] 9
All ❌✅ (primary) -3.6% [-6.4%, -1.8%] 10

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
5.0% [4.4%, 5.3%] 3
Improvements ✅
(primary)
-3.6% [-8.6%, -0.8%] 18
Improvements ✅
(secondary)
-11.6% [-46.6%, -2.1%] 17
All ❌✅ (primary) -3.6% [-8.6%, -0.8%] 18

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.9% [-13.3%, -0.1%] 22
Improvements ✅
(secondary)
-7.3% [-48.2%, -0.0%] 18
All ❌✅ (primary) -1.9% [-13.3%, -0.1%] 22

Bootstrap: 648.99s -> 648.277s (-0.11%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-trait-system-refactor The Rustc Trait System Refactor Initiative
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants