Base CGU estimates on basic block count, not statement count. #113047

nnethercote · 2023-06-26T07:51:48Z

nnethercote · 2023-06-26T07:52:02Z

bors · 2023-06-26T07:52:11Z

⌛ Trying commit 64ee4e1 with merge 57da7fc899d24430772acd1c0df9059d593714e0...

bors · 2023-06-26T09:10:31Z

☀️ Try build successful - checks-actions
Build commit: 57da7fc899d24430772acd1c0df9059d593714e0 (57da7fc899d24430772acd1c0df9059d593714e0)

rust-timer · 2023-06-26T10:25:07Z

Finished benchmarking commit (57da7fc899d24430772acd1c0df9059d593714e0): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.7%	[0.9%, 3.0%]	3
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.8%	[-2.9%, -0.4%]	58
Improvements ✅ (secondary)	-0.8%	[-1.4%, -0.4%]	24
All ❌✅ (primary)	-0.6%	[-2.9%, 3.0%]	61

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	4.0%	[1.7%, 7.4%]	4
Regressions ❌ (secondary)	2.7%	[2.0%, 3.0%]	5
Improvements ✅ (primary)	-5.1%	[-10.7%, -1.5%]	5
Improvements ✅ (secondary)	-2.5%	[-3.1%, -2.1%]	3
All ❌✅ (primary)	-1.0%	[-10.7%, 7.4%]	9

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.0%	[1.1%, 2.9%]	2
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-3.1%	[-3.4%, -2.8%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-0.5%	[-3.4%, 2.9%]	4

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.5%	[0.0%, 1.7%]	32
Regressions ❌ (secondary)	0.9%	[0.2%, 1.5%]	2
Improvements ✅ (primary)	-0.5%	[-0.7%, -0.4%]	4
Improvements ✅ (secondary)	-1.1%	[-1.1%, -1.1%]	1
All ❌✅ (primary)	0.4%	[-0.7%, 1.7%]	36

Bootstrap: 662.496s -> 666.596s (0.62%)

klensy · 2023-06-26T10:45:09Z

Interesting, run https://github.com/rust-lang/rust/actions/runs/5375275677 still spinning while should be completed a while ago.

https://github.com/rust-lang/rust/actions/runs/5375275677/jobs/9751558060#step:25:1784

thread '[codegen-units] tests/codegen-units/partitioning/incremental-merging.rs' panicked at 'explicit panic', src/tools/compiletest/src/runtest.rs:2906:13
Build completed unsuccessfully in 0:12:58
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    [codegen-units] tests/codegen-units/partitioning/incremental-merging.rs

Kobzol · 2023-06-26T11:03:46Z

Seems like some kind of GH bug, all the workflow jobs have actually completed, but the workflow run isn't marked as completed yet.

nnethercote · 2023-06-26T11:29:07Z

@klensy: I got a "PR run failed" email saying three jobs had failed, but when I visited here they were still spinning.

nnethercote · 2023-06-29T00:49:55Z

Some explanation: after much effort, I concluded that all efforts to improve CGU formation are basically doomed because the CGU size estimates aren't very good. There's no point doing anything clever if the estimates are regularly off by a factor of 2 or more in both directions (underestimating and overestimating).

So I wrote code to compare the CGU estimates to the actual compile times, and computed an error value based on the mean absolute deviation. The goal was not to to come up with a perfect measure of error, but something reasonably good that would let me evaluate alternative CGU size estimation functions.

I tried quite a few different estimation functions, and found that basing the estimates on basic block counts rather than statement counts was a big win. This makes a certain amount of sense, because basic blocks are pretty easy for compilers to handle, and a basic block with 20 statements is not that much harder than one with 5 statements, for example, because all the complexity comes from the control flow between basic blocks. (Picture me waving my hands right now.)

Here is the data showing the error measure for the old (statement-based) and new (basic-block based) estimate functions. The error for the new function is better in every case except two (marked with *) where it's marginally worse. Some drop to zero because the number of CGUs dropped to one, in which case there is no error because it's the degenerate case (can't have any discrepancy between CGU estimates when there is only one CGU).

Debug
                             old    new
bitmaps-3.1.0                0.58   0.00
cargo-0.60.0                 0.38   0.13
cexpr-0.6.0                  0.39   0.13
clap-3.1.6                   0.29   0.14
cranelift-codegen-0.82.1     0.34   0.10
diesel-1.4.8                 0.35   0.00
exa-0.10.1                   0.39   0.22
html5ever-0.26.0             0.27   0.15
image-0.24.1                 0.26   0.05
regex-1.5.5                  0.57   0.42
ripgrep-13.0.0               0.14   0.17 *
serde-1.0.136                0.42   0.00
stm32f4-0.14.0               0.20   0.07
syn-1.0.89                   0.38   0.14
ucd                          0.66   0.57
unicode-normalization-0.1.19 0.54   0.00
webrender-2022               0.28   0.12

Opt
                             old    new
bitmaps-3.1.0                0.46   0.00
cargo-0.60.0                 0.22   0.18
cexpr-0.6.0                  0.48   0.19
clap-3.1.6                   0.29   0.19
cranelift-codegen-0.82.1     0.33   0.33
diesel-1.4.8                 0.57   0.00
exa-0.10.1                   0.00   0.00
html5ever-0.26.0             0.26   0.11
image-0.24.1                 0.14   0.16 *
regex-1.5.5                  0.44   0.35
ripgrep-13.0.0               0.27   0.19
serde-1.0.136                0.61   0.00
stm32f4-0.14.0               0.11   0.11
syn-1.0.89                   0.46   0.19
ucd                          0.86   0.39
unicode-normalization-0.1.19 0.72   0.00
webrender-2022               0.26   0.23

This PR implements the new estimator, but the results are worse than I'd hoped.

Instruction counts are great! But not really relevant for this PR, because of the parallelism in the backend
Walltimes look slightly worse, though not that many benchmarks are affected. If you select "Show non-relevant results" then the results seem better overall, with lots of small improvements.
max-rss is a wash.
Binary size is a little worse. Not sure why, I would have thought wins and losses would balance out.

I think the reason why we didn't see bigger wall time improvement is that even with the improved estimate, it only takes one bad estimate to throw the whole thing off, especially if that is a bad underestimate -- you can end up with a single "long pole" CGU that takes much longer than the other CGUs to compile.

Anyway, I thought this was worth explaining, and I'd be interested to hear if others think the perf results make this worth merging.

cc @rust-lang/wg-compiler-performance @wesleywiser @mw

michaelwoerister · 2023-06-29T09:05:27Z

Thanks for the explanation, that's very interesting!

So I wrote code to compare the CGU estimates to the actual compile times

Is that end-to-end wall time?

I could also imagine that inline functions play a non-trivial role here:

They complicate the logic around merging CGUs, because the same inline functions can be in two CGUs and when the CGUs are merged, the inline function is merged too - i.e. sizeof(CGU1 + CGU2) < sizeof(CGU1) + sizeof(CGU2).
I suspect that the cost of performing the inlining is non-trivial inside of LLVM. But it's also hard to tell beforehand how much inlining will even be done.

It would be interesting to know if the cost estimates for debug builds (which do almost no inlining) are more accurate than for release builds (which do inlining, but probably also other super-linear optimizations).

What I'd really like to see at some point is the effect of throwing some machine learning on the problem. That seems a good fit for dealing with such a fuzzy, statistical problem. It would be a nice research project 🙂

nnethercote · 2023-07-03T06:49:22Z

Is that end-to-end wall time?

It is walltime for the "gen" phase (generating the LLVM IR) plus the "opt" phase (LLVM doing its work) plus the "lto" phase (opt builds only).

What I'd really like to see at some point is the effect of throwing some machine learning on the problem. That seems a good fit for dealing with such a fuzzy, statistical problem. It would be a nice research project 🙂

Perhaps, though I wonder if overfitting to specific machines would be a problem.

michaelwoerister · 2023-07-03T09:08:41Z

Perhaps, though I wonder if overfitting to specific machines would be a problem.

Yes, that's true. On the other hand, I think, we already have some overfitting to the existing benchmarks and benchmark machines going on.

FWIW, the change in this PR looks like an overall improvement to me.

nnethercote · 2023-07-23T22:18:48Z

I will close this. I have been working on collecting more MIR stats, for others with data science expertise to take a look at. Hopefully that will lead to a better estimation function.

Base CGU estimates on basic block count, not statement count.

64ee4e1

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 26, 2023

nnethercote marked this pull request as draft June 26, 2023 07:51

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 26, 2023

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jun 26, 2023

nnethercote mentioned this pull request Jul 14, 2023

Streamline size estimates #113684

Merged

nnethercote closed this Jul 23, 2023

nnethercote deleted the cgu-bb-estimate branch July 23, 2023 22:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Base CGU estimates on basic block count, not statement count. #113047

Base CGU estimates on basic block count, not statement count. #113047

nnethercote commented Jun 26, 2023

nnethercote commented Jun 26, 2023

This comment has been minimized.

bors commented Jun 26, 2023

bors commented Jun 26, 2023

This comment has been minimized.

rust-timer commented Jun 26, 2023

klensy commented Jun 26, 2023 •

edited

Kobzol commented Jun 26, 2023

nnethercote commented Jun 26, 2023

nnethercote commented Jun 29, 2023

michaelwoerister commented Jun 29, 2023

nnethercote commented Jul 3, 2023

michaelwoerister commented Jul 3, 2023

nnethercote commented Jul 23, 2023

Base CGU estimates on basic block count, not statement count. #113047

Base CGU estimates on basic block count, not statement count. #113047

Conversation

nnethercote commented Jun 26, 2023

nnethercote commented Jun 26, 2023

This comment has been minimized.

bors commented Jun 26, 2023

bors commented Jun 26, 2023

This comment has been minimized.

rust-timer commented Jun 26, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Instruction count

Max RSS (memory usage)

Cycles

Binary size

klensy commented Jun 26, 2023 • edited

Kobzol commented Jun 26, 2023

nnethercote commented Jun 26, 2023

nnethercote commented Jun 29, 2023

michaelwoerister commented Jun 29, 2023

nnethercote commented Jul 3, 2023

michaelwoerister commented Jul 3, 2023

nnethercote commented Jul 23, 2023

klensy commented Jun 26, 2023 •

edited