Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BOLT] Use CDSort and CDSplit #119418

Merged
merged 1 commit into from
Mar 16, 2024
Merged

[BOLT] Use CDSort and CDSplit #119418

merged 1 commit into from
Mar 16, 2024

Conversation

aaupov
Copy link
Contributor

@aaupov aaupov commented Dec 29, 2023

CDSort and CDSplit are the most recent versions of function ordering and function splitting algorithms with some improvements over the previous baseline (ext-tsp and two-way splitting).

CDSort and CDSplit are the most recent versions of function ordering and function splitting algorithms with some improvements over the previous baseline (ext-tsp and two-way splitting).
@rustbot
Copy link
Collaborator

rustbot commented Dec 29, 2023

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Mark-Simulacrum (or someone else) soon.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 29, 2023
@aaupov
Copy link
Contributor Author

aaupov commented Dec 29, 2023

CC @Kobzol

@Kobzol
Copy link
Contributor

Kobzol commented Dec 29, 2023

Hi! I already tried cdsort, it provided no performance benefit. I haven't tried it with .arg("-split-strategy=cdsplit") though.

I think that we'd need to bump our LLVM version for this, as it's 17.0.4 currently, and that doesn't know about cdsort, AFAIK.

@aaupov
Copy link
Contributor Author

aaupov commented Dec 29, 2023

Hi! I already tried cdsort, it provided no performance benefit.

Interesting, do you have a link with perf stats? We observed some improvements on large binaries in icache and itlb misses.

I haven't tried it with .arg("-split-strategy=cdsplit") though.

CDSplit is separate from cdsort and could be beneficial with hfsort+.

I think that we'd need to bump our LLVM version for this, as it's 17.0.4 currently, and that doesn't know about cdsort, AFAIK.

Sorry about not checking the LLVM version used. CDSplit is also very recent and was only upstreamed a month ago. If you don't use trunk BOLT, it will probably be in the next release.

@Kobzol
Copy link
Contributor

Kobzol commented Dec 29, 2023

I think that we have talked about this on Discord 😅

The previous attempt was this commit, the results are here.

There were a few cycles wins, nothing too big though. We could switch to cdsort once we bump LLVM to 18.

@Kobzol
Copy link
Contributor

Kobzol commented Dec 29, 2023

Looking at the results again, there were some nice max RSS improvements, and bootstrap time was also improved by 5s, which isn't trivial. Maybe I was too harsh to cdsort, it looks like an actual improvement.

@Kobzol
Copy link
Contributor

Kobzol commented Dec 29, 2023

I would suggest to first try only .arg("-split-strategy=cdsplit") in this PR, I can start a perf. run for that (assuming that we don't need LLVM 18 for it).

@aaupov
Copy link
Contributor Author

aaupov commented Dec 29, 2023

I would suggest to first try only .arg("-split-strategy=cdsplit") in this PR, I can start a perf. run for that (assuming that we don't need LLVM 18 for it).

We do need trunk (or LLVM 18 rc) BOLT for CDSplit.

@Kobzol
Copy link
Contributor

Kobzol commented Dec 29, 2023

I would suggest waiting for LLVM 18 RC, and then we can try both options.

@Mark-Simulacrum
Copy link
Member

r? @Kobzol

@rustbot rustbot assigned Kobzol and unassigned Mark-Simulacrum Dec 31, 2023
@Kobzol
Copy link
Contributor

Kobzol commented Feb 5, 2024

LLVM 18-rc1 has been tagged, so perhaps we can try it. The change needs to be made here.

@mati865
Copy link
Contributor

mati865 commented Feb 5, 2024

I think it'd also require PassWrapper changes from nikic@be8c330

@Kobzol
Copy link
Contributor

Kobzol commented Mar 10, 2024

@aaupov We now have LLVM 18 as the host compiler on CI. I tried the new CDSort/CDSpltit flags here and they provide nice max RSS reductions and also some cycle wins.

Could you please rebase so that I can start another perf. run?

@Kobzol
Copy link
Contributor

Kobzol commented Mar 16, 2024

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 16, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 16, 2024
[BOLT] Use CDSort and CDSplit

CDSort and CDSplit are the most recent versions of function ordering and function splitting algorithms with some improvements over the previous baseline (ext-tsp and two-way splitting).
@bors
Copy link
Contributor

bors commented Mar 16, 2024

⌛ Trying commit 13ef6d4 with merge 4bd76ed...

@bors
Copy link
Contributor

bors commented Mar 16, 2024

☀️ Try build successful - checks-actions
Build commit: 4bd76ed (4bd76edee1242728ed5e9f1e80e056678b9fc3fb)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (4bd76ed): comparison URL.

Overall result: ✅ improvements - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.1% [-0.1%, -0.1%] 1
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-3.0% [-7.2%, -1.2%] 55
Improvements ✅
(secondary)
-3.3% [-7.8%, -0.5%] 114
All ❌✅ (primary) -3.0% [-7.2%, -1.2%] 55

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 670.527s -> 667.725s (-0.42%)
Artifact size: 311.59 MiB -> 312.70 MiB (0.36%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 16, 2024
@Kobzol
Copy link
Contributor

Kobzol commented Mar 16, 2024

A slight LLVM/rustc artifact size increase, but also very nice Max-RSS improvements. Thank you! Let us know if you have any other useful BOLT flags up your sleeve :)

@bors r+

@bors
Copy link
Contributor

bors commented Mar 16, 2024

📌 Commit 13ef6d4 has been approved by Kobzol

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 16, 2024
@bors
Copy link
Contributor

bors commented Mar 16, 2024

⌛ Testing commit 13ef6d4 with merge 9023f90...

@bors
Copy link
Contributor

bors commented Mar 16, 2024

☀️ Test successful - checks-actions
Approved by: Kobzol
Pushing 9023f90 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Mar 16, 2024
@bors bors merged commit 9023f90 into rust-lang:master Mar 16, 2024
12 checks passed
@rustbot rustbot added this to the 1.78.0 milestone Mar 16, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (9023f90): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-2.5% [-4.6%, -0.8%] 37
Improvements ✅
(secondary)
-3.9% [-8.1%, -0.5%] 39
All ❌✅ (primary) -2.5% [-4.6%, -0.8%] 37

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.8% [-0.8%, -0.8%] 1
Improvements ✅
(secondary)
-0.9% [-0.9%, -0.9%] 1
All ❌✅ (primary) -0.8% [-0.8%, -0.8%] 1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 671.684s -> 667.951s (-0.56%)
Artifact size: 311.57 MiB -> 312.73 MiB (0.37%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants