Skip to content

Conversation

saethlin
Copy link
Member

@saethlin saethlin commented Aug 27, 2025

I noticed in a side project that a function which just compares to [u64; 2] for equality is not cross-crate-inlinable. That was surprising to me because I didn't think that code contained a function call, but of course our array comparisons are lowered to an intrinsic. Intrinsic calls don't make a function no longer a leaf, so it makes sense to add this as an exception to the "only leaves" cross-crate-inline heuristic.

This is the useful compare link: https://perf.rust-lang.org/compare.html?start=7cb1a81145a739c4fd858abe3c624ce8e6e5f9cd&end=c3f0a64dbf9fba4722dacf8e39d2fe00069c995e&stat=instructions%3Au because it disables CGU merging in both commits, so effects that cause changes in the sysroot to perturb partitioning downstream are excluded. Perturbations to what is and isn't cross-crate-inlinable in the sysroot has chaotic effects on what items are in which CGUs after merging. It looks like before this PR by sheer luck some of the CGUs dirtied by the patch in eza incr-unchanged happened to be merged together, and with this PR they are not.

The perf runs on this PR point to a nice runtime performance improvement.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Aug 27, 2025
@saethlin
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Aug 27, 2025
Ignore intrinsic calls in cross-crate-inlining cost model
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Aug 27, 2025
@rust-log-analyzer

This comment has been minimized.

Comment on lines 139 to 140
if let Some((fn_def_id, _)) = func.const_fn_def() {
if self.tcx.intrinsic(fn_def_id).is_some() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this would benefit from combining into one if using either let-chaining or is_some_and.

@rust-bors
Copy link

rust-bors bot commented Aug 27, 2025

☀️ Try build successful (CI)
Build commit: e8d1f9d (e8d1f9d5716f4389b8330b02fb30ec690c68624a, parent: 160e7623e8cbbf1feab2b6e2a24733a98c7bde9c)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (e8d1f9d): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
1.1% [0.4%, 2.6%] 8
Regressions ❌
(secondary)
0.3% [0.1%, 0.5%] 4
Improvements ✅
(primary)
-0.5% [-0.7%, -0.3%] 4
Improvements ✅
(secondary)
-0.3% [-0.6%, -0.0%] 15
All ❌✅ (primary) 0.6% [-0.7%, 2.6%] 12

Max RSS (memory usage)

Results (primary -1.1%, secondary -3.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
1.0% [1.0%, 1.0%] 1
Regressions ❌
(secondary)
3.0% [0.8%, 5.1%] 2
Improvements ✅
(primary)
-2.2% [-3.6%, -0.8%] 2
Improvements ✅
(secondary)
-4.0% [-5.9%, -1.8%] 13
All ❌✅ (primary) -1.1% [-3.6%, 1.0%] 3

Cycles

Results (primary 0.8%, secondary 0.6%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
2.7% [2.1%, 3.2%] 2
Regressions ❌
(secondary)
3.0% [2.1%, 4.0%] 4
Improvements ✅
(primary)
-2.9% [-2.9%, -2.9%] 1
Improvements ✅
(secondary)
-4.2% [-6.0%, -2.5%] 2
All ❌✅ (primary) 0.8% [-2.9%, 3.2%] 3

Binary size

Results (primary 0.1%, secondary -0.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.2% [0.0%, 1.0%] 32
Regressions ❌
(secondary)
0.1% [0.1%, 0.3%] 10
Improvements ✅
(primary)
-0.1% [-0.2%, -0.0%] 14
Improvements ✅
(secondary)
-0.2% [-0.2%, -0.1%] 37
All ❌✅ (primary) 0.1% [-0.2%, 1.0%] 46

Bootstrap: 466.645s -> 466.461s (-0.04%)
Artifact size: 391.15 MiB -> 391.41 MiB (0.07%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Aug 27, 2025
@saethlin
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Aug 27, 2025
Ignore intrinsic calls in cross-crate-inlining cost model
@rust-bors

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Aug 27, 2025
@rust-log-analyzer

This comment has been minimized.

@saethlin
Copy link
Member Author

@bors try cancel

@rust-bors
Copy link

rust-bors bot commented Aug 27, 2025

Try build cancelled. Cancelled workflows:

@saethlin
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Aug 27, 2025
Ignore intrinsic calls in cross-crate-inlining cost model
@rust-log-analyzer

This comment has been minimized.

@rust-bors
Copy link

rust-bors bot commented Aug 27, 2025

☀️ Try build successful (CI)
Build commit: 0f272e5 (0f272e5b0ae53eac2844ed412fcedfbe9ecf3a9d, parent: 3c91be712d3d84f6345cd22eae34c47b3a22a3d3)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (0f272e5): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
1.1% [0.1%, 2.8%] 9
Regressions ❌
(secondary)
1.8% [0.1%, 3.0%] 10
Improvements ✅
(primary)
-0.5% [-0.6%, -0.3%] 4
Improvements ✅
(secondary)
-0.3% [-0.6%, -0.1%] 14
All ❌✅ (primary) 0.6% [-0.6%, 2.8%] 13

Max RSS (memory usage)

Results (primary -0.0%, secondary -2.6%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
3.9% [1.6%, 6.2%] 2
Regressions ❌
(secondary)
4.2% [3.3%, 5.1%] 3
Improvements ✅
(primary)
-3.9% [-4.7%, -3.1%] 2
Improvements ✅
(secondary)
-4.1% [-6.7%, -1.7%] 13
All ❌✅ (primary) -0.0% [-4.7%, 6.2%] 4

Cycles

Results (primary 2.5%, secondary 0.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
2.5% [2.3%, 2.7%] 2
Regressions ❌
(secondary)
3.1% [2.1%, 4.3%] 6
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.6% [-5.5%, -2.0%] 4
All ❌✅ (primary) 2.5% [2.3%, 2.7%] 2

Binary size

Results (primary 0.1%, secondary -0.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.2% [0.0%, 1.1%] 32
Regressions ❌
(secondary)
0.2% [0.1%, 0.3%] 10
Improvements ✅
(primary)
-0.1% [-0.2%, -0.0%] 15
Improvements ✅
(secondary)
-0.2% [-0.2%, -0.1%] 37
All ❌✅ (primary) 0.1% [-0.2%, 1.1%] 47

Bootstrap: 468.329s -> 467.725s (-0.13%)
Artifact size: 391.15 MiB -> 391.41 MiB (0.07%)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (c3f0a64): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
1.1% [0.1%, 4.0%] 25
Regressions ❌
(secondary)
12.8% [0.1%, 59.7%] 12
Improvements ✅
(primary)
-5.7% [-84.8%, -0.1%] 19
Improvements ✅
(secondary)
-0.3% [-0.7%, -0.0%] 31
All ❌✅ (primary) -1.8% [-84.8%, 4.0%] 44

Max RSS (memory usage)

Results (primary -2.0%, secondary -0.6%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
4.3% [1.5%, 7.2%] 2
Regressions ❌
(secondary)
2.0% [2.0%, 2.0%] 2
Improvements ✅
(primary)
-8.3% [-12.7%, -3.9%] 2
Improvements ✅
(secondary)
-3.2% [-5.1%, -1.3%] 2
All ❌✅ (primary) -2.0% [-12.7%, 7.2%] 4

Cycles

Results (primary -6.7%, secondary 12.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
2.7% [1.6%, 3.8%] 6
Regressions ❌
(secondary)
20.1% [4.4%, 48.0%] 7
Improvements ✅
(primary)
-16.1% [-84.4%, -1.5%] 6
Improvements ✅
(secondary)
-3.8% [-5.0%, -2.0%] 3
All ❌✅ (primary) -6.7% [-84.4%, 3.8%] 12

Binary size

Results (primary 1.3%, secondary 0.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
1.6% [0.0%, 15.8%] 60
Regressions ❌
(secondary)
1.5% [0.1%, 3.8%] 16
Improvements ✅
(primary)
-0.1% [-0.2%, -0.0%] 17
Improvements ✅
(secondary)
-0.1% [-0.2%, -0.1%] 37
All ❌✅ (primary) 1.3% [-0.2%, 15.8%] 77

Bootstrap: 465.874s -> 466.593s (0.15%)
Artifact size: 388.34 MiB -> 388.32 MiB (-0.00%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 3, 2025
@saethlin
Copy link
Member Author

saethlin commented Sep 3, 2025

This is the useful compare link: https://perf.rust-lang.org/compare.html?start=7cb1a81145a739c4fd858abe3c624ce8e6e5f9cd&end=c3f0a64dbf9fba4722dacf8e39d2fe00069c995e&stat=instructions%3Au

Because it disables CGU merging in both commits, so effects that cause changes in the sysroot to perturb partitioning downstream are excluded.

@saethlin saethlin force-pushed the ignore-intrinsic-calls branch from 5dc2b2e to 53bb74b Compare September 6, 2025 00:34
@saethlin saethlin force-pushed the ignore-intrinsic-calls branch from 53bb74b to ab91a63 Compare September 6, 2025 00:44
@saethlin
Copy link
Member Author

saethlin commented Sep 6, 2025

@Kobzol I figured out this case, see the updated PR description

@saethlin saethlin marked this pull request as ready for review September 6, 2025 00:45
@rustbot
Copy link
Collaborator

rustbot commented Sep 6, 2025

r? @jieyouxu

rustbot has assigned @jieyouxu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 6, 2025
@rustbot
Copy link
Collaborator

rustbot commented Sep 6, 2025

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

@cjgillot
Copy link
Contributor

cjgillot commented Sep 7, 2025

Thanks a lot for working on this. I wonder how much we should extend this to simple wrapper functions that do almost nothing except calling another function.

For perf triage: the perf run results in CGU shuffling, with wild changes in perf. Without this CGU effect, this PR is a net improvement.

@bors r+

@bors
Copy link
Collaborator

bors commented Sep 7, 2025

📌 Commit ab91a63 has been approved by cjgillot

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 7, 2025
@cjgillot cjgillot added the perf-regression-triaged The performance regression has been triaged. label Sep 7, 2025
@saethlin
Copy link
Member Author

saethlin commented Sep 7, 2025

I wonder how much we should extend this to simple wrapper functions that do almost nothing except calling another function.

I tried that before in #116898, and at a glance it has the same CGU shuffling problem and needs the same comparison trick I used here. Also that PR is 2 years old so the perf might be completely different now.

@jieyouxu jieyouxu assigned cjgillot and unassigned jieyouxu Sep 7, 2025
@bors
Copy link
Collaborator

bors commented Sep 8, 2025

⌛ Testing commit ab91a63 with merge a09fbe2...

@bors
Copy link
Collaborator

bors commented Sep 8, 2025

☀️ Test successful - checks-actions
Approved by: cjgillot
Pushing a09fbe2 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 8, 2025
@bors bors merged commit a09fbe2 into rust-lang:master Sep 8, 2025
11 checks passed
@rustbot rustbot added this to the 1.91.0 milestone Sep 8, 2025
Copy link
Contributor

github-actions bot commented Sep 8, 2025

What is this? This is an experimental post-merge analysis report that shows differences in test outcomes between the merged PR and its parent PR.

Comparing 2f3f27b (parent) -> a09fbe2 (this PR)

Test differences

Show 1 test diff

1 doctest diff were found. These are ignored, as they are noisy.

Test dashboard

Run

cargo run --manifest-path src/ci/citool/Cargo.toml -- \
    test-dashboard a09fbe2c8372643a27a8082236120f95ed4e6bba --output-dir test-dashboard

And then open test-dashboard/index.html in your browser to see an overview of all executed tests.

Job duration changes

  1. pr-check-1: 1348.2s -> 1538.3s (14.1%)
  2. x86_64-gnu-tools: 3372.4s -> 3747.5s (11.1%)
  3. dist-aarch64-apple: 6537.0s -> 7224.9s (10.5%)
  4. x86_64-gnu-llvm-19-2: 6619.3s -> 5980.8s (-9.6%)
  5. pr-check-2: 2256.5s -> 2440.5s (8.2%)
  6. x86_64-gnu-llvm-19: 2528.2s -> 2728.7s (7.9%)
  7. aarch64-msvc-1: 6585.9s -> 7091.6s (7.7%)
  8. x86_64-rust-for-linux: 3032.3s -> 2810.3s (-7.3%)
  9. aarch64-apple: 5716.5s -> 6132.0s (7.3%)
  10. aarch64-msvc-2: 4939.7s -> 5244.2s (6.2%)
How to interpret the job duration changes?

Job durations can vary a lot, based on the actual runner instance
that executed the job, system noise, invalidated caches, etc. The table above is provided
mostly for t-infra members, for simpler debugging of potential CI slow-downs.

@saethlin saethlin deleted the ignore-intrinsic-calls branch September 8, 2025 06:33
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (a09fbe2): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Our benchmarks found a performance regression caused by this PR.
This might be an actual regression, but it can also be just noise.

Next Steps:

  • If the regression was expected or you think it can be justified,
    please write a comment with sufficient written justification, and add
    @rustbot label: +perf-regression-triaged to it, to mark the regression as triaged.
  • If you think that you know of a way to resolve the regression, try to create
    a new PR with a fix for the regression.
  • If you do not understand the regression or you think that it is just noise,
    you can ask the @rust-lang/wg-compiler-performance working group for help (members of this group
    were already notified of this PR).

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
1.0% [0.3%, 2.4%] 10
Regressions ❌
(secondary)
1.9% [0.2%, 3.0%] 9
Improvements ✅
(primary)
-0.5% [-0.7%, -0.4%] 5
Improvements ✅
(secondary)
-0.4% [-0.6%, -0.1%] 14
All ❌✅ (primary) 0.5% [-0.7%, 2.4%] 15

Max RSS (memory usage)

Results (primary 2.1%, secondary 3.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
7.8% [7.8%, 7.8%] 1
Regressions ❌
(secondary)
4.0% [1.5%, 6.8%] 11
Improvements ✅
(primary)
-3.5% [-3.5%, -3.5%] 1
Improvements ✅
(secondary)
-2.3% [-2.8%, -1.8%] 2
All ❌✅ (primary) 2.1% [-3.5%, 7.8%] 2

Cycles

Results (primary 1.6%, secondary 2.7%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
1.6% [1.3%, 2.0%] 3
Regressions ❌
(secondary)
3.9% [2.7%, 4.8%] 6
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-4.2% [-4.2%, -4.2%] 1
All ❌✅ (primary) 1.6% [1.3%, 2.0%] 3

Binary size

Results (primary 0.0%, secondary -0.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.2% [0.0%, 0.9%] 27
Regressions ❌
(secondary)
0.1% [0.1%, 0.3%] 10
Improvements ✅
(primary)
-0.1% [-0.2%, -0.0%] 25
Improvements ✅
(secondary)
-0.1% [-0.2%, -0.1%] 40
All ❌✅ (primary) 0.0% [-0.2%, 0.9%] 52

Bootstrap: 468.032s -> 468.082s (0.01%)
Artifact size: 387.45 MiB -> 387.72 MiB (0.07%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants