Overall an interesting week performance wise, with small improvements to a vast
number of benchmarks seeming to outweigh an isolated set of (slightly) larger
regressions. It included a number of PRs regressed instruction counts but did
not matter for cycle times, plus one mysterious regression to check_match
and
mir_borrowck
from reworking constructor splitting (see report on PR 116391 for
details), and an awesome broad set of improvements from automatically inlining
small functions across crates (see report on PR 116505 for details).
Triage done by @pnkfelix. Revision range: 84d44dd1..b9832e72
Summary:
(instructions:u) | mean | range | count |
---|---|---|---|
Regressions ❌ (primary) |
3.0% | [0.3%, 12.2%] | 7 |
Regressions ❌ (secondary) |
0.7% | [0.3%, 1.2%] | 15 |
Improvements ✅ (primary) |
-1.1% | [-17.9%, -0.2%] | 131 |
Improvements ✅ (secondary) |
-2.4% | [-39.6%, -0.2%] | 121 |
All ❌✅ (primary) | -0.9% | [-17.9%, 12.2%] | 138 |
4 Regressions, 1 Improvements, 4 Mixed; 3 of them in rollups 84 artifact comparisons made in total
Rollup of 7 pull requests #116605 (Comparison Link)
(instructions:u) | mean | range | count |
---|---|---|---|
Regressions ❌ (primary) |
0.4% | [0.2%, 0.6%] | 7 |
Regressions ❌ (secondary) |
0.3% | [0.3%, 0.4%] | 3 |
Improvements ✅ (primary) |
- | - | 0 |
Improvements ✅ (secondary) |
- | - | 0 |
All ❌✅ (primary) | 0.4% | [0.2%, 0.6%] | 7 |
- solely rustdoc regression
- believed to be caused by PR 109422 "rustdoc-search: add impl disambiguator to duplicate assoc items"
- already marked as triaged
Optimize librustc_driver.so
with BOLT #116352 (Comparison Link)
(instructions:u) | mean | range | count |
---|---|---|---|
Regressions ❌ (primary) |
2.3% | [0.2%, 5.7%] | 10 |
Regressions ❌ (secondary) |
1.9% | [0.3%, 5.0%] | 60 |
Improvements ✅ (primary) |
- | - | 0 |
Improvements ✅ (secondary) |
-0.3% | [-0.3%, -0.3%] | 4 |
All ❌✅ (primary) | 2.3% | [0.2%, 5.7%] | 10 |
- primary instruction-count regressions were restricted to helloworld and html5ever
- As noted in comment by Kobzol, the instruction counts regressed for many benchmarks, but the cycle counts solely improved, significantly so, and bootstrap time improved (628.052s -> 623.517s (-0.72%)).
- already marked as triaged
Rollup of 3 pull requests #116742 (Comparison Link)
(instructions:u) | mean | range | count |
---|---|---|---|
Regressions ❌ (primary) |
0.3% | [0.3%, 0.4%] | 3 |
Regressions ❌ (secondary) |
- | - | 0 |
Improvements ✅ (primary) |
- | - | 0 |
Improvements ✅ (secondary) |
- | - | 0 |
All ❌✅ (primary) | 0.3% | [0.3%, 0.4%] | 3 |
- Regressions are solely to bitmaps full scenarios.
- Looks like a blip (i.e. noise) based on the graph over time.
- marking as triaged.
don't UB on dangling ptr deref, instead check inbounds on projections #114330 (Comparison Link)
(instructions:u) | mean | range | count |
---|---|---|---|
Regressions ❌ (primary) |
- | - | 0 |
Regressions ❌ (secondary) |
0.7% | [0.5%, 1.0%] | 17 |
Improvements ✅ (primary) |
- | - | 0 |
Improvements ✅ (secondary) |
- | - | 0 |
All ❌✅ (primary) | - | - | 0 |
- From skimming the PR, one can see that the PR author (RalfJung) iterated on this to identify a solution that would minimize regressions.
- As noted by the PR author, only secondary benchmarks were affected.
- Also, while instruction-counts regressed, the cycle-counts did not, at least not enough to pass our noise threshold.
- marking as triaged.
optimize zipping over array iterators #115515 (Comparison Link)
(instructions:u) | mean | range | count |
---|---|---|---|
Regressions ❌ (primary) |
- | - | 0 |
Regressions ❌ (secondary) |
- | - | 0 |
Improvements ✅ (primary) |
-0.3% | [-0.4%, -0.2%] | 3 |
Improvements ✅ (secondary) |
- | - | 0 |
All ❌✅ (primary) | -0.3% | [-0.4%, -0.2%] | 3 |
- A small win from a PR addressing user-filed performance regression, namely issue #115339, "Performance regression of array::IntoIter vs slice::Iter"
Also consider call and yield as MIR SSA. #113915 (Comparison Link)
(instructions:u) | mean | range | count |
---|---|---|---|
Regressions ❌ (primary) |
3.9% | [3.9%, 3.9%] | 1 |
Regressions ❌ (secondary) |
0.1% | [0.1%, 0.1%] | 2 |
Improvements ✅ (primary) |
-0.4% | [-0.9%, -0.2%] | 26 |
Improvements ✅ (secondary) |
-0.4% | [-0.6%, -0.3%] | 5 |
All ❌✅ (primary) | -0.2% | [-0.9%, 3.9%] | 27 |
- The try perf run had sole primary regression of unicode-normalization-0.1.19 opt-full (1.19%), while the perf run against master had sole primary regression of exa-0.10.1 opt-full (3.90%).
- The exa regression has persisted forward (i.e. it is not transient noise).
- It was already been marked as triaged, as the performance changes were deemed a wash, apart from object code sizes which saw "small but clear" improvement.
Rollup of 5 pull requests #116640 (Comparison Link)
(instructions:u) | mean | range | count |
---|---|---|---|
Regressions ❌ (primary) |
- | - | 0 |
Regressions ❌ (secondary) |
1.1% | [1.1%, 1.1%] | 1 |
Improvements ✅ (primary) |
-0.3% | [-0.4%, -0.2%] | 4 |
Improvements ✅ (secondary) |
-0.4% | [-0.5%, -0.4%] | 6 |
All ❌✅ (primary) | -0.3% | [-0.4%, -0.2%] | 4 |
- sole regression was to secondary benchmark coercions debug incr-patched: add static arr item
- Looks like a blip (i.e. noise) based on the graph over time.
- marking as triaged
exhaustiveness: Rework constructor splitting #116391 (Comparison Link)
(instructions:u) | mean | range | count |
---|---|---|---|
Regressions ❌ (primary) |
0.2% | [0.2%, 0.3%] | 4 |
Regressions ❌ (secondary) |
3.9% | [0.5%, 5.8%] | 9 |
Improvements ✅ (primary) |
-0.4% | [-0.4%, -0.4%] | 1 |
Improvements ✅ (secondary) |
- | - | 0 |
All ❌✅ (primary) | 0.1% | [-0.4%, 0.3%] | 5 |
- the primary regressions were to cranelift-codegen-0.82.1 and cargo-0.60.0 in various incremental settings (mostly check builds)
- the large (>5%) secondary regressions are all to match-stress.
- the above cases were regressions for instruction-counts, but the cycle-counts didn't get marked as regressed in any of the same cases.
- in all cases, the performance loss from these regressions was subsequently recovered (or masked) by PR 116505 "Automatically enable cross-crate inlining for small functions". (I don't know if that's actually related or just an awesome change that bought so much performance that it masked this problem).
- Since the match-stress one was relatively large, I looked at the
self-profile results in the details
which indicates a change in the delta(time) for match-stress might be due to new overheads in
check_match
andmir_borrowck
. - But this is strange; I cannot tell how this PR could have affected codegen, which would be the only way I could imagine those functions being impacted.
- Not marking as triaged for now; this mystery might be worth looking into a bit more. (But then again, the only significant regression was to a secondary stress test, so maybe its not worth spending time on.)
Automatically enable cross-crate inlining for small functions #116505 (Comparison Link)
(instructions:u) | mean | range | count |
---|---|---|---|
Regressions ❌ (primary) |
2.3% | [0.3%, 13.0%] | 8 |
Regressions ❌ (secondary) |
0.5% | [0.2%, 0.8%] | 2 |
Improvements ✅ (primary) |
-1.2% | [-18.1%, -0.1%] | 148 |
Improvements ✅ (secondary) |
-2.2% | [-39.8%, -0.2%] | 209 |
All ❌✅ (primary) | -1.0% | [-18.1%, 13.0%] | 156 |
- Already marked as triaged
- This was clearly awesome and amazing (all the more amazing if you review the history)
- 'Nuff said.
- #116742 Rollup of 3 pull requests
- #116640 Rollup of 5 pull requests
- #116492 Rollup of 7 pull requests
- #116391 exhaustiveness: Rework constructor splitting
- #116183 Always preserve DebugInfo in DeadStoreElimination.
- #115762 Explain revealing of opaque types in layout_of ParamEnv
- #115751 some inspect improvements
- #115740 Cache reachable_set on disk
- #115252 Represent MIR composite debuginfo as projections instead of aggregates
- #115082 Fix races conditions with
SyntaxContext
decoding - #115025 Make subtyping explicit in MIR
- #114892 Remove conditional use of
Sharded
from query caches - #114481 Rollup of 9 pull requests
- #114459 Do not run ConstProp on mir_for_ctfe.
- #114330 don't UB on dangling ptr deref, instead check inbounds on projections
- #114321 get auto traits for parallel rustc
- #114023 Warn on inductive cycle in coherence leading to impls being considered not overlapping
- #114004 Add
riscv64gc-unknown-hermit
target - #113858 Always const-prop scalars and scalar pairs
- #113758 Turn copy into moves during DSE.
- #113485 Bump version to 1.73
- #113370 Rollup of 8 pull requests
- #113320 Add some extra information to opaque type cycle errors
- #113306 Update debuginfo test runner to provide more useful output
- #113304 Upgrade to indexmap 2.0.0
- #113270 perform TokenStream replacement in-place when possible in expand_macro
- #113057 Rollup of 2 pull requests
- #112963 Stop bubbling out hidden types from the eval obligation queries
- #112882 Rewrite
UnDerefer
- #112420 Rollup of 4 pull requests