Skip to content

Latest commit

 

History

History
198 lines (160 loc) · 14.9 KB

2023-10-18.md

File metadata and controls

198 lines (160 loc) · 14.9 KB

2023-10-18 Triage Log

Overall an interesting week performance wise, with small improvements to a vast number of benchmarks seeming to outweigh an isolated set of (slightly) larger regressions. It included a number of PRs regressed instruction counts but did not matter for cycle times, plus one mysterious regression to check_match and mir_borrowck from reworking constructor splitting (see report on PR 116391 for details), and an awesome broad set of improvements from automatically inlining small functions across crates (see report on PR 116505 for details).

Triage done by @pnkfelix. Revision range: 84d44dd1..b9832e72

Summary:

(instructions:u) mean range count
Regressions ❌
(primary)
3.0% [0.3%, 12.2%] 7
Regressions ❌
(secondary)
0.7% [0.3%, 1.2%] 15
Improvements ✅
(primary)
-1.1% [-17.9%, -0.2%] 131
Improvements ✅
(secondary)
-2.4% [-39.6%, -0.2%] 121
All ❌✅ (primary) -0.9% [-17.9%, 12.2%] 138

4 Regressions, 1 Improvements, 4 Mixed; 3 of them in rollups 84 artifact comparisons made in total

Regressions

Rollup of 7 pull requests #116605 (Comparison Link)

(instructions:u) mean range count
Regressions ❌
(primary)
0.4% [0.2%, 0.6%] 7
Regressions ❌
(secondary)
0.3% [0.3%, 0.4%] 3
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.4% [0.2%, 0.6%] 7
  • solely rustdoc regression
  • believed to be caused by PR 109422 "rustdoc-search: add impl disambiguator to duplicate assoc items"
  • already marked as triaged

Optimize librustc_driver.so with BOLT #116352 (Comparison Link)

(instructions:u) mean range count
Regressions ❌
(primary)
2.3% [0.2%, 5.7%] 10
Regressions ❌
(secondary)
1.9% [0.3%, 5.0%] 60
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.3% [-0.3%, -0.3%] 4
All ❌✅ (primary) 2.3% [0.2%, 5.7%] 10
  • primary instruction-count regressions were restricted to helloworld and html5ever
  • As noted in comment by Kobzol, the instruction counts regressed for many benchmarks, but the cycle counts solely improved, significantly so, and bootstrap time improved (628.052s -> 623.517s (-0.72%)).
  • already marked as triaged

Rollup of 3 pull requests #116742 (Comparison Link)

(instructions:u) mean range count
Regressions ❌
(primary)
0.3% [0.3%, 0.4%] 3
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.3% [0.3%, 0.4%] 3
  • Regressions are solely to bitmaps full scenarios.
  • Looks like a blip (i.e. noise) based on the graph over time.
  • marking as triaged.

don't UB on dangling ptr deref, instead check inbounds on projections #114330 (Comparison Link)

(instructions:u) mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.7% [0.5%, 1.0%] 17
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0
  • From skimming the PR, one can see that the PR author (RalfJung) iterated on this to identify a solution that would minimize regressions.
  • As noted by the PR author, only secondary benchmarks were affected.
  • Also, while instruction-counts regressed, the cycle-counts did not, at least not enough to pass our noise threshold.
  • marking as triaged.

Improvements

optimize zipping over array iterators #115515 (Comparison Link)

(instructions:u) mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-0.3% [-0.4%, -0.2%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.3% [-0.4%, -0.2%] 3
  • A small win from a PR addressing user-filed performance regression, namely issue #115339, "Performance regression of array::IntoIter vs slice::Iter"

Mixed

Also consider call and yield as MIR SSA. #113915 (Comparison Link)

(instructions:u) mean range count
Regressions ❌
(primary)
3.9% [3.9%, 3.9%] 1
Regressions ❌
(secondary)
0.1% [0.1%, 0.1%] 2
Improvements ✅
(primary)
-0.4% [-0.9%, -0.2%] 26
Improvements ✅
(secondary)
-0.4% [-0.6%, -0.3%] 5
All ❌✅ (primary) -0.2% [-0.9%, 3.9%] 27
  • The try perf run had sole primary regression of unicode-normalization-0.1.19 opt-full (1.19%), while the perf run against master had sole primary regression of exa-0.10.1 opt-full (3.90%).
  • The exa regression has persisted forward (i.e. it is not transient noise).
  • It was already been marked as triaged, as the performance changes were deemed a wash, apart from object code sizes which saw "small but clear" improvement.

Rollup of 5 pull requests #116640 (Comparison Link)

(instructions:u) mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.1% [1.1%, 1.1%] 1
Improvements ✅
(primary)
-0.3% [-0.4%, -0.2%] 4
Improvements ✅
(secondary)
-0.4% [-0.5%, -0.4%] 6
All ❌✅ (primary) -0.3% [-0.4%, -0.2%] 4
  • sole regression was to secondary benchmark coercions debug incr-patched: add static arr item
  • Looks like a blip (i.e. noise) based on the graph over time.
  • marking as triaged

exhaustiveness: Rework constructor splitting #116391 (Comparison Link)

(instructions:u) mean range count
Regressions ❌
(primary)
0.2% [0.2%, 0.3%] 4
Regressions ❌
(secondary)
3.9% [0.5%, 5.8%] 9
Improvements ✅
(primary)
-0.4% [-0.4%, -0.4%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.1% [-0.4%, 0.3%] 5
  • the primary regressions were to cranelift-codegen-0.82.1 and cargo-0.60.0 in various incremental settings (mostly check builds)
  • the large (>5%) secondary regressions are all to match-stress.
  • the above cases were regressions for instruction-counts, but the cycle-counts didn't get marked as regressed in any of the same cases.
  • in all cases, the performance loss from these regressions was subsequently recovered (or masked) by PR 116505 "Automatically enable cross-crate inlining for small functions". (I don't know if that's actually related or just an awesome change that bought so much performance that it masked this problem).
  • Since the match-stress one was relatively large, I looked at the self-profile results in the details which indicates a change in the delta(time) for match-stress might be due to new overheads in check_match and mir_borrowck.
  • But this is strange; I cannot tell how this PR could have affected codegen, which would be the only way I could imagine those functions being impacted.
  • Not marking as triaged for now; this mystery might be worth looking into a bit more. (But then again, the only significant regression was to a secondary stress test, so maybe its not worth spending time on.)

Automatically enable cross-crate inlining for small functions #116505 (Comparison Link)

(instructions:u) mean range count
Regressions ❌
(primary)
2.3% [0.3%, 13.0%] 8
Regressions ❌
(secondary)
0.5% [0.2%, 0.8%] 2
Improvements ✅
(primary)
-1.2% [-18.1%, -0.1%] 148
Improvements ✅
(secondary)
-2.2% [-39.8%, -0.2%] 209
All ❌✅ (primary) -1.0% [-18.1%, 13.0%] 156
  • Already marked as triaged
  • This was clearly awesome and amazing (all the more amazing if you review the history)
  • 'Nuff said.

Untriaged Pull Requests