Specialize array cloning for Copy types #90755

scottmcm · 2021-11-10T05:45:06Z

Because after PR 86041, the optimizer no longer load-merges at the LLVM IR level, which might be part of the perf loss. (I'll run perf and see if this makes a difference.)

Also I added a codegen test so this hopefully won't regress in future -- it passes on stable and with my change here, but not on the 2021-11-09 nightly.

Example on current nightly: https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=1f52d46fb8fc3ca3ac9f097390085ffa

type T = u8;
const N: usize = 3;

pub fn demo_clone(x: &[T; N]) -> [T; N] {
    x.clone()
}

pub fn demo_copy(x: &[T; N]) -> [T; N] {
    *x
}

; playground::demo_clone
; Function Attrs: mustprogress nofree nosync nounwind nonlazybind uwtable willreturn
define i24 @_ZN10playground10demo_clone17h98a4f11453d1a753E([3 x i8]* noalias nocapture readonly align 1 dereferenceable(3) %x) unnamed_addr #0 personality i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {
start:
  %0 = getelementptr [3 x i8], [3 x i8]* %x, i64 0, i64 0
  %1 = getelementptr inbounds [3 x i8], [3 x i8]* %x, i64 0, i64 1
  %.val.i.i.i.i.i.i.i.i.i = load i8, i8* %0, align 1, !alias.scope !2, !noalias !9
  %2 = getelementptr inbounds [3 x i8], [3 x i8]* %x, i64 0, i64 2
  %.val.i.i.i.i.i.1.i.i.i.i = load i8, i8* %1, align 1, !alias.scope !2, !noalias !20
  %.val.i.i.i.i.i.2.i.i.i.i = load i8, i8* %2, align 1, !alias.scope !2, !noalias !23
  %array.sroa.6.0.insert.ext.i.i.i.i = zext i8 %.val.i.i.i.i.i.2.i.i.i.i to i32
  %array.sroa.6.0.insert.shift.i.i.i.i = shl nuw nsw i32 %array.sroa.6.0.insert.ext.i.i.i.i, 16
  %array.sroa.5.0.insert.ext.i.i.i.i = zext i8 %.val.i.i.i.i.i.1.i.i.i.i to i32
  %array.sroa.5.0.insert.shift.i.i.i.i = shl nuw nsw i32 %array.sroa.5.0.insert.ext.i.i.i.i, 8
  %array.sroa.0.0.insert.ext.i.i.i.i = zext i8 %.val.i.i.i.i.i.i.i.i.i to i32
  %array.sroa.5.0.insert.insert.i.i.i.i = or i32 %array.sroa.5.0.insert.shift.i.i.i.i, %array.sroa.0.0.insert.ext.i.i.i.i
  %array.sroa.0.0.insert.insert.i.i.i.i = or i32 %array.sroa.5.0.insert.insert.i.i.i.i, %array.sroa.6.0.insert.shift.i.i.i.i
  %.sroa.4.0.extract.trunc.i.i.i.i = trunc i32 %array.sroa.0.0.insert.insert.i.i.i.i to i24
  ret i24 %.sroa.4.0.extract.trunc.i.i.i.i
}

; playground::demo_copy
; Function Attrs: mustprogress nofree norecurse nosync nounwind nonlazybind readonly uwtable willreturn
define i24 @_ZN10playground9demo_copy17h7817453f9291d746E([3 x i8]* noalias nocapture readonly align 1 dereferenceable(3) %x) unnamed_addr #1 {
start:
  %.sroa.0.0..sroa_cast = bitcast [3 x i8]* %x to i24*
  %.sroa.0.0.copyload = load i24, i24* %.sroa.0.0..sroa_cast, align 1
  ret i24 %.sroa.0.0.copyload
}

Because after PR 86041, the optimizer no longer load-merges at the LLVM IR level, which might be part of the perf loss. (I'll run perf and see if this makes a difference.) Also I added a codegen test so this hopefully won't regress in future -- it passes on stable and with my change here, but not on the 2021-11-09 nightly.

rust-highfive · 2021-11-10T05:45:09Z

r? @m-ou-se

(rust-highfive has picked a reviewer for you, use r? to override)

scottmcm · 2021-11-10T06:01:49Z

@bors try @rust-timer queue

rust-timer · 2021-11-10T06:01:50Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2021-11-10T06:01:57Z

⌛ Trying commit cc7d801 with merge 87df9f91778c7252dc2e7eddbe858af73d6d444c...

bors · 2021-11-10T07:38:54Z

☀️ Try build successful - checks-actions
Build commit: 87df9f91778c7252dc2e7eddbe858af73d6d444c (87df9f91778c7252dc2e7eddbe858af73d6d444c)

rust-timer · 2021-11-10T07:38:56Z

Queued 87df9f91778c7252dc2e7eddbe858af73d6d444c with parent 8b09ba6, future comparison URL.

rust-timer · 2021-11-10T09:57:56Z

Finished benchmarking commit (87df9f91778c7252dc2e7eddbe858af73d6d444c): comparison url.

Summary: This change led to moderate relevant mixed results 🤷 in compiler performance.

Moderate improvement in instruction counts (up to -1.1% on full builds of cranelift-codegen)
Small regression in instruction counts (up to 0.7% on incr-unchanged builds of wf-projection-stress-65510)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

scottmcm · 2021-11-10T10:15:15Z

Hmm, this improvement in full-opt cranelift-codegen recovers what was lost in #86041 (comment) , but it's overall mixed. Dunno how people might feel about that.

jackh726 · 2021-11-10T12:04:13Z

Maybe try adding #[inline] annotations

library/core/src/array/mod.rs

scottmcm · 2021-11-10T19:57:55Z

@bors try @rust-timer queue

rust-timer · 2021-11-10T19:57:56Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2021-11-10T19:58:02Z

⌛ Trying commit 5b115fc with merge 619d3f7524949f70494dc855c8252f8bd77376d2...

bors · 2021-11-10T21:48:25Z

☀️ Try build successful - checks-actions
Build commit: 619d3f7524949f70494dc855c8252f8bd77376d2 (619d3f7524949f70494dc855c8252f8bd77376d2)

rust-timer · 2021-11-10T21:48:27Z

Queued 619d3f7524949f70494dc855c8252f8bd77376d2 with parent 68ca579, future comparison URL.

rust-timer · 2021-11-10T23:37:56Z

Finished benchmarking commit (619d3f7524949f70494dc855c8252f8bd77376d2): comparison url.

Summary: This change led to moderate relevant mixed results 🤷 in compiler performance.

Moderate improvement in instruction counts (up to -1.0% on full builds of cranelift-codegen)
Small regression in instruction counts (up to 0.7% on incr-unchanged builds of wf-projection-stress-65510)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR led to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

jackh726 · 2021-11-10T23:42:33Z

Okay so no difference with or without inline annotation.

r=me, whether you remove those or not

scottmcm · 2021-11-10T23:54:24Z

The method on impl Clone has the #[inline]s, so I might as well leave the on the things it calls in this PR.

@bors r=jackh726

bors · 2021-11-10T23:54:25Z

📌 Commit 5b115fc has been approved by jackh726

bors · 2021-11-11T09:13:25Z

⌛ Testing commit 5b115fc with merge 62efba8...

bors · 2021-11-11T12:07:47Z

☀️ Test successful - checks-actions
Approved by: jackh726
Pushing 62efba8 to master...

jackh726 · 2021-11-11T12:35:02Z

Targeted perf fix with mostly wins and a few small regressions.

rust-timer · 2021-11-11T14:49:00Z

Finished benchmarking commit (62efba8): comparison url.

Summary: This change led to small relevant mixed results 🤷 in compiler performance.

Small improvement in instruction counts (up to -0.9% on full builds of cranelift-codegen)
Small regression in instruction counts (up to 0.7% on incr-unchanged builds of wf-projection-stress-65510)

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression

rust-highfive assigned m-ou-se Nov 10, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 10, 2021

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 10, 2021

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Nov 10, 2021

the8472 reviewed Nov 10, 2021

View reviewed changes

library/core/src/array/mod.rs Show resolved Hide resolved

Moar #[inline]

5b115fc

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 10, 2021

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 10, 2021

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 10, 2021

bors added the merged-by-bors This PR was explicitly merged by bors. label Nov 11, 2021

bors merged commit 62efba8 into rust-lang:master Nov 11, 2021

rustbot added this to the 1.58.0 milestone Nov 11, 2021

jackh726 added the perf-regression-triaged The performance regression has been triaged. label Nov 11, 2021

scottmcm deleted the spec-array-clone branch November 13, 2021 23:49

Specialize array cloning for Copy types #90755

Specialize array cloning for Copy types #90755

Uh oh!

Conversation

scottmcm commented Nov 10, 2021

Uh oh!

rust-highfive commented Nov 10, 2021

Uh oh!

scottmcm commented Nov 10, 2021

Uh oh!

rust-timer commented Nov 10, 2021

Uh oh!

bors commented Nov 10, 2021

Uh oh!

bors commented Nov 10, 2021

Uh oh!

rust-timer commented Nov 10, 2021

Uh oh!

rust-timer commented Nov 10, 2021

Uh oh!

scottmcm commented Nov 10, 2021

Uh oh!

jackh726 commented Nov 10, 2021

Uh oh!

Uh oh!

scottmcm commented Nov 10, 2021

Uh oh!

rust-timer commented Nov 10, 2021

Uh oh!

bors commented Nov 10, 2021

Uh oh!

bors commented Nov 10, 2021

Uh oh!

rust-timer commented Nov 10, 2021

Uh oh!

rust-timer commented Nov 10, 2021

Uh oh!

jackh726 commented Nov 10, 2021

Uh oh!

scottmcm commented Nov 10, 2021

Uh oh!

bors commented Nov 10, 2021

Uh oh!

bors commented Nov 11, 2021

Uh oh!

bors commented Nov 11, 2021

Uh oh!

jackh726 commented Nov 11, 2021

Uh oh!

rust-timer commented Nov 11, 2021

Uh oh!

Uh oh!