Nest array initialization is not optimized #58625

upsuper · 2019-02-21T22:00:36Z

See the following code:

pub fn foo() -> [[i32; 1000]; 1000] {
    [[0; 1000]; 1000]
}

pub fn bar() -> [i32; 1_000_000] {
    [0; 1_000_000]
}

Ideally they should generate the same code as [[i32; 1000]; 1000] is essentially just [i32; 1_000_000]. However, the first function does one memset to set a [i32; 1000] on the stack, then use memcpy to generate the big array, while the second just uses a single memset.

The code generated from the first is way larger than the second, and I would also expect it to be much slower given its repeatedly invoking memcpy.

The text was updated successfully, but these errors were encountered:

upsuper · 2019-02-21T22:02:32Z

Similarly, if I do [[unsafe { std::mem::uninitialized() }; 1000]; 1000], the initialization of the inner array is skipped, while it still memcpys the uninitialized array content to initialize the outer array.

nikic · 2021-03-12T21:14:38Z

No changes: https://godbolt.org/z/o1xxd1

Also for the uninitialized case: https://godbolt.org/z/EEh9ae This one is surprising to me.

nikic · 2021-03-12T21:36:38Z

The uninitialized case seems to be due to a bug in the location-aware MSSA clobber walker, which currently bails on phi starting access.

nikic · 2021-03-12T22:18:31Z

Upstream patch: https://reviews.llvm.org/D98557

nikic · 2021-03-13T10:00:53Z

Upstream fix: llvm/llvm-project@b2f933a

Assuming no phase ordering issues, we should be generating optimal code for both the zero-initialization and the uninitialized case after this.

nikic · 2021-08-22T13:33:05Z

After #87570 the uninitialized case results in:

define void @_ZN5test23foo17he3396acf5b2d9080E([1000 x [1000 x i32]]* noalias nocapture sret([1000 x [1000 x i32]]) dereferenceable(4000000) %0) unnamed_addr #0 {
start:
  %1 = getelementptr inbounds [1000 x [1000 x i32]], [1000 x [1000 x i32]]* %0, i64 0, i64 0
  %2 = getelementptr inbounds [1000 x [1000 x i32]], [1000 x [1000 x i32]]* %0, i64 0, i64 1000
  br label %repeat_loop_body2

repeat_loop_body2:                                ; preds = %repeat_loop_body2, %start
  %3 = phi [1000 x i32]* [ %1, %start ], [ %4, %repeat_loop_body2 ]
  %4 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 25
  %.not5.24 = icmp eq [1000 x i32]* %4, %2
  br i1 %.not5.24, label %repeat_loop_next3, label %repeat_loop_body2

repeat_loop_next3:                                ; preds = %repeat_loop_body2
  ret void
}

The memcpy is eliminated, but we keep around a dead loop. The LoopDeletion pass runs before MemCpyOpt.

And the zero-initialized results in:

define void @_ZN5test23foo17he3396acf5b2d9080E([1000 x [1000 x i32]]* noalias nocapture sret([1000 x [1000 x i32]]) dereferenceable(4000000) %0) unnamed_addr #0 {
start:
  %1 = getelementptr inbounds [1000 x [1000 x i32]], [1000 x [1000 x i32]]* %0, i64 0, i64 0
  %2 = getelementptr inbounds [1000 x [1000 x i32]], [1000 x [1000 x i32]]* %0, i64 0, i64 1000
  br label %repeat_loop_body

repeat_loop_body:                                 ; preds = %repeat_loop_body, %start
  %3 = phi [1000 x i32]* [ %1, %start ], [ %53, %repeat_loop_body ]
  %4 = bitcast [1000 x i32]* %3 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %4, i8 0, i64 4000, i1 false)
  %5 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 1
  %6 = bitcast [1000 x i32]* %5 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %6, i8 0, i64 4000, i1 false)
  %7 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 2
  %8 = bitcast [1000 x i32]* %7 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %8, i8 0, i64 4000, i1 false)
  %9 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 3
  %10 = bitcast [1000 x i32]* %9 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %10, i8 0, i64 4000, i1 false)
  %11 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 4
  %12 = bitcast [1000 x i32]* %11 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %12, i8 0, i64 4000, i1 false)
  %13 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 5
  %14 = bitcast [1000 x i32]* %13 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %14, i8 0, i64 4000, i1 false)
  %15 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 6
  %16 = bitcast [1000 x i32]* %15 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %16, i8 0, i64 4000, i1 false)
  %17 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 7
  %18 = bitcast [1000 x i32]* %17 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %18, i8 0, i64 4000, i1 false)
  %19 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 8
  %20 = bitcast [1000 x i32]* %19 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %20, i8 0, i64 4000, i1 false)
  %21 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 9
  %22 = bitcast [1000 x i32]* %21 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %22, i8 0, i64 4000, i1 false)
  %23 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 10
  %24 = bitcast [1000 x i32]* %23 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %24, i8 0, i64 4000, i1 false)
  %25 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 11
  %26 = bitcast [1000 x i32]* %25 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %26, i8 0, i64 4000, i1 false)
  %27 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 12
  %28 = bitcast [1000 x i32]* %27 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %28, i8 0, i64 4000, i1 false)
  %29 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 13
  %30 = bitcast [1000 x i32]* %29 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %30, i8 0, i64 4000, i1 false)
  %31 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 14
  %32 = bitcast [1000 x i32]* %31 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %32, i8 0, i64 4000, i1 false)
  %33 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 15
  %34 = bitcast [1000 x i32]* %33 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %34, i8 0, i64 4000, i1 false)
  %35 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 16
  %36 = bitcast [1000 x i32]* %35 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %36, i8 0, i64 4000, i1 false)
  %37 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 17
  %38 = bitcast [1000 x i32]* %37 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %38, i8 0, i64 4000, i1 false)
  %39 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 18
  %40 = bitcast [1000 x i32]* %39 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %40, i8 0, i64 4000, i1 false)
  %41 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 19
  %42 = bitcast [1000 x i32]* %41 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %42, i8 0, i64 4000, i1 false)
  %43 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 20
  %44 = bitcast [1000 x i32]* %43 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %44, i8 0, i64 4000, i1 false)
  %45 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 21
  %46 = bitcast [1000 x i32]* %45 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %46, i8 0, i64 4000, i1 false)
  %47 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 22
  %48 = bitcast [1000 x i32]* %47 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %48, i8 0, i64 4000, i1 false)
  %49 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 23
  %50 = bitcast [1000 x i32]* %49 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %50, i8 0, i64 4000, i1 false)
  %51 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 24
  %52 = bitcast [1000 x i32]* %51 to i8*
  call void @llvm.memset.p0i8.i64(i8* noundef nonnull align 4 dereferenceable(4000) %52, i8 0, i64 4000, i1 false)
  %53 = getelementptr inbounds [1000 x i32], [1000 x i32]* %3, i64 25
  %.not.24 = icmp eq [1000 x i32]* %53, %2
  br i1 %.not.24, label %repeat_loop_next, label %repeat_loop_body

repeat_loop_next:                                 ; preds = %repeat_loop_body
  ret void
}

The memcpy is gone here as well, but we still perform memsets in a loop, instead of one large memset. I believe LoopIdiomRecognize would handle this, but it also runs before MemCpyOpt.

nikic · 2022-08-30T13:02:42Z

The uninit case looks good now: https://godbolt.org/z/vfo9PM1ca The init case still has memsets that could be combined: https://godbolt.org/z/eba6GT5xq

Centril added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Feb 21, 2019

scottmcm added the WG-llvm Working group: LLVM backend code generation label Feb 22, 2019

oli-obk self-assigned this Feb 22, 2019

oli-obk removed their assignment Mar 12, 2019

nikic self-assigned this Mar 12, 2021

oli-obk added the A-mir-opt Area: MIR optimizations label Mar 13, 2021

nikic removed their assignment Apr 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nest array initialization is not optimized #58625

Nest array initialization is not optimized #58625

upsuper commented Feb 21, 2019

upsuper commented Feb 21, 2019

nikic commented Mar 12, 2021

nikic commented Mar 12, 2021

nikic commented Mar 12, 2021

nikic commented Mar 13, 2021

nikic commented Aug 22, 2021

nikic commented Aug 30, 2022

Nest array initialization is not optimized #58625

Nest array initialization is not optimized #58625

Comments

upsuper commented Feb 21, 2019

upsuper commented Feb 21, 2019

nikic commented Mar 12, 2021

nikic commented Mar 12, 2021

nikic commented Mar 12, 2021

nikic commented Mar 13, 2021

nikic commented Aug 22, 2021

nikic commented Aug 30, 2022