Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary instructions generated #75978

Closed
bugadani opened this issue Aug 27, 2020 · 9 comments · Fixed by #109895
Closed

Unnecessary instructions generated #75978

bugadani opened this issue Aug 27, 2020 · 9 comments · Fixed by #109895
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-mir-opt Area: MIR optimizations E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@bugadani
Copy link
Contributor

bugadani commented Aug 27, 2020

I'm trying to figure out why certain iterator-based algorithms get const-optimized, while others don't and I stumbled upon this case.

example (updated 2021.03.05)

Compiling fn1 results in the expected machine code, but fn2 has extra stack reserve/free instructions that are not necessary, since the function's contents are optimized away completely. The stack size equals the size of the array.

@tesuji
Copy link
Contributor

tesuji commented Aug 28, 2020

Two observations:

@nikic
Copy link
Contributor

nikic commented Sep 20, 2020

This would very likely be fixed by https://reviews.llvm.org/D87972.

@camelid camelid added the A-mir-opt Area: MIR optimizations label Oct 20, 2020
@bugadani
Copy link
Contributor Author

bugadani commented Mar 5, 2021

Fixed by #81451

@bugadani bugadani closed this as completed Mar 5, 2021
@mati865
Copy link
Contributor

mati865 commented Mar 5, 2021

There should be codegen test for this issue.

@bugadani
Copy link
Contributor Author

bugadani commented Mar 5, 2021

Yeah, right. I can add those for the issues I closed.

@bugadani
Copy link
Contributor Author

bugadani commented Mar 5, 2021

Actually, I'm reopening this one as the issue stands on ARM: https://rust.godbolt.org/z/6fEdoq

@bugadani bugadani reopened this Mar 5, 2021
@nikic
Copy link
Contributor

nikic commented Mar 5, 2021

Looks like a phase ordering problem. The remaining IR would be eliminated by GVN + InstCombine. Currently it only gets eliminated by DAGCombine, thus you still see the prologue/epilogue on ARM.

@nikic nikic self-assigned this Mar 5, 2021
@nikic
Copy link
Contributor

nikic commented Mar 7, 2021

The problem here is that this loop only gets unrolled during full loop unrolling, while we need it to be unrolled during simple unrolling to still have a chance to optimize. For the IR at that point, SCEV cannot determine the loop trip count:

; Preheader:
start:
  %s = alloca [7 x i32], align 4
  %0 = bitcast [7 x i32]* %s to i8*
  call void @llvm.lifetime.start.p0i8(i64 28, i8* nonnull %0)
  %1 = getelementptr inbounds [7 x i32], [7 x i32]* %s, i64 0, i64 0
  store i32 1, i32* %1, align 4
  %2 = getelementptr inbounds [7 x i32], [7 x i32]* %s, i64 0, i64 1
  store i32 2, i32* %2, align 4 
  %3 = getelementptr inbounds [7 x i32], [7 x i32]* %s, i64 0, i64 2
  store i32 3, i32* %3, align 4
  %4 = getelementptr inbounds [7 x i32], [7 x i32]* %s, i64 0, i64 3
  store i32 4, i32* %4, align 4
  %5 = getelementptr inbounds [7 x i32], [7 x i32]* %s, i64 0, i64 4
  store i32 5, i32* %5, align 4
  %6 = getelementptr inbounds [7 x i32], [7 x i32]* %s, i64 0, i64 5
  store i32 6, i32* %6, align 4
  %7 = getelementptr inbounds [7 x i32], [7 x i32]* %s, i64 0, i64 6
  store i32 7, i32* %7, align 4
  %8 = getelementptr inbounds [7 x i32], [7 x i32]* %s, i64 0, i64 0
  %9 = getelementptr inbounds [7 x i32], [7 x i32]* %s, i64 0, i64 7
  %10 = getelementptr inbounds [7 x i32], [7 x i32]* %s, i64 0, i64 1
  br label %bb5

; Loop:
bb5:                                              ; preds = %start, %"_ZN4core6option15Option$LT$T$GT$6map_or17h8c48b1da3da3943fE.exit" 
  %11 = phi i32* [ %10, %start ], [ %14, %"_ZN4core6option15Option$LT$T$GT$6map_or17h8c48b1da3da3943fE.exit" ]
  %sum.020 = phi i32 [ 0, %start ], [ %13, %"_ZN4core6option15Option$LT$T$GT$6map_or17h8c48b1da3da3943fE.exit" ]
  %iter.sroa.0.019 = phi i32* [ %8, %start ], [ %iter.sroa.0.218, %"_ZN4core6option15Option$LT$T$GT$6map_or17h8c48b1da3da3943fE.exit" ]
  %_12.i = icmp eq i32* %11, %9
  br i1 %_12.i, label %"_ZN4core6option15Option$LT$T$GT$6map_or17h8c48b1da3da3943fE.exit", label %bb3.i

bb3.i:                                            ; preds = %bb5
  %12 = getelementptr inbounds i32, i32* %iter.sroa.0.019, i64 2
  %.val.i = load i32, i32* %11, align 4, !alias.scope !2
  br label %"_ZN4core6option15Option$LT$T$GT$6map_or17h8c48b1da3da3943fE.exit"

"_ZN4core6option15Option$LT$T$GT$6map_or17h8c48b1da3da3943fE.exit": ; preds = %bb5, %bb3.i
  %iter.sroa.0.218 = phi i32* [ %12, %bb3.i ], [ %11, %bb5 ]
  %.0.i = phi i32 [ %.val.i, %bb3.i ], [ 1, %bb5 ]
  %13 = add i32 %.0.i, %sum.020
  %_12.i7 = icmp eq i32* %iter.sroa.0.218, %9
  %14 = getelementptr inbounds i32, i32* %iter.sroa.0.218, i64 1
  br i1 %_12.i7, label %bb4, label %bb5

; Exit blocks
bb4:                                              ; preds = %"_ZN4core6option15Option$LT$T$GT$6map_or17h8c48b1da3da3943fE.exit"
  %.lcssa = phi i32 [ %13, %"_ZN4core6option15Option$LT$T$GT$6map_or17h8c48b1da3da3943fE.exit" ] 
  call void @llvm.lifetime.end.p0i8(i64 28, i8* nonnull %0)
  ret i32 %.lcssa

@nikic nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. labels Apr 3, 2023
@nikic
Copy link
Contributor

nikic commented Apr 3, 2023

Fixed by the LLVM 16 upgrade.

@Nilstrieb Nilstrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2023
JohnTitor added a commit to JohnTitor/rust that referenced this issue Apr 11, 2023
bors added a commit to rust-lang-ci/rust that referenced this issue Apr 12, 2023
@bors bors closed this as completed in 73f40d4 Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-mir-opt Area: MIR optimizations E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants