Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pipeline] Eliminate dead loops introduced by InstCombine #69073

Closed
wants to merge 2 commits into from

Conversation

dtcxzyw
Copy link
Member

@dtcxzyw dtcxzyw commented Oct 14, 2023

This PR adds a LoopDeletionPass at the end of the vectorization pipeline. It will eliminate dead loops introduced by InstCombine.

Fixes #68282.

@llvmbot
Copy link
Collaborator

llvmbot commented Oct 14, 2023

@llvm/pr-subscribers-llvm-transforms

Author: Yingwei Zheng (dtcxzyw)

Changes

This PR adds a LoopDeletionPass at the end of the vectorization pipeline. It will eliminate dead loops introduced by InstCombine.

Fixes #68282.


Full diff: https://github.com/llvm/llvm-project/pull/69073.diff

7 Files Affected:

  • (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+8-4)
  • (modified) llvm/test/Other/new-pm-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-lto-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll (+1)
  • (modified) llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll (+1)
  • (added) llvm/test/Transforms/PhaseOrdering/X86/pr68282.ll (+59)
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 600f8d43caaf216..011d0119d18ade1 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -1280,10 +1280,14 @@ void PassBuilder::addVectorPasses(OptimizationLevel Level,
   //      divide result.
   //   2. It helps to clean up some loop-invariant code created by the loop
   //      unroll pass when IsFullLTO=false.
-  FPM.addPass(createFunctionToLoopPassAdaptor(
-      LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
-               /*AllowSpeculation=*/true),
-      /*UseMemorySSA=*/true, /*UseBlockFrequencyInfo=*/false));
+  //   3. It deletes dead loops exposed by instcombine.
+  LoopPassManager LPM;
+  LPM.addPass(LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap,
+                       /*AllowSpeculation=*/true));
+  LPM.addPass(LoopDeletionPass());
+  FPM.addPass(createFunctionToLoopPassAdaptor(std::move(LPM),
+                                              /*UseMemorySSA=*/true,
+                                              /*UseBlockFrequencyInfo=*/false));
 
   // Now that we've vectorized and unrolled loops, we may have more refined
   // alignment information, try to re-derive it here.
diff --git a/llvm/test/Other/new-pm-defaults.ll b/llvm/test/Other/new-pm-defaults.ll
index 098db2b959a29ec..68af447faef6293 100644
--- a/llvm/test/Other/new-pm-defaults.ll
+++ b/llvm/test/Other/new-pm-defaults.ll
@@ -265,6 +265,7 @@
 ; CHECK-O-NEXT: Running pass: LoopSimplifyPass
 ; CHECK-O-NEXT: Running pass: LCSSAPass
 ; CHECK-O-NEXT: Running pass: LICMPass
+; CHECK-O-NEXT: Running pass: LoopDeletionPass
 ; CHECK-O-NEXT: Running pass: AlignmentFromAssumptionsPass
 ; CHECK-O-NEXT: Running pass: LoopSinkPass
 ; CHECK-O-NEXT: Running pass: InstSimplifyPass
diff --git a/llvm/test/Other/new-pm-lto-defaults.ll b/llvm/test/Other/new-pm-lto-defaults.ll
index d451d2897f673cd..f627a0079fd6fae 100644
--- a/llvm/test/Other/new-pm-lto-defaults.ll
+++ b/llvm/test/Other/new-pm-lto-defaults.ll
@@ -134,6 +134,7 @@
 ; CHECK-O23SZ-NEXT: Running pass: LoopSimplifyPass
 ; CHECK-O23SZ-NEXT: Running pass: LCSSAPass
 ; CHECK-O23SZ-NEXT: Running pass: LICMPass
+; CHECK-O23SZ-NEXT: Running pass: LoopDeletionPass
 ; CHECK-O23SZ-NEXT: Running pass: AlignmentFromAssumptionsPass on foo
 ; CHECK-EP-Peephole-NEXT: Running pass: NoOpFunctionPass on foo
 ; CHECK-O23SZ-NEXT: Running pass: JumpThreadingPass on foo
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
index 1103c8f984fa296..28b68dc80b1cabd 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-defaults.ll
@@ -192,6 +192,7 @@
 ; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass
 ; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass
 ; CHECK-POSTLINK-O-NEXT: Running pass: LICMPass
+; CHECK-POSTLINK-O-NEXT: Running pass: LoopDeletionPass
 ; CHECK-POSTLINK-O-NEXT: Running pass: AlignmentFromAssumptionsPass
 ; CHECK-POSTLINK-O-NEXT: Running pass: LoopSinkPass
 ; CHECK-POSTLINK-O-NEXT: Running pass: InstSimplifyPass
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
index 05d165ae399f903..b6ae0690f76797c 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll
@@ -178,6 +178,7 @@
 ; CHECK-O-NEXT: Running pass: LoopSimplifyPass
 ; CHECK-O-NEXT: Running pass: LCSSAPass
 ; CHECK-O-NEXT: Running pass: LICMPass
+; CHECK-O-NEXT: Running pass: LoopDeletionPass
 ; CHECK-O-NEXT: Running pass: AlignmentFromAssumptionsPass
 ; CHECK-O-NEXT: Running pass: LoopSinkPass
 ; CHECK-O-NEXT: Running pass: InstSimplifyPass
diff --git a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
index 2393ba9dbde4b64..5954feccacc5f6c 100644
--- a/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll
@@ -185,6 +185,7 @@
 ; CHECK-O-NEXT: Running pass: LoopSimplifyPass
 ; CHECK-O-NEXT: Running pass: LCSSAPass
 ; CHECK-O-NEXT: Running pass: LICMPass
+; CHECK-O-NEXT: Running pass: LoopDeletionPass
 ; CHECK-O-NEXT: Running pass: AlignmentFromAssumptionsPass
 ; CHECK-O-NEXT: Running pass: LoopSinkPass
 ; CHECK-O-NEXT: Running pass: InstSimplifyPass
diff --git a/llvm/test/Transforms/PhaseOrdering/X86/pr68282.ll b/llvm/test/Transforms/PhaseOrdering/X86/pr68282.ll
new file mode 100644
index 000000000000000..6f1e3dfa7ad3436
--- /dev/null
+++ b/llvm/test/Transforms/PhaseOrdering/X86/pr68282.ll
@@ -0,0 +1,59 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 3
+; RUN: opt < %s -passes='default<O3>' -mcpu=generic -S | FileCheck %s
+
+; C version of test case
+; int pr68282() {
+;     int outputValue = 0;
+;     for (int x = 0; x < 1024 * 1024; ++x) {
+;         outputValue += outputValue + 0;
+;         outputValue += outputValue + 1;
+;         outputValue += outputValue + 2;
+;         outputValue += outputValue + 3;
+;         outputValue += outputValue + 4;
+;         outputValue += outputValue + 5;
+;         outputValue += outputValue + 6;
+;         outputValue += outputValue + 7;
+;     }
+;     return outputValue;
+; }
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define i32 @pr68282() {
+; CHECK-LABEL: define i32 @pr68282(
+; CHECK-SAME: ) local_unnamed_addr #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    ret i32 -134744073
+;
+entry:
+  br label %for.cond
+
+for.cond:
+  %outputValue.0 = phi i32 [ 0, %entry ], [ %add15, %for.body ]
+  %x.0 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
+  %cmp = icmp slt i32 %x.0, 1048576
+  br i1 %cmp, label %for.body, label %for.cond.cleanup
+
+for.cond.cleanup:
+  ret i32 %outputValue.0
+
+for.body:
+  %add1 = add nsw i32 %outputValue.0, %outputValue.0
+  %add2 = add nsw i32 %add1, 1
+  %add3 = add nsw i32 %add1, %add2
+  %add4 = add nsw i32 %add3, 2
+  %add5 = add nsw i32 %add3, %add4
+  %add6 = add nsw i32 %add5, 3
+  %add7 = add nsw i32 %add5, %add6
+  %add8 = add nsw i32 %add7, 4
+  %add9 = add nsw i32 %add7, %add8
+  %add10 = add nsw i32 %add9, 5
+  %add11 = add nsw i32 %add9, %add10
+  %add12 = add nsw i32 %add11, 6
+  %add13 = add nsw i32 %add11, %add12
+  %add14 = add nsw i32 %add13, 7
+  %add15 = add nsw i32 %add13, %add14
+  %inc = add nsw i32 %x.0, 1
+  br label %for.cond
+}

@dtcxzyw
Copy link
Member Author

dtcxzyw commented Oct 14, 2023

@nikic I would like to evaluate the compile-time impact of this PR. Could you please add my fork https://github.com/dtcxzyw/llvm-project to http://llvm-compile-time-tracker.com/?

@nikic
Copy link
Contributor

nikic commented Oct 14, 2023

@nikic I would like to evaluate the compile-time impact of this PR. Could you please add my fork https://github.com/dtcxzyw/llvm-project to http://llvm-compile-time-tracker.com/?

Done, here are the results: https://llvm-compile-time-tracker.com/compare.php?from=ed2c80151d6be0e8c2480a68d301d2c8a447ef13&to=060c47fa33edcf27e647c6c11cd1fd2e589bc272&stat=instructions:u

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Architecturally, trying to exploit optimization opportunities that are only opened up by runtime unrolling is pretty much always the wrong thing to do. At that point the optimization happens mostly by accident, not by intent, and only on some subtargets.

In this case, what you want to do instead is allow calculating the exit value in the loop earlier. Possibly in SCEV/IndVars. A possibly relevant observation here is that the exit value reaches a fixed point after a small number of iterations.

@dtcxzyw dtcxzyw closed this Nov 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clang fails to remove unnecessary loop
3 participants