-
Notifications
You must be signed in to change notification settings - Fork 15.5k
[IndVarSimplify] Remove sinkunusedInvariants #169250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This sinkunused invariants from preheader to exit does not sink the invariant memory instructions. Hence the liveness range of the memory instructions is extended till the exit block creating a lot of register spills. This performance regression was not very visible before because the sinkunused invariants code had a bug which double iterated skipping a lot of potential instructions that could be sunk. Tried sinking the invariant memory instructions, but that causes performance regression in small cases. This effort is to see if removing this code has a big performance impact.
|
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-backend-amdgpu Author: Vigneshwar Jayakumar (VigneshwarJ) ChangesThis sinkunused invariants from preheader to exit does not sink the invariant memory instructions. Hence the liveness range of the memory instructions is extended till the exit block creating a lot of register spills. Tried sinking the invariant memory instructions, but that causes performance regression in small cases (#157559). This effort is to see if removing this code has a big performance impact. Patch is 40.74 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169250.diff 23 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
index b46527eb1057b..e22591f9298b4 100644
--- a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
+++ b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
@@ -162,7 +162,6 @@ class IndVarSimplify {
const SCEV *ExitCount,
PHINode *IndVar, SCEVExpander &Rewriter);
- bool sinkUnusedInvariants(Loop *L);
public:
IndVarSimplify(LoopInfo *LI, ScalarEvolution *SE, DominatorTree *DT,
@@ -1093,85 +1092,6 @@ linearFunctionTestReplace(Loop *L, BasicBlock *ExitingBB,
return true;
}
-//===----------------------------------------------------------------------===//
-// sinkUnusedInvariants. A late subpass to cleanup loop preheaders.
-//===----------------------------------------------------------------------===//
-
-/// If there's a single exit block, sink any loop-invariant values that
-/// were defined in the preheader but not used inside the loop into the
-/// exit block to reduce register pressure in the loop.
-bool IndVarSimplify::sinkUnusedInvariants(Loop *L) {
- BasicBlock *ExitBlock = L->getExitBlock();
- if (!ExitBlock) return false;
-
- BasicBlock *Preheader = L->getLoopPreheader();
- if (!Preheader) return false;
-
- bool MadeAnyChanges = false;
- for (Instruction &I : llvm::make_early_inc_range(llvm::reverse(*Preheader))) {
-
- // Skip BB Terminator.
- if (Preheader->getTerminator() == &I)
- continue;
-
- // New instructions were inserted at the end of the preheader.
- if (isa<PHINode>(I))
- break;
-
- // Don't move instructions which might have side effects, since the side
- // effects need to complete before instructions inside the loop. Also don't
- // move instructions which might read memory, since the loop may modify
- // memory. Note that it's okay if the instruction might have undefined
- // behavior: LoopSimplify guarantees that the preheader dominates the exit
- // block.
- if (I.mayHaveSideEffects() || I.mayReadFromMemory())
- continue;
-
- // Skip debug or pseudo instructions.
- if (I.isDebugOrPseudoInst())
- continue;
-
- // Skip eh pad instructions.
- if (I.isEHPad())
- continue;
-
- // Don't sink alloca: we never want to sink static alloca's out of the
- // entry block, and correctly sinking dynamic alloca's requires
- // checks for stacksave/stackrestore intrinsics.
- // FIXME: Refactor this check somehow?
- if (isa<AllocaInst>(&I))
- continue;
-
- // Determine if there is a use in or before the loop (direct or
- // otherwise).
- bool UsedInLoop = false;
- for (Use &U : I.uses()) {
- Instruction *User = cast<Instruction>(U.getUser());
- BasicBlock *UseBB = User->getParent();
- if (PHINode *P = dyn_cast<PHINode>(User)) {
- unsigned i =
- PHINode::getIncomingValueNumForOperand(U.getOperandNo());
- UseBB = P->getIncomingBlock(i);
- }
- if (UseBB == Preheader || L->contains(UseBB)) {
- UsedInLoop = true;
- break;
- }
- }
-
- // If there is, the def must remain in the preheader.
- if (UsedInLoop)
- continue;
-
- // Otherwise, sink it to the exit block.
- I.moveBefore(ExitBlock->getFirstInsertionPt());
- SE->forgetValue(&I);
- MadeAnyChanges = true;
- }
-
- return MadeAnyChanges;
-}
-
static void replaceExitCond(BranchInst *BI, Value *NewCond,
SmallVectorImpl<WeakTrackingVH> &DeadInsts) {
auto *OldCond = BI->getCondition();
@@ -2079,9 +1999,6 @@ bool IndVarSimplify::run(Loop *L) {
// The Rewriter may not be used from this point on.
- // Loop-invariant instructions in the preheader that aren't used in the
- // loop may be sunk below the loop to reduce register pressure.
- Changed |= sinkUnusedInvariants(L);
// rewriteFirstIterationLoopExitValues does not rely on the computation of
// trip count and therefore can further simplify exit values in addition to
diff --git a/llvm/test/Transforms/IndVarSimplify/AMDGPU/addrspace-7-doesnt-crash.ll b/llvm/test/Transforms/IndVarSimplify/AMDGPU/addrspace-7-doesnt-crash.ll
index 08dcf1d7a0091..8e932e0c00d4f 100644
--- a/llvm/test/Transforms/IndVarSimplify/AMDGPU/addrspace-7-doesnt-crash.ll
+++ b/llvm/test/Transforms/IndVarSimplify/AMDGPU/addrspace-7-doesnt-crash.ll
@@ -7,11 +7,11 @@ define void @f(ptr addrspace(7) %arg) {
; CHECK-LABEL: define void @f
; CHECK-SAME: (ptr addrspace(7) [[ARG:%.*]]) {
; CHECK-NEXT: bb:
+; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr addrspace(7) [[ARG]], i32 8
; CHECK-NEXT: br label [[BB1:%.*]]
; CHECK: bb1:
; CHECK-NEXT: br i1 false, label [[BB2:%.*]], label [[BB1]]
; CHECK: bb2:
-; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i8, ptr addrspace(7) [[ARG]], i32 8
; CHECK-NEXT: br label [[BB3:%.*]]
; CHECK: bb3:
; CHECK-NEXT: [[I4:%.*]] = load i32, ptr addrspace(7) [[SCEVGEP]], align 4
diff --git a/llvm/test/Transforms/IndVarSimplify/ARM/code-size.ll b/llvm/test/Transforms/IndVarSimplify/ARM/code-size.ll
index 2003b1a72206d..3c6535da486aa 100644
--- a/llvm/test/Transforms/IndVarSimplify/ARM/code-size.ll
+++ b/llvm/test/Transforms/IndVarSimplify/ARM/code-size.ll
@@ -4,33 +4,31 @@
define i32 @remove_loop(i32 %size) #0 {
; CHECK-V8M-LABEL: @remove_loop(
-; CHECK-V8M-SAME: i32 [[SIZE:%.*]]) #[[ATTR0:[0-9]+]] {
; CHECK-V8M-NEXT: entry:
-; CHECK-V8M-NEXT: br label %[[WHILE_COND:.*]]
-; CHECK-V8M: while.cond:
-; CHECK-V8M-NEXT: br i1 false, label %[[WHILE_COND]], label %[[WHILE_END:.*]]
-; CHECK-V8M: while.end:
-; CHECK-V8M-NEXT: [[TMP0:%.*]] = add i32 [[SIZE]], 31
+; CHECK-V8M-NEXT: [[TMP0:%.*]] = add i32 [[SIZE:%.*]], 31
; CHECK-V8M-NEXT: [[UMIN:%.*]] = call i32 @llvm.umin.i32(i32 [[SIZE]], i32 31)
; CHECK-V8M-NEXT: [[TMP1:%.*]] = sub i32 [[TMP0]], [[UMIN]]
; CHECK-V8M-NEXT: [[TMP2:%.*]] = lshr i32 [[TMP1]], 5
; CHECK-V8M-NEXT: [[TMP3:%.*]] = shl nuw i32 [[TMP2]], 5
; CHECK-V8M-NEXT: [[TMP4:%.*]] = sub i32 [[SIZE]], [[TMP3]]
+; CHECK-V8M-NEXT: br label [[WHILE_COND:%.*]]
+; CHECK-V8M: while.cond:
+; CHECK-V8M-NEXT: br i1 false, label [[WHILE_COND]], label [[WHILE_END:%.*]]
+; CHECK-V8M: while.end:
; CHECK-V8M-NEXT: ret i32 [[TMP4]]
;
; CHECK-V8A-LABEL: @remove_loop(
-; CHECK-V8A-SAME: i32 [[SIZE:%.*]]) #[[ATTR0:[0-9]+]] {
; CHECK-V8A-NEXT: entry:
-; CHECK-V8A-NEXT: br label %[[WHILE_COND:.*]]
-; CHECK-V8A: while.cond:
-; CHECK-V8A-NEXT: br i1 false, label %[[WHILE_COND]], label %[[WHILE_END:.*]]
-; CHECK-V8A: while.end:
-; CHECK-V8A-NEXT: [[TMP0:%.*]] = add i32 [[SIZE]], 31
+; CHECK-V8A-NEXT: [[TMP0:%.*]] = add i32 [[SIZE:%.*]], 31
; CHECK-V8A-NEXT: [[UMIN:%.*]] = call i32 @llvm.umin.i32(i32 [[SIZE]], i32 31)
; CHECK-V8A-NEXT: [[TMP1:%.*]] = sub i32 [[TMP0]], [[UMIN]]
; CHECK-V8A-NEXT: [[TMP2:%.*]] = lshr i32 [[TMP1]], 5
; CHECK-V8A-NEXT: [[TMP3:%.*]] = shl nuw i32 [[TMP2]], 5
; CHECK-V8A-NEXT: [[TMP4:%.*]] = sub i32 [[SIZE]], [[TMP3]]
+; CHECK-V8A-NEXT: br label [[WHILE_COND:%.*]]
+; CHECK-V8A: while.cond:
+; CHECK-V8A-NEXT: br i1 false, label [[WHILE_COND]], label [[WHILE_END:%.*]]
+; CHECK-V8A: while.end:
; CHECK-V8A-NEXT: ret i32 [[TMP4]]
;
entry:
diff --git a/llvm/test/Transforms/IndVarSimplify/ARM/indvar-unroll-imm-cost.ll b/llvm/test/Transforms/IndVarSimplify/ARM/indvar-unroll-imm-cost.ll
index 2261423766792..382f026e7de6a 100644
--- a/llvm/test/Transforms/IndVarSimplify/ARM/indvar-unroll-imm-cost.ll
+++ b/llvm/test/Transforms/IndVarSimplify/ARM/indvar-unroll-imm-cost.ll
@@ -77,6 +77,8 @@ define dso_local arm_aapcscc void @test(ptr nocapture %pDest, ptr nocapture read
; CHECK-NEXT: [[CMP2780:%.*]] = icmp ugt i32 [[ADD25]], [[J_0_LCSSA]]
; CHECK-NEXT: br i1 [[CMP2780]], label [[FOR_BODY29_PREHEADER:%.*]], label [[FOR_END40]]
; CHECK: for.body29.preheader:
+; CHECK-NEXT: [[TMP10:%.*]] = sub nsw i32 [[ADD25]], [[J_0_LCSSA]]
+; CHECK-NEXT: [[SCEVGEP93:%.*]] = getelementptr i16, ptr [[PSRCB_ADDR_1_LCSSA]], i32 [[TMP10]]
; CHECK-NEXT: br label [[FOR_BODY29:%.*]]
; CHECK: for.body29:
; CHECK-NEXT: [[J_184:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY29]] ], [ [[J_0_LCSSA]], [[FOR_BODY29_PREHEADER]] ]
@@ -100,8 +102,6 @@ define dso_local arm_aapcscc void @test(ptr nocapture %pDest, ptr nocapture read
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[ADD25]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END40_LOOPEXIT:%.*]], label [[FOR_BODY29]]
; CHECK: for.end40.loopexit:
-; CHECK-NEXT: [[TMP10:%.*]] = sub nsw i32 [[ADD25]], [[J_0_LCSSA]]
-; CHECK-NEXT: [[SCEVGEP93:%.*]] = getelementptr i16, ptr [[PSRCB_ADDR_1_LCSSA]], i32 [[TMP10]]
; CHECK-NEXT: [[SCEVGEP:%.*]] = getelementptr i16, ptr [[PSRCA_ADDR_1_LCSSA]], i32 [[TMP10]]
; CHECK-NEXT: [[SCEVGEP94:%.*]] = getelementptr i32, ptr [[PDEST_ADDR_1_LCSSA]], i32 [[TMP10]]
; CHECK-NEXT: br label [[FOR_END40]]
diff --git a/llvm/test/Transforms/IndVarSimplify/X86/inner-loop-by-latch-cond.ll b/llvm/test/Transforms/IndVarSimplify/X86/inner-loop-by-latch-cond.ll
index 0fa6e34cf186e..0eb9debce8177 100644
--- a/llvm/test/Transforms/IndVarSimplify/X86/inner-loop-by-latch-cond.ll
+++ b/llvm/test/Transforms/IndVarSimplify/X86/inner-loop-by-latch-cond.ll
@@ -14,6 +14,7 @@ define void @test(i64 %a) {
; CHECK: outer_header:
; CHECK-NEXT: [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_NEXT:%.*]], [[OUTER_LATCH:%.*]] ], [ 21, [[ENTRY:%.*]] ]
; CHECK-NEXT: [[I:%.*]] = phi i64 [ 20, [[ENTRY]] ], [ [[I_NEXT:%.*]], [[OUTER_LATCH]] ]
+; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 1
; CHECK-NEXT: br label [[INNER_HEADER:%.*]]
; CHECK: inner_header:
; CHECK-NEXT: [[J:%.*]] = phi i64 [ 1, [[OUTER_HEADER]] ], [ [[J_NEXT:%.*]], [[INNER_HEADER]] ]
@@ -22,7 +23,6 @@ define void @test(i64 %a) {
; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i64 [[J_NEXT]], [[INDVARS_IV]]
; CHECK-NEXT: br i1 [[EXITCOND]], label [[INNER_HEADER]], label [[OUTER_LATCH]]
; CHECK: outer_latch:
-; CHECK-NEXT: [[I_NEXT]] = add nuw nsw i64 [[I]], 1
; CHECK-NEXT: [[COND2:%.*]] = icmp ne i64 [[I_NEXT]], 40
; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
; CHECK-NEXT: br i1 [[COND2]], label [[OUTER_HEADER]], label [[RETURN:%.*]]
diff --git a/llvm/test/Transforms/IndVarSimplify/exit-count-select.ll b/llvm/test/Transforms/IndVarSimplify/exit-count-select.ll
index 1592b84480e3f..50c8e0fb7aabf 100644
--- a/llvm/test/Transforms/IndVarSimplify/exit-count-select.ll
+++ b/llvm/test/Transforms/IndVarSimplify/exit-count-select.ll
@@ -4,12 +4,12 @@
define i32 @logical_and_2ops(i32 %n, i32 %m) {
; CHECK-LABEL: @logical_and_2ops(
; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = freeze i32 [[M:%.*]]
+; CHECK-NEXT: [[UMIN:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP0]], i32 [[N:%.*]])
; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:
; CHECK-NEXT: br i1 false, label [[LOOP]], label [[EXIT:%.*]]
; CHECK: exit:
-; CHECK-NEXT: [[TMP0:%.*]] = freeze i32 [[M:%.*]]
-; CHECK-NEXT: [[UMIN:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP0]], i32 [[N:%.*]])
; CHECK-NEXT: ret i32 [[UMIN]]
;
entry:
@@ -28,12 +28,12 @@ exit:
define i32 @logical_or_2ops(i32 %n, i32 %m) {
; CHECK-LABEL: @logical_or_2ops(
; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = freeze i32 [[M:%.*]]
+; CHECK-NEXT: [[UMIN:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP0]], i32 [[N:%.*]])
; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:
; CHECK-NEXT: br i1 true, label [[EXIT:%.*]], label [[LOOP]]
; CHECK: exit:
-; CHECK-NEXT: [[TMP0:%.*]] = freeze i32 [[M:%.*]]
-; CHECK-NEXT: [[UMIN:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP0]], i32 [[N:%.*]])
; CHECK-NEXT: ret i32 [[UMIN]]
;
entry:
@@ -52,14 +52,14 @@ exit:
define i32 @logical_and_3ops(i32 %n, i32 %m, i32 %k) {
; CHECK-LABEL: @logical_and_3ops(
; CHECK-NEXT: entry:
-; CHECK-NEXT: br label [[LOOP:%.*]]
-; CHECK: loop:
-; CHECK-NEXT: br i1 false, label [[LOOP]], label [[EXIT:%.*]]
-; CHECK: exit:
; CHECK-NEXT: [[TMP0:%.*]] = freeze i32 [[K:%.*]]
; CHECK-NEXT: [[TMP1:%.*]] = freeze i32 [[M:%.*]]
; CHECK-NEXT: [[UMIN:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP0]], i32 [[TMP1]])
; CHECK-NEXT: [[UMIN1:%.*]] = call i32 @llvm.umin.i32(i32 [[UMIN]], i32 [[N:%.*]])
+; CHECK-NEXT: br label [[LOOP:%.*]]
+; CHECK: loop:
+; CHECK-NEXT: br i1 false, label [[LOOP]], label [[EXIT:%.*]]
+; CHECK: exit:
; CHECK-NEXT: ret i32 [[UMIN1]]
;
entry:
@@ -80,14 +80,14 @@ exit:
define i32 @logical_or_3ops(i32 %n, i32 %m, i32 %k) {
; CHECK-LABEL: @logical_or_3ops(
; CHECK-NEXT: entry:
-; CHECK-NEXT: br label [[LOOP:%.*]]
-; CHECK: loop:
-; CHECK-NEXT: br i1 true, label [[EXIT:%.*]], label [[LOOP]]
-; CHECK: exit:
; CHECK-NEXT: [[TMP0:%.*]] = freeze i32 [[K:%.*]]
; CHECK-NEXT: [[TMP1:%.*]] = freeze i32 [[M:%.*]]
; CHECK-NEXT: [[UMIN:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP0]], i32 [[TMP1]])
; CHECK-NEXT: [[UMIN1:%.*]] = call i32 @llvm.umin.i32(i32 [[UMIN]], i32 [[N:%.*]])
+; CHECK-NEXT: br label [[LOOP:%.*]]
+; CHECK: loop:
+; CHECK-NEXT: br i1 true, label [[EXIT:%.*]], label [[LOOP]]
+; CHECK: exit:
; CHECK-NEXT: ret i32 [[UMIN1]]
;
entry:
diff --git a/llvm/test/Transforms/IndVarSimplify/finite-exit-comparisons.ll b/llvm/test/Transforms/IndVarSimplify/finite-exit-comparisons.ll
index e006d9f6696ca..f798eb281f51a 100644
--- a/llvm/test/Transforms/IndVarSimplify/finite-exit-comparisons.ll
+++ b/llvm/test/Transforms/IndVarSimplify/finite-exit-comparisons.ll
@@ -932,6 +932,9 @@ for.end: ; preds = %for.body, %entry
define i16 @ult_multiuse_profit(i16 %n.raw, i8 %start) mustprogress {
; CHECK-LABEL: @ult_multiuse_profit(
; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[START:%.*]], 1
+; CHECK-NEXT: [[TMP1:%.*]] = zext i8 [[TMP2]] to i16
+; CHECK-NEXT: [[UMAX:%.*]] = call i16 @llvm.umax.i16(i16 [[TMP1]], i16 254)
; CHECK-NEXT: [[TMP0:%.*]] = trunc i16 254 to i8
; CHECK-NEXT: br label [[FOR_BODY:%.*]]
; CHECK: for.body:
@@ -940,9 +943,6 @@ define i16 @ult_multiuse_profit(i16 %n.raw, i8 %start) mustprogress {
; CHECK-NEXT: [[CMP:%.*]] = icmp ult i8 [[IV_NEXT]], [[TMP0]]
; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]
; CHECK: for.end:
-; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[START:%.*]], 1
-; CHECK-NEXT: [[TMP2:%.*]] = zext i8 [[TMP1]] to i16
-; CHECK-NEXT: [[UMAX:%.*]] = call i16 @llvm.umax.i16(i16 [[TMP2]], i16 254)
; CHECK-NEXT: ret i16 [[UMAX]]
;
entry:
diff --git a/llvm/test/Transforms/IndVarSimplify/pr116483.ll b/llvm/test/Transforms/IndVarSimplify/pr116483.ll
index 093e25a3caa81..e9e0d22bf960a 100644
--- a/llvm/test/Transforms/IndVarSimplify/pr116483.ll
+++ b/llvm/test/Transforms/IndVarSimplify/pr116483.ll
@@ -4,16 +4,16 @@
define i32 @test() {
; CHECK-LABEL: define i32 @test() {
; CHECK-NEXT: [[ENTRY:.*:]]
-; CHECK-NEXT: br label %[[LOOP_BODY:.*]]
-; CHECK: [[LOOP_BODY]]:
-; CHECK-NEXT: br i1 true, label %[[EXIT:.*]], label %[[LOOP_BODY]]
-; CHECK: [[EXIT]]:
; CHECK-NEXT: [[XOR:%.*]] = xor i32 0, 3
; CHECK-NEXT: [[MUL:%.*]] = mul i32 [[XOR]], 329
; CHECK-NEXT: [[CONV:%.*]] = trunc i32 [[MUL]] to i16
; CHECK-NEXT: [[SEXT:%.*]] = shl i16 [[CONV]], 8
; CHECK-NEXT: [[CONV1:%.*]] = ashr i16 [[SEXT]], 8
; CHECK-NEXT: [[CONV3:%.*]] = zext i16 [[CONV1]] to i32
+; CHECK-NEXT: br label %[[LOOP_BODY:.*]]
+; CHECK: [[LOOP_BODY]]:
+; CHECK-NEXT: br i1 true, label %[[EXIT:.*]], label %[[LOOP_BODY]]
+; CHECK: [[EXIT]]:
; CHECK-NEXT: ret i32 [[CONV3]]
;
entry:
diff --git a/llvm/test/Transforms/IndVarSimplify/pr24783.ll b/llvm/test/Transforms/IndVarSimplify/pr24783.ll
index c521bcaf59d49..37ecf42ea0fd3 100644
--- a/llvm/test/Transforms/IndVarSimplify/pr24783.ll
+++ b/llvm/test/Transforms/IndVarSimplify/pr24783.ll
@@ -7,11 +7,11 @@ target triple = "powerpc64-unknown-linux-gnu"
define void @f(ptr %end.s, ptr %loc, i32 %p) {
; CHECK-LABEL: @f(
; CHECK-NEXT: entry:
+; CHECK-NEXT: [[END:%.*]] = getelementptr inbounds i32, ptr [[END_S:%.*]], i32 [[P:%.*]]
; CHECK-NEXT: br label [[WHILE_BODY_I:%.*]]
; CHECK: while.body.i:
; CHECK-NEXT: br i1 true, label [[LOOP_EXIT:%.*]], label [[WHILE_BODY_I]]
; CHECK: loop.exit:
-; CHECK-NEXT: [[END:%.*]] = getelementptr inbounds i32, ptr [[END_S:%.*]], i32 [[P:%.*]]
; CHECK-NEXT: store ptr [[END]], ptr [[LOC:%.*]], align 8
; CHECK-NEXT: ret void
;
diff --git a/llvm/test/Transforms/IndVarSimplify/pr39673.ll b/llvm/test/Transforms/IndVarSimplify/pr39673.ll
index 7b093b34b91ad..3cee1ab7be881 100644
--- a/llvm/test/Transforms/IndVarSimplify/pr39673.ll
+++ b/llvm/test/Transforms/IndVarSimplify/pr39673.ll
@@ -148,6 +148,7 @@ loop2.end: ; preds = %loop2
define i16 @neg_loop_carried(i16 %arg) {
; CHECK-LABEL: @neg_loop_carried(
; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = add i16 [[ARG:%.*]], 2
; CHECK-NEXT: br label [[LOOP1:%.*]]
; CHECK: loop1:
; CHECK-NEXT: [[L1:%.*]] = phi i16 [ 0, [[ENTRY:%.*]] ], [ [[L1_ADD:%.*]], [[LOOP1]] ]
@@ -155,7 +156,6 @@ define i16 @neg_loop_carried(i16 %arg) {
; CHECK-NEXT: [[CMP1:%.*]] = icmp ult i16 [[L1_ADD]], 2
; CHECK-NEXT: br i1 [[CMP1]], label [[LOOP1]], label [[LOOP2_PREHEADER:%.*]]
; CHECK: loop2.preheader:
-; CHECK-NEXT: [[TMP0:%.*]] = add i16 [[ARG:%.*]], 2
; CHECK-NEXT: br label [[LOOP2:%.*]]
; CHECK: loop2:
; CHECK-NEXT: [[K2:%.*]] = phi i16 [ [[K2_ADD:%.*]], [[LOOP2]] ], [ [[TMP0]], [[LOOP2_PREHEADER]] ]
diff --git a/llvm/test/Transforms/IndVarSimplify/pr63763.ll b/llvm/test/Transforms/IndVarSimplify/pr63763.ll
index 427db1e67410a..a5fde67d6140a 100644
--- a/llvm/test/Transforms/IndVarSimplify/pr63763.ll
+++ b/llvm/test/Transforms/IndVarSimplify/pr63763.ll
@@ -16,13 +16,13 @@ define i32 @test(i1 %c) {
; CHECK-NEXT: [[CONV2:%.*]] = ashr exact i32 [[SEXT]], 24
; CHECK-NEXT: [[INVARIANT_OP:%.*]] = sub nsw i32 7, [[CONV2]]
; CHECK-NEXT: call void @use(i32 [[INVARIANT_OP]])
+; CHECK-NEXT: [[SEXT_US:%.*]] = shl i32 [[SEL]], 24
+; CHECK-NEXT: [[CONV2_US:%.*]] = ashr exact i32 [[SEXT_US]], 24
+; CHECK-NEXT: [[INVARIANT_OP_US:%.*]] = sub nsw i32 7, [[CONV2_US]]
; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:
; CHECK-NEXT: br i1 true, label [[EXIT:%.*]], label [[LOOP]]
; CHECK: exit:
-; CHECK-NEXT: [[SEXT_US:%.*]] = shl i32 [[SEL]], 24
-; CHECK-NEXT: [[CONV2_US:%.*]] = ashr exact i32 [[SEXT_US]], 24
-; CHECK-NEXT: [[INVARIANT_OP_US:%.*]] = sub nsw i32 7, [[CONV2_US]]
; CHECK-NEXT: ret i32 [[INVARIANT_OP_US]]
;
entry:
diff --git a/llvm/test/Transforms/IndVarSimplify/replace-loop-exit-folds.ll b/llvm/test/Transforms/IndVarSimplify/replace-loop-exit-folds.ll
index b3162de0f2245..7cdc98a6c4f7c 100644
--- a/llvm/test/Transforms/IndVarSimplify/replace-loop-exit-folds.ll
+++ b/llvm/test/Transforms/IndVarSimplify/replace-loop-exit-folds.ll
@@ -4,22 +4,21 @@
target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
define i32 @remove_loop(i32 %size) {
-; CHECK-LABEL: define i32 @remove_loop(
-; CHECK-SAME: i32 [[SIZE:%.*]]) {
-; CHECK-NEXT: [[ENTRY:.*]]:
-; CHECK-NEXT: br label %[[WHILE_COND:.*]]
-; CHECK: [[WHILE_COND]]:
-; CHECK-NEXT: [[SIZE_ADDR_0:%.*]] = phi i32 [ [[SIZE]], %[[ENTRY]] ], [ [[SUB:%.*]], %[[WHILE_COND]] ]
-; CHECK-NEXT: [[CMP:%.*]] = icmp ugt i32 [[SIZE_ADDR_0]], 31
-; CHECK-NEXT: [[SUB]] = add i32 [[SIZE_ADDR_0]], -32
-; CHECK-NEXT: br i1 [[CMP]], label %[[WHILE_COND]], label %[[WHILE_END:.*]]
-; CHECK: [[WHILE_END]]:
-; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[SIZE]], 31
+; CHECK-LABEL: @remove_loop(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = add i32 [[SIZE:%.*]], 31
; CHECK-NEXT: [[UMIN:%.*]] = call i32 @llvm.umin.i32(i32 [[SIZE]], i32 31)
; CHECK-NEXT: [[TMP1:%.*]] = sub i32 [[TMP0]], [[UMIN]]
; CHECK-NEXT: [[TMP2:%.*]] = lshr i32 [[TMP1]], 5
; CHECK-NEXT: [[TMP3:%.*]] = shl nuw i32 [[TMP2]], 5
; CHECK-NEXT: [[TMP4:%.*]] = sub i32 [[SIZE]], [[TMP3]]
+; CHECK-NEXT: br label [...
[truncated]
|
|
The above mentioned PR #157559 is reverted due to increased stack size and stack overflow when the stack size is very low. This was due to way the MSSAU was updated. I have a fix for that, but there were issues raised for performance regression with that change. So I wanted to see if removing this optimization would be a good way. Else I will raise a PR again for the #157559 again with those fixes. |
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
a5d6e38 to
f7c8e8f
Compare
|
Pinging for review, if this is not a viable solution, This the reland PR with the fix for stack overflow on smaller stack size. #170204 |
| @@ -1,19 +0,0 @@ | |||
| ; RUN: opt < %s -passes=indvars -S | FileCheck %s | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably shouldn't be deleting tests?
Probably more worthwhile to investigate those directly? Just ripping out an optimization that sounds obviously useful doesn't seem like the right strategy |
|
That's a fair point. I tried fixing this by sinking invariant memory instructions too (#157559), but that caused perf regressions in small cases which are non-trivial. (this issue #166671). Also, this logic is arguably out of place in Even with the sinking invariant load change, LICM conservatively doesn't sink invariant load if there's any memdef after( even if they don't alias) , hence still causing extended liverange define void @test_loop_invariant_sinking(ptr %ptr, ptr %ptr2, i32 %N) {
; CHECK-LABEL: define void @test_loop_invariant_sinking(
; CHECK-SAME: ptr [[PTR:%.*]], ptr [[PTR2:%.*]], i32 [[N:%.*]]) {
; CHECK-NEXT: [[ENTRY:.*]]:
; CHECK-NEXT: [[VAL1:%.*]] = load i32, ptr [[PTR]], align 4
; CHECK-NEXT: store i32 20, ptr [[PTR2]], align 4
; CHECK-NEXT: br label %[[LOOP:.*]]
; CHECK: [[LOOP]]:
; CHECK-NEXT: [[I:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ [[I_NEXT:%.*]], %[[LOOP]] ]
; CHECK-NEXT: [[I_NEXT]] = add i32 [[I]], 1
; CHECK-NEXT: [[COND:%.*]] = icmp slt i32 [[I_NEXT]], [[N]]
; CHECK-NEXT: br i1 [[COND]], label %[[LOOP]], label %[[EXIT:.*]]
; CHECK: [[EXIT]]:
; CHECK-NEXT: [[INV0:%.*]] = add i32 [[VAL1]], 10
; CHECK-NEXT: [[VAL2:%.*]] = load i32, ptr [[PTR]], align 4
; CHECK-NEXT: [[INV2:%.*]] = add i32 [[VAL2]], 20
; CHECK-NEXT: [[INV3:%.*]] = mul i32 [[INV2]], [[INV0]]
; CHECK-NEXT: ret void
;
entry:
; Preheader
%val1 = load i32, ptr %ptr
%inv0 = add i32 %val1, 10
store i32 20, ptr %ptr2
%val2 = load i32, ptr %ptr
%inv2 = add i32 %val2, 20
%inv3 = mul i32 %inv2, %inv0
br label %loop
loop:
%i = phi i32 [ 0, %entry ], [ %i.next, %loop ]
%i.next = add i32 %i, 1
%cond = icmp slt i32 %i.next, %N
br i1 %cond, label %loop, label %exit
exit:
ret void
}Given that this logic was broken for years, and "fixing" it causes significant regressions that are difficult to fix without introducing others, I believe the code is currently a net negative. Removing this misplaced and problematic logic is cleaner than maintaining a fragile implementation that fights against register allocation. In any case if we want to keep this optimization, this #170204 has the reland patch. |
This sinkunused invariants from preheader to exit does not sink the invariant memory instructions. Hence the liveness range of the memory instructions is extended till the exit block creating a lot of register spills.
This performance regression was not very visible before because the sinkunused invariants code had a bug which double iterated skipping a lot of potential instructions that could be sunk.
Tried sinking the invariant memory instructions, but that causes performance regression in small cases (#157559). This effort is to see if removing this code has a big performance impact.