[IndVarSimplify] Remove sinkunusedInvariants #169250

VigneshwarJ · 2025-11-23T22:35:29Z

This sinkunused invariants from preheader to exit does not sink the invariant memory instructions. Hence the liveness range of the memory instructions is extended till the exit block creating a lot of register spills.
This performance regression was not very visible before because the sinkunused invariants code had a bug which double iterated skipping a lot of potential instructions that could be sunk.

Tried sinking the invariant memory instructions, but that causes performance regression in small cases (#157559). This effort is to see if removing this code has a big performance impact.

This sinkunused invariants from preheader to exit does not sink the invariant memory instructions. Hence the liveness range of the memory instructions is extended till the exit block creating a lot of register spills. This performance regression was not very visible before because the sinkunused invariants code had a bug which double iterated skipping a lot of potential instructions that could be sunk. Tried sinking the invariant memory instructions, but that causes performance regression in small cases. This effort is to see if removing this code has a big performance impact.

llvmbot · 2025-11-23T22:35:57Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-amdgpu

Author: Vigneshwar Jayakumar (VigneshwarJ)

Changes

This sinkunused invariants from preheader to exit does not sink the invariant memory instructions. Hence the liveness range of the memory instructions is extended till the exit block creating a lot of register spills.
This performance regression was not very visible before because the sinkunused invariants code had a bug which double iterated skipping a lot of potential instructions that could be sunk.

Tried sinking the invariant memory instructions, but that causes performance regression in small cases (#157559). This effort is to see if removing this code has a big performance impact.

Patch is 40.74 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169250.diff

23 Files Affected:

(modified) llvm/lib/Transforms/Scalar/IndVarSimplify.cpp (-83)
(modified) llvm/test/Transforms/IndVarSimplify/AMDGPU/addrspace-7-doesnt-crash.ll (+1-1)
(modified) llvm/test/Transforms/IndVarSimplify/ARM/code-size.ll (+10-12)
(modified) llvm/test/Transforms/IndVarSimplify/ARM/indvar-unroll-imm-cost.ll (+2-2)
(modified) llvm/test/Transforms/IndVarSimplify/X86/inner-loop-by-latch-cond.ll (+1-1)
(modified) llvm/test/Transforms/IndVarSimplify/exit-count-select.ll (+12-12)
(modified) llvm/test/Transforms/IndVarSimplify/finite-exit-comparisons.ll (+3-3)
(modified) llvm/test/Transforms/IndVarSimplify/pr116483.ll (+4-4)
(modified) llvm/test/Transforms/IndVarSimplify/pr24783.ll (+1-1)
(modified) llvm/test/Transforms/IndVarSimplify/pr39673.ll (+1-1)
(modified) llvm/test/Transforms/IndVarSimplify/pr63763.ll (+3-3)
(modified) llvm/test/Transforms/IndVarSimplify/replace-loop-exit-folds.ll (+10-11)
(modified) llvm/test/Transforms/IndVarSimplify/rewrite-loop-exit-values-phi.ll (+4-4)
(modified) llvm/test/Transforms/IndVarSimplify/scev-expander-preserve-lcssa.ll (+7-7)
(modified) llvm/test/Transforms/IndVarSimplify/scev-invalidation.ll (+2-2)
(modified) llvm/test/Transforms/IndVarSimplify/sentinel.ll (+7-7)
(removed) llvm/test/Transforms/IndVarSimplify/sink-from-preheader.ll (-32)
(removed) llvm/test/Transforms/IndVarSimplify/sink-trapping.ll (-19)
(modified) llvm/test/Transforms/IndVarSimplify/zext-nuw.ll (+1-1)
(modified) llvm/test/Transforms/LoopDeletion/invalidate-scev-after-hoisting.ll (+1-1)
(modified) llvm/test/Transforms/LoopDistribute/laa-invalidation.ll (+1-1)
(modified) llvm/test/Transforms/LoopUnroll/unroll-cleanup.ll (+7-7)
(modified) llvm/test/Transforms/PhaseOrdering/ARM/arm_mean_q7.ll (+6-8)

diff --git a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
index b46527eb1057b..e22591f9298b4 100644
--- a/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
+++ b/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp
@@ -162,7 +162,6 @@ class IndVarSimplify {
                                  const SCEV *ExitCount,
                                  PHINode *IndVar, SCEVExpander &Rewriter);
 
-  bool sinkUnusedInvariants(Loop *L);
 
 public:
   IndVarSimplify(LoopInfo *LI, ScalarEvolution *SE, DominatorTree *DT,
@@ -1093,85 +1092,6 @@ linearFunctionTestReplace(Loop *L, BasicBlock *ExitingBB,
   return true;
 }
 
-//===----------------------------------------------------------------------===//
-//  sinkUnusedInvariants. A late subpass to cleanup loop preheaders.
-//===----------------------------------------------------------------------===//
-
-/// If there's a single exit block, sink any loop-invariant values that
-/// were defined in the preheader but not used inside the loop into the
-/// exit block to reduce register pressure in the loop.
-bool IndVarSimplify::sinkUnusedInvariants(Loop *L) {
-  BasicBlock *ExitBlock = L->getExitBlock();
-  if (!ExitBlock) return false;
-
-  BasicBlock *Preheader = L->getLoopPreheader();
-  if (!Preheader) return false;
-
-  bool MadeAnyChanges = false;
-  for (Instruction &I : llvm::make_early_inc_range(llvm::reverse(*Preheader))) {
-
-    // Skip BB Terminator.
-    if (Preheader->getTerminator() == &I)
-      continue;
-
-    // New instructions were inserted at the end of the preheader.
-    if (isa<PHINode>(I))
-      break;
-
-    // Don't move instructions which might have side effects, since the side
-    // effects need to complete before instructions inside the loop.  Also don't
-    // move instructions which might read memory, since the loop may modify
-    // memory. Note that it's okay if the instruction might have undefined
-    // behavior: LoopSimplify guarantees that the preheader dominates the exit
-    // block.
-    if (I.mayHaveSideEffects() || I.mayReadFromMemory())
-      continue;
-
-    // Skip debug or pseudo instructions.
-    if (I.isDebugOrPseudoInst())
-      continue;
-
-    // Skip eh pad instructions.
-    if (I.isEHPad())
-      continue;
-
-    // Don't sink alloca: we never want to sink static alloca's out of the
-    // entry block, and correctly sinking dynamic alloca's requires
-    // checks for stacksave/stackrestore intrinsics.
-    // FIXME: Refactor this check somehow?
-    if (isa<AllocaInst>(&I))
-      continue;
-
-    // Determine if there is a use in or before the loop (direct or
-    // otherwise).
-    bool UsedInLoop = false;
-    for (Use &U : I.uses()) {
-      Instruction *User = cast<Instruction>(U.getUser());
-      BasicBlock *UseBB = User->getParent();
-      if (PHINode *P = dyn_cast<PHINode>(User)) {
-        unsigned i =
-          PHINode::getIncomingValueNumForOperand(U.getOperandNo());
-        UseBB = P->getIncomingBlock(i);
-      }
-      if (UseBB == Preheader || L->contains(UseBB)) {
-        UsedInLoop = true;
-        break;
-      }
-    }
-
-    // If there is, the def must remain in the preheader.
-    if (UsedInLoop)
-      continue;
-
-    // Otherwise, sink it to the exit block.
-    I.moveBefore(ExitBlock->getFirstInsertionPt());
-    SE->forgetValue(&I);
-    MadeAnyChanges = true;
-  }
-
-  return MadeAnyChanges;
-}
-
 static void replaceExitCond(BranchInst *BI, Value *NewCond,
                             SmallVectorImpl<WeakTrackingVH> &DeadInsts) {
   auto *OldCond = BI->getCondition();
@@ -2079,9 +1999,6 @@ bool IndVarSimplify::run(Loop *L) {
 
   // The Rewriter may not be used from this point on.
 
-  // Loop-invariant instructions in the preheader that aren't used in the
-  // loop may be sunk below the loop to reduce register pressure.
-  Changed |= sinkUnusedInvariants(L);
 
   // rewriteFirstIterationLoopExitValues does not rely on the computation of
   // trip count and therefore can further simplify exit values in addition to
diff --git a/llvm/test/Transforms/IndVarSimplify/AMDGPU/addrspace-7-doesnt-crash.ll b/llvm/test/Transforms/IndVarSimplify/AMDGPU/addrspace-7-doesnt-crash.ll
index 08dcf1d7a0091..8e932e0c00d4f 100644
--- a/llvm/test/Transforms/IndVarSimplify/AMDGPU/addrspace-7-doesnt-crash.ll
+++ b/llvm/test/Transforms/IndVarSimplify/AMDGPU/addrspace-7-doesnt-crash.ll
@@ -7,11 +7,11 @@ define void @f(ptr addrspace(7) %arg) {
 ; CHECK-LABEL: define void @f
 ; CHECK-SAME: (ptr addrspace(7) [[ARG:%.*]]) {
 ; CHECK-NEXT:  bb:
+; CHECK-NEXT:    [[SCEVGEP:%.*]] = getelementptr i8, ptr addrspace(7) [[ARG]], i32 8
 ; CHECK-NEXT:    br label [[BB1:%.*]]
 ; CHECK:       bb1:
 ; CHECK-NEXT:    br i1 false, label [[BB2:%.*]], label [[BB1]]
 ; CHECK:       bb2:
-; CHECK-NEXT:    [[SCEVGEP:%.*]] = getelementptr i8, ptr addrspace(7) [[ARG]], i32 8
 ; CHECK-NEXT:    br label [[BB3:%.*]]
 ; CHECK:       bb3:
 ; CHECK-NEXT:    [[I4:%.*]] = load i32, ptr addrspace(7) [[SCEVGEP]], align 4
diff --git a/llvm/test/Transforms/IndVarSimplify/ARM/code-size.ll b/llvm/test/Transforms/IndVarSimplify/ARM/code-size.ll
index 2003b1a72206d..3c6535da486aa 100644
--- a/llvm/test/Transforms/IndVarSimplify/ARM/code-size.ll
+++ b/llvm/test/Transforms/IndVarSimplify/ARM/code-size.ll
@@ -4,33 +4,31 @@
 
 define i32 @remove_loop(i32 %size) #0 {
 ; CHECK-V8M-LABEL: @remove_loop(
-; CHECK-V8M-SAME: i32 [[SIZE:%.*]]) #[[ATTR0:[0-9]+]] {
 ; CHECK-V8M-NEXT:  entry:
-; CHECK-V8M-NEXT:    br label %[[WHILE_COND:.*]]
-; CHECK-V8M:       while.cond:
-; CHECK-V8M-NEXT:    br i1 false, label %[[WHILE_COND]], label %[[WHILE_END:.*]]
-; CHECK-V8M:       while.end:
-; CHECK-V8M-NEXT:    [[TMP0:%.*]] = add i32 [[SIZE]], 31
+; CHECK-V8M-NEXT:    [[TMP0:%.*]] = add i32 [[SIZE:%.*]], 31
 ; CHECK-V8M-NEXT:    [[UMIN:%.*]] = call i32 @llvm.umin.i32(i32 [[SIZE]], i32 31)
 ; CHECK-V8M-NEXT:    [[TMP1:%.*]] = sub i32 [[TMP0]], [[UMIN]]
 ; CHECK-V8M-NEXT:    [[TMP2:%.*]] = lshr i32 [[TMP1]], 5
 ; CHECK-V8M-NEXT:    [[TMP3:%.*]] = shl nuw i32 [[TMP2]], 5
 ; CHECK-V8M-NEXT:    [[TMP4:%.*]] = sub i32 [[SIZE]], [[TMP3]]
+; CHECK-V8M-NEXT:    br label [[WHILE_COND:%.*]]
+; CHECK-V8M:       while.cond:
+; CHECK-V8M-NEXT:    br i1 false, label [[WHILE_COND]], label [[WHILE_END:%.*]]
+; CHECK-V8M:       while.end:
 ; CHECK-V8M-NEXT:    ret i32 [[TMP4]]
 ;
 ; CHECK-V8A-LABEL: @remove_loop(
-; CHECK-V8A-SAME: i32 [[SIZE:%.*]]) #[[ATTR0:[0-9]+]] {
 ; CHECK-V8A-NEXT:  entry:
-; CHECK-V8A-NEXT:    br label %[[WHILE_COND:.*]]
-; CHECK-V8A:       while.cond:
-; CHECK-V8A-NEXT:    br i1 false, label %[[WHILE_COND]], label %[[WHILE_END:.*]]
-; CHECK-V8A:       while.end:
-; CHECK-V8A-NEXT:    [[TMP0:%.*]] = add i32 [[SIZE]], 31
+; CHECK-V8A-NEXT:    [[TMP0:%.*]] = add i32 [[SIZE:%.*]], 31
 ; CHECK-V8A-NEXT:    [[UMIN:%.*]] = call i32 @llvm.umin.i32(i32 [[SIZE]], i32 31)
 ; CHECK-V8A-NEXT:    [[TMP1:%.*]] = sub i32 [[TMP0]], [[UMIN]]
 ; CHECK-V8A-NEXT:    [[TMP2:%.*]] = lshr i32 [[TMP1]], 5
 ; CHECK-V8A-NEXT:    [[TMP3:%.*]] = shl nuw i32 [[TMP2]], 5
 ; CHECK-V8A-NEXT:    [[TMP4:%.*]] = sub i32 [[SIZE]], [[TMP3]]
+; CHECK-V8A-NEXT:    br label [[WHILE_COND:%.*]]
+; CHECK-V8A:       while.cond:
+; CHECK-V8A-NEXT:    br i1 false, label [[WHILE_COND]], label [[WHILE_END:%.*]]
+; CHECK-V8A:       while.end:
 ; CHECK-V8A-NEXT:    ret i32 [[TMP4]]
 ;
 entry:
diff --git a/llvm/test/Transforms/IndVarSimplify/ARM/indvar-unroll-imm-cost.ll b/llvm/test/Transforms/IndVarSimplify/ARM/indvar-unroll-imm-cost.ll
index 2261423766792..382f026e7de6a 100644
--- a/llvm/test/Transforms/IndVarSimplify/ARM/indvar-unroll-imm-cost.ll
+++ b/llvm/test/Transforms/IndVarSimplify/ARM/indvar-unroll-imm-cost.ll
@@ -77,6 +77,8 @@ define dso_local arm_aapcscc void @test(ptr nocapture %pDest, ptr nocapture read
 ; CHECK-NEXT:    [[CMP2780:%.*]] = icmp ugt i32 [[ADD25]], [[J_0_LCSSA]]
 ; CHECK-NEXT:    br i1 [[CMP2780]], label [[FOR_BODY29_PREHEADER:%.*]], label [[FOR_END40]]
 ; CHECK:       for.body29.preheader:
+; CHECK-NEXT:    [[TMP10:%.*]] = sub nsw i32 [[ADD25]], [[J_0_LCSSA]]
+; CHECK-NEXT:    [[SCEVGEP93:%.*]] = getelementptr i16, ptr [[PSRCB_ADDR_1_LCSSA]], i32 [[TMP10]]
 ; CHECK-NEXT:    br label [[FOR_BODY29:%.*]]
 ; CHECK:       for.body29:
 ; CHECK-NEXT:    [[J_184:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY29]] ], [ [[J_0_LCSSA]], [[FOR_BODY29_PREHEADER]] ]
@@ -100,8 +102,6 @@ define dso_local arm_aapcscc void @test(ptr nocapture %pDest, ptr nocapture read
 ; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[ADD25]]
 ; CHECK-NEXT:    br i1 [[EXITCOND]], label [[FOR_END40_LOOPEXIT:%.*]], label [[FOR_BODY29]]
 ; CHECK:       for.end40.loopexit:
-; CHECK-NEXT:    [[TMP10:%.*]] = sub nsw i32 [[ADD25]], [[J_0_LCSSA]]
-; CHECK-NEXT:    [[SCEVGEP93:%.*]] = getelementptr i16, ptr [[PSRCB_ADDR_1_LCSSA]], i32 [[TMP10]]
 ; CHECK-NEXT:    [[SCEVGEP:%.*]] = getelementptr i16, ptr [[PSRCA_ADDR_1_LCSSA]], i32 [[TMP10]]
 ; CHECK-NEXT:    [[SCEVGEP94:%.*]] = getelementptr i32, ptr [[PDEST_ADDR_1_LCSSA]], i32 [[TMP10]]
 ; CHECK-NEXT:    br label [[FOR_END40]]
diff --git a/llvm/test/Transforms/IndVarSimplify/X86/inner-loop-by-latch-cond.ll b/llvm/test/Transforms/IndVarSimplify/X86/inner-loop-by-latch-cond.ll
index 0fa6e34cf186e..0eb9debce8177 100644
--- a/llvm/test/Transforms/IndVarSimplify/X86/inner-loop-by-latch-cond.ll
+++ b/llvm/test/Transforms/IndVarSimplify/X86/inner-loop-by-latch-cond.ll
@@ -14,6 +14,7 @@ define void @test(i64 %a) {
 ; CHECK:       outer_header:
 ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ [[INDVARS_IV_NEXT:%.*]], [[OUTER_LATCH:%.*]] ], [ 21, [[ENTRY:%.*]] ]
 ; CHECK-NEXT:    [[I:%.*]] = phi i64 [ 20, [[ENTRY]] ], [ [[I_NEXT:%.*]], [[OUTER_LATCH]] ]
+; CHECK-NEXT:    [[I_NEXT]] = add nuw nsw i64 [[I]], 1
 ; CHECK-NEXT:    br label [[INNER_HEADER:%.*]]
 ; CHECK:       inner_header:
 ; CHECK-NEXT:    [[J:%.*]] = phi i64 [ 1, [[OUTER_HEADER]] ], [ [[J_NEXT:%.*]], [[INNER_HEADER]] ]
@@ -22,7 +23,6 @@ define void @test(i64 %a) {
 ; CHECK-NEXT:    [[EXITCOND:%.*]] = icmp ne i64 [[J_NEXT]], [[INDVARS_IV]]
 ; CHECK-NEXT:    br i1 [[EXITCOND]], label [[INNER_HEADER]], label [[OUTER_LATCH]]
 ; CHECK:       outer_latch:
-; CHECK-NEXT:    [[I_NEXT]] = add nuw nsw i64 [[I]], 1
 ; CHECK-NEXT:    [[COND2:%.*]] = icmp ne i64 [[I_NEXT]], 40
 ; CHECK-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
 ; CHECK-NEXT:    br i1 [[COND2]], label [[OUTER_HEADER]], label [[RETURN:%.*]]
diff --git a/llvm/test/Transforms/IndVarSimplify/exit-count-select.ll b/llvm/test/Transforms/IndVarSimplify/exit-count-select.ll
index 1592b84480e3f..50c8e0fb7aabf 100644
--- a/llvm/test/Transforms/IndVarSimplify/exit-count-select.ll
+++ b/llvm/test/Transforms/IndVarSimplify/exit-count-select.ll
@@ -4,12 +4,12 @@
 define i32 @logical_and_2ops(i32 %n, i32 %m) {
 ; CHECK-LABEL: @logical_and_2ops(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = freeze i32 [[M:%.*]]
+; CHECK-NEXT:    [[UMIN:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP0]], i32 [[N:%.*]])
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
 ; CHECK-NEXT:    br i1 false, label [[LOOP]], label [[EXIT:%.*]]
 ; CHECK:       exit:
-; CHECK-NEXT:    [[TMP0:%.*]] = freeze i32 [[M:%.*]]
-; CHECK-NEXT:    [[UMIN:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP0]], i32 [[N:%.*]])
 ; CHECK-NEXT:    ret i32 [[UMIN]]
 ;
 entry:
@@ -28,12 +28,12 @@ exit:
 define i32 @logical_or_2ops(i32 %n, i32 %m) {
 ; CHECK-LABEL: @logical_or_2ops(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = freeze i32 [[M:%.*]]
+; CHECK-NEXT:    [[UMIN:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP0]], i32 [[N:%.*]])
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
 ; CHECK-NEXT:    br i1 true, label [[EXIT:%.*]], label [[LOOP]]
 ; CHECK:       exit:
-; CHECK-NEXT:    [[TMP0:%.*]] = freeze i32 [[M:%.*]]
-; CHECK-NEXT:    [[UMIN:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP0]], i32 [[N:%.*]])
 ; CHECK-NEXT:    ret i32 [[UMIN]]
 ;
 entry:
@@ -52,14 +52,14 @@ exit:
 define i32 @logical_and_3ops(i32 %n, i32 %m, i32 %k) {
 ; CHECK-LABEL: @logical_and_3ops(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    br label [[LOOP:%.*]]
-; CHECK:       loop:
-; CHECK-NEXT:    br i1 false, label [[LOOP]], label [[EXIT:%.*]]
-; CHECK:       exit:
 ; CHECK-NEXT:    [[TMP0:%.*]] = freeze i32 [[K:%.*]]
 ; CHECK-NEXT:    [[TMP1:%.*]] = freeze i32 [[M:%.*]]
 ; CHECK-NEXT:    [[UMIN:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP0]], i32 [[TMP1]])
 ; CHECK-NEXT:    [[UMIN1:%.*]] = call i32 @llvm.umin.i32(i32 [[UMIN]], i32 [[N:%.*]])
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    br i1 false, label [[LOOP]], label [[EXIT:%.*]]
+; CHECK:       exit:
 ; CHECK-NEXT:    ret i32 [[UMIN1]]
 ;
 entry:
@@ -80,14 +80,14 @@ exit:
 define i32 @logical_or_3ops(i32 %n, i32 %m, i32 %k) {
 ; CHECK-LABEL: @logical_or_3ops(
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    br label [[LOOP:%.*]]
-; CHECK:       loop:
-; CHECK-NEXT:    br i1 true, label [[EXIT:%.*]], label [[LOOP]]
-; CHECK:       exit:
 ; CHECK-NEXT:    [[TMP0:%.*]] = freeze i32 [[K:%.*]]
 ; CHECK-NEXT:    [[TMP1:%.*]] = freeze i32 [[M:%.*]]
 ; CHECK-NEXT:    [[UMIN:%.*]] = call i32 @llvm.umin.i32(i32 [[TMP0]], i32 [[TMP1]])
 ; CHECK-NEXT:    [[UMIN1:%.*]] = call i32 @llvm.umin.i32(i32 [[UMIN]], i32 [[N:%.*]])
+; CHECK-NEXT:    br label [[LOOP:%.*]]
+; CHECK:       loop:
+; CHECK-NEXT:    br i1 true, label [[EXIT:%.*]], label [[LOOP]]
+; CHECK:       exit:
 ; CHECK-NEXT:    ret i32 [[UMIN1]]
 ;
 entry:
diff --git a/llvm/test/Transforms/IndVarSimplify/finite-exit-comparisons.ll b/llvm/test/Transforms/IndVarSimplify/finite-exit-comparisons.ll
index e006d9f6696ca..f798eb281f51a 100644
--- a/llvm/test/Transforms/IndVarSimplify/finite-exit-comparisons.ll
+++ b/llvm/test/Transforms/IndVarSimplify/finite-exit-comparisons.ll
@@ -932,6 +932,9 @@ for.end:                                          ; preds = %for.body, %entry
 define i16 @ult_multiuse_profit(i16 %n.raw, i8 %start) mustprogress {
 ; CHECK-LABEL: @ult_multiuse_profit(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP2:%.*]] = add i8 [[START:%.*]], 1
+; CHECK-NEXT:    [[TMP1:%.*]] = zext i8 [[TMP2]] to i16
+; CHECK-NEXT:    [[UMAX:%.*]] = call i16 @llvm.umax.i16(i16 [[TMP1]], i16 254)
 ; CHECK-NEXT:    [[TMP0:%.*]] = trunc i16 254 to i8
 ; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
 ; CHECK:       for.body:
@@ -940,9 +943,6 @@ define i16 @ult_multiuse_profit(i16 %n.raw, i8 %start) mustprogress {
 ; CHECK-NEXT:    [[CMP:%.*]] = icmp ult i8 [[IV_NEXT]], [[TMP0]]
 ; CHECK-NEXT:    br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]
 ; CHECK:       for.end:
-; CHECK-NEXT:    [[TMP1:%.*]] = add i8 [[START:%.*]], 1
-; CHECK-NEXT:    [[TMP2:%.*]] = zext i8 [[TMP1]] to i16
-; CHECK-NEXT:    [[UMAX:%.*]] = call i16 @llvm.umax.i16(i16 [[TMP2]], i16 254)
 ; CHECK-NEXT:    ret i16 [[UMAX]]
 ;
 entry:
diff --git a/llvm/test/Transforms/IndVarSimplify/pr116483.ll b/llvm/test/Transforms/IndVarSimplify/pr116483.ll
index 093e25a3caa81..e9e0d22bf960a 100644
--- a/llvm/test/Transforms/IndVarSimplify/pr116483.ll
+++ b/llvm/test/Transforms/IndVarSimplify/pr116483.ll
@@ -4,16 +4,16 @@
 define i32 @test() {
 ; CHECK-LABEL: define i32 @test() {
 ; CHECK-NEXT:  [[ENTRY:.*:]]
-; CHECK-NEXT:    br label %[[LOOP_BODY:.*]]
-; CHECK:       [[LOOP_BODY]]:
-; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[LOOP_BODY]]
-; CHECK:       [[EXIT]]:
 ; CHECK-NEXT:    [[XOR:%.*]] = xor i32 0, 3
 ; CHECK-NEXT:    [[MUL:%.*]] = mul i32 [[XOR]], 329
 ; CHECK-NEXT:    [[CONV:%.*]] = trunc i32 [[MUL]] to i16
 ; CHECK-NEXT:    [[SEXT:%.*]] = shl i16 [[CONV]], 8
 ; CHECK-NEXT:    [[CONV1:%.*]] = ashr i16 [[SEXT]], 8
 ; CHECK-NEXT:    [[CONV3:%.*]] = zext i16 [[CONV1]] to i32
+; CHECK-NEXT:    br label %[[LOOP_BODY:.*]]
+; CHECK:       [[LOOP_BODY]]:
+; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[LOOP_BODY]]
+; CHECK:       [[EXIT]]:
 ; CHECK-NEXT:    ret i32 [[CONV3]]
 ;
 entry:
diff --git a/llvm/test/Transforms/IndVarSimplify/pr24783.ll b/llvm/test/Transforms/IndVarSimplify/pr24783.ll
index c521bcaf59d49..37ecf42ea0fd3 100644
--- a/llvm/test/Transforms/IndVarSimplify/pr24783.ll
+++ b/llvm/test/Transforms/IndVarSimplify/pr24783.ll
@@ -7,11 +7,11 @@ target triple = "powerpc64-unknown-linux-gnu"
 define void @f(ptr %end.s, ptr %loc, i32 %p) {
 ; CHECK-LABEL: @f(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[END:%.*]] = getelementptr inbounds i32, ptr [[END_S:%.*]], i32 [[P:%.*]]
 ; CHECK-NEXT:    br label [[WHILE_BODY_I:%.*]]
 ; CHECK:       while.body.i:
 ; CHECK-NEXT:    br i1 true, label [[LOOP_EXIT:%.*]], label [[WHILE_BODY_I]]
 ; CHECK:       loop.exit:
-; CHECK-NEXT:    [[END:%.*]] = getelementptr inbounds i32, ptr [[END_S:%.*]], i32 [[P:%.*]]
 ; CHECK-NEXT:    store ptr [[END]], ptr [[LOC:%.*]], align 8
 ; CHECK-NEXT:    ret void
 ;
diff --git a/llvm/test/Transforms/IndVarSimplify/pr39673.ll b/llvm/test/Transforms/IndVarSimplify/pr39673.ll
index 7b093b34b91ad..3cee1ab7be881 100644
--- a/llvm/test/Transforms/IndVarSimplify/pr39673.ll
+++ b/llvm/test/Transforms/IndVarSimplify/pr39673.ll
@@ -148,6 +148,7 @@ loop2.end:                                       ; preds = %loop2
 define i16 @neg_loop_carried(i16 %arg) {
 ; CHECK-LABEL: @neg_loop_carried(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = add i16 [[ARG:%.*]], 2
 ; CHECK-NEXT:    br label [[LOOP1:%.*]]
 ; CHECK:       loop1:
 ; CHECK-NEXT:    [[L1:%.*]] = phi i16 [ 0, [[ENTRY:%.*]] ], [ [[L1_ADD:%.*]], [[LOOP1]] ]
@@ -155,7 +156,6 @@ define i16 @neg_loop_carried(i16 %arg) {
 ; CHECK-NEXT:    [[CMP1:%.*]] = icmp ult i16 [[L1_ADD]], 2
 ; CHECK-NEXT:    br i1 [[CMP1]], label [[LOOP1]], label [[LOOP2_PREHEADER:%.*]]
 ; CHECK:       loop2.preheader:
-; CHECK-NEXT:    [[TMP0:%.*]] = add i16 [[ARG:%.*]], 2
 ; CHECK-NEXT:    br label [[LOOP2:%.*]]
 ; CHECK:       loop2:
 ; CHECK-NEXT:    [[K2:%.*]] = phi i16 [ [[K2_ADD:%.*]], [[LOOP2]] ], [ [[TMP0]], [[LOOP2_PREHEADER]] ]
diff --git a/llvm/test/Transforms/IndVarSimplify/pr63763.ll b/llvm/test/Transforms/IndVarSimplify/pr63763.ll
index 427db1e67410a..a5fde67d6140a 100644
--- a/llvm/test/Transforms/IndVarSimplify/pr63763.ll
+++ b/llvm/test/Transforms/IndVarSimplify/pr63763.ll
@@ -16,13 +16,13 @@ define i32 @test(i1 %c) {
 ; CHECK-NEXT:    [[CONV2:%.*]] = ashr exact i32 [[SEXT]], 24
 ; CHECK-NEXT:    [[INVARIANT_OP:%.*]] = sub nsw i32 7, [[CONV2]]
 ; CHECK-NEXT:    call void @use(i32 [[INVARIANT_OP]])
+; CHECK-NEXT:    [[SEXT_US:%.*]] = shl i32 [[SEL]], 24
+; CHECK-NEXT:    [[CONV2_US:%.*]] = ashr exact i32 [[SEXT_US]], 24
+; CHECK-NEXT:    [[INVARIANT_OP_US:%.*]] = sub nsw i32 7, [[CONV2_US]]
 ; CHECK-NEXT:    br label [[LOOP:%.*]]
 ; CHECK:       loop:
 ; CHECK-NEXT:    br i1 true, label [[EXIT:%.*]], label [[LOOP]]
 ; CHECK:       exit:
-; CHECK-NEXT:    [[SEXT_US:%.*]] = shl i32 [[SEL]], 24
-; CHECK-NEXT:    [[CONV2_US:%.*]] = ashr exact i32 [[SEXT_US]], 24
-; CHECK-NEXT:    [[INVARIANT_OP_US:%.*]] = sub nsw i32 7, [[CONV2_US]]
 ; CHECK-NEXT:    ret i32 [[INVARIANT_OP_US]]
 ;
 entry:
diff --git a/llvm/test/Transforms/IndVarSimplify/replace-loop-exit-folds.ll b/llvm/test/Transforms/IndVarSimplify/replace-loop-exit-folds.ll
index b3162de0f2245..7cdc98a6c4f7c 100644
--- a/llvm/test/Transforms/IndVarSimplify/replace-loop-exit-folds.ll
+++ b/llvm/test/Transforms/IndVarSimplify/replace-loop-exit-folds.ll
@@ -4,22 +4,21 @@
 target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
 
 define i32 @remove_loop(i32 %size) {
-; CHECK-LABEL: define i32 @remove_loop(
-; CHECK-SAME: i32 [[SIZE:%.*]]) {
-; CHECK-NEXT:  [[ENTRY:.*]]:
-; CHECK-NEXT:    br label %[[WHILE_COND:.*]]
-; CHECK:       [[WHILE_COND]]:
-; CHECK-NEXT:    [[SIZE_ADDR_0:%.*]] = phi i32 [ [[SIZE]], %[[ENTRY]] ], [ [[SUB:%.*]], %[[WHILE_COND]] ]
-; CHECK-NEXT:    [[CMP:%.*]] = icmp ugt i32 [[SIZE_ADDR_0]], 31
-; CHECK-NEXT:    [[SUB]] = add i32 [[SIZE_ADDR_0]], -32
-; CHECK-NEXT:    br i1 [[CMP]], label %[[WHILE_COND]], label %[[WHILE_END:.*]]
-; CHECK:       [[WHILE_END]]:
-; CHECK-NEXT:    [[TMP0:%.*]] = add i32 [[SIZE]], 31
+; CHECK-LABEL: @remove_loop(
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[TMP0:%.*]] = add i32 [[SIZE:%.*]], 31
 ; CHECK-NEXT:    [[UMIN:%.*]] = call i32 @llvm.umin.i32(i32 [[SIZE]], i32 31)
 ; CHECK-NEXT:    [[TMP1:%.*]] = sub i32 [[TMP0]], [[UMIN]]
 ; CHECK-NEXT:    [[TMP2:%.*]] = lshr i32 [[TMP1]], 5
 ; CHECK-NEXT:    [[TMP3:%.*]] = shl nuw i32 [[TMP2]], 5
 ; CHECK-NEXT:    [[TMP4:%.*]] = sub i32 [[SIZE]], [[TMP3]]
+; CHECK-NEXT:    br label [...
[truncated]

VigneshwarJ · 2025-11-23T22:39:57Z

The above mentioned PR #157559 is reverted due to increased stack size and stack overflow when the stack size is very low. This was due to way the MSSAU was updated. I have a fix for that, but there were issues raised for performance regression with that change. So I wanted to see if removing this optimization would be a good way. Else I will raise a PR again for the #157559 again with those fixes.

github-actions · 2025-11-23T23:38:23Z

✅ With the latest revision this PR passed the C/C++ code formatter.

VigneshwarJ · 2025-12-04T18:20:20Z

Pinging for review, if this is not a viable solution, This the reland PR with the fix for stack overflow on smaller stack size. #170204

arsenm · 2025-12-04T18:52:50Z

llvm/test/Transforms/IndVarSimplify/sink-trapping.ll

@@ -1,19 +0,0 @@
-; RUN: opt < %s -passes=indvars -S | FileCheck %s


Probably shouldn't be deleting tests?

arsenm · 2025-12-04T18:55:30Z

Tried sinking the invariant memory instructions, but that causes performance regression in small cases

Probably more worthwhile to investigate those directly? Just ripping out an optimization that sounds obviously useful doesn't seem like the right strategy

VigneshwarJ · 2025-12-04T23:45:10Z

That's a fair point.
However, this sinking logic was broken for a long time (double iteration skipped a lot of instructions) until fixed recently in #135205. That fix exposed a major issue: it doesn't sink invariant memory instructions, which extends their liveness range to the exit block and causes significant register spills.

I tried fixing this by sinking invariant memory instructions too (#157559), but that caused perf regressions in small cases which are non-trivial. (this issue #166671).

Also, this logic is arguably out of place in IndVarSimplify and "fixing" it is fragile/regressive.

Even with the sinking invariant load change, LICM conservatively doesn't sink invariant load if there's any memdef after( even if they don't alias) , hence still causing extended liverange

define void @test_loop_invariant_sinking(ptr %ptr, ptr %ptr2, i32 %N) {
; CHECK-LABEL: define void @test_loop_invariant_sinking(
; CHECK-SAME: ptr [[PTR:%.*]], ptr [[PTR2:%.*]], i32 [[N:%.*]]) {
; CHECK-NEXT:  [[ENTRY:.*]]:
; CHECK-NEXT:    [[VAL1:%.*]] = load i32, ptr [[PTR]], align 4
; CHECK-NEXT:    store i32 20, ptr [[PTR2]], align 4
; CHECK-NEXT:    br label %[[LOOP:.*]]
; CHECK:       [[LOOP]]:
; CHECK-NEXT:    [[I:%.*]] = phi i32 [ 0, %[[ENTRY]] ], [ [[I_NEXT:%.*]], %[[LOOP]] ]
; CHECK-NEXT:    [[I_NEXT]] = add i32 [[I]], 1
; CHECK-NEXT:    [[COND:%.*]] = icmp slt i32 [[I_NEXT]], [[N]]
; CHECK-NEXT:    br i1 [[COND]], label %[[LOOP]], label %[[EXIT:.*]]
; CHECK:       [[EXIT]]:
; CHECK-NEXT:    [[INV0:%.*]] = add i32 [[VAL1]], 10
; CHECK-NEXT:    [[VAL2:%.*]] = load i32, ptr [[PTR]], align 4
; CHECK-NEXT:    [[INV2:%.*]] = add i32 [[VAL2]], 20
; CHECK-NEXT:    [[INV3:%.*]] = mul i32 [[INV2]], [[INV0]]
; CHECK-NEXT:    ret void
;
entry:
  ; Preheader
  %val1 = load i32, ptr %ptr
  %inv0 = add i32 %val1, 10
  store i32 20, ptr %ptr2
  %val2 = load i32, ptr %ptr
  %inv2 = add i32 %val2, 20
  %inv3 = mul i32 %inv2, %inv0
  br label %loop

loop:
  %i = phi i32 [ 0, %entry ], [ %i.next, %loop ]
  %i.next = add i32 %i, 1
  %cond = icmp slt i32 %i.next, %N
  br i1 %cond, label %loop, label %exit

exit:
  ret void
}

Given that this logic was broken for years, and "fixing" it causes significant regressions that are difficult to fix without introducing others, I believe the code is currently a net negative. Removing this misplaced and problematic logic is cleaner than maintaining a fragile implementation that fights against register allocation.

In any case if we want to keep this optimization, this #170204 has the reland patch.

llvmbot added backend:AMDGPU llvm:transforms labels Nov 23, 2025

VigneshwarJ requested a review from nikic November 23, 2025 22:35

VigneshwarJ requested a review from srpande November 23, 2025 22:40

Merge branch 'main' into invariantsreove

f7c8e8f

VigneshwarJ force-pushed the invariantsreove branch from a5d6e38 to f7c8e8f Compare November 23, 2025 23:47

arsenm reviewed Dec 4, 2025

View reviewed changes

VigneshwarJ requested review from arsenm and bcahoon December 4, 2025 23:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[IndVarSimplify] Remove sinkunusedInvariants #169250

[IndVarSimplify] Remove sinkunusedInvariants #169250

Uh oh!

VigneshwarJ commented Nov 23, 2025

Uh oh!

llvmbot commented Nov 23, 2025 •

edited

Loading

Uh oh!

VigneshwarJ commented Nov 23, 2025

Uh oh!

github-actions bot commented Nov 23, 2025 •

edited

Loading

Uh oh!

VigneshwarJ commented Dec 4, 2025

Uh oh!

arsenm Dec 4, 2025

Uh oh!

arsenm commented Dec 4, 2025

Uh oh!

VigneshwarJ commented Dec 4, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -1,19 +0,0 @@
		; RUN: opt < %s -passes=indvars -S \| FileCheck %s

[IndVarSimplify] Remove sinkunusedInvariants #169250

Are you sure you want to change the base?

[IndVarSimplify] Remove sinkunusedInvariants #169250

Uh oh!

Conversation

VigneshwarJ commented Nov 23, 2025

Uh oh!

llvmbot commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VigneshwarJ commented Nov 23, 2025

Uh oh!

github-actions bot commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VigneshwarJ commented Dec 4, 2025

Uh oh!

arsenm Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm commented Dec 4, 2025

Uh oh!

VigneshwarJ commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

llvmbot commented Nov 23, 2025 •

edited

Loading

github-actions bot commented Nov 23, 2025 •

edited

Loading

VigneshwarJ commented Dec 4, 2025 •

edited

Loading