[LoopRotate] Rotate loop if this makes exit count computable #162654

mark-sed · 2025-10-09T13:32:40Z

This patch adds a profitability check to loop rotation to rotate the loop if this makes the exit latch count computable.

This form is beneficial to runtime loop unrolling as well as loop vectorization, which requires the loop to be bottom-tested.

I have tried different approaches to improve runtime unrolling (#146540 and #148243) none of which seems the right way to go.

After discussion with @annamthomas, we now think this additional heuristic of checking a computable latch is a stronger condition worth adding for a rotated loop form -- a good canonical form to rotate a loop into. This form helps with runtime loop unrolling (see test) and also LoopVectorization requires rotation to make the loop bottom-tested (unconditional latch -> rotate the loop to make latch conditional). There are other passes as well which prefers this "bottom-tested" notation for loops:

EarlyExitVectorization:

llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

Line 1716 in bd9117c

PSE.getSE()->getPredicatedExitCount(TheLoop, LatchBB, &Predicates))) {
LoopConstrainer:

llvm-project/llvm/lib/Transforms/Utils/LoopConstrainer.cpp

Line 168 in bd9117c

if (isa<SCEVCouldNotCompute>(MaxBETakenCount)) {

loop if this makes the exit latch count computable. This form is beneficial to runtime loop unrolling as well as loop vectorization, which requires the loop to be bottom-tested.

llvmbot · 2025-10-09T13:33:19Z

@llvm/pr-subscribers-llvm-transforms

Author: Marek Sedláček (mark-sed)

Changes

This patch adds a profitability check to loop rotation to rotate the loop if this makes the exit latch count computable.

This form is beneficial to runtime loop unrolling as well as loop vectorization, which requires the loop to be bottom-tested.

I have tried different approaches to improve runtime unrolling (#146540 and #148243) none of which seems the right way to go.

After discussion with @annamthomas, we now think this additional heuristic of checking a computable latch is a stronger condition worth adding for a rotated loop form -- a good canonical form to rotate a loop into. This form helps with runtime loop unrolling (see test) and also LoopVectorization requires rotation to make the loop bottom-tested (unconditional latch -> rotate the loop to make latch conditional). There are other passes as well which prefers this "bottom-tested" notation for loops:

EarlyExitVectorization:

llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

Line 1716 in bd9117c

PSE.getSE()->getPredicatedExitCount(TheLoop, LatchBB, &Predicates))) {
LoopConstrainer:

llvm-project/llvm/lib/Transforms/Utils/LoopConstrainer.cpp

Line 168 in bd9117c

if (isa<SCEVCouldNotCompute>(MaxBETakenCount)) {

Review request: @fhahn @nikic @davemgreen

Full diff: https://github.com/llvm/llvm-project/pull/162654.diff

2 Files Affected:

(modified) llvm/lib/Transforms/Utils/LoopRotationUtils.cpp (+15-1)
(added) llvm/test/Transforms/LoopUnroll/X86/runtime-unroll-after-rotate-if-computable.ll (+122)

diff --git a/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp b/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
index 0c8d6fa47b9ae..2abc56d95a393 100644
--- a/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopRotationUtils.cpp
@@ -200,6 +200,19 @@ static bool profitableToRotateLoopExitingLatch(Loop *L) {
   return false;
 }
 
+// Checks that if loop gets rotated it makes the exit latch count computable.
+// This form is beneficial to runtime loop unrolling as well as loop
+// vectorization, which requires the loop to be bottom-tested.
+static bool rotationMakesLoopComputable(Loop *L, ScalarEvolution *SE) {
+  BasicBlock *Header = L->getHeader();
+  BranchInst *BI = dyn_cast<BranchInst>(Header->getTerminator());
+  assert(BI && BI->isConditional() && "need header with conditional exit");
+  if (SE && isa<SCEVCouldNotCompute>(SE->getExitCount(L, L->getLoopLatch())) &&
+      !isa<SCEVCouldNotCompute>(SE->getExitCount(L, Header)))
+    return true;
+  return false;
+}
+
 static void updateBranchWeights(BranchInst &PreHeaderBI, BranchInst &LoopBI,
                                 bool HasConditionalPreHeader,
                                 bool SuccsSwapped) {
@@ -364,7 +377,8 @@ bool LoopRotate::rotateLoop(Loop *L, bool SimplifiedLatch) {
   // Rotate if the loop latch was just simplified. Or if it makes the loop exit
   // count computable. Or if we think it will be profitable.
   if (L->isLoopExiting(OrigLatch) && !SimplifiedLatch && IsUtilMode == false &&
-      !profitableToRotateLoopExitingLatch(L))
+      !profitableToRotateLoopExitingLatch(L) &&
+      !rotationMakesLoopComputable(L, SE))
     return Rotated;
 
   // Check size of original header and reject loop if it is very big or we can't
diff --git a/llvm/test/Transforms/LoopUnroll/X86/runtime-unroll-after-rotate-if-computable.ll b/llvm/test/Transforms/LoopUnroll/X86/runtime-unroll-after-rotate-if-computable.ll
new file mode 100644
index 0000000000000..2a408fbb364da
--- /dev/null
+++ b/llvm/test/Transforms/LoopUnroll/X86/runtime-unroll-after-rotate-if-computable.ll
@@ -0,0 +1,122 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt --passes='loop(loop-rotate),loop-unroll' -unroll-runtime=true -unroll-runtime-other-exit-predictable=1 -S %s | FileCheck %s
+; RUN: opt --passes='loop-unroll' -unroll-runtime=true -unroll-runtime-other-exit-predictable=1 -S %s | FileCheck %s -check-prefix=NO-ROTATE
+
+target triple = "x86_64-unknown-linux-gnu"
+
+; Test that loop gets unrolled if rotated (becomes computable after rotation).
+define void @test(i64 %0, ptr %1) {
+; CHECK-LABEL: define void @test(
+; CHECK-SAME: i64 [[TMP0:%.*]], ptr [[TMP1:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[B1:%.*]] = icmp eq i64 [[TMP0]], 0
+; CHECK-NEXT:    br i1 [[B1]], label %[[AFTER:.*]], label %[[BODY_LR_PH:.*]]
+; CHECK:       [[BODY_LR_PH]]:
+; CHECK-NEXT:    [[TMP5:%.*]] = sub i64 0, [[TMP0]]
+; CHECK-NEXT:    [[TMP2:%.*]] = freeze i64 [[TMP5]]
+; CHECK-NEXT:    [[TMP3:%.*]] = add i64 [[TMP2]], -1
+; CHECK-NEXT:    [[XTRAITER:%.*]] = and i64 [[TMP2]], 7
+; CHECK-NEXT:    [[LCMP_MOD:%.*]] = icmp ne i64 [[XTRAITER]], 0
+; CHECK-NEXT:    br i1 [[LCMP_MOD]], label %[[BODY_PROL_PREHEADER:.*]], label %[[BODY_PROL_LOOPEXIT:.*]]
+; CHECK:       [[BODY_PROL_PREHEADER]]:
+; CHECK-NEXT:    br label %[[BODY_PROL:.*]]
+; CHECK:       [[BODY_PROL]]:
+; CHECK-NEXT:    [[A2_PROL:%.*]] = phi i64 [ [[TMP0]], %[[BODY_PROL_PREHEADER]] ], [ [[A_PROL:%.*]], %[[HEADER_PROL:.*]] ]
+; CHECK-NEXT:    [[PROL_ITER:%.*]] = phi i64 [ 0, %[[BODY_PROL_PREHEADER]] ], [ [[PROL_ITER_NEXT:%.*]], %[[HEADER_PROL]] ]
+; CHECK-NEXT:    [[C_PROL:%.*]] = add i64 [[A2_PROL]], 1
+; CHECK-NEXT:    [[D_PROL:%.*]] = load i32, ptr [[TMP1]], align 4
+; CHECK-NEXT:    [[E_PROL:%.*]] = icmp eq i32 [[D_PROL]], 0
+; CHECK-NEXT:    br i1 [[E_PROL]], label %[[END_LOOPEXIT3:.*]], label %[[HEADER_PROL]]
+; CHECK:       [[HEADER_PROL]]:
+; CHECK-NEXT:    [[A_PROL]] = phi i64 [ [[C_PROL]], %[[BODY_PROL]] ]
+; CHECK-NEXT:    [[B_PROL:%.*]] = icmp eq i64 [[A_PROL]], 0
+; CHECK-NEXT:    [[PROL_ITER_NEXT]] = add i64 [[PROL_ITER]], 1
+; CHECK-NEXT:    [[PROL_ITER_CMP:%.*]] = icmp ne i64 [[PROL_ITER_NEXT]], [[XTRAITER]]
+; CHECK-NEXT:    br i1 [[PROL_ITER_CMP]], label %[[BODY_PROL]], label %[[BODY_PROL_LOOPEXIT_UNR_LCSSA:.*]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       [[BODY_PROL_LOOPEXIT_UNR_LCSSA]]:
+; CHECK-NEXT:    [[A2_UNR_PH:%.*]] = phi i64 [ [[A_PROL]], %[[HEADER_PROL]] ]
+; CHECK-NEXT:    br label %[[BODY_PROL_LOOPEXIT]]
+; CHECK:       [[BODY_PROL_LOOPEXIT]]:
+; CHECK-NEXT:    [[A2_UNR:%.*]] = phi i64 [ [[TMP0]], %[[BODY_LR_PH]] ], [ [[A2_UNR_PH]], %[[BODY_PROL_LOOPEXIT_UNR_LCSSA]] ]
+; CHECK-NEXT:    [[TMP6:%.*]] = icmp ult i64 [[TMP3]], 7
+; CHECK-NEXT:    br i1 [[TMP6]], label %[[HEADER_AFTER_CRIT_EDGE:.*]], label %[[BODY_LR_PH_NEW:.*]]
+; CHECK:       [[BODY_LR_PH_NEW]]:
+; CHECK-NEXT:    br label %[[BODY:.*]]
+; CHECK:       [[HEADER:.*]]:
+; CHECK-NEXT:    br i1 false, label %[[END_LOOPEXIT:.*]], label %[[HEADER_1:.*]]
+; CHECK:       [[HEADER_1]]:
+; CHECK-NEXT:    br i1 false, label %[[END_LOOPEXIT]], label %[[HEADER_2:.*]]
+; CHECK:       [[HEADER_2]]:
+; CHECK-NEXT:    br i1 false, label %[[END_LOOPEXIT]], label %[[HEADER_3:.*]]
+; CHECK:       [[HEADER_3]]:
+; CHECK-NEXT:    br i1 false, label %[[END_LOOPEXIT]], label %[[HEADER_4:.*]]
+; CHECK:       [[HEADER_4]]:
+; CHECK-NEXT:    br i1 false, label %[[END_LOOPEXIT]], label %[[HEADER_5:.*]]
+; CHECK:       [[HEADER_5]]:
+; CHECK-NEXT:    br i1 false, label %[[END_LOOPEXIT]], label %[[HEADER_6:.*]]
+; CHECK:       [[HEADER_6]]:
+; CHECK-NEXT:    [[C_7:%.*]] = add i64 [[A2:%.*]], 8
+; CHECK-NEXT:    br i1 false, label %[[END_LOOPEXIT]], label %[[HEADER_7:.*]]
+; CHECK:       [[HEADER_7]]:
+; CHECK-NEXT:    [[B_7:%.*]] = icmp eq i64 [[C_7]], 0
+; CHECK-NEXT:    br i1 [[B_7]], label %[[HEADER_AFTER_CRIT_EDGE_UNR_LCSSA:.*]], label %[[BODY]]
+; CHECK:       [[BODY]]:
+; CHECK-NEXT:    [[A2]] = phi i64 [ [[A2_UNR]], %[[BODY_LR_PH_NEW]] ], [ [[C_7]], %[[HEADER_7]] ]
+; CHECK-NEXT:    [[D:%.*]] = load i32, ptr [[TMP1]], align 4
+; CHECK-NEXT:    [[E:%.*]] = icmp eq i32 [[D]], 0
+; CHECK-NEXT:    br i1 [[E]], label %[[END_LOOPEXIT]], label %[[HEADER]]
+; CHECK:       [[END_LOOPEXIT]]:
+; CHECK-NEXT:    br label %[[END:.*]]
+; CHECK:       [[END_LOOPEXIT3]]:
+; CHECK-NEXT:    br label %[[END]]
+; CHECK:       [[END]]:
+; CHECK-NEXT:    ret void
+; CHECK:       [[HEADER_AFTER_CRIT_EDGE_UNR_LCSSA]]:
+; CHECK-NEXT:    br label %[[HEADER_AFTER_CRIT_EDGE]]
+; CHECK:       [[HEADER_AFTER_CRIT_EDGE]]:
+; CHECK-NEXT:    br label %[[AFTER]]
+; CHECK:       [[AFTER]]:
+; CHECK-NEXT:    ret void
+;
+; NO-ROTATE-LABEL: define void @test(
+; NO-ROTATE-SAME: i64 [[TMP0:%.*]], ptr [[TMP1:%.*]]) {
+; NO-ROTATE-NEXT:  [[ENTRY:.*]]:
+; NO-ROTATE-NEXT:    br label %[[HEADER:.*]]
+; NO-ROTATE:       [[HEADER]]:
+; NO-ROTATE-NEXT:    [[A_PROL:%.*]] = phi i64 [ [[TMP0]], %[[ENTRY]] ], [ [[C:%.*]], %[[BODY:.*]] ]
+; NO-ROTATE-NEXT:    [[B_PROL:%.*]] = icmp eq i64 [[A_PROL]], 0
+; NO-ROTATE-NEXT:    br i1 [[B_PROL]], label %[[AFTER:.*]], label %[[BODY]]
+; NO-ROTATE:       [[BODY]]:
+; NO-ROTATE-NEXT:    [[C]] = add i64 [[A_PROL]], 1
+; NO-ROTATE-NEXT:    [[D:%.*]] = load i32, ptr [[TMP1]], align 4
+; NO-ROTATE-NEXT:    [[E:%.*]] = icmp eq i32 [[D]], 0
+; NO-ROTATE-NEXT:    br i1 [[E]], label %[[END:.*]], label %[[HEADER]]
+; NO-ROTATE:       [[END]]:
+; NO-ROTATE-NEXT:    ret void
+; NO-ROTATE:       [[AFTER]]:
+; NO-ROTATE-NEXT:    ret void
+;
+entry:
+  br label %header
+
+header:
+  %a = phi i64 [ %0, %entry ], [ %c, %body ]
+  %b = icmp eq i64 %a, 0
+  br i1 %b, label %after, label %body
+
+body:
+  %c = add i64 %a, 1
+  %d = load i32, ptr %1, align 4
+  %e = icmp eq i32 %d, 0
+  br i1 %e, label %end, label %header
+
+end:
+  ret void
+
+after:
+  ret void
+}
+;.
+; CHECK: [[LOOP0]] = distinct !{[[LOOP0]], [[META1:![0-9]+]]}
+; CHECK: [[META1]] = !{!"llvm.loop.unroll.disable"}
+;.

annamthomas · 2025-10-09T13:46:41Z

this form helps with runtime loop unrolling (see test) and also LoopVectorization requires rotation to make the loop bottom-tested (unconditional latch -> rotate the loop to make latch conditional). There are other passes as well which prefers this "bottom-tested" notation for loops

Just to clarify, the main reason for loop rotation has always been to make the loop bottom tested. . What is being proposed here is a stronger heuristic: make the bottom tested loop countable (if possible).

As shown in the testcase, this helps in loop-unrolling. We also can potentially help vectorize more early-exit loops (as seen in the code referenced in the description).

nikic · 2025-10-09T15:54:02Z

It looks like this has a large impact on compile-time: https://llvm-compile-time-tracker.com/compare.php?from=5841319aca0f2596cc00ab83d54ec07c9b70da3c&to=08496a966850e06788de4bbf75e521ee5ed1363c&stat=instructions:u The clang thin link is up +0.5%.

Don't know whether that's due to the transform itself or whether it's because this enables expensive followup transforms.

mark-sed · 2025-10-10T16:13:39Z

Don't know whether that's due to the transform itself or whether it's because this enables expensive followup transforms.

@nikic That was unexpected. Do you have any recommendation on what to do about this?
We can see that this brings potential speedup with unrolling and vectorization where it was not possible and I am not sure what is justifiable trade off of compile time for performance.

One option I could propose is enabling this only for higher level opt.
Alternatively it could be controlled by off-by-default flag.

This patch adds a profitability check to loop rotation to rotate the

21a2d49

loop if this makes the exit latch count computable. This form is beneficial to runtime loop unrolling as well as loop vectorization, which requires the loop to be bottom-tested.

llvmbot added the llvm:transforms label Oct 9, 2025

nikic requested review from davemgreen, fhahn and nikic October 9, 2025 13:34

mark-sed mentioned this pull request Oct 9, 2025

[LoopUnroll] Rotate loop before unrolling inside of UnrollRuntimeLoopRemainder #148243

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LoopRotate] Rotate loop if this makes exit count computable #162654

[LoopRotate] Rotate loop if this makes exit count computable #162654

mark-sed commented Oct 9, 2025 •

edited by nikic

Loading

Uh oh!

llvmbot commented Oct 9, 2025

Uh oh!

annamthomas commented Oct 9, 2025

Uh oh!

nikic commented Oct 9, 2025 •

edited

Loading

Uh oh!

mark-sed commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[LoopRotate] Rotate loop if this makes exit count computable #162654

Are you sure you want to change the base?

[LoopRotate] Rotate loop if this makes exit count computable #162654

Conversation

mark-sed commented Oct 9, 2025 • edited by nikic Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Oct 9, 2025

Uh oh!

annamthomas commented Oct 9, 2025

Uh oh!

nikic commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mark-sed commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mark-sed commented Oct 9, 2025 •

edited by nikic

Loading

nikic commented Oct 9, 2025 •

edited

Loading