Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LoopPeel] LCSSA form is destroyed by LoopPeel, preserve it #78696

Merged
merged 3 commits into from
Feb 21, 2024

Conversation

vedantparanjape-amd
Copy link
Member

LoopPeel util takes in the PreserveLCSSA argument, but the transformation doesn't seem to preserve the LCSSA form. But, it passes on the same argument to simplifyLoop which checks if the loop is in a valid LCSSA form, when (PreserveLCSSA = true).

This causes an assert in simplifyLoop when (PreserveLCSSA = true), as LoopPeel doesn't preserve LCSSA, and thus crashes at the following assert.

assert(L->isRecursivelyLCSSAForm(*DT, *LI) &&
"Requested to preserve LCSSA, but it's already broken.");

This patch fixes #77118, it tries to form LCSSA at the end of the optimization if (PreserveLCSSA = true) and the loop is not in LCSSA form. It also adds a test case of LoopUnrolling which crashed earlier.

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 19, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Vedant Paranjape (vedantparanjape-amd)

Changes

LoopPeel util takes in the PreserveLCSSA argument, but the transformation doesn't seem to preserve the LCSSA form. But, it passes on the same argument to simplifyLoop which checks if the loop is in a valid LCSSA form, when (PreserveLCSSA = true).

This causes an assert in simplifyLoop when (PreserveLCSSA = true), as LoopPeel doesn't preserve LCSSA, and thus crashes at the following assert.

assert(L->isRecursivelyLCSSAForm(*DT, *LI) &&
"Requested to preserve LCSSA, but it's already broken.");

This patch fixes llvm#77118, it tries to form LCSSA at the end of the optimization if (PreserveLCSSA = true) and the loop is not in LCSSA form. It also adds a test case of LoopUnrolling which crashed earlier.


Full diff: https://github.com/llvm/llvm-project/pull/78696.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/LoopPeel.cpp (+2)
  • (added) llvm/test/Transforms/LoopUnroll/gh-issue77118-broken-lcssa-form.ll (+117)
diff --git a/llvm/lib/Transforms/Utils/LoopPeel.cpp b/llvm/lib/Transforms/Utils/LoopPeel.cpp
index f76fa3bb6c61144..3e0545656d493c6 100644
--- a/llvm/lib/Transforms/Utils/LoopPeel.cpp
+++ b/llvm/lib/Transforms/Utils/LoopPeel.cpp
@@ -1054,6 +1054,8 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, LoopInfo *LI,
   assert(DT.verify(DominatorTree::VerificationLevel::Fast));
 #endif
 
+  if (PreserveLCSSA && !L->isRecursivelyLCSSAForm(DT, *LI))
+    formLCSSARecursively(*L, DT, LI, SE);
   // FIXME: Incrementally update loop-simplify
   simplifyLoop(L, &DT, LI, SE, AC, nullptr, PreserveLCSSA);
 
diff --git a/llvm/test/Transforms/LoopUnroll/gh-issue77118-broken-lcssa-form.ll b/llvm/test/Transforms/LoopUnroll/gh-issue77118-broken-lcssa-form.ll
new file mode 100644
index 000000000000000..fe0607e59421c0e
--- /dev/null
+++ b/llvm/test/Transforms/LoopUnroll/gh-issue77118-broken-lcssa-form.ll
@@ -0,0 +1,117 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -S -passes=loop-unroll < %s | FileCheck %s
+source_filename = "./reduced.ll"
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128-ni:1-p2:32:8:8:32-ni:2"
+target triple = "x86_64-unknown-linux-gnu"
+
+define void @wombat() gc "statepoint-example" !prof !0 {
+; CHECK-LABEL: define void @wombat(
+; CHECK-SAME: ) gc "statepoint-example" !prof [[PROF0:![0-9]+]] {
+; CHECK-NEXT:  bb:
+; CHECK-NEXT:    br label [[BB1:%.*]]
+; CHECK:       bb1.loopexit.loopexit:
+; CHECK-NEXT:    br label [[BB1_LOOPEXIT:%.*]]
+; CHECK:       bb1.loopexit:
+; CHECK-NEXT:    br label [[BB1]]
+; CHECK:       bb1:
+; CHECK-NEXT:    [[PHI:%.*]] = phi i32 [ 1, [[BB:%.*]] ], [ 0, [[BB1_LOOPEXIT]] ]
+; CHECK-NEXT:    br label [[BB2_PEEL_BEGIN:%.*]]
+; CHECK:       bb2.peel.begin:
+; CHECK-NEXT:    br label [[BB2_PEEL:%.*]]
+; CHECK:       bb2.peel:
+; CHECK-NEXT:    br label [[BB4_PEEL:%.*]]
+; CHECK:       bb4.peel:
+; CHECK-NEXT:    [[TRUNC_PEEL:%.*]] = trunc i64 0 to i32
+; CHECK-NEXT:    br i1 true, label [[BB9_PEEL:%.*]], label [[BB7_LOOPEXIT2:%.*]]
+; CHECK:       bb9.peel:
+; CHECK-NEXT:    [[ADD_PEEL:%.*]] = add i32 1, [[PHI]]
+; CHECK-NEXT:    br i1 true, label [[BB9_1_PEEL:%.*]], label [[BB7_LOOPEXIT2]]
+; CHECK:       bb9.1.peel:
+; CHECK-NEXT:    br i1 false, label [[BB12_PREHEADER:%.*]], label [[BB2_PEEL_NEXT:%.*]]
+; CHECK:       bb2.peel.next:
+; CHECK-NEXT:    br label [[BB2_PEEL_NEXT1:%.*]]
+; CHECK:       bb2.peel.next1:
+; CHECK-NEXT:    br label [[BB1_PEEL_NEWPH:%.*]]
+; CHECK:       bb1.peel.newph:
+; CHECK-NEXT:    [[TMP0:%.*]] = add nuw nsw i32 [[PHI]], 1
+; CHECK-NEXT:    br label [[BB2:%.*]]
+; CHECK:       bb2:
+; CHECK-NEXT:    br label [[BB4:%.*]]
+; CHECK:       bb4:
+; CHECK-NEXT:    br i1 true, label [[BB9:%.*]], label [[BB7_LOOPEXIT:%.*]]
+; CHECK:       bb7.loopexit:
+; CHECK-NEXT:    [[DOTLCSSA:%.*]] = phi i32 [ [[TMP0]], [[BB4]] ], [ [[TMP0]], [[BB9]] ]
+; CHECK-NEXT:    br label [[BB7:%.*]]
+; CHECK:       bb7.loopexit2:
+; CHECK-NEXT:    [[PHI8_PH3:%.*]] = phi i32 [ [[ADD_PEEL]], [[BB9_PEEL]] ], [ [[TRUNC_PEEL]], [[BB4_PEEL]] ]
+; CHECK-NEXT:    br label [[BB7]]
+; CHECK:       bb7:
+; CHECK-NEXT:    [[PHI8:%.*]] = phi i32 [ [[DOTLCSSA]], [[BB7_LOOPEXIT]] ], [ [[PHI8_PH3]], [[BB7_LOOPEXIT2]] ]
+; CHECK-NEXT:    ret void
+; CHECK:       bb9:
+; CHECK-NEXT:    br i1 true, label [[BB9_1:%.*]], label [[BB7_LOOPEXIT]]
+; CHECK:       bb9.1:
+; CHECK-NEXT:    br i1 false, label [[BB12_PREHEADER_LOOPEXIT:%.*]], label [[BB2]], !llvm.loop [[LOOP1:![0-9]+]]
+; CHECK:       bb12.preheader.loopexit:
+; CHECK-NEXT:    br label [[BB12_PREHEADER]]
+; CHECK:       bb12.preheader:
+; CHECK-NEXT:    br label [[BB12_PEEL_BEGIN:%.*]]
+; CHECK:       bb12.peel.begin:
+; CHECK-NEXT:    br label [[BB12_PEEL:%.*]]
+; CHECK:       bb12.peel:
+; CHECK-NEXT:    br i1 false, label [[BB1_LOOPEXIT]], label [[BB12_PEEL_NEXT:%.*]], !prof [[PROF3:![0-9]+]]
+; CHECK:       bb12.peel.next:
+; CHECK-NEXT:    br label [[BB12_PEEL_NEXT4:%.*]]
+; CHECK:       bb12.peel.next4:
+; CHECK-NEXT:    br label [[BB12_PREHEADER_PEEL_NEWPH:%.*]]
+; CHECK:       bb12.preheader.peel.newph:
+; CHECK-NEXT:    br label [[BB12:%.*]]
+; CHECK:       bb12:
+; CHECK-NEXT:    br i1 false, label [[BB1_LOOPEXIT_LOOPEXIT:%.*]], label [[BB12]], !prof [[PROF4:![0-9]+]], !llvm.loop [[LOOP5:![0-9]+]]
+;
+bb:
+  br label %bb1
+
+bb1:                                              ; preds = %bb12, %bb
+  %phi = phi i32 [ 1, %bb ], [ 0, %bb12 ]
+  br label %bb2
+
+bb2:                                              ; preds = %bb11, %bb1
+  %phi3 = phi i64 [ 0, %bb1 ], [ %sext, %bb11 ]
+  br label %bb4
+
+bb4:                                              ; preds = %bb9, %bb2
+  %phi5 = phi i64 [ %phi3, %bb2 ], [ %sext, %bb9 ]
+  %phi6 = phi i32 [ 1, %bb2 ], [ %add10, %bb9 ]
+  %trunc = trunc i64 %phi5 to i32
+  br i1 true, label %bb9, label %bb7
+
+bb7:                                              ; preds = %bb4
+  %phi8 = phi i32 [ %trunc, %bb4 ]
+  ret void
+
+bb9:                                              ; preds = %bb4
+  %add = add i32 1, %phi
+  %sext = sext i32 %add to i64
+  %add10 = add i32 %phi6, 1
+  %icmp = icmp ugt i32 %add10, 2
+  br i1 %icmp, label %bb11, label %bb4
+
+bb11:                                             ; preds = %bb9
+  br i1 false, label %bb12, label %bb2
+
+bb12:                                             ; preds = %bb12, %bb11
+  br i1 false, label %bb1, label %bb12, !prof !1
+}
+
+!0 = !{!"function_entry_count", i64 32768}
+!1 = !{!"branch_weights", i32 11, i32 1}
+;.
+; CHECK: [[PROF0]] = !{!"function_entry_count", i64 32768}
+; CHECK: [[LOOP1]] = distinct !{[[LOOP1]], [[META2:![0-9]+]]}
+; CHECK: [[META2]] = !{!"llvm.loop.peeled.count", i32 1}
+; CHECK: [[PROF3]] = !{!"branch_weights", i32 11, i32 1}
+; CHECK: [[PROF4]] = !{!"branch_weights", i32 11, i32 11}
+; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META2]], [[META6:![0-9]+]]}
+; CHECK: [[META6]] = !{!"llvm.loop.unroll.disable"}
+;.

@vedantparanjape-amd
Copy link
Member Author

ping ! @nikic @fhahn

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain in what way LCSSA form gets broken? How does the IR look like after peeling but before simplification?

Just calling formLCSSARecursively() is of course correct, but it's better to understand why LCSSA gets broken and whether we can fix it incrementally instead.

; RUN: opt -S -passes=loop-unroll < %s | FileCheck %s
source_filename = "./reduced.ll"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128-ni:1-p2:32:8:8:32-ni:2"
target triple = "x86_64-unknown-linux-gnu"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove source_filename, datalayout and triple.

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128-ni:1-p2:32:8:8:32-ni:2"
target triple = "x86_64-unknown-linux-gnu"

define void @wombat() gc "statepoint-example" !prof !0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove gc. Also can you replace the profile metadata with an -unroll-peel-count option?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With -unroll-peel-count=2, running llvm-reduce produces this smaller test case:

define void @wombat() {
bb:
  br label %bb1

bb1:                                              ; preds = %bb11, %bb
  %phi = phi i32 [ 1, %bb ], [ 0, %bb11 ]
  br label %bb2

bb2:                                              ; preds = %bb11, %bb1
  %phi3 = phi i64 [ 0, %bb1 ], [ %sext, %bb11 ]
  br label %bb4

bb4:                                              ; preds = %bb9, %bb2
  %phi5 = phi i64 [ %phi3, %bb2 ], [ %sext, %bb9 ]
  %trunc = trunc i64 %phi5 to i32
  br i1 true, label %bb9, label %bb7

bb7:                                              ; preds = %bb4
  %phi8 = phi i32 [ %trunc, %bb4 ]
  ret void

bb9:                                              ; preds = %bb4
  %sext = sext i32 %phi to i64
  br i1 false, label %bb11, label %bb4

bb11:                                             ; preds = %bb9
  br i1 false, label %bb1, label %bb2
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though you'll probably want to run this test case through loop-simplify,lcssa to produce the output that will actually be used after loop canonicalizations. Please also give basic blocks more meaningful names, like bb -> entry, bb1 -> loop1, bb2 -> loop2, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reduced testcase, doesn't crash the code. Since it somehow removes the bb12 loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a crash with the reduced test case: https://llvm.godbolt.org/z/84Wb9cTnf

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a crash with the reduced test case: https://llvm.godbolt.org/z/84Wb9cTnf

my bad, I didn't pass -unroll-peel-count=2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic that testcase seems to reproduce some different error, when run through loop-simplify,lcssa and then when it is run through LCSSA the crash is something different. When I dump the IR from loop-unroll just before the crash and try to run it through LCSSA it crashes as follows: https://godbolt.org/z/o5WM4hsjo

@vedantparanjape-amd
Copy link
Member Author

vedantparanjape-amd commented Jan 22, 2024

Can you please explain in what way LCSSA form gets broken? How does the IR look like after peeling but before simplification?

Just calling formLCSSARecursively() is of course correct, but it's better to understand why LCSSA gets broken and whether we can fix it incrementally instead.

red

This is the CFG and below is the diff at the end of LoopPeel before and after LCSSA form is regenerated.

image

LCSSA verifier fails as bb7 uses %0 which is defined in bb1.peel.newph, and bb1.peel.newph doesn't dominate bb7

@Nirhar
Copy link
Contributor

Nirhar commented Jan 31, 2024

@vedantparanjape-amd , going by the definition of LCSSA, I see that %0 is not used anywhere outside of the loop headed by bb1 except by the phi node in bb7, which I believe is permitted(going by this example in llvm-docs Link). Therefore in my opinion the code at the end of LoopPeel is already in LCSSA.

If that is the case, I believe that this test case hit some corner condition which was not considered by isRecursivelyInLCSSAForm and thus the patch must modify that.

@nikic @vedantparanjape-amd Please correct me if I am wrong, I am trying to refine my understanding of LCSSA

@vedantparanjape-amd
Copy link
Member Author

vedantparanjape-amd commented Feb 1, 2024

@vedantparanjape-amd , going by the definition of LCSSA, I see that %0 is not used anywhere outside of the loop headed by bb1 except by the phi node in bb7, which I believe is permitted(going by this example in llvm-docs Link). Therefore in my opinion the code at the end of LoopPeel is already in LCSSA.

If that is the case, I believe that this test case hit some corner condition which was not considered by isRecursivelyInLCSSAForm and thus the patch must modify that.

@nikic @vedantparanjape-amd Please correct me if I am wrong, I am trying to refine my understanding of LCSSA

Yeah, this seems correct. The assumptions with which it has been implemented seem to be wrong in this case. Quoting the code in isBlockInLCSSAForm and LLVM documents.

// For practical purposes, we consider that the use in a PHI
// occurs in the respective predecessor block. For more info,
// see the `phi` doc in LangRef and the LCSSA doc.
if (const PHINode *P = dyn_cast<PHINode>(UI))
  UserBB = P->getIncomingBlock(U);

Note that an exit block is outside of a loop, so how can such a phi "close"
the value inside the loop since it uses it outside of it ? First of all,
for phi nodes, as
`mentioned in the LangRef <https://llvm.org/docs/LangRef.html#id311>`_:
"the use of each incoming value is deemed to occur on the edge from the
corresponding predecessor block to the current block". Now, an
edge to an exit block is considered outside of the loop because
if we take that edge, it leads us clearly out of the loop.

However, an edge doesn't actually contain any IR, so in source code,
we have to choose a convention of whether the use happens in
the current block or in the respective predecessor. For LCSSA's purpose,
we consider the use happens in the latter (so as to consider the
use inside) [#point-of-use-phis]_.

So, the function starts with BB = bb1.peel.newph, UserBB = bb7 and UI = %phi8, now it checks if UI is a PhiNode, if it is, it assigns UserBB = IncomingBlock(UI) = bb7.loopexit, which is still fine. In the next if condition that returns false, it checks if BB != UserBB, which fails due to the wrong assumption that Phi occurs in the respective predecessor block only.

Actual call chain is bb1.peel.newph -> bb4 -> bb7.loopexit -> bb7 or bb1.peel.newph -> bb4 -> bb9-> bb7.loopexit -> bb7`.

I think we need to relax the criteria of use being in the predecessor block, it is not necessary to be a immediate predecessor. It's strange that before doing Loop Peeling this issue is not triggered.

This case is something like this bb1.peel.newph -> loop (<bb2>) -> loop-exit of bb2 (<bb7.loopexit>) -> bb7. both the basic blocks bb1.peel.newph and bb7 are not part of the inner loop.

CC @baziotis and @igor-laevsky (you seem to be author of the changes there)

@vedantparanjape-amd
Copy link
Member Author

vedantparanjape-amd commented Feb 1, 2024

Can you please explain in what way LCSSA form gets broken? How does the IR look like after peeling but before simplification?

Just calling formLCSSARecursively() is of course correct, but it's better to understand why LCSSA gets broken and whether we can fix it incrementally instead.

Actually the issue seems something else imo, the Peeled Loop seems bogus, it just adds blocks with unconditional branch instructions, the loop it seems to evaluate is Loop at depth 2 containing: %bb12<header><latch><exiting> which is a loop with no blocks and also infinite. Maybe we need to stop loop peeling in such cases. I am not sure why does LLVM try to peel and unroll and unconditional loop, that's going to infinitely loop. The crash happens when it tries to unroll the bb12 loop, this can be confirmed by these debug-only logs:

Loop Unroll: F[wombat] Loop %bb4
  Loop Size = 6
  Exiting block %bb4: TripCount=0, TripMultiple=1, BreakoutTrip=1
  Exiting block %bb9: TripCount=2, TripMultiple=0, BreakoutTrip=0
COMPLETELY UNROLLING loop %bb4 with trip count 2!
Loop Unroll: F[wombat] Loop %bb2
  Loop Size = 7
Loop size: 7
Peel Count: 1
PEELING loop %bb2 with iteration count 1!
Loop Unroll: F[wombat] Loop %bb12
  Loop Size = 3
Loop size: 3
Peel Count: 1
PEELING loop %bb12 with iteration count 1!
opt: /home/vparanja/llvm-project-upstream/llvm/lib/Transforms/Utils/LoopSimplify.cpp:712: bool llvm::simplifyLoop(llvm::Loop*, llvm::DominatorTree*, llvm::LoopInfo*, llvm::ScalarEvolution*, llvm::AssumptionCache*, llvm::MemorySSAUpdater*, bool): Assertion `L->isRecursivelyLCSSAForm(*DT, *LI) && "Requested to preserve LCSSA, but it's already broken."' failed.

@Nirhar
Copy link
Contributor

Nirhar commented Feb 1, 2024

Can you please explain in what way LCSSA form gets broken? How does the IR look like after peeling but before simplification?
Just calling formLCSSARecursively() is of course correct, but it's better to understand why LCSSA gets broken and whether we can fix it incrementally instead.

Actually the issue seems something else imo, the Peeled Loop seems bogus, it just adds blocks with unconditional branch instructions, the loop it seems to evaluate is Loop at depth 2 containing: %bb12<header><latch><exiting> which is a loop with no blocks and also infinite. Maybe we need to stop loop peeling in such cases.

I think this is just because the loop bb12 contains only a branch, however the issue still persists if the loop was a meaning full one as in this example: https://godbolt.org/z/a6cYb4Ta6

@vedantparanjape-amd
Copy link
Member Author

vedantparanjape-amd commented Feb 1, 2024

Can you please explain in what way LCSSA form gets broken? How does the IR look like after peeling but before simplification?
Just calling formLCSSARecursively() is of course correct, but it's better to understand why LCSSA gets broken and whether we can fix it incrementally instead.

Actually the issue seems something else imo, the Peeled Loop seems bogus, it just adds blocks with unconditional branch instructions, the loop it seems to evaluate is Loop at depth 2 containing: %bb12<header><latch><exiting> which is a loop with no blocks and also infinite. Maybe we need to stop loop peeling in such cases.

I think this is just because the loop bb12 contains only a branch, however the issue still persists if the loop was a meaning full one as in this example: https://godbolt.org/z/a6cYb4Ta6

If you add one more block to the loop, it doesn't crash: https://godbolt.org/z/h3hM7MMnq. It seems to be a issue with LoopUnrolling.

Ignoring my comment above about isBlockInLCSSAForm, I think there's nothing wrong with the function because bb7.loopexit and bb7 are not part of any loops, hence adding the the lcssa Phi seems correct to me.

@vedantparanjape-amd
Copy link
Member Author

diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
index 7cfeb019af97..937a31e0a532 100644
--- a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
@@ -1269,11 +1269,37 @@ tryToUnrollLoop(Loop *L, DominatorTree &DT, LoopInfo *LI, ScalarEvolution &SE,
              << " iterations";
     });

+    dbgs() << "========================================\n";
+    L->dump();
+    dbgs() << "========================================\n";
+    L->getParentLoop()->dump();
+    dbgs() << "========================================\n";
+    dbgs() << "Islcssa before PL: " << L->isRecursivelyLCSSAForm(DT, *LI) << "\n";
+    dbgs() << "Islcssa (parent) before PL: " << L->getParentLoop()->isRecursivelyLCSSAForm(DT, *LI) << "\n";
+    dbgs() << "========================================\n";
+
     ValueToValueMapTy VMap;
     if (peelLoop(L, PP.PeelCount, LI, &SE, DT, &AC, PreserveLCSSA, VMap)) {
+      L->dump();
+      dbgs() << "========================================\n";
+      L->getParentLoop()->dump();
+      dbgs() << "========================================\n";
+      dbgs() << "Islcssa after PL: " << L->isRecursivelyLCSSAForm(DT, *LI) << "\n";
+      dbgs() << "Islcssa (parent) after PL: " << L->getParentLoop()->isRecursivelyLCSSAForm(DT, *LI) << "\n";
+      dbgs() << "========================================\n";
+
       simplifyLoopAfterUnroll(L, true, LI, &SE, &DT, &AC, &TTI);
       // If the loop was peeled, we already "used up" the profile information
       // we had, so we don't want to unroll or peel again.
+
+      L->dump();
+      dbgs() << "========================================\n";
+      L->getParentLoop()->dump();
+      dbgs() << "========================================\n";
+      dbgs() << "Islcssa after SLAU: " << L->isRecursivelyLCSSAForm(DT, *LI) << "\n";
+      dbgs() << "Islcssa (parent) after SLAU: " << L->getParentLoop()->isRecursivelyLCSSAForm(DT, *LI) << "\n";
+      dbgs() << "========================================\n";
+
       if (PP.PeelProfiledIterations)
         L->setLoopAlreadyUnrolled();
       return LoopUnrollResult::PartiallyUnrolled;
diff --git a/llvm/lib/Transforms/Utils/LoopUnroll.cpp b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
index ee6f7b35750a..17f7a5afe13b 100644
--- a/llvm/lib/Transforms/Utils/LoopUnroll.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
@@ -218,11 +218,33 @@ void llvm::simplifyLoopAfterUnroll(Loop *L, bool SimplifyIVs, LoopInfo *LI,
                                    const TargetTransformInfo *TTI) {
   using namespace llvm::PatternMatch;

+  dbgs() << "+++ bb before +++\n";
+  L->getHeader()->getParent()->dump();
+  dbgs() << "+++ bb before +++\n";
+  L->dump();
+  dbgs() << "========================================\n";
+  L->getParentLoop()->dump();
+  dbgs() << "========================================\n";
+  dbgs() << "Islcssa (start) inside SLAU: " << L->isRecursivelyLCSSAForm(*DT, *LI) << "\n";
+  dbgs() << "Islcssa (start) (parent) after SLAU: " << L->getParentLoop()->isRecursivelyLCSSAForm(*DT, *LI) << "\n";
+  dbgs() << "========================================\n";
+
   // Simplify any new induction variables in the partially unrolled loop.
   if (SE && SimplifyIVs) {
     SmallVector<WeakTrackingVH, 16> DeadInsts;
     simplifyLoopIVs(L, SE, DT, LI, TTI, DeadInsts);

+    dbgs() << "+++ bb after +++\n";
+    L->getHeader()->getParent()->dump();
+    dbgs() << "+++ bb after +++\n";
+    L->dump();
+    dbgs() << "========================================\n";
+    L->getParentLoop()->dump();
+    dbgs() << "========================================\n";
+    dbgs() << "Islcssa (middle-in) inside SLAU: " << L->isRecursivelyLCSSAForm(*DT, *LI) << "\n";
+    dbgs() << "Islcssa (middle-in) (parent) after SLAU: " << L->getParentLoop()->isRecursivelyLCSSAForm(*DT, *LI) << "\n";
+    dbgs() << "========================================\n";
+
     // Aggressively clean up dead instructions that simplifyLoopIVs already
     // identified. Any remaining should be cleaned up below.
     while (!DeadInsts.empty()) {

Applying this patch, shows that simplifyLoopAfterUnroll destroys the LCSSA form of the outermost loop. I have attached the results of this patch below

+++ bb before +++
define void @wombat() gc "statepoint-example" !prof !0 {
bb:
  br label %bb1

bb1.loopexit:                                     ; preds = %bb12
  br label %bb1

bb1:                                              ; preds = %bb1.loopexit, %bb
  %phi = phi i32 [ 1, %bb ], [ 0, %bb1.loopexit ]
  br label %bb2.peel.begin

bb2.peel.begin:                                   ; preds = %bb1
  br label %bb2.peel

bb2.peel:                                         ; preds = %bb2.peel.begin
  br label %bb4.peel

bb4.peel:                                         ; preds = %bb2.peel
  %trunc.peel = trunc i64 0 to i32
  br i1 true, label %bb9.peel, label %bb7.loopexit2

bb9.peel:                                         ; preds = %bb4.peel
  %add.peel = add i32 1, %phi
  br i1 true, label %bb9.1.peel, label %bb7.loopexit2

bb9.1.peel:                                       ; preds = %bb9.peel
  %add.1.peel = add i32 1, %phi
  %sext.1.peel = sext i32 %add.1.peel to i64
  br i1 false, label %bb12.preheader, label %bb2.peel.next

bb2.peel.next:                                    ; preds = %bb9.1.peel
  br label %bb2.peel.next1

bb2.peel.next1:                                   ; preds = %bb2.peel.next
  br label %bb1.peel.newph

bb1.peel.newph:                                   ; preds = %bb2.peel.next1
  br label %bb2

bb2:                                              ; preds = %bb9.1, %bb1.peel.newph
  %phi3 = phi i64 [ %sext.1.peel, %bb1.peel.newph ], [ %sext.1, %bb9.1 ]
  br label %bb4

bb4:                                              ; preds = %bb2
  %trunc = trunc i64 %phi3 to i32
  br i1 true, label %bb9, label %bb7.loopexit

bb7.loopexit:                                     ; preds = %bb4, %bb9
  %phi8.ph = phi i32 [ %add, %bb9 ], [ %trunc, %bb4 ]
  br label %bb7

bb7.loopexit2:                                    ; preds = %bb4.peel, %bb9.peel
  %phi8.ph3 = phi i32 [ %add.peel, %bb9.peel ], [ %trunc.peel, %bb4.peel ]
  br label %bb7

bb7:                                              ; preds = %bb7.loopexit2, %bb7.loopexit
  %phi8 = phi i32 [ %phi8.ph, %bb7.loopexit ], [ %phi8.ph3, %bb7.loopexit2 ]
  ret void

bb9:                                              ; preds = %bb4
  %add = add i32 1, %phi
  br i1 true, label %bb9.1, label %bb7.loopexit

bb9.1:                                            ; preds = %bb9
  %add.1 = add i32 1, %phi
  %sext.1 = sext i32 %add.1 to i64
  br i1 false, label %bb12.preheader.loopexit, label %bb2, !llvm.loop !1

bb12.preheader.loopexit:                          ; preds = %bb9.1
  br label %bb12.preheader

bb12.preheader:                                   ; preds = %bb12.preheader.loopexit, %bb9.1.peel
  br label %bb12

bb12:                                             ; preds = %bb12.preheader, %bb12
  br i1 false, label %bb1.loopexit, label %bb12, !prof !3
}

+++ bb before +++
Parallel Loop at depth 2 containing: %bb2<header>,%bb4<exiting>,%bb9<exiting>,%bb9.1<latch><exiting>
========================================
Loop at depth 1 containing: %bb1<header>,%bb2,%bb4<exiting>,%bb9<exiting>,%bb12,%bb12.preheader,%bb1.loopexit<latch>,%bb9.1,%bb2.peel.begin,%bb2.peel.next,%bb1.peel.newph,%bb2.peel,%bb4.peel<exiting>,%bb9.peel<exiting>,%bb9.1.peel,%bb2.peel.next1,%bb12.preheader.loopexit
    Parallel Loop at depth 2 containing: %bb2<header>,%bb4<exiting>,%bb9<exiting>,%bb9.1<latch><exiting>
    Loop at depth 2 containing: %bb12<header><latch><exiting>
========================================
Islcssa (start) inside SLAU: 1
Islcssa (start) (parent) after SLAU: 1
========================================
+++ bb after +++
define void @wombat() gc "statepoint-example" !prof !0 {
bb:
  br label %bb1

bb1.loopexit:                                     ; preds = %bb12
  br label %bb1

bb1:                                              ; preds = %bb1.loopexit, %bb
  %phi = phi i32 [ 1, %bb ], [ 0, %bb1.loopexit ]
  br label %bb2.peel.begin

bb2.peel.begin:                                   ; preds = %bb1
  br label %bb2.peel

bb2.peel:                                         ; preds = %bb2.peel.begin
  br label %bb4.peel

bb4.peel:                                         ; preds = %bb2.peel
  %trunc.peel = trunc i64 0 to i32
  br i1 true, label %bb9.peel, label %bb7.loopexit2

bb9.peel:                                         ; preds = %bb4.peel
  %add.peel = add i32 1, %phi
  br i1 true, label %bb9.1.peel, label %bb7.loopexit2

bb9.1.peel:                                       ; preds = %bb9.peel
  %add.1.peel = add i32 1, %phi
  %sext.1.peel = sext i32 %add.1.peel to i64
  br i1 false, label %bb12.preheader, label %bb2.peel.next

bb2.peel.next:                                    ; preds = %bb9.1.peel
  br label %bb2.peel.next1

bb2.peel.next1:                                   ; preds = %bb2.peel.next
  br label %bb1.peel.newph

bb1.peel.newph:                                   ; preds = %bb2.peel.next1
  %0 = add nuw nsw i32 %phi, 1
  br label %bb2

bb2:                                              ; preds = %bb9.1, %bb1.peel.newph
  %phi3 = phi i64 [ %sext.1.peel, %bb1.peel.newph ], [ %sext.1, %bb9.1 ]
  br label %bb4

bb4:                                              ; preds = %bb2
  %trunc = trunc i64 %phi3 to i32
  br i1 true, label %bb9, label %bb7.loopexit

bb7.loopexit:                                     ; preds = %bb4, %bb9
  %phi8.ph = phi i32 [ %add, %bb9 ], [ %trunc, %bb4 ]
  br label %bb7

bb7.loopexit2:                                    ; preds = %bb4.peel, %bb9.peel
  %phi8.ph3 = phi i32 [ %add.peel, %bb9.peel ], [ %trunc.peel, %bb4.peel ]
  br label %bb7

bb7:                                              ; preds = %bb7.loopexit2, %bb7.loopexit
  %phi8 = phi i32 [ %0, %bb7.loopexit ], [ %phi8.ph3, %bb7.loopexit2 ]
  ret void

bb9:                                              ; preds = %bb4
  %add = add i32 1, %phi
  br i1 true, label %bb9.1, label %bb7.loopexit

bb9.1:                                            ; preds = %bb9
  %add.1 = add i32 1, %phi
  %sext.1 = sext i32 %add.1 to i64
  br i1 false, label %bb12.preheader.loopexit, label %bb2, !llvm.loop !1

bb12.preheader.loopexit:                          ; preds = %bb9.1
  br label %bb12.preheader

bb12.preheader:                                   ; preds = %bb12.preheader.loopexit, %bb9.1.peel
  br label %bb12

bb12:                                             ; preds = %bb12.preheader, %bb12
  br i1 false, label %bb1.loopexit, label %bb12, !prof !3
}

+++ bb after +++
Parallel Loop at depth 2 containing: %bb2<header>,%bb4<exiting>,%bb9<exiting>,%bb9.1<latch><exiting>
========================================
Loop at depth 1 containing: %bb1<header>,%bb2,%bb4<exiting>,%bb9<exiting>,%bb12,%bb12.preheader,%bb1.loopexit<latch>,%bb9.1,%bb2.peel.begin,%bb2.peel.next,%bb1.peel.newph,%bb2.peel,%bb4.peel<exiting>,%bb9.peel<exiting>,%bb9.1.peel,%bb2.peel.next1,%bb12.preheader.loopexit
    Parallel Loop at depth 2 containing: %bb2<header>,%bb4<exiting>,%bb9<exiting>,%bb9.1<latch><exiting>
    Loop at depth 2 containing: %bb12<header><latch><exiting>
========================================
Islcssa (middle-in) inside SLAU: 1
Islcssa (middle-in) (parent) after SLAU: 0
========================================
Parallel Loop at depth 2 containing: %bb2<header>,%bb4<exiting>,%bb9<exiting>,%bb9.1<latch><exiting>
========================================
Loop at depth 1 containing: %bb1<header>,%bb2,%bb4<exiting>,%bb9<exiting>,%bb12,%bb12.preheader,%bb1.loopexit<latch>,%bb9.1,%bb2.peel.begin,%bb2.peel.next,%bb1.peel.newph,%bb2.peel,%bb4.peel<exiting>,%bb9.peel<exiting>,%bb9.1.peel,%bb2.peel.next1,%bb12.preheader.loopexit
    Parallel Loop at depth 2 containing: %bb2<header>,%bb4<exiting>,%bb9<exiting>,%bb9.1<latch><exiting>
    Loop at depth 2 containing: %bb12<header><latch><exiting>
========================================
Islcssa (middle) inside SLAU: 1
Islcssa (middle) (parent) after SLAU: 0
========================================
Parallel Loop at depth 2 containing: %bb2<header>,%bb4<exiting>,%bb9<exiting>,%bb9.1<latch><exiting>
========================================
Loop at depth 1 containing: %bb1<header>,%bb2,%bb4<exiting>,%bb9<exiting>,%bb12,%bb12.preheader,%bb1.loopexit<latch>,%bb9.1,%bb2.peel.begin,%bb2.peel.next,%bb1.peel.newph,%bb2.peel,%bb4.peel<exiting>,%bb9.peel<exiting>,%bb9.1.peel,%bb2.peel.next1,%bb12.preheader.loopexit
    Parallel Loop at depth 2 containing: %bb2<header>,%bb4<exiting>,%bb9<exiting>,%bb9.1<latch><exiting>
    Loop at depth 2 containing: %bb12<header><latch><exiting>
========================================
Islcssa after SLAU: 1
Islcssa (parent) after SLAU: 0
========================================
Loop Unroll: F[wombat] Loop %bb12
  Loop Size = 3
PEELING loop %bb12 with iteration count 1!
========================================
Loop at depth 2 containing: %bb12<header><latch><exiting>
========================================
Loop at depth 1 containing: %bb1<header>,%bb2,%bb4<exiting>,%bb9<exiting>,%bb12,%bb12.preheader,%bb1.loopexit<latch>,%bb9.1,%bb2.peel.begin,%bb2.peel.next,%bb1.peel.newph,%bb2.peel,%bb4.peel<exiting>,%bb9.peel<exiting>,%bb9.1.peel,%bb2.peel.next1,%bb12.preheader.loopexit
    Parallel Loop at depth 2 containing: %bb2<header>,%bb4<exiting>,%bb9<exiting>,%bb9.1<latch><exiting>
    Loop at depth 2 containing: %bb12<header><latch><exiting>
========================================
Islcssa before PL: 1
Islcssa (parent) before PL: 0
========================================
opt: /home/vparanja/llvm-project-upstream/llvm/lib/Transforms/Utils/LoopSimplify.cpp:712: bool llvm::simplifyLoop(llvm::Loop*, llvm::DominatorTree*, llvm::LoopInfo*, llvm::ScalarEvolution*, llvm::AssumptionCache*, llvm::MemorySSAUpdater*, bool): Assertion `L->isRecursivelyLCSSAForm(*DT, *LI) && "Requested to preserve LCSSA, but it's already broken."' failed.

@nikic the issue is with simplifyLoopIVs function call, it seems to move the incoming values to Phi node from successor to predecessor. Is this expected behaviour ? If yes, I think we can simply add something to reform LCSSA after this call. Here's the diff of before and after: https://godbolt.org/z/9jo115Tas (look at bb1.peel.newph and bb7)

@Nirhar
Copy link
Contributor

Nirhar commented Feb 10, 2024

@nikic a gentle ping for your inputs!

@vedantparanjape-amd
Copy link
Member Author

ping !

@annamthomas
Copy link
Contributor

Quick question - I had seen a bug previously (and had a fix in phabricator, which I cannot access now). The issue is that canPeel doesn't check for LCSSA form, where as in fact peeling requires loop to be in LCSSA form. Can you check in your case if the loop already was in LCSSA form before peeling? What I described might be an entirely different issue (I could only trigger it from llvm unit tests at the time).

@vedantparanjape-amd
Copy link
Member Author

Quick question - I had seen a bug previously (and had a fix in phabricator, which I cannot access now). The issue is that canPeel doesn't check for LCSSA form, where as in fact peeling requires loop to be in LCSSA form. Can you check in your case if the loop already was in LCSSA form before peeling? What I described might be an entirely different issue (I could only trigger it from llvm unit tests at the time).

So, the issue is in SimplifyAfterUnroll call, it breaks the LCSSA form of the parent loop. The order of call is as follows:

LoopPeel -> LCCSA preserved
SimplifyLoopAfterUnroll -> LCSSA of parent loop destroyed
LoopPeel -> since LCSSA of parent loop was destroyed, this crashes.

To answer your question, during the 2nd call to LoopPeel, it is not in LCSSA form.

@vedantparanjape-amd
Copy link
Member Author

@nikic ping!

@nikic
Copy link
Contributor

nikic commented Feb 16, 2024

@vedantparanjape-amd If the LCSSA of the parent loop is already destroyed by the time we enter LoopPeel, then I don't think trying to recover it inside LoopPeel is the right thing to do. We should fix this at the point where LCSSA is broken.

Do I understand correctly that the only relation LoopPeel has to anything here is that it contains this LCSSA assertion? It looks like adding

  for (Loop *L : LI)
    assert(L->isRecursivelyLCSSAForm(DT, LI));

to the end of unrolling triggers the assertion even without peeling.


For the record, here is the test case with cleaned up block naming:

target triple = "x86_64-unknown-linux-gnu"

define void @test() !prof !0 {
entry:
  br label %loop1

loop1:
  %phi = phi i32 [ 1, %entry ], [ 0, %loop1.latch ]
  br label %loop2

loop2:
  %phi3 = phi i64 [ 0, %loop1 ], [ %sext, %loop2.latch ]
  br label %loop3

loop3:
  %phi5 = phi i64 [ %phi3, %loop2 ], [ %sext, %loop3.latch ]
  %phi6 = phi i32 [ 1, %loop2 ], [ %add10, %loop3.latch ]
  %trunc = trunc i64 %phi5 to i32
  br i1 true, label %loop3.latch, label %exit

loop3.latch:
  %add = add i32 1, %phi
  %sext = sext i32 %add to i64
  %add10 = add i32 %phi6, 1
  %icmp = icmp ugt i32 %add10, 2
  br i1 %icmp, label %loop2.latch, label %loop3

loop2.latch:
  br i1 false, label %loop4.preheader, label %loop2

loop4.preheader:
  br label %loop4

loop4:
  br i1 false, label %loop1.latch, label %loop4

loop1.latch:
  br label %loop1

exit:
  %phi8 = phi i32 [ %trunc, %loop3 ]
  ret void
}

!0 = !{!"function_entry_count", i64 32768}
!1 = !{!"branch_weights", i32 1, i32 1}

@Nirhar
Copy link
Contributor

Nirhar commented Feb 16, 2024

Hey @nikic! @vedantparanjape-amd has identified the point where LCSSA is broken in this comment here: #78696 (comment)

TLDR; LCSSA is broken in the function

void llvm::simplifyLoopAfterUnroll(Loop *L, bool SimplifyIVs, LoopInfo *LI,
,
which gets called after the first peel.

In particular the LCSSA is broken at this point:

simplifyLoopIVs(L, SE, DT, LI, TTI, DeadInsts);
,

which does the transformation as shown in the last diagram of this comment:
#78696 (comment), breaking the LCSSA.

Now we are unaware if such a transformation is expected of simplifyLoopIV and if it is also expected to maintain LCSSA.

@nikic
Copy link
Contributor

nikic commented Feb 16, 2024

Now we are unaware if such a transformation is expected of simplifyLoopIV and if it is also expected to maintain LCSSA.

simplifyLoopIV is supposed to maintain LCSSA form.

@vedantparanjape-amd
Copy link
Member Author

@vedantparanjape-amd If the LCSSA of the parent loop is already destroyed by the time we enter LoopPeel, then I don't think trying to recover it inside LoopPeel is the right thing to do. We should fix this at the point where LCSSA is broken.

Do I understand correctly that the only relation LoopPeel has to anything here is that it contains this LCSSA assertion? It looks like adding

Yeah. that's correct ! It has no relation as such with LoopPeel.

  for (Loop *L : LI)
    assert(L->isRecursivelyLCSSAForm(DT, LI));

to the end of unrolling triggers the assertion even without peeling.

For the record, here is the test case with cleaned up block naming:

target triple = "x86_64-unknown-linux-gnu"

define void @test() !prof !0 {
entry:
  br label %loop1

loop1:
  %phi = phi i32 [ 1, %entry ], [ 0, %loop1.latch ]
  br label %loop2

loop2:
  %phi3 = phi i64 [ 0, %loop1 ], [ %sext, %loop2.latch ]
  br label %loop3

loop3:
  %phi5 = phi i64 [ %phi3, %loop2 ], [ %sext, %loop3.latch ]
  %phi6 = phi i32 [ 1, %loop2 ], [ %add10, %loop3.latch ]
  %trunc = trunc i64 %phi5 to i32
  br i1 true, label %loop3.latch, label %exit

loop3.latch:
  %add = add i32 1, %phi
  %sext = sext i32 %add to i64
  %add10 = add i32 %phi6, 1
  %icmp = icmp ugt i32 %add10, 2
  br i1 %icmp, label %loop2.latch, label %loop3

loop2.latch:
  br i1 false, label %loop4.preheader, label %loop2

loop4.preheader:
  br label %loop4

loop4:
  br i1 false, label %loop1.latch, label %loop4

loop1.latch:
  br label %loop1

exit:
  %phi8 = phi i32 [ %trunc, %loop3 ]
  ret void
}

!0 = !{!"function_entry_count", i64 32768}
!1 = !{!"branch_weights", i32 1, i32 1}

Thanks for the cleaned up testcase.

@vedantparanjape-amd
Copy link
Member Author

Now we are unaware if such a transformation is expected of simplifyLoopIV and if it is also expected to maintain LCSSA.

simplifyLoopIV is supposed to maintain LCSSA form.

So, should we update the simplifyLoopIV or just call formLCSSAform after simplifyLoopIV ?

@nikic
Copy link
Contributor

nikic commented Feb 19, 2024

Now we are unaware if such a transformation is expected of simplifyLoopIV and if it is also expected to maintain LCSSA.

simplifyLoopIV is supposed to maintain LCSSA form.

So, should we update the simplifyLoopIV or just call formLCSSAform after simplifyLoopIV ?

We should fix simplifyLoopIV. Note that it already contains various replacementPreservesLCSSAForm checks already.

@vedantparanjape-amd
Copy link
Member Author

Now we are unaware if such a transformation is expected of simplifyLoopIV and if it is also expected to maintain LCSSA.

simplifyLoopIV is supposed to maintain LCSSA form.

So, should we update the simplifyLoopIV or just call formLCSSAform after simplifyLoopIV ?

We should fix simplifyLoopIV. Note that it already contains various replacementPreservesLCSSAForm checks already.

I think you mean to say simplifyLoopAfterUnroll has various replacementPreservesLCSSAForm checks, not the SimplifyLoopIV.

Copy link

github-actions bot commented Feb 20, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@vedantparanjape-amd
Copy link
Member Author

Now we are unaware if such a transformation is expected of simplifyLoopIV and if it is also expected to maintain LCSSA.

simplifyLoopIV is supposed to maintain LCSSA form.

So, should we update the simplifyLoopIV or just call formLCSSAform after simplifyLoopIV ?

We should fix simplifyLoopIV. Note that it already contains various replacementPreservesLCSSAForm checks already.

I pushed a fix, it adds a replacementPreservesLCSSAForm check to simplifyLoopIV. It passes the testcase as well. @nikic can you please check once, thanks !

<< " with loop invariant: " << *S
<< " as it breaks LCSSA form " << '\n');
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SCEVExpander is also supposed to preserve LCSSA form (if PreserveLCSSA is enabled, which it is by default), so I think the actual fix needs to be in SCEVExpander. It's not possible to abort a transform after using SCEVExpander.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's call to fixupLCSSAFormFor in SCEVExpander::expand, which in turn calls formLCSSAForInstructions. But it never reaches till that point, it simply exits before that point here as UseLoop == DefLoop and DefLoop->contains(UseLoop) is true.

if (!DefLoop || UseLoop == DefLoop || DefLoop->contains(UseLoop))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikic I studied the code, I don't think SCEVExpander is the right place. As we don't pass the Instruction to be replaced by the expanded value to the SCEVExpander at any point. I think it the duty of the function that calls the SCEVExpander to maintain LCSSA form, because SCEVExpander doesn't break LCSSA as such, but when we replace its use by the expanded value, LCSSA form is broken !!

Either we need a expanded value which doesn't make changes to the IR so that we can safely check for the LCSSA, or we need to somehow pass Instruction to be replaced to SCEVExpander and then do the check inside ? (I don't think we should do this).

What do you think, would be the ideal way to make this change ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I get the problem now. SCEVExpander insertion point is the loop-invariant location, but the actual use will be inside an exit block.

I think the options to fix this would be:

  1. Make a check that LCSSA is not broken before calling Rewriter, under the assumption that the instruction will be inserted at the insertion point. (This may have some false positives if instruction reuse occurs, but I think it's fine.)
  2. Fix up LCSSA form in this special case after the transform. So if !replacementPreservesLCSSAForm insert a phi node.

@vedantparanjape-amd
Copy link
Member Author

@nikic I implemented a fix based on our discussion, I implemented the 2nd one, because it makes sure optimization doesn't bail out. Can you review it once, thanks !

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new approach looks reasonable to me.

llvm/lib/Transforms/Utils/SimplifyIndVar.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/Utils/SimplifyIndVar.cpp Outdated Show resolved Hide resolved
In LoopUnroll, peelLoop is called on the loop. After the loop is peeled
it calls simplifyLoopAfterUnroll on the loop. This call to
simplifyLoopAfterUnroll doesn't preserve the LCSSA form of the parent
loop and thus during the next call to peelLoop the LCSSA form is already
broken.

LoopPeel util takes in the PreserveLCSSA argument and it passes
on the same argument to simplifyLoop which checks if the loop is in a
valid LCSSA form, when (PreserveLCSSA = true).

This causes an assert in simplifyLoop when (PreserveLCSSA = true), as
during the last call LCSSA for the loop wasn't preserved, and thus
crashes at the following assert.

assert(L->isRecursivelyLCSSAForm(*DT, *LI) &&
           "Requested to preserve LCSSA, but it's already broken.");

Upon debugging, it is evident that simplifyLoopIVs call inside
simplifyLoopAfterUnroll breaks the LCSSA form. This patch fixes
llvm#77118, it checks if the replacement of IV Users with Loop Invariant
preserves the LCSSA form. If it does not, it emits the required LCSSA
Phi instructions.
@vedantparanjape-amd
Copy link
Member Author

@nikic did the requested changes, is it okay to merge now ?

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for tracking this down all the way!

@vedantparanjape-amd vedantparanjape-amd removed the request for review from fhahn February 21, 2024 11:58
@vedantparanjape-amd vedantparanjape-amd merged commit e209178 into llvm:main Feb 21, 2024
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants