-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LoopPeel] LCSSA form is destroyed by LoopPeel, preserve it #78696
Conversation
@llvm/pr-subscribers-llvm-transforms Author: Vedant Paranjape (vedantparanjape-amd) ChangesLoopPeel util takes in the PreserveLCSSA argument, but the transformation doesn't seem to preserve the LCSSA form. But, it passes on the same argument to simplifyLoop which checks if the loop is in a valid LCSSA form, when (PreserveLCSSA = true). This causes an assert in simplifyLoop when (PreserveLCSSA = true), as LoopPeel doesn't preserve LCSSA, and thus crashes at the following assert. assert(L->isRecursivelyLCSSAForm(*DT, *LI) && This patch fixes llvm#77118, it tries to form LCSSA at the end of the optimization if (PreserveLCSSA = true) and the loop is not in LCSSA form. It also adds a test case of LoopUnrolling which crashed earlier. Full diff: https://github.com/llvm/llvm-project/pull/78696.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Utils/LoopPeel.cpp b/llvm/lib/Transforms/Utils/LoopPeel.cpp
index f76fa3bb6c61144..3e0545656d493c6 100644
--- a/llvm/lib/Transforms/Utils/LoopPeel.cpp
+++ b/llvm/lib/Transforms/Utils/LoopPeel.cpp
@@ -1054,6 +1054,8 @@ bool llvm::peelLoop(Loop *L, unsigned PeelCount, LoopInfo *LI,
assert(DT.verify(DominatorTree::VerificationLevel::Fast));
#endif
+ if (PreserveLCSSA && !L->isRecursivelyLCSSAForm(DT, *LI))
+ formLCSSARecursively(*L, DT, LI, SE);
// FIXME: Incrementally update loop-simplify
simplifyLoop(L, &DT, LI, SE, AC, nullptr, PreserveLCSSA);
diff --git a/llvm/test/Transforms/LoopUnroll/gh-issue77118-broken-lcssa-form.ll b/llvm/test/Transforms/LoopUnroll/gh-issue77118-broken-lcssa-form.ll
new file mode 100644
index 000000000000000..fe0607e59421c0e
--- /dev/null
+++ b/llvm/test/Transforms/LoopUnroll/gh-issue77118-broken-lcssa-form.ll
@@ -0,0 +1,117 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -S -passes=loop-unroll < %s | FileCheck %s
+source_filename = "./reduced.ll"
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128-ni:1-p2:32:8:8:32-ni:2"
+target triple = "x86_64-unknown-linux-gnu"
+
+define void @wombat() gc "statepoint-example" !prof !0 {
+; CHECK-LABEL: define void @wombat(
+; CHECK-SAME: ) gc "statepoint-example" !prof [[PROF0:![0-9]+]] {
+; CHECK-NEXT: bb:
+; CHECK-NEXT: br label [[BB1:%.*]]
+; CHECK: bb1.loopexit.loopexit:
+; CHECK-NEXT: br label [[BB1_LOOPEXIT:%.*]]
+; CHECK: bb1.loopexit:
+; CHECK-NEXT: br label [[BB1]]
+; CHECK: bb1:
+; CHECK-NEXT: [[PHI:%.*]] = phi i32 [ 1, [[BB:%.*]] ], [ 0, [[BB1_LOOPEXIT]] ]
+; CHECK-NEXT: br label [[BB2_PEEL_BEGIN:%.*]]
+; CHECK: bb2.peel.begin:
+; CHECK-NEXT: br label [[BB2_PEEL:%.*]]
+; CHECK: bb2.peel:
+; CHECK-NEXT: br label [[BB4_PEEL:%.*]]
+; CHECK: bb4.peel:
+; CHECK-NEXT: [[TRUNC_PEEL:%.*]] = trunc i64 0 to i32
+; CHECK-NEXT: br i1 true, label [[BB9_PEEL:%.*]], label [[BB7_LOOPEXIT2:%.*]]
+; CHECK: bb9.peel:
+; CHECK-NEXT: [[ADD_PEEL:%.*]] = add i32 1, [[PHI]]
+; CHECK-NEXT: br i1 true, label [[BB9_1_PEEL:%.*]], label [[BB7_LOOPEXIT2]]
+; CHECK: bb9.1.peel:
+; CHECK-NEXT: br i1 false, label [[BB12_PREHEADER:%.*]], label [[BB2_PEEL_NEXT:%.*]]
+; CHECK: bb2.peel.next:
+; CHECK-NEXT: br label [[BB2_PEEL_NEXT1:%.*]]
+; CHECK: bb2.peel.next1:
+; CHECK-NEXT: br label [[BB1_PEEL_NEWPH:%.*]]
+; CHECK: bb1.peel.newph:
+; CHECK-NEXT: [[TMP0:%.*]] = add nuw nsw i32 [[PHI]], 1
+; CHECK-NEXT: br label [[BB2:%.*]]
+; CHECK: bb2:
+; CHECK-NEXT: br label [[BB4:%.*]]
+; CHECK: bb4:
+; CHECK-NEXT: br i1 true, label [[BB9:%.*]], label [[BB7_LOOPEXIT:%.*]]
+; CHECK: bb7.loopexit:
+; CHECK-NEXT: [[DOTLCSSA:%.*]] = phi i32 [ [[TMP0]], [[BB4]] ], [ [[TMP0]], [[BB9]] ]
+; CHECK-NEXT: br label [[BB7:%.*]]
+; CHECK: bb7.loopexit2:
+; CHECK-NEXT: [[PHI8_PH3:%.*]] = phi i32 [ [[ADD_PEEL]], [[BB9_PEEL]] ], [ [[TRUNC_PEEL]], [[BB4_PEEL]] ]
+; CHECK-NEXT: br label [[BB7]]
+; CHECK: bb7:
+; CHECK-NEXT: [[PHI8:%.*]] = phi i32 [ [[DOTLCSSA]], [[BB7_LOOPEXIT]] ], [ [[PHI8_PH3]], [[BB7_LOOPEXIT2]] ]
+; CHECK-NEXT: ret void
+; CHECK: bb9:
+; CHECK-NEXT: br i1 true, label [[BB9_1:%.*]], label [[BB7_LOOPEXIT]]
+; CHECK: bb9.1:
+; CHECK-NEXT: br i1 false, label [[BB12_PREHEADER_LOOPEXIT:%.*]], label [[BB2]], !llvm.loop [[LOOP1:![0-9]+]]
+; CHECK: bb12.preheader.loopexit:
+; CHECK-NEXT: br label [[BB12_PREHEADER]]
+; CHECK: bb12.preheader:
+; CHECK-NEXT: br label [[BB12_PEEL_BEGIN:%.*]]
+; CHECK: bb12.peel.begin:
+; CHECK-NEXT: br label [[BB12_PEEL:%.*]]
+; CHECK: bb12.peel:
+; CHECK-NEXT: br i1 false, label [[BB1_LOOPEXIT]], label [[BB12_PEEL_NEXT:%.*]], !prof [[PROF3:![0-9]+]]
+; CHECK: bb12.peel.next:
+; CHECK-NEXT: br label [[BB12_PEEL_NEXT4:%.*]]
+; CHECK: bb12.peel.next4:
+; CHECK-NEXT: br label [[BB12_PREHEADER_PEEL_NEWPH:%.*]]
+; CHECK: bb12.preheader.peel.newph:
+; CHECK-NEXT: br label [[BB12:%.*]]
+; CHECK: bb12:
+; CHECK-NEXT: br i1 false, label [[BB1_LOOPEXIT_LOOPEXIT:%.*]], label [[BB12]], !prof [[PROF4:![0-9]+]], !llvm.loop [[LOOP5:![0-9]+]]
+;
+bb:
+ br label %bb1
+
+bb1: ; preds = %bb12, %bb
+ %phi = phi i32 [ 1, %bb ], [ 0, %bb12 ]
+ br label %bb2
+
+bb2: ; preds = %bb11, %bb1
+ %phi3 = phi i64 [ 0, %bb1 ], [ %sext, %bb11 ]
+ br label %bb4
+
+bb4: ; preds = %bb9, %bb2
+ %phi5 = phi i64 [ %phi3, %bb2 ], [ %sext, %bb9 ]
+ %phi6 = phi i32 [ 1, %bb2 ], [ %add10, %bb9 ]
+ %trunc = trunc i64 %phi5 to i32
+ br i1 true, label %bb9, label %bb7
+
+bb7: ; preds = %bb4
+ %phi8 = phi i32 [ %trunc, %bb4 ]
+ ret void
+
+bb9: ; preds = %bb4
+ %add = add i32 1, %phi
+ %sext = sext i32 %add to i64
+ %add10 = add i32 %phi6, 1
+ %icmp = icmp ugt i32 %add10, 2
+ br i1 %icmp, label %bb11, label %bb4
+
+bb11: ; preds = %bb9
+ br i1 false, label %bb12, label %bb2
+
+bb12: ; preds = %bb12, %bb11
+ br i1 false, label %bb1, label %bb12, !prof !1
+}
+
+!0 = !{!"function_entry_count", i64 32768}
+!1 = !{!"branch_weights", i32 11, i32 1}
+;.
+; CHECK: [[PROF0]] = !{!"function_entry_count", i64 32768}
+; CHECK: [[LOOP1]] = distinct !{[[LOOP1]], [[META2:![0-9]+]]}
+; CHECK: [[META2]] = !{!"llvm.loop.peeled.count", i32 1}
+; CHECK: [[PROF3]] = !{!"branch_weights", i32 11, i32 1}
+; CHECK: [[PROF4]] = !{!"branch_weights", i32 11, i32 11}
+; CHECK: [[LOOP5]] = distinct !{[[LOOP5]], [[META2]], [[META6:![0-9]+]]}
+; CHECK: [[META6]] = !{!"llvm.loop.unroll.disable"}
+;.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please explain in what way LCSSA form gets broken? How does the IR look like after peeling but before simplification?
Just calling formLCSSARecursively() is of course correct, but it's better to understand why LCSSA gets broken and whether we can fix it incrementally instead.
; RUN: opt -S -passes=loop-unroll < %s | FileCheck %s | ||
source_filename = "./reduced.ll" | ||
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128-ni:1-p2:32:8:8:32-ni:2" | ||
target triple = "x86_64-unknown-linux-gnu" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove source_filename, datalayout and triple.
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128-ni:1-p2:32:8:8:32-ni:2" | ||
target triple = "x86_64-unknown-linux-gnu" | ||
|
||
define void @wombat() gc "statepoint-example" !prof !0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove gc. Also can you replace the profile metadata with an -unroll-peel-count
option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With -unroll-peel-count=2
, running llvm-reduce produces this smaller test case:
define void @wombat() {
bb:
br label %bb1
bb1: ; preds = %bb11, %bb
%phi = phi i32 [ 1, %bb ], [ 0, %bb11 ]
br label %bb2
bb2: ; preds = %bb11, %bb1
%phi3 = phi i64 [ 0, %bb1 ], [ %sext, %bb11 ]
br label %bb4
bb4: ; preds = %bb9, %bb2
%phi5 = phi i64 [ %phi3, %bb2 ], [ %sext, %bb9 ]
%trunc = trunc i64 %phi5 to i32
br i1 true, label %bb9, label %bb7
bb7: ; preds = %bb4
%phi8 = phi i32 [ %trunc, %bb4 ]
ret void
bb9: ; preds = %bb4
%sext = sext i32 %phi to i64
br i1 false, label %bb11, label %bb4
bb11: ; preds = %bb9
br i1 false, label %bb1, label %bb2
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though you'll probably want to run this test case through loop-simplify,lcssa to produce the output that will actually be used after loop canonicalizations. Please also give basic blocks more meaningful names, like bb -> entry, bb1 -> loop1, bb2 -> loop2, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reduced testcase, doesn't crash the code. Since it somehow removes the bb12 loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a crash with the reduced test case: https://llvm.godbolt.org/z/84Wb9cTnf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a crash with the reduced test case: https://llvm.godbolt.org/z/84Wb9cTnf
my bad, I didn't pass -unroll-peel-count=2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nikic that testcase seems to reproduce some different error, when run through loop-simplify,lcssa and then when it is run through LCSSA the crash is something different. When I dump the IR from loop-unroll just before the crash and try to run it through LCSSA it crashes as follows: https://godbolt.org/z/o5WM4hsjo
This is the CFG and below is the diff at the end of LoopPeel before and after LCSSA form is regenerated. LCSSA verifier fails as bb7 uses %0 which is defined in bb1.peel.newph, and bb1.peel.newph doesn't dominate bb7 |
@vedantparanjape-amd , going by the definition of LCSSA, I see that %0 is not used anywhere outside of the loop headed by bb1 except by the phi node in bb7, which I believe is permitted(going by this example in llvm-docs Link). Therefore in my opinion the code at the end of LoopPeel is already in LCSSA. If that is the case, I believe that this test case hit some corner condition which was not considered by @nikic @vedantparanjape-amd Please correct me if I am wrong, I am trying to refine my understanding of LCSSA |
Yeah, this seems correct. The assumptions with which it has been implemented seem to be wrong in this case. Quoting the code in isBlockInLCSSAForm and LLVM documents.
So, the function starts with BB = bb1.peel.newph, UserBB = bb7 and UI = %phi8, now it checks if UI is a PhiNode, if it is, it assigns UserBB = IncomingBlock(UI) = bb7.loopexit, which is still fine. In the next if condition that returns false, it checks if BB != UserBB, which fails due to the wrong assumption that Phi occurs in the respective predecessor block only. Actual call chain is I think we need to relax the criteria of use being in the predecessor block, it is not necessary to be a immediate predecessor. It's strange that before doing Loop Peeling this issue is not triggered. This case is something like this CC @baziotis and @igor-laevsky (you seem to be author of the changes there) |
Actually the issue seems something else imo, the Peeled Loop seems bogus, it just adds blocks with unconditional branch instructions, the loop it seems to evaluate is
|
I think this is just because the loop bb12 contains only a branch, however the issue still persists if the loop was a meaning full one as in this example: https://godbolt.org/z/a6cYb4Ta6 |
If you add one more block to the loop, it doesn't crash: https://godbolt.org/z/h3hM7MMnq. It seems to be a issue with LoopUnrolling. Ignoring my comment above about isBlockInLCSSAForm, I think there's nothing wrong with the function because bb7.loopexit and bb7 are not part of any loops, hence adding the the lcssa Phi seems correct to me. |
Applying this patch, shows that simplifyLoopAfterUnroll destroys the LCSSA form of the outermost loop. I have attached the results of this patch below
@nikic the issue is with simplifyLoopIVs function call, it seems to move the incoming values to Phi node from successor to predecessor. Is this expected behaviour ? If yes, I think we can simply add something to reform LCSSA after this call. Here's the diff of before and after: https://godbolt.org/z/9jo115Tas (look at bb1.peel.newph and bb7) |
@nikic a gentle ping for your inputs! |
ping ! |
Quick question - I had seen a bug previously (and had a fix in phabricator, which I cannot access now). The issue is that |
So, the issue is in SimplifyAfterUnroll call, it breaks the LCSSA form of the parent loop. The order of call is as follows: LoopPeel -> LCCSA preserved To answer your question, during the 2nd call to LoopPeel, it is not in LCSSA form. |
@nikic ping! |
@vedantparanjape-amd If the LCSSA of the parent loop is already destroyed by the time we enter LoopPeel, then I don't think trying to recover it inside LoopPeel is the right thing to do. We should fix this at the point where LCSSA is broken. Do I understand correctly that the only relation LoopPeel has to anything here is that it contains this LCSSA assertion? It looks like adding
to the end of unrolling triggers the assertion even without peeling. For the record, here is the test case with cleaned up block naming: target triple = "x86_64-unknown-linux-gnu"
define void @test() !prof !0 {
entry:
br label %loop1
loop1:
%phi = phi i32 [ 1, %entry ], [ 0, %loop1.latch ]
br label %loop2
loop2:
%phi3 = phi i64 [ 0, %loop1 ], [ %sext, %loop2.latch ]
br label %loop3
loop3:
%phi5 = phi i64 [ %phi3, %loop2 ], [ %sext, %loop3.latch ]
%phi6 = phi i32 [ 1, %loop2 ], [ %add10, %loop3.latch ]
%trunc = trunc i64 %phi5 to i32
br i1 true, label %loop3.latch, label %exit
loop3.latch:
%add = add i32 1, %phi
%sext = sext i32 %add to i64
%add10 = add i32 %phi6, 1
%icmp = icmp ugt i32 %add10, 2
br i1 %icmp, label %loop2.latch, label %loop3
loop2.latch:
br i1 false, label %loop4.preheader, label %loop2
loop4.preheader:
br label %loop4
loop4:
br i1 false, label %loop1.latch, label %loop4
loop1.latch:
br label %loop1
exit:
%phi8 = phi i32 [ %trunc, %loop3 ]
ret void
}
!0 = !{!"function_entry_count", i64 32768}
!1 = !{!"branch_weights", i32 1, i32 1} |
Hey @nikic! @vedantparanjape-amd has identified the point where LCSSA is broken in this comment here: #78696 (comment) TLDR; LCSSA is broken in the function
which gets called after the first peel. In particular the LCSSA is broken at this point:
which does the transformation as shown in the last diagram of this comment: Now we are unaware if such a transformation is expected of |
simplifyLoopIV is supposed to maintain LCSSA form. |
Yeah. that's correct ! It has no relation as such with LoopPeel.
Thanks for the cleaned up testcase. |
So, should we update the simplifyLoopIV or just call formLCSSAform after simplifyLoopIV ? |
We should fix simplifyLoopIV. Note that it already contains various replacementPreservesLCSSAForm checks already. |
I think you mean to say simplifyLoopAfterUnroll has various replacementPreservesLCSSAForm checks, not the SimplifyLoopIV. |
9b1b448
to
31d9871
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
I pushed a fix, it adds a replacementPreservesLCSSAForm check to simplifyLoopIV. It passes the testcase as well. @nikic can you please check once, thanks ! |
31d9871
to
a15eebf
Compare
<< " with loop invariant: " << *S | ||
<< " as it breaks LCSSA form " << '\n'); | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SCEVExpander is also supposed to preserve LCSSA form (if PreserveLCSSA is enabled, which it is by default), so I think the actual fix needs to be in SCEVExpander. It's not possible to abort a transform after using SCEVExpander.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's call to fixupLCSSAFormFor
in SCEVExpander::expand, which in turn calls formLCSSAForInstructions. But it never reaches till that point, it simply exits before that point here as UseLoop == DefLoop and DefLoop->contains(UseLoop) is true.
if (!DefLoop || UseLoop == DefLoop || DefLoop->contains(UseLoop)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nikic I studied the code, I don't think SCEVExpander is the right place. As we don't pass the Instruction to be replaced by the expanded value to the SCEVExpander at any point. I think it the duty of the function that calls the SCEVExpander to maintain LCSSA form, because SCEVExpander doesn't break LCSSA as such, but when we replace its use by the expanded value, LCSSA form is broken !!
Either we need a expanded value which doesn't make changes to the IR so that we can safely check for the LCSSA, or we need to somehow pass Instruction to be replaced to SCEVExpander and then do the check inside ? (I don't think we should do this).
What do you think, would be the ideal way to make this change ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I get the problem now. SCEVExpander insertion point is the loop-invariant location, but the actual use will be inside an exit block.
I think the options to fix this would be:
- Make a check that LCSSA is not broken before calling Rewriter, under the assumption that the instruction will be inserted at the insertion point. (This may have some false positives if instruction reuse occurs, but I think it's fine.)
- Fix up LCSSA form in this special case after the transform. So if
!replacementPreservesLCSSAForm
insert a phi node.
a15eebf
to
88686be
Compare
@nikic I implemented a fix based on our discussion, I implemented the 2nd one, because it makes sure optimization doesn't bail out. Can you review it once, thanks ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new approach looks reasonable to me.
llvm/test/Transforms/LoopUnroll/gh-issue77118-broken-lcssa-form.ll
Outdated
Show resolved
Hide resolved
In LoopUnroll, peelLoop is called on the loop. After the loop is peeled it calls simplifyLoopAfterUnroll on the loop. This call to simplifyLoopAfterUnroll doesn't preserve the LCSSA form of the parent loop and thus during the next call to peelLoop the LCSSA form is already broken. LoopPeel util takes in the PreserveLCSSA argument and it passes on the same argument to simplifyLoop which checks if the loop is in a valid LCSSA form, when (PreserveLCSSA = true). This causes an assert in simplifyLoop when (PreserveLCSSA = true), as during the last call LCSSA for the loop wasn't preserved, and thus crashes at the following assert. assert(L->isRecursivelyLCSSAForm(*DT, *LI) && "Requested to preserve LCSSA, but it's already broken."); Upon debugging, it is evident that simplifyLoopIVs call inside simplifyLoopAfterUnroll breaks the LCSSA form. This patch fixes llvm#77118, it checks if the replacement of IV Users with Loop Invariant preserves the LCSSA form. If it does not, it emits the required LCSSA Phi instructions.
472e16d
to
2eb4999
Compare
@nikic did the requested changes, is it okay to merge now ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for tracking this down all the way!
LoopPeel util takes in the PreserveLCSSA argument, but the transformation doesn't seem to preserve the LCSSA form. But, it passes on the same argument to simplifyLoop which checks if the loop is in a valid LCSSA form, when (PreserveLCSSA = true).
This causes an assert in simplifyLoop when (PreserveLCSSA = true), as LoopPeel doesn't preserve LCSSA, and thus crashes at the following assert.
assert(L->isRecursivelyLCSSAForm(*DT, *LI) &&
"Requested to preserve LCSSA, but it's already broken.");
This patch fixes #77118, it tries to form LCSSA at the end of the optimization if (PreserveLCSSA = true) and the loop is not in LCSSA form. It also adds a test case of LoopUnrolling which crashed earlier.