-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[VPlan] Fix LastActiveLane assertion on scalar VF #167897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VPlan] Fix LastActiveLane assertion on scalar VF #167897
Conversation
For a scalar only VPlan with tail folding, if it has a phi live out then legalizeAndOptimizeInductions will scalarize the widened canonical IV feeding into the header mask:
<x1> vector loop: {
vector.body:
EMIT vp<%4> = CANONICAL-INDUCTION ir<0>, vp<%index.next>
vp<%5> = SCALAR-STEPS vp<%4>, ir<1>, vp<%0>
EMIT vp<%6> = icmp ule vp<%5>, vp<%3>
EMIT vp<%index.next> = add nuw vp<%4>, vp<%1>
EMIT branch-on-count vp<%index.next>, vp<%2>
No successors
}
Successor(s): middle.block
middle.block:
EMIT vp<%8> = last-active-lane vp<%6>
EMIT vp<%9> = extract-lane vp<%8>, vp<%5>
Successor(s): ir-bb<exit>
The verifier complains about this but this should still generate the correct last active lane, so this fixes the assert by handling this case in isHeaderMask. There is a similar pattern already there for ActiveLaneMask, which also expects a VPScalarIVSteps recipe.
Fixes llvm#167813
|
@llvm/pr-subscribers-vectorizers @llvm/pr-subscribers-llvm-transforms Author: Luke Lau (lukel97) ChangesFor a scalar only VPlan with tail folding, if it has a phi live out then legalizeAndOptimizeInductions will scalarize the widened canonical IV feeding into the header mask: The verifier complains about this but this should still generate the correct last active lane, so this fixes the assert by handling this case in isHeaderMask. There is a similar pattern already there for ActiveLaneMask, which also expects a VPScalarIVSteps recipe. Fixes #167813 Full diff: https://github.com/llvm/llvm-project/pull/167897.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp b/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
index e22c5dfdb9f38..c9de9b82bca7c 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
@@ -66,6 +66,13 @@ bool vputils::isHeaderMask(const VPValue *V, const VPlan &Plan) {
m_One(), m_Specific(&Plan.getVF()))) ||
IsWideCanonicalIV(A));
+ if (match(V,
+ m_ICmp(m_ScalarIVSteps(
+ m_Specific(Plan.getVectorLoopRegion()->getCanonicalIV()),
+ m_One(), m_Specific(&Plan.getVF())),
+ m_Specific(Plan.getBackedgeTakenCount()))))
+ return true;
+
return match(V, m_ICmp(m_VPValue(A), m_VPValue(B))) && IsWideCanonicalIV(A) &&
B == Plan.getBackedgeTakenCount();
}
diff --git a/llvm/test/Transforms/LoopVectorize/tail-folding-live-out-scalar-vf.ll b/llvm/test/Transforms/LoopVectorize/tail-folding-live-out-scalar-vf.ll
new file mode 100644
index 0000000000000..5964cf45fb6be
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/tail-folding-live-out-scalar-vf.ll
@@ -0,0 +1,60 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 6
+; RUN: opt -p loop-vectorize -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue -force-vector-width=1 -force-vector-interleave=2 -S %s | FileCheck %s
+
+define i64 @live_out_scalar_vf(i64 %n) {
+; CHECK-LABEL: define i64 @live_out_scalar_vf(
+; CHECK-SAME: i64 [[N:%.*]]) {
+; CHECK-NEXT: [[ENTRY:.*:]]
+; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[N]], 1
+; CHECK-NEXT: br label %[[VECTOR_PH:.*]]
+; CHECK: [[VECTOR_PH]]:
+; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 [[TMP0]], 1
+; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 2
+; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
+; CHECK-NEXT: [[TRIP_COUNT_MINUS_1:%.*]] = sub i64 [[TMP0]], 1
+; CHECK-NEXT: br label %[[VECTOR_BODY:.*]]
+; CHECK: [[VECTOR_BODY]]:
+; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 0
+; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 1
+; CHECK-NEXT: [[TMP3:%.*]] = icmp ugt i64 [[TMP1]], [[TRIP_COUNT_MINUS_1]]
+; CHECK-NEXT: [[TMP4:%.*]] = icmp ugt i64 [[TMP2]], [[TRIP_COUNT_MINUS_1]]
+; CHECK-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 2
+; CHECK-NEXT: [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT: br i1 [[TMP5]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK: [[MIDDLE_BLOCK]]:
+; CHECK-NEXT: [[TMP6:%.*]] = icmp eq i1 [[TMP4]], false
+; CHECK-NEXT: [[TMP7:%.*]] = zext i1 [[TMP6]] to i64
+; CHECK-NEXT: [[TMP8:%.*]] = add i64 1, [[TMP7]]
+; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i1 [[TMP3]], false
+; CHECK-NEXT: [[TMP10:%.*]] = zext i1 [[TMP9]] to i64
+; CHECK-NEXT: [[TMP11:%.*]] = add i64 0, [[TMP10]]
+; CHECK-NEXT: [[TMP12:%.*]] = icmp ne i64 [[TMP10]], 1
+; CHECK-NEXT: [[TMP13:%.*]] = select i1 [[TMP12]], i64 [[TMP11]], i64 [[TMP8]]
+; CHECK-NEXT: [[LAST_ACTIVE_LANE:%.*]] = sub i64 [[TMP13]], 1
+; CHECK-NEXT: [[TMP14:%.*]] = sub i64 [[LAST_ACTIVE_LANE]], 1
+; CHECK-NEXT: [[TMP15:%.*]] = icmp uge i64 [[LAST_ACTIVE_LANE]], 1
+; CHECK-NEXT: [[TMP16:%.*]] = select i1 [[TMP15]], i64 [[TMP2]], i64 [[TMP1]]
+; CHECK-NEXT: br label %[[EXIT:.*]]
+; CHECK: [[EXIT]]:
+; CHECK-NEXT: ret i64 [[TMP16]]
+;
+entry:
+ br label %loop
+
+loop:
+ %iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]
+ br label %latch
+
+latch:
+ ; Need to use a phi otherwise the header mask will use a
+ ; VPWidenCanonicalIVRecipe instead of a VPScalarIVStepsRecipe.
+ %exitval = phi i64 [ %iv, %loop ]
+ %iv.next = add i64 %iv, 1
+ %ec = icmp eq i64 %iv, %n
+ br i1 %ec, label %exit, label %loop
+
+exit:
+ ret i64 %exitval
+}
+
|
|
|
||
| loop: | ||
| %iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ] | ||
| br label %latch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we turn this into a non-trivial branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to remove the branch entirely in 3863112, and the crash is still reproducable
| m_One(), m_Specific(&Plan.getVF()))) || | ||
| IsWideCanonicalIV(A)); | ||
|
|
||
| if (match(V, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (match(V, | |
| // For scalar plans, the header mask uses the scalar steps. | |
| if (match(V, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 23bce03
…ze/fix-tail-folded-live-out-scalar-vf
|
I see that #149042 was reverted but I've updated this on main if you'd like to land this separately anyway. |
|
|
||
| // For scalar plans, the header mask uses the scalar steps. | ||
| if (match(V, | ||
| m_ICmp(m_ScalarIVSteps( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this pattern is now used twice in this function, i.e.
m_ScalarIVSteps(m_Specific(Plan.getVectorLoopRegion()->getCanonicalIV()),
m_One(), m_Specific(&Plan.getVF()))
Is it worth creating a specific pattern match for this? Something like
m_CanonicalScalarIVSteps(Plan)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I've added a helper in efca5e5
…ze/fix-tail-folded-live-out-scalar-vf
arcbbb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks for the fix!
| return Expanded; | ||
| } | ||
|
|
||
| static inline auto m_CanonicalScalarIVSteps(const VPlan &Plan) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it’s only used in a single function, you can re-use by assigning the pattern to a variable I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it stays a static function here, you can drop the inline I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, turned it into a variable in c0a7067
fhahn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
| ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 6 | ||
| ; RUN: opt -p loop-vectorize -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue -force-vector-width=1 -force-vector-interleave=2 -S %s | FileCheck %s | ||
|
|
||
| define i64 @live_out_scalar_vf(i64 %n) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if possible, might be good to add this to llvm/test/Transforms/LoopVectorize/tail-folding-vectorization-factor-1.ll, which already contains a set of test cases for interleaving only with tail-folding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in c29428f
Needed to add -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue flag to run lines. Looks like the existing tests weren't necessarily tail folded?
For a scalar only VPlan with tail folding, if it has a phi live out then legalizeAndOptimizeInductions will scalarize the widened canonical IV feeding into the header mask:
The verifier complains about this but this should still generate the correct last active lane, so this fixes the assert by handling this case in isHeaderMask. There is a similar pattern already there for ActiveLaneMask, which also expects a VPScalarIVSteps recipe.
Fixes #167813