Skip to content

Conversation

@lukel97
Copy link
Contributor

@lukel97 lukel97 commented Nov 13, 2025

For a scalar only VPlan with tail folding, if it has a phi live out then legalizeAndOptimizeInductions will scalarize the widened canonical IV feeding into the header mask:

<x1> vector loop: {
  vector.body:
    EMIT vp<%4> = CANONICAL-INDUCTION ir<0>, vp<%index.next>
    vp<%5> = SCALAR-STEPS vp<%4>, ir<1>, vp<%0>
    EMIT vp<%6> = icmp ule vp<%5>, vp<%3>
    EMIT vp<%index.next> = add nuw vp<%4>, vp<%1>
    EMIT branch-on-count vp<%index.next>, vp<%2>
  No successors
}
Successor(s): middle.block

middle.block:
  EMIT vp<%8> = last-active-lane vp<%6>
  EMIT vp<%9> = extract-lane vp<%8>, vp<%5>
Successor(s): ir-bb<exit>

The verifier complains about this but this should still generate the correct last active lane, so this fixes the assert by handling this case in isHeaderMask. There is a similar pattern already there for ActiveLaneMask, which also expects a VPScalarIVSteps recipe.

Fixes #167813

For a scalar only VPlan with tail folding, if it has a phi live out then legalizeAndOptimizeInductions will scalarize the widened canonical IV feeding into the header mask:

    <x1> vector loop: {
      vector.body:
        EMIT vp<%4> = CANONICAL-INDUCTION ir<0>, vp<%index.next>
        vp<%5> = SCALAR-STEPS vp<%4>, ir<1>, vp<%0>
        EMIT vp<%6> = icmp ule vp<%5>, vp<%3>
        EMIT vp<%index.next> = add nuw vp<%4>, vp<%1>
        EMIT branch-on-count vp<%index.next>, vp<%2>
      No successors
    }
    Successor(s): middle.block

    middle.block:
      EMIT vp<%8> = last-active-lane vp<%6>
      EMIT vp<%9> = extract-lane vp<%8>, vp<%5>
    Successor(s): ir-bb<exit>

The verifier complains about this but this should still generate the correct last active lane, so this fixes the assert by handling this case in isHeaderMask. There is a similar pattern already there for ActiveLaneMask, which also expects a VPScalarIVSteps recipe.

Fixes llvm#167813
@llvmbot
Copy link
Member

llvmbot commented Nov 13, 2025

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Luke Lau (lukel97)

Changes

For a scalar only VPlan with tail folding, if it has a phi live out then legalizeAndOptimizeInductions will scalarize the widened canonical IV feeding into the header mask:

&lt;x1&gt; vector loop: {
  vector.body:
    EMIT vp&lt;%4&gt; = CANONICAL-INDUCTION ir&lt;0&gt;, vp&lt;%index.next&gt;
    vp&lt;%5&gt; = SCALAR-STEPS vp&lt;%4&gt;, ir&lt;1&gt;, vp&lt;%0&gt;
    EMIT vp&lt;%6&gt; = icmp ule vp&lt;%5&gt;, vp&lt;%3&gt;
    EMIT vp&lt;%index.next&gt; = add nuw vp&lt;%4&gt;, vp&lt;%1&gt;
    EMIT branch-on-count vp&lt;%index.next&gt;, vp&lt;%2&gt;
  No successors
}
Successor(s): middle.block

middle.block:
  EMIT vp&lt;%8&gt; = last-active-lane vp&lt;%6&gt;
  EMIT vp&lt;%9&gt; = extract-lane vp&lt;%8&gt;, vp&lt;%5&gt;
Successor(s): ir-bb&lt;exit&gt;

The verifier complains about this but this should still generate the correct last active lane, so this fixes the assert by handling this case in isHeaderMask. There is a similar pattern already there for ActiveLaneMask, which also expects a VPScalarIVSteps recipe.

Fixes #167813


Full diff: https://github.com/llvm/llvm-project/pull/167897.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlanUtils.cpp (+7)
  • (added) llvm/test/Transforms/LoopVectorize/tail-folding-live-out-scalar-vf.ll (+60)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp b/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
index e22c5dfdb9f38..c9de9b82bca7c 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanUtils.cpp
@@ -66,6 +66,13 @@ bool vputils::isHeaderMask(const VPValue *V, const VPlan &Plan) {
                       m_One(), m_Specific(&Plan.getVF()))) ||
             IsWideCanonicalIV(A));
 
+  if (match(V,
+            m_ICmp(m_ScalarIVSteps(
+                       m_Specific(Plan.getVectorLoopRegion()->getCanonicalIV()),
+                       m_One(), m_Specific(&Plan.getVF())),
+                   m_Specific(Plan.getBackedgeTakenCount()))))
+    return true;
+
   return match(V, m_ICmp(m_VPValue(A), m_VPValue(B))) && IsWideCanonicalIV(A) &&
          B == Plan.getBackedgeTakenCount();
 }
diff --git a/llvm/test/Transforms/LoopVectorize/tail-folding-live-out-scalar-vf.ll b/llvm/test/Transforms/LoopVectorize/tail-folding-live-out-scalar-vf.ll
new file mode 100644
index 0000000000000..5964cf45fb6be
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/tail-folding-live-out-scalar-vf.ll
@@ -0,0 +1,60 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 6
+; RUN: opt -p loop-vectorize -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue -force-vector-width=1 -force-vector-interleave=2 -S %s | FileCheck %s
+
+define i64 @live_out_scalar_vf(i64 %n) {
+; CHECK-LABEL: define i64 @live_out_scalar_vf(
+; CHECK-SAME: i64 [[N:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[TMP0:%.*]] = add i64 [[N]], 1
+; CHECK-NEXT:    br label %[[VECTOR_PH:.*]]
+; CHECK:       [[VECTOR_PH]]:
+; CHECK-NEXT:    [[N_RND_UP:%.*]] = add i64 [[TMP0]], 1
+; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], 2
+; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
+; CHECK-NEXT:    [[TRIP_COUNT_MINUS_1:%.*]] = sub i64 [[TMP0]], 1
+; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
+; CHECK:       [[VECTOR_BODY]]:
+; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
+; CHECK-NEXT:    [[TMP1:%.*]] = add i64 [[INDEX]], 0
+; CHECK-NEXT:    [[TMP2:%.*]] = add i64 [[INDEX]], 1
+; CHECK-NEXT:    [[TMP3:%.*]] = icmp ugt i64 [[TMP1]], [[TRIP_COUNT_MINUS_1]]
+; CHECK-NEXT:    [[TMP4:%.*]] = icmp ugt i64 [[TMP2]], [[TRIP_COUNT_MINUS_1]]
+; CHECK-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], 2
+; CHECK-NEXT:    [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; CHECK-NEXT:    br i1 [[TMP5]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       [[MIDDLE_BLOCK]]:
+; CHECK-NEXT:    [[TMP6:%.*]] = icmp eq i1 [[TMP4]], false
+; CHECK-NEXT:    [[TMP7:%.*]] = zext i1 [[TMP6]] to i64
+; CHECK-NEXT:    [[TMP8:%.*]] = add i64 1, [[TMP7]]
+; CHECK-NEXT:    [[TMP9:%.*]] = icmp eq i1 [[TMP3]], false
+; CHECK-NEXT:    [[TMP10:%.*]] = zext i1 [[TMP9]] to i64
+; CHECK-NEXT:    [[TMP11:%.*]] = add i64 0, [[TMP10]]
+; CHECK-NEXT:    [[TMP12:%.*]] = icmp ne i64 [[TMP10]], 1
+; CHECK-NEXT:    [[TMP13:%.*]] = select i1 [[TMP12]], i64 [[TMP11]], i64 [[TMP8]]
+; CHECK-NEXT:    [[LAST_ACTIVE_LANE:%.*]] = sub i64 [[TMP13]], 1
+; CHECK-NEXT:    [[TMP14:%.*]] = sub i64 [[LAST_ACTIVE_LANE]], 1
+; CHECK-NEXT:    [[TMP15:%.*]] = icmp uge i64 [[LAST_ACTIVE_LANE]], 1
+; CHECK-NEXT:    [[TMP16:%.*]] = select i1 [[TMP15]], i64 [[TMP2]], i64 [[TMP1]]
+; CHECK-NEXT:    br label %[[EXIT:.*]]
+; CHECK:       [[EXIT]]:
+; CHECK-NEXT:    ret i64 [[TMP16]]
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]
+  br label %latch
+
+latch:
+  ; Need to use a phi otherwise the header mask will use a
+  ; VPWidenCanonicalIVRecipe instead of a VPScalarIVStepsRecipe.
+  %exitval = phi i64 [ %iv, %loop ]
+  %iv.next = add i64 %iv, 1
+  %ec = icmp eq i64 %iv, %n
+  br i1 %ec, label %exit, label %loop
+
+exit:
+  ret i64 %exitval
+}
+


loop:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %latch ]
br label %latch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we turn this into a non-trivial branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to remove the branch entirely in 3863112, and the crash is still reproducable

m_One(), m_Specific(&Plan.getVF()))) ||
IsWideCanonicalIV(A));

if (match(V,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (match(V,
// For scalar plans, the header mask uses the scalar steps.
if (match(V,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 23bce03

@lukel97
Copy link
Contributor Author

lukel97 commented Nov 14, 2025

I see that #149042 was reverted but I've updated this on main if you'd like to land this separately anyway.


// For scalar plans, the header mask uses the scalar steps.
if (match(V,
m_ICmp(m_ScalarIVSteps(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this pattern is now used twice in this function, i.e.

m_ScalarIVSteps(m_Specific(Plan.getVectorLoopRegion()->getCanonicalIV()),
                m_One(), m_Specific(&Plan.getVF()))

Is it worth creating a specific pattern match for this? Something like

  m_CanonicalScalarIVSteps(Plan)

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I've added a helper in efca5e5

Copy link
Contributor

@arcbbb arcbbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for the fix!

return Expanded;
}

static inline auto m_CanonicalScalarIVSteps(const VPlan &Plan) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it’s only used in a single function, you can re-use by assigning the pattern to a variable I think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it stays a static function here, you can drop the inline I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, turned it into a variable in c0a7067

Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals none --version 6
; RUN: opt -p loop-vectorize -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue -force-vector-width=1 -force-vector-interleave=2 -S %s | FileCheck %s

define i64 @live_out_scalar_vf(i64 %n) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if possible, might be good to add this to llvm/test/Transforms/LoopVectorize/tail-folding-vectorization-factor-1.ll, which already contains a set of test cases for interleaving only with tail-folding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in c29428f

Needed to add -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue flag to run lines. Looks like the existing tests weren't necessarily tail folded?
@lukel97 lukel97 enabled auto-merge (squash) November 17, 2025 10:34
@lukel97 lukel97 merged commit 4d4a60c into llvm:main Nov 17, 2025
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[LoopVectorize] Assertion `verifyVPlanIsValid(*Plan) && "VPlan is invalid"' failed.

5 participants