Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 10 additions & 4 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8232,15 +8232,18 @@ void LoopVectorizationPlanner::buildVPlansWithVPRecipes(ElementCount MinVF,
VPlanTransforms::runPass(VPlanTransforms::truncateToMinimalBitwidths,
*Plan, CM.getMinimalBitwidths());
VPlanTransforms::runPass(VPlanTransforms::optimize, *Plan);
// TODO: try to put it close to addActiveLaneMask().
if (CM.foldTailWithEVL())
VPlanTransforms::runPass(VPlanTransforms::addExplicitVectorLength,
*Plan, CM.getMaxSafeElements());
assert(verifyVPlanIsValid(*Plan) && "VPlan is invalid");
VPlans.push_back(std::move(Plan));
}
VF = SubRange.End;
}

if (CM.foldTailWithEVL()) {
for (auto &Plan : VPlans) {
VPlanTransforms::runPass(VPlanTransforms::optimizeMasksToEVL, *Plan);
assert(verifyVPlanIsValid(*Plan) && "VPlan is invalid");
}
}
}

VPlanPtr LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(
Expand Down Expand Up @@ -8499,6 +8502,9 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(
}
VPlanTransforms::optimizeInductionExitUsers(*Plan, IVEndValues, *PSE.getSE());

if (CM.foldTailWithEVL())
VPlanTransforms::addExplicitVectorLength(*Plan, CM.getMaxSafeElements());

assert(verifyVPlanIsValid(*Plan) && "VPlan is invalid");
return Plan;
}
Expand Down
91 changes: 43 additions & 48 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -605,9 +605,12 @@ createScalarIVSteps(VPlan &Plan, InductionDescriptor::InductionKind Kind,
VPBuilder &Builder) {
VPRegionBlock *LoopRegion = Plan.getVectorLoopRegion();
VPBasicBlock *HeaderVPBB = LoopRegion->getEntryBasicBlock();
VPCanonicalIVPHIRecipe *CanonicalIV = LoopRegion->getCanonicalIV();
VPSingleDefRecipe *BaseIV = Builder.createDerivedIV(
Kind, FPBinOp, StartV, CanonicalIV, Step, "offset.idx");
VPHeaderPHIRecipe *IV = LoopRegion->getCanonicalIV();
if (auto *EVLIV =
dyn_cast<VPEVLBasedIVPHIRecipe>(std::next(IV->getIterator())))
Comment on lines +608 to +610
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we can avoid this by deferring canonical IV replacement until canonicalizeEVLLoops?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think thats a good idea because it means we will have an incorrect VPlan throughout the optimisation pipeline.

Part of the motivation for this PR is to have everything use the EVL based IV as soon as it's added so we don't accidentally have recipes using the canonical IV and producing incorrect results on the penultimate iteration.

We could probably add a method to VPRegionBlock that abstracts over the EVL or canonical IV like getEffectiveIV, but that probably requires more discussion so I'd like to leave that to another PR if possible

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it be incorrect?

If the movement means every optimization must be careful whether to use the canonical IV or EVL based IV, and adding new users to the canonical IV could cause incorrect transformations, then I am not sure if that is the best direction forward?

Copy link
Contributor Author

@lukel97 lukel97 Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it be incorrect?

Once we change the header mask to an EVL based mask (a.k.a variable stepping), if a widened recipe still uses the canonical IV it will operate on the wrong lanes in the penultimate iteration.

So the conversion of the header mask to the EVL based mask needs to be done in tandem with replacing all uses of the canonical IV with the EVL based IV.

If the movement means every optimization must be careful whether to use the canonical IV or EVL based IV, and adding new users to the canonical IV could cause incorrect transformations

This is already something we need to be careful about today, even without this patch. E.g. narrowInterleaveGroups uses the canonical IV and runs after addExplicitVectorLength. It just so happens to bail when it sees any non-canonical IV phis at the moment, but in the future we presumably will need to handle EVL based IVs etc.

It crossed my mind that maybe we should just call addExplicitVectorLength as late as possible, but I can see two potential issues:

  • If we move past the point where we compute the cost then the cost would be inaccurate, because we would no longer see that the header mask is optimised away/we use VP intrinsics
  • We miss out on any simplifications that are exposed via the EVL transform

In my opinion I think it's simplest if we have the EVL based loop early on, instead of having a mix of some transforms being EVL-aware and some unaware.

We should probably also audit users of getCanonicalIV and make sure they're using some API that returns either the canonical IV or EVL based IV.

Hope that explanation makes sense, open to other thoughts and suggestions.

IV = EVLIV;
VPSingleDefRecipe *BaseIV =
Builder.createDerivedIV(Kind, FPBinOp, StartV, IV, Step, "offset.idx");

// Truncate base induction if needed.
VPTypeAnalysis TypeInfo(Plan);
Expand Down Expand Up @@ -2621,8 +2624,42 @@ static VPRecipeBase *optimizeMaskToEVL(VPValue *HeaderMask,
return nullptr;
}

/// Replace recipes with their EVL variants.
static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
void VPlanTransforms::optimizeMasksToEVL(VPlan &Plan) {
// Find the EVL-based header mask if it exists: icmp ult step-vector, EVL
VPInstruction *HeaderMask = nullptr;
for (VPRecipeBase &R : *Plan.getVectorLoopRegion()->getEntryBasicBlock()) {
if (match(&R, m_ICmp(m_VPInstruction<VPInstruction::StepVector>(),
m_EVL(m_VPValue())))) {
HeaderMask = cast<VPInstruction>(&R);
break;
}
}
if (!HeaderMask)
return;

VPValue *EVL = HeaderMask->getOperand(1);
Comment on lines +2628 to +2640
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have plan to create helper function like static VPValue *findEVLMask(VPlan &Plan)?
If not, could we do

Suggested change
// Find the EVL-based header mask if it exists: icmp ult step-vector, EVL
VPInstruction *HeaderMask = nullptr;
for (VPRecipeBase &R : *Plan.getVectorLoopRegion()->getEntryBasicBlock()) {
if (match(&R, m_ICmp(m_VPInstruction<VPInstruction::StepVector>(),
m_EVL(m_VPValue())))) {
HeaderMask = cast<VPInstruction>(&R);
break;
}
}
if (!HeaderMask)
return;
VPValue *EVL = HeaderMask->getOperand(1);
// Find the EVL-based header mask if it exists: icmp ult step-vector, EVL
VPInstruction *HeaderMask = nullptr;
VPValue *EVL;
for (VPRecipeBase &R : *Plan.getVectorLoopRegion()->getEntryBasicBlock()) {
if (match(&R, m_ICmp(m_VPInstruction<VPInstruction::StepVector>(),
m_EVL(m_VPValue(EVL))))) {
HeaderMask = cast<VPInstruction>(&R);
break;
}
}
if (!HeaderMask)
return;

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't planning on creating a helper function. I think the change you suggested matches the AVL though, not the EVL. I think we need something like bind_and_match_ty from PatternMatch.h where we can both match and capture a value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, do we need to update vputils::isHeaderMask?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, isHeaderMask is only used by findHeaderMask, which in turn is only used by transformRecipestoEVLRecipes and addActiveLaneMask


VPTypeAnalysis TypeInfo(Plan);

for (VPUser *U : collectUsersRecursively(HeaderMask)) {
VPRecipeBase *R = cast<VPRecipeBase>(U);
if (auto *NewR = optimizeMaskToEVL(HeaderMask, *R, TypeInfo, *EVL)) {
NewR->insertBefore(R);
for (auto [Old, New] :
zip_equal(R->definedValues(), NewR->definedValues()))
Old->replaceAllUsesWith(New);
// Erase dead stores, the rest will be removed by removeDeadRecipes.
if (R->getNumDefinedValues() == 0)
R->eraseFromParent();
}
}

removeDeadRecipes(Plan);
}

/// After replacing the IV with a EVL-based IV, fixup recipes that use VF to use
/// the EVL instead to avoid incorrect updates on the penultimate iteration.
static void fixupVFUsersForEVL(VPlan &Plan, VPValue &EVL) {
VPTypeAnalysis TypeInfo(Plan);
VPRegionBlock *LoopRegion = Plan.getVectorLoopRegion();
VPBasicBlock *Header = LoopRegion->getEntryBasicBlock();
Expand Down Expand Up @@ -2650,10 +2687,6 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
return isa<VPWidenPointerInductionRecipe>(U);
});

// Defer erasing recipes till the end so that we don't invalidate the
// VPTypeAnalysis cache.
SmallVector<VPRecipeBase *> ToErase;

// Create a scalar phi to track the previous EVL if fixed-order recurrence is
// contained.
bool ContainsFORs =
Expand Down Expand Up @@ -2687,7 +2720,6 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
TypeInfo.inferScalarType(R.getVPSingleValue()), R.getDebugLoc());
VPSplice->insertBefore(&R);
R.getVPSingleValue()->replaceAllUsesWith(VPSplice);
ToErase.push_back(&R);
}
}
}
Expand All @@ -2708,43 +2740,6 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
CmpInst::ICMP_ULT,
Builder.createNaryOp(VPInstruction::StepVector, {}, EVLType), &EVL);
HeaderMask->replaceAllUsesWith(EVLMask);
ToErase.push_back(HeaderMask->getDefiningRecipe());

// Try to optimize header mask recipes away to their EVL variants.
// TODO: Split optimizeMaskToEVL out and move into
// VPlanTransforms::optimize. transformRecipestoEVLRecipes should be run in
// tryToBuildVPlanWithVPRecipes beforehand.
for (VPUser *U : collectUsersRecursively(EVLMask)) {
auto *CurRecipe = cast<VPRecipeBase>(U);
VPRecipeBase *EVLRecipe =
optimizeMaskToEVL(EVLMask, *CurRecipe, TypeInfo, EVL);
if (!EVLRecipe)
continue;

unsigned NumDefVal = EVLRecipe->getNumDefinedValues();
assert(NumDefVal == CurRecipe->getNumDefinedValues() &&
"New recipe must define the same number of values as the "
"original.");
EVLRecipe->insertBefore(CurRecipe);
if (isa<VPSingleDefRecipe, VPWidenLoadEVLRecipe, VPInterleaveEVLRecipe>(
EVLRecipe)) {
for (unsigned I = 0; I < NumDefVal; ++I) {
VPValue *CurVPV = CurRecipe->getVPValue(I);
CurVPV->replaceAllUsesWith(EVLRecipe->getVPValue(I));
}
}
ToErase.push_back(CurRecipe);
}
// Remove dead EVL mask.
if (EVLMask->getNumUsers() == 0)
ToErase.push_back(EVLMask->getDefiningRecipe());

for (VPRecipeBase *R : reverse(ToErase)) {
SmallVector<VPValue *> PossiblyDead(R->operands());
R->eraseFromParent();
for (VPValue *Op : PossiblyDead)
recursivelyDeleteDeadRecipes(Op);
}
}

/// Add a VPEVLBasedIVPHIRecipe and related recipes to \p Plan and
Expand Down Expand Up @@ -2842,7 +2837,7 @@ void VPlanTransforms::addExplicitVectorLength(
DebugLoc::getCompilerGenerated(), "avl.next");
AVLPhi->addOperand(NextAVL);

transformRecipestoEVLRecipes(Plan, *VPEVL);
fixupVFUsersForEVL(Plan, *VPEVL);

// Replace all uses of VPCanonicalIVPHIRecipe by
// VPEVLBasedIVPHIRecipe except for the canonical IV increment.
Expand Down
11 changes: 11 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlanTransforms.h
Original file line number Diff line number Diff line change
Expand Up @@ -377,6 +377,17 @@ struct VPlanTransforms {
/// users in the original exit block using the VPIRInstruction wrapping to the
/// LCSSA phi.
static void addExitUsersForFirstOrderRecurrences(VPlan &Plan, VFRange &Range);

/// If the loop is EVL tail folded, try and optimize any recipes that use a
/// EVL based header mask to a VP intrinsic, e.g:
///
/// %mask = icmp step-vector, EVL
/// %load = load %ptr, %mask
///
/// ->
///
/// %load = vp.load %ptr, EVL
static void optimizeMasksToEVL(VPlan &Plan);
};

} // namespace llvm
Expand Down
6 changes: 6 additions & 0 deletions llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -317,6 +317,12 @@ bool VPlanVerifier::verifyVPBasicBlock(const VPBasicBlock *VPBB) {
break;
}
}
if (const auto *EVLPhi = dyn_cast<VPEVLBasedIVPHIRecipe>(&R)) {
if (!isa<VPCanonicalIVPHIRecipe>(std::prev(EVLPhi->getIterator()))) {
errs() << "EVL-based IV is not immediately after canonical IV\n";
return false;
}
}
}

auto *IRBB = dyn_cast<VPIRBasicBlock>(VPBB);
Expand Down
2 changes: 1 addition & 1 deletion llvm/test/Transforms/LoopVectorize/RISCV/dead-ops-cost.ll
Original file line number Diff line number Diff line change
Expand Up @@ -361,12 +361,12 @@ define void @gather_interleave_group_with_dead_insert_pos(i64 %N, ptr noalias %s
; CHECK-NEXT: [[EVL_BASED_IV:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_EVL_NEXT:%.*]], %[[VECTOR_BODY]] ]
; CHECK-NEXT: [[VEC_IND:%.*]] = phi <vscale x 4 x i64> [ [[INDUCTION]], %[[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.*]], %[[VECTOR_BODY]] ]
; CHECK-NEXT: [[AVL:%.*]] = phi i64 [ [[TMP2]], %[[VECTOR_PH]] ], [ [[AVL_NEXT:%.*]], %[[VECTOR_BODY]] ]
; CHECK-NEXT: [[OFFSET_IDX:%.*]] = mul i64 [[EVL_BASED_IV]], 2
; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.experimental.get.vector.length.i64(i64 [[AVL]], i32 4, i1 true)
; CHECK-NEXT: [[TMP16:%.*]] = zext i32 [[TMP10]] to i64
; CHECK-NEXT: [[TMP12:%.*]] = mul i64 2, [[TMP16]]
; CHECK-NEXT: [[DOTSPLATINSERT:%.*]] = insertelement <vscale x 4 x i64> poison, i64 [[TMP12]], i64 0
; CHECK-NEXT: [[DOTSPLAT:%.*]] = shufflevector <vscale x 4 x i64> [[DOTSPLATINSERT]], <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer
; CHECK-NEXT: [[OFFSET_IDX:%.*]] = mul i64 [[EVL_BASED_IV]], 2
; CHECK-NEXT: [[TMP22:%.*]] = getelementptr i8, ptr [[SRC]], i64 [[OFFSET_IDX]]
; CHECK-NEXT: [[INTERLEAVE_EVL:%.*]] = mul nuw nsw i32 [[TMP10]], 2
; CHECK-NEXT: [[WIDE_MASKED_VEC:%.*]] = call <vscale x 8 x i8> @llvm.vp.load.nxv8i8.p0(ptr align 1 [[TMP22]], <vscale x 8 x i1> splat (i1 true), i32 [[INTERLEAVE_EVL]])
Expand Down
6 changes: 3 additions & 3 deletions llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll
Original file line number Diff line number Diff line change
Expand Up @@ -270,6 +270,7 @@ define void @predicated_udiv(ptr noalias nocapture %a, i64 %v, i64 %n) {
; CHECK: vector.ph:
; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[V:%.*]], i64 0
; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 2 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
; CHECK-NEXT: [[TMP7:%.*]] = call <vscale x 2 x i32> @llvm.stepvector.nxv2i32()
; CHECK-NEXT: [[TMP6:%.*]] = icmp ne <vscale x 2 x i64> [[BROADCAST_SPLAT]], zeroinitializer
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:
Expand All @@ -278,7 +279,6 @@ define void @predicated_udiv(ptr noalias nocapture %a, i64 %v, i64 %n) {
; CHECK-NEXT: [[TMP12:%.*]] = call i32 @llvm.experimental.get.vector.length.i64(i64 [[AVL]], i32 2, i1 true)
; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <vscale x 2 x i32> poison, i32 [[TMP12]], i64 0
; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <vscale x 2 x i32> [[BROADCAST_SPLATINSERT1]], <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer
; CHECK-NEXT: [[TMP7:%.*]] = call <vscale x 2 x i32> @llvm.stepvector.nxv2i32()
; CHECK-NEXT: [[TMP15:%.*]] = icmp ult <vscale x 2 x i32> [[TMP7]], [[BROADCAST_SPLAT2]]
; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[A:%.*]], i64 [[INDEX]]
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = call <vscale x 2 x i64> @llvm.vp.load.nxv2i64.p0(ptr align 8 [[TMP8]], <vscale x 2 x i1> splat (i1 true), i32 [[TMP12]])
Expand Down Expand Up @@ -354,6 +354,7 @@ define void @predicated_sdiv(ptr noalias nocapture %a, i64 %v, i64 %n) {
; CHECK: vector.ph:
; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 2 x i64> poison, i64 [[V:%.*]], i64 0
; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 2 x i64> [[BROADCAST_SPLATINSERT]], <vscale x 2 x i64> poison, <vscale x 2 x i32> zeroinitializer
; CHECK-NEXT: [[TMP7:%.*]] = call <vscale x 2 x i32> @llvm.stepvector.nxv2i32()
; CHECK-NEXT: [[TMP6:%.*]] = icmp ne <vscale x 2 x i64> [[BROADCAST_SPLAT]], zeroinitializer
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:
Expand All @@ -362,7 +363,6 @@ define void @predicated_sdiv(ptr noalias nocapture %a, i64 %v, i64 %n) {
; CHECK-NEXT: [[TMP12:%.*]] = call i32 @llvm.experimental.get.vector.length.i64(i64 [[AVL]], i32 2, i1 true)
; CHECK-NEXT: [[BROADCAST_SPLATINSERT1:%.*]] = insertelement <vscale x 2 x i32> poison, i32 [[TMP12]], i64 0
; CHECK-NEXT: [[BROADCAST_SPLAT2:%.*]] = shufflevector <vscale x 2 x i32> [[BROADCAST_SPLATINSERT1]], <vscale x 2 x i32> poison, <vscale x 2 x i32> zeroinitializer
; CHECK-NEXT: [[TMP7:%.*]] = call <vscale x 2 x i32> @llvm.stepvector.nxv2i32()
; CHECK-NEXT: [[TMP15:%.*]] = icmp ult <vscale x 2 x i32> [[TMP7]], [[BROADCAST_SPLAT2]]
; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[A:%.*]], i64 [[INDEX]]
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = call <vscale x 2 x i64> @llvm.vp.load.nxv2i64.p0(ptr align 8 [[TMP8]], <vscale x 2 x i1> splat (i1 true), i32 [[TMP12]])
Expand Down Expand Up @@ -576,14 +576,14 @@ define void @predicated_sdiv_by_minus_one(ptr noalias nocapture %a, i64 %n) {
; CHECK-NEXT: entry:
; CHECK-NEXT: br label [[VECTOR_PH:%.*]]
; CHECK: vector.ph:
; CHECK-NEXT: [[TMP6:%.*]] = call <vscale x 16 x i32> @llvm.stepvector.nxv16i32()
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[AVL:%.*]] = phi i64 [ 1024, [[VECTOR_PH]] ], [ [[AVL_NEXT:%.*]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP12:%.*]] = call i32 @llvm.experimental.get.vector.length.i64(i64 [[AVL]], i32 16, i1 true)
; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <vscale x 16 x i32> poison, i32 [[TMP12]], i64 0
; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 16 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 16 x i32> poison, <vscale x 16 x i32> zeroinitializer
; CHECK-NEXT: [[TMP6:%.*]] = call <vscale x 16 x i32> @llvm.stepvector.nxv16i32()
; CHECK-NEXT: [[TMP15:%.*]] = icmp ult <vscale x 16 x i32> [[TMP6]], [[BROADCAST_SPLAT]]
; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i8, ptr [[A:%.*]], i64 [[INDEX]]
; CHECK-NEXT: [[WIDE_LOAD:%.*]] = call <vscale x 16 x i8> @llvm.vp.load.nxv16i8.p0(ptr align 1 [[TMP7]], <vscale x 16 x i1> splat (i1 true), i32 [[TMP12]])
Expand Down
2 changes: 1 addition & 1 deletion llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-bf16.ll
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ define void @add(ptr noalias nocapture readonly %src1, ptr noalias nocapture rea
; CHECK-LABEL: add
; CHECK: LV(REG): VF = vscale x 4
; CHECK-NEXT: LV(REG): Found max usage: 2 item
; CHECK-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 6 registers
; CHECK-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 5 registers
; CHECK-NEXT: LV(REG): RegisterClass: RISCV::VRRC, 4 registers
; CHECK-NEXT: LV(REG): Found invariant usage: 1 item
; CHECK-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 1 registers
Expand Down
4 changes: 2 additions & 2 deletions llvm/test/Transforms/LoopVectorize/RISCV/reg-usage-f16.ll
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,14 @@ define void @add(ptr noalias nocapture readonly %src1, ptr noalias nocapture rea
; ZVFH-LABEL: add
; ZVFH: LV(REG): VF = vscale x 4
; ZVFH-NEXT: LV(REG): Found max usage: 2 item
; ZVFH-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 6 registers
; ZVFH-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 5 registers
; ZVFH-NEXT: LV(REG): RegisterClass: RISCV::VRRC, 2 registers
; ZVFH-NEXT: LV(REG): Found invariant usage: 1 item
; ZVFH-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 1 registers
; ZVFHMIN-LABEL: add
; ZVFHMIN: LV(REG): VF = vscale x 4
; ZVFHMIN-NEXT: LV(REG): Found max usage: 2 item
; ZVFHMIN-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 6 registers
; ZVFHMIN-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 5 registers
; ZVFHMIN-NEXT: LV(REG): RegisterClass: RISCV::VRRC, 4 registers
; ZVFHMIN-NEXT: LV(REG): Found invariant usage: 1 item
; ZVFHMIN-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 1 registers
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
define i32 @dotp(ptr %a, ptr %b) {
; CHECK-REGS-VP: LV(REG): VF = vscale x 16
; CHECK-REGS-VP-NEXT: LV(REG): Found max usage: 2 item
; CHECK-REGS-VP-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 6 registers
; CHECK-REGS-VP-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 5 registers
; CHECK-REGS-VP-NEXT: LV(REG): RegisterClass: RISCV::VRRC, 24 registers
; CHECK-REGS-VP-NEXT: LV(REG): Found invariant usage: 1 item
; CHECK-REGS-VP-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 1 registers
Expand Down
16 changes: 8 additions & 8 deletions llvm/test/Transforms/LoopVectorize/RISCV/reg-usage.ll
Original file line number Diff line number Diff line change
Expand Up @@ -31,28 +31,28 @@ define void @add(ptr noalias nocapture readonly %src1, ptr noalias nocapture rea
; CHECK-LMUL1-LABEL: add
; CHECK-LMUL1: LV(REG): VF = vscale x 2
; CHECK-LMUL1-NEXT: LV(REG): Found max usage: 2 item
; CHECK-LMUL1-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 6 registers
; CHECK-LMUL1-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 5 registers
; CHECK-LMUL1-NEXT: LV(REG): RegisterClass: RISCV::VRRC, 2 registers
; CHECK-LMUL1-NEXT: LV(REG): Found invariant usage: 1 item
; CHECK-LMUL1-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 1 registers
; CHECK-LMUL2-LABEL: add
; CHECK-LMUL2: LV(REG): VF = vscale x 4
; CHECK-LMUL2-NEXT: LV(REG): Found max usage: 2 item
; CHECK-LMUL2-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 6 registers
; CHECK-LMUL2-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 5 registers
; CHECK-LMUL2-NEXT: LV(REG): RegisterClass: RISCV::VRRC, 4 registers
; CHECK-LMUL2-NEXT: LV(REG): Found invariant usage: 1 item
; CHECK-LMUL2-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 1 registers
; CHECK-LMUL4-LABEL: add
; CHECK-LMUL4: LV(REG): VF = vscale x 8
; CHECK-LMUL4-NEXT: LV(REG): Found max usage: 2 item
; CHECK-LMUL4-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 6 registers
; CHECK-LMUL4-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 5 registers
; CHECK-LMUL4-NEXT: LV(REG): RegisterClass: RISCV::VRRC, 8 registers
; CHECK-LMUL4-NEXT: LV(REG): Found invariant usage: 1 item
; CHECK-LMUL4-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 1 registers
; CHECK-LMUL8-LABEL: add
; CHECK-LMUL8: LV(REG): VF = vscale x 16
; CHECK-LMUL8-NEXT: LV(REG): Found max usage: 2 item
; CHECK-LMUL8-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 6 registers
; CHECK-LMUL8-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 5 registers
; CHECK-LMUL8-NEXT: LV(REG): RegisterClass: RISCV::VRRC, 16 registers
; CHECK-LMUL8-NEXT: LV(REG): Found invariant usage: 1 item
; CHECK-LMUL8-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 1 registers
Expand Down Expand Up @@ -86,19 +86,19 @@ define void @goo(ptr nocapture noundef %a, i32 noundef signext %n) {
; CHECK-SCALAR-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 3 registers
; CHECK-LMUL1: LV(REG): VF = vscale x 2
; CHECK-LMUL1-NEXT: LV(REG): Found max usage: 2 item
; CHECK-LMUL1-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 6 registers
; CHECK-LMUL1-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 5 registers
; CHECK-LMUL1-NEXT: LV(REG): RegisterClass: RISCV::VRRC, 2 registers
; CHECK-LMUL2: LV(REG): VF = vscale x 4
; CHECK-LMUL2-NEXT: LV(REG): Found max usage: 2 item
; CHECK-LMUL2-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 6 registers
; CHECK-LMUL2-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 5 registers
; CHECK-LMUL2-NEXT: LV(REG): RegisterClass: RISCV::VRRC, 4 registers
; CHECK-LMUL4: LV(REG): VF = vscale x 8
; CHECK-LMUL4-NEXT: LV(REG): Found max usage: 2 item
; CHECK-LMUL4-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 6 registers
; CHECK-LMUL4-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 5 registers
; CHECK-LMUL4-NEXT: LV(REG): RegisterClass: RISCV::VRRC, 8 registers
; CHECK-LMUL8: LV(REG): VF = vscale x 16
; CHECK-LMUL8-NEXT: LV(REG): Found max usage: 2 item
; CHECK-LMUL8-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 6 registers
; CHECK-LMUL8-NEXT: LV(REG): RegisterClass: RISCV::GPRRC, 5 registers
; CHECK-LMUL8-NEXT: LV(REG): RegisterClass: RISCV::VRRC, 16 registers
entry:
%cmp3 = icmp sgt i32 %n, 0
Expand Down
Loading
Loading