-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[VPlan] Create epilogue minimum iteration check in VPlan. #157545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Move creation of the minimum iteration check for the epilogue vector loop to VPlan. This is a first step towards breaking up and moving skeleton creation for epilogue vectorization to VPlan. It moves most logic out of EpilogueVectorizerEpilogueLoop: the minimum iteration check is created directly in VPlan, connecting the check blocks from the main vector loop is done as post-processing. Next steps are to move connecting and updating the branches from the check blocks to VPlan, as well as updating the incoming values for phis. Test changes are improvements due to folding of live-ins.
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-backend-powerpc Author: Florian Hahn (fhahn) ChangesMove creation of the minimum iteration check for the epilogue vector loop to VPlan. This is a first step towards breaking up and moving skeleton creation for epilogue vectorization to VPlan. It moves most logic out of EpilogueVectorizerEpilogueLoop: the minimum iteration check is created directly in VPlan, connecting the check blocks from the main vector loop is done as post-processing. Next steps are to move connecting and updating the branches from the check blocks to VPlan, as well as updating the incoming values for phis. Test changes are improvements due to folding of live-ins. Patch is 98.41 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/157545.diff 41 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index d78e190e8bf7b..79613a78f12f8 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -688,11 +688,6 @@ class EpilogueVectorizerMainLoop : public InnerLoopAndEpilogueVectorizer {
// vectorization of *epilogue* loops in the process of vectorizing loops and
// their epilogues.
class EpilogueVectorizerEpilogueLoop : public InnerLoopAndEpilogueVectorizer {
- /// The additional bypass block which conditionally skips over the epilogue
- /// loop after executing the main loop. Needed to resume inductions and
- /// reductions during epilogue vectorization.
- BasicBlock *AdditionalBypassBlock = nullptr;
-
public:
EpilogueVectorizerEpilogueLoop(
Loop *OrigLoop, PredicatedScalarEvolution &PSE, LoopInfo *LI,
@@ -703,20 +698,12 @@ class EpilogueVectorizerEpilogueLoop : public InnerLoopAndEpilogueVectorizer {
: InnerLoopAndEpilogueVectorizer(OrigLoop, PSE, LI, DT, TTI, AC, EPI, CM,
BFI, PSI, Checks, Plan, EPI.EpilogueVF,
EPI.EpilogueVF, EPI.EpilogueUF) {
- TripCount = EPI.TripCount;
+ TripCount = nullptr;
}
/// Implements the interface for creating a vectorized skeleton using the
/// *epilogue loop* strategy (i.e., the second pass of VPlan execution).
BasicBlock *createVectorizedLoopSkeleton() final;
- /// Return the additional bypass block which targets the scalar loop by
- /// skipping the epilogue loop after completing the main loop.
- BasicBlock *getAdditionalBypassBlock() const {
- assert(AdditionalBypassBlock &&
- "Trying to access AdditionalBypassBlock but it has not been set");
- return AdditionalBypassBlock;
- }
-
protected:
/// Emits an iteration count bypass check after the main vector loop has
/// finished to see if there are any iterations left to execute by either
@@ -7361,121 +7348,22 @@ BasicBlock *EpilogueVectorizerMainLoop::emitIterationCountCheck(
/// This function is partially responsible for generating the control flow
/// depicted in https://llvm.org/docs/Vectorizers.html#epilogue-vectorization.
BasicBlock *EpilogueVectorizerEpilogueLoop::createVectorizedLoopSkeleton() {
+ BasicBlock *Entry = OrigLoop->getLoopPreheader();
BasicBlock *ScalarPH = createScalarPreheader("vec.epilog.");
- BasicBlock *VectorPH = ScalarPH->getSinglePredecessor();
- // Now, compare the remaining count and if there aren't enough iterations to
- // execute the vectorized epilogue skip to the scalar part.
- VectorPH->setName("vec.epilog.ph");
- BasicBlock *VecEpilogueIterationCountCheck =
- SplitBlock(VectorPH, VectorPH->begin(), DT, LI, nullptr,
- "vec.epilog.iter.check", true);
- VectorPHVPBB = replaceVPBBWithIRVPBB(VectorPHVPBB, VectorPH);
-
- emitMinimumVectorEpilogueIterCountCheck(VectorPH, ScalarPH,
- VecEpilogueIterationCountCheck);
- AdditionalBypassBlock = VecEpilogueIterationCountCheck;
-
- // Adjust the control flow taking the state info from the main loop
- // vectorization into account.
- assert(EPI.MainLoopIterationCountCheck && EPI.EpilogueIterationCountCheck &&
- "expected this to be saved from the previous pass.");
- EPI.MainLoopIterationCountCheck->getTerminator()->replaceUsesOfWith(
- VecEpilogueIterationCountCheck, VectorPH);
-
- EPI.EpilogueIterationCountCheck->getTerminator()->replaceUsesOfWith(
- VecEpilogueIterationCountCheck, ScalarPH);
-
- // Adjust the terminators of runtime check blocks and phis using them.
- BasicBlock *SCEVCheckBlock = RTChecks.getSCEVChecks().second;
- BasicBlock *MemCheckBlock = RTChecks.getMemRuntimeChecks().second;
- if (SCEVCheckBlock)
- SCEVCheckBlock->getTerminator()->replaceUsesOfWith(
- VecEpilogueIterationCountCheck, ScalarPH);
- if (MemCheckBlock)
- MemCheckBlock->getTerminator()->replaceUsesOfWith(
- VecEpilogueIterationCountCheck, ScalarPH);
-
- DT->changeImmediateDominator(ScalarPH, EPI.EpilogueIterationCountCheck);
-
- // The vec.epilog.iter.check block may contain Phi nodes from inductions or
- // reductions which merge control-flow from the latch block and the middle
- // block. Update the incoming values here and move the Phi into the preheader.
- SmallVector<PHINode *, 4> PhisInBlock(
- llvm::make_pointer_range(VecEpilogueIterationCountCheck->phis()));
-
- for (PHINode *Phi : PhisInBlock) {
- Phi->moveBefore(VectorPH->getFirstNonPHIIt());
- Phi->replaceIncomingBlockWith(
- VecEpilogueIterationCountCheck->getSinglePredecessor(),
- VecEpilogueIterationCountCheck);
-
- // If the phi doesn't have an incoming value from the
- // EpilogueIterationCountCheck, we are done. Otherwise remove the incoming
- // value and also those from other check blocks. This is needed for
- // reduction phis only.
- if (none_of(Phi->blocks(), [&](BasicBlock *IncB) {
- return EPI.EpilogueIterationCountCheck == IncB;
- }))
+ ScalarPH->getSinglePredecessor()->setName("vec.epilog.iter.check");
+ VPIRBasicBlock *NewEntry = Plan.createVPIRBasicBlock(Entry);
+ VPBasicBlock *OldEntry = Plan.getEntry();
+ for (auto &R : make_early_inc_range(
+ make_range(OldEntry->getFirstNonPhi(), OldEntry->end()))) {
+ if (isa<VPIRInstruction>(&R))
continue;
- Phi->removeIncomingValue(EPI.EpilogueIterationCountCheck);
- if (SCEVCheckBlock)
- Phi->removeIncomingValue(SCEVCheckBlock);
- if (MemCheckBlock)
- Phi->removeIncomingValue(MemCheckBlock);
+ R.moveBefore(*NewEntry, NewEntry->end());
}
- return VectorPH;
-}
-
-BasicBlock *
-EpilogueVectorizerEpilogueLoop::emitMinimumVectorEpilogueIterCountCheck(
- BasicBlock *VectorPH, BasicBlock *Bypass, BasicBlock *Insert) {
-
- assert(EPI.TripCount &&
- "Expected trip count to have been saved in the first pass.");
- Value *TC = EPI.TripCount;
- IRBuilder<> Builder(Insert->getTerminator());
- Value *Count = Builder.CreateSub(TC, EPI.VectorTripCount, "n.vec.remaining");
-
- // Generate code to check if the loop's trip count is less than VF * UF of the
- // vector epilogue loop.
- auto P = Cost->requiresScalarEpilogue(EPI.EpilogueVF.isVector())
- ? ICmpInst::ICMP_ULE
- : ICmpInst::ICMP_ULT;
-
- Value *CheckMinIters =
- Builder.CreateICmp(P, Count,
- createStepForVF(Builder, Count->getType(),
- EPI.EpilogueVF, EPI.EpilogueUF),
- "min.epilog.iters.check");
-
- BranchInst &BI = *BranchInst::Create(Bypass, VectorPH, CheckMinIters);
- auto VScale = Cost->getVScaleForTuning();
- unsigned MainLoopStep =
- estimateElementCount(EPI.MainLoopVF * EPI.MainLoopUF, VScale);
- unsigned EpilogueLoopStep =
- estimateElementCount(EPI.EpilogueVF * EPI.EpilogueUF, VScale);
- // We assume the remaining `Count` is equally distributed in
- // [0, MainLoopStep)
- // So the probability for `Count < EpilogueLoopStep` should be
- // min(MainLoopStep, EpilogueLoopStep) / MainLoopStep
- // TODO: Improve the estimate by taking the estimated trip count into
- // consideration.
- unsigned EstimatedSkipCount = std::min(MainLoopStep, EpilogueLoopStep);
- const uint32_t Weights[] = {EstimatedSkipCount,
- MainLoopStep - EstimatedSkipCount};
- setBranchWeights(BI, Weights, /*IsExpected=*/false);
- ReplaceInstWithInst(Insert->getTerminator(), &BI);
-
- // A new entry block has been created for the epilogue VPlan. Hook it in, as
- // otherwise we would try to modify the entry to the main vector loop.
- VPIRBasicBlock *NewEntry = Plan.createVPIRBasicBlock(Insert);
- VPBasicBlock *OldEntry = Plan.getEntry();
VPBlockUtils::reassociateBlocks(OldEntry, NewEntry);
Plan.setEntry(NewEntry);
- // OldEntry is now dead and will be cleaned up when the plan gets destroyed.
- return Insert;
+ return ScalarPH->getSinglePredecessor();
}
void EpilogueVectorizerEpilogueLoop::printDebugTracesAtStart() {
@@ -9555,10 +9443,11 @@ static void preparePlanForMainVectorLoop(VPlan &MainPlan, VPlan &EpiPlan) {
/// Prepare \p Plan for vectorizing the epilogue loop. That is, re-use expanded
/// SCEVs from \p ExpandedSCEVs and set resume values for header recipes.
-static void
-preparePlanForEpilogueVectorLoop(VPlan &Plan, Loop *L,
- const SCEV2ValueTy &ExpandedSCEVs,
- EpilogueLoopVectorizationInfo &EPI) {
+static void preparePlanForEpilogueVectorLoop(
+ VPlan &Plan, Loop *L, const SCEV2ValueTy &ExpandedSCEVs,
+ EpilogueLoopVectorizationInfo &EPI,
+ SmallVectorImpl<Instruction *> &InstsToMove, LoopVectorizationCostModel &CM,
+ ScalarEvolution &SE) {
VPRegionBlock *VectorLoop = Plan.getVectorLoopRegion();
VPBasicBlock *Header = VectorLoop->getEntryBasicBlock();
Header->setName("vec.epilog.vector.body");
@@ -9633,6 +9522,8 @@ preparePlanForEpilogueVectorLoop(VPlan &Plan, Loop *L,
BasicBlock *PBB = cast<Instruction>(ResumeV)->getParent();
IRBuilder<> Builder(PBB, PBB->getFirstNonPHIIt());
ResumeV = Builder.CreateICmpNE(ResumeV, StartV);
+ if (auto *I = dyn_cast<Instruction>(ResumeV))
+ InstsToMove.push_back(I);
} else if (RecurrenceDescriptor::isFindIVRecurrenceKind(RK)) {
Value *StartV = getStartValueFromReductionResult(RdxResult);
ToFrozen[StartV] = cast<PHINode>(ResumeV)->getIncomingValueForBlock(
@@ -9647,8 +9538,12 @@ preparePlanForEpilogueVectorLoop(VPlan &Plan, Loop *L,
BasicBlock *ResumeBB = cast<Instruction>(ResumeV)->getParent();
IRBuilder<> Builder(ResumeBB, ResumeBB->getFirstNonPHIIt());
Value *Cmp = Builder.CreateICmpEQ(ResumeV, ToFrozen[StartV]);
+ if (auto *I = dyn_cast<Instruction>(Cmp))
+ InstsToMove.push_back(I);
Value *Sentinel = RdxResult->getOperand(2)->getLiveInIRValue();
ResumeV = Builder.CreateSelect(Cmp, Sentinel, ResumeV);
+ if (auto *I = dyn_cast<Instruction>(ResumeV))
+ InstsToMove.push_back(I);
} else {
VPValue *StartVal = Plan.getOrAddLiveIn(ResumeV);
auto *PhiR = dyn_cast<VPReductionPHIRecipe>(&R);
@@ -9700,6 +9595,46 @@ preparePlanForEpilogueVectorLoop(VPlan &Plan, Loop *L,
Plan.resetTripCount(ExpandedVal);
ExpandR->eraseFromParent();
}
+
+ // Add the minimum iteration check for the epilogue vector loop.
+ VPValue *TC = Plan.getOrAddLiveIn(EPI.TripCount);
+ VPBuilder Builder(cast<VPBasicBlock>(Plan.getEntry()));
+ VPValue *Count = Builder.createNaryOp(
+ Instruction::Sub, {TC, Plan.getOrAddLiveIn(EPI.VectorTripCount)},
+ DebugLoc::getUnknown(), "n.vec.remaining");
+
+ // Generate code to check if the loop's trip count is less than VF * UF of
+ // the vector epilogue loop.
+ auto P = CM.requiresScalarEpilogue(EPI.EpilogueVF.isVector())
+ ? ICmpInst::ICMP_ULE
+ : ICmpInst::ICMP_ULT;
+ VPValue *VFxUF = Builder.createExpandSCEV(
+ SE.getElementCount(EPI.TripCount->getType(),
+ (EPI.EpilogueVF * EPI.EpilogueUF), SCEV::FlagNUW));
+
+ auto *CheckMinIters = Builder.createICmp(
+ P, Count, VFxUF, DebugLoc::getUnknown(), "min.epilog.iters.check");
+ VPInstruction *Branch =
+ Builder.createNaryOp(VPInstruction::BranchOnCond, CheckMinIters);
+
+ auto VScale = CM.getVScaleForTuning();
+ unsigned MainLoopStep =
+ estimateElementCount(EPI.MainLoopVF * EPI.MainLoopUF, VScale);
+ unsigned EpilogueLoopStep =
+ estimateElementCount(EPI.EpilogueVF * EPI.EpilogueUF, VScale);
+ // We assume the remaining `Count` is equally distributed in
+ // [0, MainLoopStep)
+ // So the probability for `Count < EpilogueLoopStep` should be
+ // min(MainLoopStep, EpilogueLoopStep) / MainLoopStep
+ // TODO: Improve the estimate by taking the estimated trip count into
+ // consideration.
+ unsigned EstimatedSkipCount = std::min(MainLoopStep, EpilogueLoopStep);
+ const uint32_t Weights[] = {EstimatedSkipCount,
+ MainLoopStep - EstimatedSkipCount};
+ MDBuilder MDB(Plan.getContext());
+ MDNode *BranchWeights =
+ MDB.createBranchWeights(Weights, /*IsExpected=*/false);
+ Branch->addMetadata(LLVMContext::MD_prof, BranchWeights);
}
// Generate bypass values from the additional bypass block. Note that when the
@@ -9766,6 +9701,98 @@ static void fixScalarResumeValuesFromBypass(BasicBlock *BypassBlock, Loop *L,
}
}
+/// Connect the epilogue vector loop generated for \p Plan to the main vector
+/// loop, updating branches from the iteration and runtime checks, as well as
+/// updating various phis.
+static void connectEpilogueVectorLoop(
+ VPlan &Plan, Loop *L, EpilogueLoopVectorizationInfo &EPI, DominatorTree *DT,
+ LoopVectorizationLegality &LVL,
+ DenseMap<const SCEV *, Value *> &ExpandedSCEVs, GeneratedRTChecks &Checks,
+ ArrayRef<Instruction *> InstsToMove) {
+ BasicBlock *AdditionalBypassBlock =
+ cast<VPIRBasicBlock>(Plan.getEntry())->getIRBasicBlock();
+ BasicBlock *VecEpilogueIterationCountCheck =
+ cast<VPIRBasicBlock>(Plan.getEntry())->getIRBasicBlock();
+
+ BasicBlock *LoopVectorPreHeader =
+ cast<BranchInst>(VecEpilogueIterationCountCheck->getTerminator())
+ ->getSuccessor(1);
+ // Adjust the control flow taking the state info from the main loop
+ // vectorization into account.
+ assert(EPI.MainLoopIterationCountCheck && EPI.EpilogueIterationCountCheck &&
+ "expected this to be saved from the previous pass.");
+ DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
+ EPI.MainLoopIterationCountCheck->getTerminator()->replaceUsesOfWith(
+ VecEpilogueIterationCountCheck, LoopVectorPreHeader);
+
+ DTU.applyUpdates({{DominatorTree::Delete, EPI.MainLoopIterationCountCheck,
+ VecEpilogueIterationCountCheck},
+ {DominatorTree::Insert, EPI.MainLoopIterationCountCheck,
+ LoopVectorPreHeader}});
+
+ BasicBlock *ScalarPH =
+ cast<VPIRBasicBlock>(Plan.getScalarPreheader())->getIRBasicBlock();
+ EPI.EpilogueIterationCountCheck->getTerminator()->replaceUsesOfWith(
+ VecEpilogueIterationCountCheck, ScalarPH);
+ DTU.applyUpdates(
+ {{DominatorTree::Delete, EPI.EpilogueIterationCountCheck,
+ VecEpilogueIterationCountCheck},
+ {DominatorTree::Insert, EPI.EpilogueIterationCountCheck, ScalarPH}});
+
+ // Adjust the terminators of runtime check blocks and phis using them.
+ BasicBlock *SCEVCheckBlock = Checks.getSCEVChecks().second;
+ BasicBlock *MemCheckBlock = Checks.getMemRuntimeChecks().second;
+ if (SCEVCheckBlock) {
+ SCEVCheckBlock->getTerminator()->replaceUsesOfWith(
+ VecEpilogueIterationCountCheck, ScalarPH);
+ DTU.applyUpdates({{DominatorTree::Delete, SCEVCheckBlock,
+ VecEpilogueIterationCountCheck},
+ {DominatorTree::Insert, SCEVCheckBlock, ScalarPH}});
+ }
+ if (MemCheckBlock) {
+ MemCheckBlock->getTerminator()->replaceUsesOfWith(
+ VecEpilogueIterationCountCheck, ScalarPH);
+ DTU.applyUpdates(
+ {{DominatorTree::Delete, MemCheckBlock, VecEpilogueIterationCountCheck},
+ {DominatorTree::Insert, MemCheckBlock, ScalarPH}});
+ }
+
+ // The vec.epilog.iter.check block may contain Phi nodes from inductions
+ // or reductions which merge control-flow from the latch block and the
+ // middle block. Update the incoming values here and move the Phi into the
+ // preheader.
+ SmallVector<PHINode *, 4> PhisInBlock(
+ llvm::make_pointer_range(VecEpilogueIterationCountCheck->phis()));
+
+ for (PHINode *Phi : PhisInBlock) {
+ Phi->moveBefore(LoopVectorPreHeader->getFirstNonPHIIt());
+ Phi->replaceIncomingBlockWith(
+ VecEpilogueIterationCountCheck->getSinglePredecessor(),
+ VecEpilogueIterationCountCheck);
+
+ // If the phi doesn't have an incoming value from the
+ // EpilogueIterationCountCheck, we are done. Otherwise remove the
+ // incoming value and also those from other check blocks. This is needed
+ // for reduction phis only.
+ if (none_of(Phi->blocks(), [&](BasicBlock *IncB) {
+ return EPI.EpilogueIterationCountCheck == IncB;
+ }))
+ continue;
+ Phi->removeIncomingValue(EPI.EpilogueIterationCountCheck);
+ if (SCEVCheckBlock)
+ Phi->removeIncomingValue(SCEVCheckBlock);
+ if (MemCheckBlock)
+ Phi->removeIncomingValue(MemCheckBlock);
+ }
+
+ auto IP = LoopVectorPreHeader->getFirstNonPHIIt();
+ for (auto *I : InstsToMove)
+ I->moveBefore(IP);
+
+ fixScalarResumeValuesFromBypass(AdditionalBypassBlock, L, Plan, LVL,
+ ExpandedSCEVs, EPI.VectorTripCount);
+}
+
bool LoopVectorizePass::processLoop(Loop *L) {
assert((EnableVPlanNativePath || L->isInnermost()) &&
"VPlan-native path is not enabled. Only process inner loops.");
@@ -10127,6 +10154,7 @@ bool LoopVectorizePass::processLoop(Loop *L) {
// factor) again shortly afterwards.
VPlan &BestEpiPlan = LVP.getPlanFor(EpilogueVF.Width);
BestEpiPlan.getMiddleBlock()->setName("vec.epilog.middle.block");
+ BestEpiPlan.getVectorPreheader()->setName("vec.epilog.ph");
preparePlanForMainVectorLoop(*BestMainPlan, BestEpiPlan);
EpilogueLoopVectorizationInfo EPI(VF.Width, IC, EpilogueVF.Width, 1,
BestEpiPlan);
@@ -10140,15 +10168,13 @@ bool LoopVectorizePass::processLoop(Loop *L) {
// edges from the first pass.
EpilogueVectorizerEpilogueLoop EpilogILV(L, PSE, LI, DT, TTI, AC, EPI, &CM,
BFI, PSI, Checks, BestEpiPlan);
- EpilogILV.setTripCount(MainILV.getTripCount());
- preparePlanForEpilogueVectorLoop(BestEpiPlan, L, ExpandedSCEVs, EPI);
-
+ SmallVector<Instruction *> InstsToMove;
+ preparePlanForEpilogueVectorLoop(BestEpiPlan, L, ExpandedSCEVs, EPI,
+ InstsToMove, CM, *PSE.getSE());
LVP.executePlan(EPI.EpilogueVF, EPI.EpilogueUF, BestEpiPlan, EpilogILV, DT,
true);
-
- fixScalarResumeValuesFromBypass(EpilogILV.getAdditionalBypassBlock(), L,
- BestEpiPlan, LVL, ExpandedSCEVs,
- EPI.VectorTripCount);
+ connectEpilogueVectorLoop(BestEpiPlan, L, EPI, DT, LVL, ExpandedSCEVs,
+ Checks, InstsToMove);
++LoopsEpilogueVectorized;
} else {
InnerLoopVectorizer LB(L, PSE, LI, DT, TTI, AC, VF.Width, IC, &CM, BFI, PSI,
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll b/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
index 3b435f320b0c2..cf1ee26d754df 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
@@ -50,8 +50,7 @@ define void @test_pr25490(i32 %n, ptr noalias nocapture %a, ptr noalias nocaptur
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]
; CHECK-NEXT: br i1 [[CMP_N]], [[EXIT:label %.*]], label %[[VEC_EPILOG_ITER_CHECK:.*]]
; CHECK: [[VEC_EPILOG_ITER_CHECK]]:
-; CHECK-NEXT: [[N_VEC_REMAINING:%.*]] = sub i64 [[TMP0]], [[N_VEC]]
-; CHECK-NEXT: [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp ult i64 [[N_VEC_REMAINING]], 4
+; CHECK-NEXT: [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp ult i64 [[N_MOD_VF]], 4
; CHECK-NEXT: br i1 [[MIN_EPILOG_ITERS_CHECK]], label %[[VEC_EPILOG_SCALAR_PH]], label %[[VEC_EPILOG_PH]], !prof [[PROF3:![0-9]+]]
; CHECK: [[VEC_EPILOG_PH]]:
; CHECK-NEXT: [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[VEC_EPILOG_ITER_CHECK]] ], [ 0, %[[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/epilog-iv-select-cmp.ll b/llvm/test/Transforms/LoopVectorize/AArch64/epilog-iv-select-cmp.ll
index 3a46944712567..dc52e644742e2 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/epilog-iv-select-cmp.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/epilog-iv-select-cmp.ll
@@ -47,8 +47,7 @@ define i8 @select_icmp_var_start(ptr %a, i8 %n, i8 %start) {
; CHECK-NEXT: br i1 [[CMP_N]...
[truncated]
|
BFI, PSI, Checks, Plan, EPI.EpilogueVF, | ||
EPI.EpilogueVF, EPI.EpilogueUF) { | ||
TripCount = EPI.TripCount; | ||
TripCount = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TripCount = nullptr; |
ILV::TripCount set to null by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep updated, thanks
VPBasicBlock *OldEntry = Plan.getEntry(); | ||
VPBlockUtils::reassociateBlocks(OldEntry, NewEntry); | ||
Plan.setEntry(NewEntry); | ||
// OldEntry is now dead and will be cleaned up when the plan gets destroyed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this comment still hold?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, restored thanks
/// This function is partially responsible for generating the control flow | ||
/// depicted in https://llvm.org/docs/Vectorizers.html#epilogue-vectorization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better update documentation to specify what (little) this method actually does? And what it returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added thanks
/// This function is partially responsible for generating the control flow | ||
/// depicted in https://llvm.org/docs/Vectorizers.html#epilogue-vectorization. | ||
BasicBlock *EpilogueVectorizerEpilogueLoop::createVectorizedLoopSkeleton() { | ||
BasicBlock *Entry = OrigLoop->getLoopPreheader(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better inline Entry
, or rename, to avoid confusion with other Entry's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to OldScalarPH, thanks
} | ||
|
||
/// Prepare \p Plan for vectorizing the epilogue loop. That is, re-use expanded | ||
/// SCEVs from \p ExpandedSCEVs and set resume values for header recipes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain \p InstsToMove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks
SmallVector<Instruction *> InstsToMove; | ||
preparePlanForEpilogueVectorLoop(BestEpiPlan, L, ExpandedSCEVs, EPI, | ||
InstsToMove, CM, *PSE.getSE()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SmallVector<Instruction *> InstsToMove; | |
preparePlanForEpilogueVectorLoop(BestEpiPlan, L, ExpandedSCEVs, EPI, | |
InstsToMove, CM, *PSE.getSE()); | |
SmallVector<Instruction *> InstsToMove = | |
preparePlanForEpilogueVectorLoop(BestEpiPlan, L, ExpandedSCEVs, EPI, | |
CM, *PSE.getSE()); |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
ExpandR->eraseFromParent(); | ||
} | ||
|
||
// Add the minimum iteration check for the epilogue vector loop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth outlining?
As in EpilogueVectorizerEpilogueLoop::emitMinimumVectorEpilogueIterCountCheck().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to addMinimumVectorEpilogueIterationCheck
in VPlanConstruction.cpp, similar to addMinimumIterationCheck
BasicBlock *VecEpilogueIterationCountCheck = | ||
cast<VPIRBasicBlock>(Plan.getEntry())->getIRBasicBlock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BasicBlock *VecEpilogueIterationCountCheck = | |
cast<VPIRBasicBlock>(Plan.getEntry())->getIRBasicBlock(); | |
BasicBlock *VecEpilogueIterationCountCheck = AdditionalBypassBlock; |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done ,thanks
✅ With the latest revision this PR passed the C/C++ code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, thanks, adding some minor comments.
GeneratedRTChecks &Checks, VPlan &Plan) | ||
: InnerLoopAndEpilogueVectorizer(OrigLoop, PSE, LI, DT, TTI, AC, EPI, CM, | ||
BFI, PSI, Checks, Plan, EPI.EpilogueVF, | ||
EPI.EpilogueVF, EPI.EpilogueUF) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EPI.EpilogueVF, EPI.EpilogueUF) { | |
EPI.EpilogueVF, EPI.EpilogueUF) {} |
clang-format, as in InnerLoopVectorizer's constructor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, updated thanks!
/// entry block to the epilogue VPlan. The minimum iteration check is already | ||
/// created in VPlan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// entry block to the epilogue VPlan. The minimum iteration check is already | |
/// created in VPlan. | |
/// entry block to the epilogue VPlan. The minimum iteration check is being | |
/// represented in VPlan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, thanks!
BasicBlock *OriginalScalarPH = OrigLoop->getLoopPreheader(); | ||
BasicBlock *ScalarPH = createScalarPreheader("vec.epilog."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BasicBlock *OriginalScalarPH = OrigLoop->getLoopPreheader(); | |
BasicBlock *ScalarPH = createScalarPreheader("vec.epilog."); | |
BasicBlock *NewScalarPH = createScalarPreheader("vec.epilog."); | |
BasicBlock *OriginalScalarPH = NewScalarPH->getSinglePredecessor(); |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done thanks
// OldEntry is now dead and will be cleaned up when the plan gets destroyed. | ||
|
||
return Insert; | ||
return ScalarPH->getSinglePredecessor(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return ScalarPH->getSinglePredecessor(); | |
return OriginalScalarPH; |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, thanks
make_range(OldEntry->getFirstNonPhi(), OldEntry->end()))) { | ||
if (isa<VPIRInstruction>(&R)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth noting why phi's and VPIRInstructions are excluded when moving recipes from OldEntry to NewEntry? VPIRInstructions are unmovable by definition and phi's are expected to be VPIRPhi's (i.e., also VPIRInstructions)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment and also updated to iterate over all recipes.
/// Connect the epilogue vector loop generated for \p Plan to the main vector | ||
/// loop, updating branches from the iteration and runtime checks, as well as | ||
/// updating various phis. \p InstsToMove contains instructions that need to be | ||
/// moved to the vector preheader. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// moved to the vector preheader. | |
/// moved to the preheader of the epilogue vector loop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks
/// updating various phis. \p InstsToMove contains instructions that need to be | ||
/// moved to the vector preheader. | ||
static void connectEpilogueVectorLoop( | ||
VPlan &Plan, Loop *L, EpilogueLoopVectorizationInfo &EPI, DominatorTree *DT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should Plan
be renamed EpiPlan
and/or commented as such?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, updated, thanks!
cast<VPIRBasicBlock>(Plan.getEntry())->getIRBasicBlock(); | ||
BasicBlock *VecEpilogueIterationCountCheck = AdditionalBypassBlock; | ||
|
||
BasicBlock *LoopVectorPreHeader = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BasicBlock *LoopVectorPreHeader = | |
BasicBlock *VecEpiloguePreHeader = |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated thanks!
} | ||
} | ||
|
||
void VPlanTransforms::addMinimumVectorEpilogueIterationCheck( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: refactor this with the above addMinimumIterationCheck().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added thanks
/// SCEVs from \p ExpandedSCEVs and set resume values for header recipes. Some | ||
/// reductions require creating new instructions to compute the resume values. | ||
/// They are collected in a vector and returned. They must be moved to the | ||
/// vector preheader. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// vector preheader. | |
/// preheader of the vector epilogue loop, after created by the execution of Plan. |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated thanks!
…(#157545) Move creation of the minimum iteration check for the epilogue vector loop to VPlan. This is a first step towards breaking up and moving skeleton creation for epilogue vectorization to VPlan. It moves most logic out of EpilogueVectorizerEpilogueLoop: the minimum iteration check is created directly in VPlan, connecting the check blocks from the main vector loop is done as post-processing. Next steps are to move connecting and updating the branches from the check blocks to VPlan, as well as updating the incoming values for phis. Test changes are improvements due to folding of live-ins. PR: llvm/llvm-project#157545
Move creation of the minimum iteration check for the epilogue vector loop to VPlan. This is a first step towards breaking up and moving skeleton creation for epilogue vectorization to VPlan. It moves most logic out of EpilogueVectorizerEpilogueLoop: the minimum iteration check is created directly in VPlan, connecting the check blocks from the main vector loop is done as post-processing. Next steps are to move connecting and updating the branches from the check blocks to VPlan, as well as updating the incoming values for phis. Test changes are improvements due to folding of live-ins. PR: llvm#157545
Move creation of the minimum iteration check for the epilogue vector loop to VPlan. This is a first step towards breaking up and moving skeleton creation for epilogue vectorization to VPlan. It moves most logic out of EpilogueVectorizerEpilogueLoop: the minimum iteration check is created directly in VPlan, connecting the check blocks from the main vector loop is done as post-processing. Next steps are to move connecting and updating the branches from the check blocks to VPlan, as well as updating the incoming values for phis. Test changes are improvements due to folding of live-ins. PR: llvm#157545
Move creation of the minimum iteration check for the epilogue vector loop to VPlan. This is a first step towards breaking up and moving skeleton creation for epilogue vectorization to VPlan.
It moves most logic out of EpilogueVectorizerEpilogueLoop: the minimum iteration check is created directly in VPlan, connecting the check blocks from the main vector loop is done as post-processing. Next steps are to move connecting and updating the branches from the check blocks to VPlan, as well as updating the incoming values for phis.
Test changes are improvements due to folding of live-ins.