Skip to content

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Sep 8, 2025

Move creation of the minimum iteration check for the epilogue vector loop to VPlan. This is a first step towards breaking up and moving skeleton creation for epilogue vectorization to VPlan.

It moves most logic out of EpilogueVectorizerEpilogueLoop: the minimum iteration check is created directly in VPlan, connecting the check blocks from the main vector loop is done as post-processing. Next steps are to move connecting and updating the branches from the check blocks to VPlan, as well as updating the incoming values for phis.

Test changes are improvements due to folding of live-ins.

Move creation of the minimum iteration check for the epilogue vector
loop to VPlan. This is a first step towards breaking up and moving skeleton
creation for epilogue vectorization to VPlan.

It moves most logic out of EpilogueVectorizerEpilogueLoop: the minimum
iteration check is created directly in VPlan, connecting the check
blocks from the main vector loop is done as post-processing. Next steps
are to move connecting and updating the branches from the check blocks
to VPlan, as well as updating the incoming values for phis.

Test changes are improvements due to folding of live-ins.
@llvmbot
Copy link
Member

llvmbot commented Sep 8, 2025

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-powerpc

Author: Florian Hahn (fhahn)

Changes

Move creation of the minimum iteration check for the epilogue vector loop to VPlan. This is a first step towards breaking up and moving skeleton creation for epilogue vectorization to VPlan.

It moves most logic out of EpilogueVectorizerEpilogueLoop: the minimum iteration check is created directly in VPlan, connecting the check blocks from the main vector loop is done as post-processing. Next steps are to move connecting and updating the branches from the check blocks to VPlan, as well as updating the incoming values for phis.

Test changes are improvements due to folding of live-ins.


Patch is 98.41 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/157545.diff

41 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+159-133)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll (+1-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/epilog-iv-select-cmp.ll (+2-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-factors.ll (+4-8)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll (+3-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/f128-fmuladd-reduction.ll (+1-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll (+1-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/interleave-with-gaps.ll (+2-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/interleaving-load-store.ll (+2-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/interleaving-reduction.ll (+1-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/intrinsiccost.ll (+2-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/loop-vectorization-factors.ll (+5-10)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/low_trip_count_predicates.ll (+2-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-epilogue.ll (+1-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/store-costs-sve.ll (+1-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-inloop-reductions.ll (+1-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-reductions.ll (+1-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-strict-reductions.ll (+1-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll (+9-18)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vscale-fixed.ll (+2-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve2-histcnt-epilogue.ll (+1-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll (+2-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/vector-loop-backedge-elimination-epilogue.ll (+1-2)
  • (modified) llvm/test/Transforms/LoopVectorize/PowerPC/exit-branch-cost.ll (+1-2)
  • (modified) llvm/test/Transforms/LoopVectorize/PowerPC/optimal-epilog-vectorization.ll (+4-8)
  • (modified) llvm/test/Transforms/LoopVectorize/PowerPC/small-loop-rdx.ll (+1-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/conversion-cost.ll (+1-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/cost-conditional-branches.ll (+1-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/cost-model.ll (+2-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/epilog-vectorization-inductions.ll (+3-5)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/float-induction-x86.ll (+2-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll (+1-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/induction-costs.ll (+1-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/intrinsiccost.ll (+2-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/invariant-load-gather.ll (+1-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/invariant-store-vectorization.ll (+3-6)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll (+6-12)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr23997.ll (+1-2)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/pr54634.ll (+4-5)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/scatter_crash.ll (+2-4)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/vectorize-force-tail-with-evl.ll (+1-2)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index d78e190e8bf7b..79613a78f12f8 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -688,11 +688,6 @@ class EpilogueVectorizerMainLoop : public InnerLoopAndEpilogueVectorizer {
 // vectorization of *epilogue* loops in the process of vectorizing loops and
 // their epilogues.
 class EpilogueVectorizerEpilogueLoop : public InnerLoopAndEpilogueVectorizer {
-  /// The additional bypass block which conditionally skips over the epilogue
-  /// loop after executing the main loop. Needed to resume inductions and
-  /// reductions during epilogue vectorization.
-  BasicBlock *AdditionalBypassBlock = nullptr;
-
 public:
   EpilogueVectorizerEpilogueLoop(
       Loop *OrigLoop, PredicatedScalarEvolution &PSE, LoopInfo *LI,
@@ -703,20 +698,12 @@ class EpilogueVectorizerEpilogueLoop : public InnerLoopAndEpilogueVectorizer {
       : InnerLoopAndEpilogueVectorizer(OrigLoop, PSE, LI, DT, TTI, AC, EPI, CM,
                                        BFI, PSI, Checks, Plan, EPI.EpilogueVF,
                                        EPI.EpilogueVF, EPI.EpilogueUF) {
-    TripCount = EPI.TripCount;
+    TripCount = nullptr;
   }
   /// Implements the interface for creating a vectorized skeleton using the
   /// *epilogue loop* strategy (i.e., the second pass of VPlan execution).
   BasicBlock *createVectorizedLoopSkeleton() final;
 
-  /// Return the additional bypass block which targets the scalar loop by
-  /// skipping the epilogue loop after completing the main loop.
-  BasicBlock *getAdditionalBypassBlock() const {
-    assert(AdditionalBypassBlock &&
-           "Trying to access AdditionalBypassBlock but it has not been set");
-    return AdditionalBypassBlock;
-  }
-
 protected:
   /// Emits an iteration count bypass check after the main vector loop has
   /// finished to see if there are any iterations left to execute by either
@@ -7361,121 +7348,22 @@ BasicBlock *EpilogueVectorizerMainLoop::emitIterationCountCheck(
 /// This function is partially responsible for generating the control flow
 /// depicted in https://llvm.org/docs/Vectorizers.html#epilogue-vectorization.
 BasicBlock *EpilogueVectorizerEpilogueLoop::createVectorizedLoopSkeleton() {
+  BasicBlock *Entry = OrigLoop->getLoopPreheader();
   BasicBlock *ScalarPH = createScalarPreheader("vec.epilog.");
-  BasicBlock *VectorPH = ScalarPH->getSinglePredecessor();
-  // Now, compare the remaining count and if there aren't enough iterations to
-  // execute the vectorized epilogue skip to the scalar part.
-  VectorPH->setName("vec.epilog.ph");
-  BasicBlock *VecEpilogueIterationCountCheck =
-      SplitBlock(VectorPH, VectorPH->begin(), DT, LI, nullptr,
-                 "vec.epilog.iter.check", true);
-  VectorPHVPBB = replaceVPBBWithIRVPBB(VectorPHVPBB, VectorPH);
-
-  emitMinimumVectorEpilogueIterCountCheck(VectorPH, ScalarPH,
-                                          VecEpilogueIterationCountCheck);
-  AdditionalBypassBlock = VecEpilogueIterationCountCheck;
-
-  // Adjust the control flow taking the state info from the main loop
-  // vectorization into account.
-  assert(EPI.MainLoopIterationCountCheck && EPI.EpilogueIterationCountCheck &&
-         "expected this to be saved from the previous pass.");
-  EPI.MainLoopIterationCountCheck->getTerminator()->replaceUsesOfWith(
-      VecEpilogueIterationCountCheck, VectorPH);
-
-  EPI.EpilogueIterationCountCheck->getTerminator()->replaceUsesOfWith(
-      VecEpilogueIterationCountCheck, ScalarPH);
-
-  // Adjust the terminators of runtime check blocks and phis using them.
-  BasicBlock *SCEVCheckBlock = RTChecks.getSCEVChecks().second;
-  BasicBlock *MemCheckBlock = RTChecks.getMemRuntimeChecks().second;
-  if (SCEVCheckBlock)
-    SCEVCheckBlock->getTerminator()->replaceUsesOfWith(
-        VecEpilogueIterationCountCheck, ScalarPH);
-  if (MemCheckBlock)
-    MemCheckBlock->getTerminator()->replaceUsesOfWith(
-        VecEpilogueIterationCountCheck, ScalarPH);
-
-  DT->changeImmediateDominator(ScalarPH, EPI.EpilogueIterationCountCheck);
-
-  // The vec.epilog.iter.check block may contain Phi nodes from inductions or
-  // reductions which merge control-flow from the latch block and the middle
-  // block. Update the incoming values here and move the Phi into the preheader.
-  SmallVector<PHINode *, 4> PhisInBlock(
-      llvm::make_pointer_range(VecEpilogueIterationCountCheck->phis()));
-
-  for (PHINode *Phi : PhisInBlock) {
-    Phi->moveBefore(VectorPH->getFirstNonPHIIt());
-    Phi->replaceIncomingBlockWith(
-        VecEpilogueIterationCountCheck->getSinglePredecessor(),
-        VecEpilogueIterationCountCheck);
-
-    // If the phi doesn't have an incoming value from the
-    // EpilogueIterationCountCheck, we are done. Otherwise remove the incoming
-    // value and also those from other check blocks. This is needed for
-    // reduction phis only.
-    if (none_of(Phi->blocks(), [&](BasicBlock *IncB) {
-          return EPI.EpilogueIterationCountCheck == IncB;
-        }))
+  ScalarPH->getSinglePredecessor()->setName("vec.epilog.iter.check");
+  VPIRBasicBlock *NewEntry = Plan.createVPIRBasicBlock(Entry);
+  VPBasicBlock *OldEntry = Plan.getEntry();
+  for (auto &R : make_early_inc_range(
+           make_range(OldEntry->getFirstNonPhi(), OldEntry->end()))) {
+    if (isa<VPIRInstruction>(&R))
       continue;
-    Phi->removeIncomingValue(EPI.EpilogueIterationCountCheck);
-    if (SCEVCheckBlock)
-      Phi->removeIncomingValue(SCEVCheckBlock);
-    if (MemCheckBlock)
-      Phi->removeIncomingValue(MemCheckBlock);
+    R.moveBefore(*NewEntry, NewEntry->end());
   }
 
-  return VectorPH;
-}
-
-BasicBlock *
-EpilogueVectorizerEpilogueLoop::emitMinimumVectorEpilogueIterCountCheck(
-    BasicBlock *VectorPH, BasicBlock *Bypass, BasicBlock *Insert) {
-
-  assert(EPI.TripCount &&
-         "Expected trip count to have been saved in the first pass.");
-  Value *TC = EPI.TripCount;
-  IRBuilder<> Builder(Insert->getTerminator());
-  Value *Count = Builder.CreateSub(TC, EPI.VectorTripCount, "n.vec.remaining");
-
-  // Generate code to check if the loop's trip count is less than VF * UF of the
-  // vector epilogue loop.
-  auto P = Cost->requiresScalarEpilogue(EPI.EpilogueVF.isVector())
-               ? ICmpInst::ICMP_ULE
-               : ICmpInst::ICMP_ULT;
-
-  Value *CheckMinIters =
-      Builder.CreateICmp(P, Count,
-                         createStepForVF(Builder, Count->getType(),
-                                         EPI.EpilogueVF, EPI.EpilogueUF),
-                         "min.epilog.iters.check");
-
-  BranchInst &BI = *BranchInst::Create(Bypass, VectorPH, CheckMinIters);
-  auto VScale = Cost->getVScaleForTuning();
-  unsigned MainLoopStep =
-      estimateElementCount(EPI.MainLoopVF * EPI.MainLoopUF, VScale);
-  unsigned EpilogueLoopStep =
-      estimateElementCount(EPI.EpilogueVF * EPI.EpilogueUF, VScale);
-  // We assume the remaining `Count` is equally distributed in
-  // [0, MainLoopStep)
-  // So the probability for `Count < EpilogueLoopStep` should be
-  // min(MainLoopStep, EpilogueLoopStep) / MainLoopStep
-  // TODO: Improve the estimate by taking the estimated trip count into
-  // consideration.
-  unsigned EstimatedSkipCount = std::min(MainLoopStep, EpilogueLoopStep);
-  const uint32_t Weights[] = {EstimatedSkipCount,
-                              MainLoopStep - EstimatedSkipCount};
-  setBranchWeights(BI, Weights, /*IsExpected=*/false);
-  ReplaceInstWithInst(Insert->getTerminator(), &BI);
-
-  // A new entry block has been created for the epilogue VPlan. Hook it in, as
-  // otherwise we would try to modify the entry to the main vector loop.
-  VPIRBasicBlock *NewEntry = Plan.createVPIRBasicBlock(Insert);
-  VPBasicBlock *OldEntry = Plan.getEntry();
   VPBlockUtils::reassociateBlocks(OldEntry, NewEntry);
   Plan.setEntry(NewEntry);
-  // OldEntry is now dead and will be cleaned up when the plan gets destroyed.
 
-  return Insert;
+  return ScalarPH->getSinglePredecessor();
 }
 
 void EpilogueVectorizerEpilogueLoop::printDebugTracesAtStart() {
@@ -9555,10 +9443,11 @@ static void preparePlanForMainVectorLoop(VPlan &MainPlan, VPlan &EpiPlan) {
 
 /// Prepare \p Plan for vectorizing the epilogue loop. That is, re-use expanded
 /// SCEVs from \p ExpandedSCEVs and set resume values for header recipes.
-static void
-preparePlanForEpilogueVectorLoop(VPlan &Plan, Loop *L,
-                                 const SCEV2ValueTy &ExpandedSCEVs,
-                                 EpilogueLoopVectorizationInfo &EPI) {
+static void preparePlanForEpilogueVectorLoop(
+    VPlan &Plan, Loop *L, const SCEV2ValueTy &ExpandedSCEVs,
+    EpilogueLoopVectorizationInfo &EPI,
+    SmallVectorImpl<Instruction *> &InstsToMove, LoopVectorizationCostModel &CM,
+    ScalarEvolution &SE) {
   VPRegionBlock *VectorLoop = Plan.getVectorLoopRegion();
   VPBasicBlock *Header = VectorLoop->getEntryBasicBlock();
   Header->setName("vec.epilog.vector.body");
@@ -9633,6 +9522,8 @@ preparePlanForEpilogueVectorLoop(VPlan &Plan, Loop *L,
         BasicBlock *PBB = cast<Instruction>(ResumeV)->getParent();
         IRBuilder<> Builder(PBB, PBB->getFirstNonPHIIt());
         ResumeV = Builder.CreateICmpNE(ResumeV, StartV);
+        if (auto *I = dyn_cast<Instruction>(ResumeV))
+          InstsToMove.push_back(I);
       } else if (RecurrenceDescriptor::isFindIVRecurrenceKind(RK)) {
         Value *StartV = getStartValueFromReductionResult(RdxResult);
         ToFrozen[StartV] = cast<PHINode>(ResumeV)->getIncomingValueForBlock(
@@ -9647,8 +9538,12 @@ preparePlanForEpilogueVectorLoop(VPlan &Plan, Loop *L,
         BasicBlock *ResumeBB = cast<Instruction>(ResumeV)->getParent();
         IRBuilder<> Builder(ResumeBB, ResumeBB->getFirstNonPHIIt());
         Value *Cmp = Builder.CreateICmpEQ(ResumeV, ToFrozen[StartV]);
+        if (auto *I = dyn_cast<Instruction>(Cmp))
+          InstsToMove.push_back(I);
         Value *Sentinel = RdxResult->getOperand(2)->getLiveInIRValue();
         ResumeV = Builder.CreateSelect(Cmp, Sentinel, ResumeV);
+        if (auto *I = dyn_cast<Instruction>(ResumeV))
+          InstsToMove.push_back(I);
       } else {
         VPValue *StartVal = Plan.getOrAddLiveIn(ResumeV);
         auto *PhiR = dyn_cast<VPReductionPHIRecipe>(&R);
@@ -9700,6 +9595,46 @@ preparePlanForEpilogueVectorLoop(VPlan &Plan, Loop *L,
       Plan.resetTripCount(ExpandedVal);
     ExpandR->eraseFromParent();
   }
+
+  // Add the minimum iteration check for the epilogue vector loop.
+  VPValue *TC = Plan.getOrAddLiveIn(EPI.TripCount);
+  VPBuilder Builder(cast<VPBasicBlock>(Plan.getEntry()));
+  VPValue *Count = Builder.createNaryOp(
+      Instruction::Sub, {TC, Plan.getOrAddLiveIn(EPI.VectorTripCount)},
+      DebugLoc::getUnknown(), "n.vec.remaining");
+
+  // Generate code to check if the loop's trip count is less than VF * UF of
+  // the vector epilogue loop.
+  auto P = CM.requiresScalarEpilogue(EPI.EpilogueVF.isVector())
+               ? ICmpInst::ICMP_ULE
+               : ICmpInst::ICMP_ULT;
+  VPValue *VFxUF = Builder.createExpandSCEV(
+      SE.getElementCount(EPI.TripCount->getType(),
+                         (EPI.EpilogueVF * EPI.EpilogueUF), SCEV::FlagNUW));
+
+  auto *CheckMinIters = Builder.createICmp(
+      P, Count, VFxUF, DebugLoc::getUnknown(), "min.epilog.iters.check");
+  VPInstruction *Branch =
+      Builder.createNaryOp(VPInstruction::BranchOnCond, CheckMinIters);
+
+  auto VScale = CM.getVScaleForTuning();
+  unsigned MainLoopStep =
+      estimateElementCount(EPI.MainLoopVF * EPI.MainLoopUF, VScale);
+  unsigned EpilogueLoopStep =
+      estimateElementCount(EPI.EpilogueVF * EPI.EpilogueUF, VScale);
+  // We assume the remaining `Count` is equally distributed in
+  // [0, MainLoopStep)
+  // So the probability for `Count < EpilogueLoopStep` should be
+  // min(MainLoopStep, EpilogueLoopStep) / MainLoopStep
+  // TODO: Improve the estimate by taking the estimated trip count into
+  // consideration.
+  unsigned EstimatedSkipCount = std::min(MainLoopStep, EpilogueLoopStep);
+  const uint32_t Weights[] = {EstimatedSkipCount,
+                              MainLoopStep - EstimatedSkipCount};
+  MDBuilder MDB(Plan.getContext());
+  MDNode *BranchWeights =
+      MDB.createBranchWeights(Weights, /*IsExpected=*/false);
+  Branch->addMetadata(LLVMContext::MD_prof, BranchWeights);
 }
 
 // Generate bypass values from the additional bypass block. Note that when the
@@ -9766,6 +9701,98 @@ static void fixScalarResumeValuesFromBypass(BasicBlock *BypassBlock, Loop *L,
   }
 }
 
+/// Connect the epilogue vector loop generated for \p Plan to the main vector
+/// loop, updating branches from the iteration and runtime checks, as well as
+/// updating various phis.
+static void connectEpilogueVectorLoop(
+    VPlan &Plan, Loop *L, EpilogueLoopVectorizationInfo &EPI, DominatorTree *DT,
+    LoopVectorizationLegality &LVL,
+    DenseMap<const SCEV *, Value *> &ExpandedSCEVs, GeneratedRTChecks &Checks,
+    ArrayRef<Instruction *> InstsToMove) {
+  BasicBlock *AdditionalBypassBlock =
+      cast<VPIRBasicBlock>(Plan.getEntry())->getIRBasicBlock();
+  BasicBlock *VecEpilogueIterationCountCheck =
+      cast<VPIRBasicBlock>(Plan.getEntry())->getIRBasicBlock();
+
+  BasicBlock *LoopVectorPreHeader =
+      cast<BranchInst>(VecEpilogueIterationCountCheck->getTerminator())
+          ->getSuccessor(1);
+  // Adjust the control flow taking the state info from the main loop
+  // vectorization into account.
+  assert(EPI.MainLoopIterationCountCheck && EPI.EpilogueIterationCountCheck &&
+         "expected this to be saved from the previous pass.");
+  DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Eager);
+  EPI.MainLoopIterationCountCheck->getTerminator()->replaceUsesOfWith(
+      VecEpilogueIterationCountCheck, LoopVectorPreHeader);
+
+  DTU.applyUpdates({{DominatorTree::Delete, EPI.MainLoopIterationCountCheck,
+                     VecEpilogueIterationCountCheck},
+                    {DominatorTree::Insert, EPI.MainLoopIterationCountCheck,
+                     LoopVectorPreHeader}});
+
+  BasicBlock *ScalarPH =
+      cast<VPIRBasicBlock>(Plan.getScalarPreheader())->getIRBasicBlock();
+  EPI.EpilogueIterationCountCheck->getTerminator()->replaceUsesOfWith(
+      VecEpilogueIterationCountCheck, ScalarPH);
+  DTU.applyUpdates(
+      {{DominatorTree::Delete, EPI.EpilogueIterationCountCheck,
+        VecEpilogueIterationCountCheck},
+       {DominatorTree::Insert, EPI.EpilogueIterationCountCheck, ScalarPH}});
+
+  // Adjust the terminators of runtime check blocks and phis using them.
+  BasicBlock *SCEVCheckBlock = Checks.getSCEVChecks().second;
+  BasicBlock *MemCheckBlock = Checks.getMemRuntimeChecks().second;
+  if (SCEVCheckBlock) {
+    SCEVCheckBlock->getTerminator()->replaceUsesOfWith(
+        VecEpilogueIterationCountCheck, ScalarPH);
+    DTU.applyUpdates({{DominatorTree::Delete, SCEVCheckBlock,
+                       VecEpilogueIterationCountCheck},
+                      {DominatorTree::Insert, SCEVCheckBlock, ScalarPH}});
+  }
+  if (MemCheckBlock) {
+    MemCheckBlock->getTerminator()->replaceUsesOfWith(
+        VecEpilogueIterationCountCheck, ScalarPH);
+    DTU.applyUpdates(
+        {{DominatorTree::Delete, MemCheckBlock, VecEpilogueIterationCountCheck},
+         {DominatorTree::Insert, MemCheckBlock, ScalarPH}});
+  }
+
+  // The vec.epilog.iter.check block may contain Phi nodes from inductions
+  // or reductions which merge control-flow from the latch block and the
+  // middle block. Update the incoming values here and move the Phi into the
+  // preheader.
+  SmallVector<PHINode *, 4> PhisInBlock(
+      llvm::make_pointer_range(VecEpilogueIterationCountCheck->phis()));
+
+  for (PHINode *Phi : PhisInBlock) {
+    Phi->moveBefore(LoopVectorPreHeader->getFirstNonPHIIt());
+    Phi->replaceIncomingBlockWith(
+        VecEpilogueIterationCountCheck->getSinglePredecessor(),
+        VecEpilogueIterationCountCheck);
+
+    // If the phi doesn't have an incoming value from the
+    // EpilogueIterationCountCheck, we are done. Otherwise remove the
+    // incoming value and also those from other check blocks. This is needed
+    // for reduction phis only.
+    if (none_of(Phi->blocks(), [&](BasicBlock *IncB) {
+          return EPI.EpilogueIterationCountCheck == IncB;
+        }))
+      continue;
+    Phi->removeIncomingValue(EPI.EpilogueIterationCountCheck);
+    if (SCEVCheckBlock)
+      Phi->removeIncomingValue(SCEVCheckBlock);
+    if (MemCheckBlock)
+      Phi->removeIncomingValue(MemCheckBlock);
+  }
+
+  auto IP = LoopVectorPreHeader->getFirstNonPHIIt();
+  for (auto *I : InstsToMove)
+    I->moveBefore(IP);
+
+  fixScalarResumeValuesFromBypass(AdditionalBypassBlock, L, Plan, LVL,
+                                  ExpandedSCEVs, EPI.VectorTripCount);
+}
+
 bool LoopVectorizePass::processLoop(Loop *L) {
   assert((EnableVPlanNativePath || L->isInnermost()) &&
          "VPlan-native path is not enabled. Only process inner loops.");
@@ -10127,6 +10154,7 @@ bool LoopVectorizePass::processLoop(Loop *L) {
     // factor) again shortly afterwards.
     VPlan &BestEpiPlan = LVP.getPlanFor(EpilogueVF.Width);
     BestEpiPlan.getMiddleBlock()->setName("vec.epilog.middle.block");
+    BestEpiPlan.getVectorPreheader()->setName("vec.epilog.ph");
     preparePlanForMainVectorLoop(*BestMainPlan, BestEpiPlan);
     EpilogueLoopVectorizationInfo EPI(VF.Width, IC, EpilogueVF.Width, 1,
                                       BestEpiPlan);
@@ -10140,15 +10168,13 @@ bool LoopVectorizePass::processLoop(Loop *L) {
     // edges from the first pass.
     EpilogueVectorizerEpilogueLoop EpilogILV(L, PSE, LI, DT, TTI, AC, EPI, &CM,
                                              BFI, PSI, Checks, BestEpiPlan);
-    EpilogILV.setTripCount(MainILV.getTripCount());
-    preparePlanForEpilogueVectorLoop(BestEpiPlan, L, ExpandedSCEVs, EPI);
-
+    SmallVector<Instruction *> InstsToMove;
+    preparePlanForEpilogueVectorLoop(BestEpiPlan, L, ExpandedSCEVs, EPI,
+                                     InstsToMove, CM, *PSE.getSE());
     LVP.executePlan(EPI.EpilogueVF, EPI.EpilogueUF, BestEpiPlan, EpilogILV, DT,
                     true);
-
-    fixScalarResumeValuesFromBypass(EpilogILV.getAdditionalBypassBlock(), L,
-                                    BestEpiPlan, LVL, ExpandedSCEVs,
-                                    EPI.VectorTripCount);
+    connectEpilogueVectorLoop(BestEpiPlan, L, EPI, DT, LVL, ExpandedSCEVs,
+                              Checks, InstsToMove);
     ++LoopsEpilogueVectorized;
   } else {
     InnerLoopVectorizer LB(L, PSE, LI, DT, TTI, AC, VF.Width, IC, &CM, BFI, PSI,
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll b/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
index 3b435f320b0c2..cf1ee26d754df 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/deterministic-type-shrinkage.ll
@@ -50,8 +50,7 @@ define void @test_pr25490(i32 %n, ptr noalias nocapture %a, ptr noalias nocaptur
 ; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]
 ; CHECK-NEXT:    br i1 [[CMP_N]], [[EXIT:label %.*]], label %[[VEC_EPILOG_ITER_CHECK:.*]]
 ; CHECK:       [[VEC_EPILOG_ITER_CHECK]]:
-; CHECK-NEXT:    [[N_VEC_REMAINING:%.*]] = sub i64 [[TMP0]], [[N_VEC]]
-; CHECK-NEXT:    [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp ult i64 [[N_VEC_REMAINING]], 4
+; CHECK-NEXT:    [[MIN_EPILOG_ITERS_CHECK:%.*]] = icmp ult i64 [[N_MOD_VF]], 4
 ; CHECK-NEXT:    br i1 [[MIN_EPILOG_ITERS_CHECK]], label %[[VEC_EPILOG_SCALAR_PH]], label %[[VEC_EPILOG_PH]], !prof [[PROF3:![0-9]+]]
 ; CHECK:       [[VEC_EPILOG_PH]]:
 ; CHECK-NEXT:    [[VEC_EPILOG_RESUME_VAL:%.*]] = phi i64 [ [[N_VEC]], %[[VEC_EPILOG_ITER_CHECK]] ], [ 0, %[[VECTOR_MAIN_LOOP_ITER_CHECK]] ]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/epilog-iv-select-cmp.ll b/llvm/test/Transforms/LoopVectorize/AArch64/epilog-iv-select-cmp.ll
index 3a46944712567..dc52e644742e2 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/epilog-iv-select-cmp.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/epilog-iv-select-cmp.ll
@@ -47,8 +47,7 @@ define i8 @select_icmp_var_start(ptr %a, i8 %n, i8 %start) {
 ; CHECK-NEXT:    br i1 [[CMP_N]...
[truncated]

@artagnon artagnon changed the title [VPlan] Create epilogue miniimum iteration check in VPlan. [VPlan] Create epilogue minimum iteration check in VPlan. Sep 9, 2025
BFI, PSI, Checks, Plan, EPI.EpilogueVF,
EPI.EpilogueVF, EPI.EpilogueUF) {
TripCount = EPI.TripCount;
TripCount = nullptr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TripCount = nullptr;

ILV::TripCount set to null by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep updated, thanks

VPBasicBlock *OldEntry = Plan.getEntry();
VPBlockUtils::reassociateBlocks(OldEntry, NewEntry);
Plan.setEntry(NewEntry);
// OldEntry is now dead and will be cleaned up when the plan gets destroyed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this comment still hold?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, restored thanks

Comment on lines 7348 to 7349
/// This function is partially responsible for generating the control flow
/// depicted in https://llvm.org/docs/Vectorizers.html#epilogue-vectorization.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better update documentation to specify what (little) this method actually does? And what it returns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added thanks

/// This function is partially responsible for generating the control flow
/// depicted in https://llvm.org/docs/Vectorizers.html#epilogue-vectorization.
BasicBlock *EpilogueVectorizerEpilogueLoop::createVectorizedLoopSkeleton() {
BasicBlock *Entry = OrigLoop->getLoopPreheader();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better inline Entry, or rename, to avoid confusion with other Entry's.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to OldScalarPH, thanks

}

/// Prepare \p Plan for vectorizing the epilogue loop. That is, re-use expanded
/// SCEVs from \p ExpandedSCEVs and set resume values for header recipes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain \p InstsToMove?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks

Comment on lines 10171 to 10173
SmallVector<Instruction *> InstsToMove;
preparePlanForEpilogueVectorLoop(BestEpiPlan, L, ExpandedSCEVs, EPI,
InstsToMove, CM, *PSE.getSE());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SmallVector<Instruction *> InstsToMove;
preparePlanForEpilogueVectorLoop(BestEpiPlan, L, ExpandedSCEVs, EPI,
InstsToMove, CM, *PSE.getSE());
SmallVector<Instruction *> InstsToMove =
preparePlanForEpilogueVectorLoop(BestEpiPlan, L, ExpandedSCEVs, EPI,
CM, *PSE.getSE());

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

ExpandR->eraseFromParent();
}

// Add the minimum iteration check for the epilogue vector loop.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth outlining?
As in EpilogueVectorizerEpilogueLoop::emitMinimumVectorEpilogueIterCountCheck().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to addMinimumVectorEpilogueIterationCheck in VPlanConstruction.cpp, similar to addMinimumIterationCheck

Comment on lines 9714 to 9715
BasicBlock *VecEpilogueIterationCountCheck =
cast<VPIRBasicBlock>(Plan.getEntry())->getIRBasicBlock();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BasicBlock *VecEpilogueIterationCountCheck =
cast<VPIRBasicBlock>(Plan.getEntry())->getIRBasicBlock();
BasicBlock *VecEpilogueIterationCountCheck = AdditionalBypassBlock;

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ,thanks

Copy link

github-actions bot commented Sep 21, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, thanks, adding some minor comments.

GeneratedRTChecks &Checks, VPlan &Plan)
: InnerLoopAndEpilogueVectorizer(OrigLoop, PSE, LI, DT, TTI, AC, EPI, CM,
BFI, PSI, Checks, Plan, EPI.EpilogueVF,
EPI.EpilogueVF, EPI.EpilogueUF) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
EPI.EpilogueVF, EPI.EpilogueUF) {
EPI.EpilogueVF, EPI.EpilogueUF) {}

clang-format, as in InnerLoopVectorizer's constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, updated thanks!

Comment on lines 7395 to 7396
/// entry block to the epilogue VPlan. The minimum iteration check is already
/// created in VPlan.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// entry block to the epilogue VPlan. The minimum iteration check is already
/// created in VPlan.
/// entry block to the epilogue VPlan. The minimum iteration check is being
/// represented in VPlan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, thanks!

Comment on lines 7398 to 7399
BasicBlock *OriginalScalarPH = OrigLoop->getLoopPreheader();
BasicBlock *ScalarPH = createScalarPreheader("vec.epilog.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BasicBlock *OriginalScalarPH = OrigLoop->getLoopPreheader();
BasicBlock *ScalarPH = createScalarPreheader("vec.epilog.");
BasicBlock *NewScalarPH = createScalarPreheader("vec.epilog.");
BasicBlock *OriginalScalarPH = NewScalarPH->getSinglePredecessor();

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

// OldEntry is now dead and will be cleaned up when the plan gets destroyed.

return Insert;
return ScalarPH->getSinglePredecessor();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return ScalarPH->getSinglePredecessor();
return OriginalScalarPH;

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, thanks

Comment on lines 7404 to 7405
make_range(OldEntry->getFirstNonPhi(), OldEntry->end()))) {
if (isa<VPIRInstruction>(&R))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting why phi's and VPIRInstructions are excluded when moving recipes from OldEntry to NewEntry? VPIRInstructions are unmovable by definition and phi's are expected to be VPIRPhi's (i.e., also VPIRInstructions)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment and also updated to iterate over all recipes.

/// Connect the epilogue vector loop generated for \p Plan to the main vector
/// loop, updating branches from the iteration and runtime checks, as well as
/// updating various phis. \p InstsToMove contains instructions that need to be
/// moved to the vector preheader.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// moved to the vector preheader.
/// moved to the preheader of the epilogue vector loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

/// updating various phis. \p InstsToMove contains instructions that need to be
/// moved to the vector preheader.
static void connectEpilogueVectorLoop(
VPlan &Plan, Loop *L, EpilogueLoopVectorizationInfo &EPI, DominatorTree *DT,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Plan be renamed EpiPlan and/or commented as such?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, updated, thanks!

cast<VPIRBasicBlock>(Plan.getEntry())->getIRBasicBlock();
BasicBlock *VecEpilogueIterationCountCheck = AdditionalBypassBlock;

BasicBlock *LoopVectorPreHeader =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
BasicBlock *LoopVectorPreHeader =
BasicBlock *VecEpiloguePreHeader =

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated thanks!

}
}

void VPlanTransforms::addMinimumVectorEpilogueIterationCheck(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: refactor this with the above addMinimumIterationCheck().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added thanks

/// SCEVs from \p ExpandedSCEVs and set resume values for header recipes. Some
/// reductions require creating new instructions to compute the resume values.
/// They are collected in a vector and returned. They must be moved to the
/// vector preheader.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// vector preheader.
/// preheader of the vector epilogue loop, after created by the execution of Plan.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated thanks!

@fhahn fhahn enabled auto-merge (squash) September 25, 2025 06:43
@fhahn fhahn merged commit 2016af5 into llvm:main Sep 25, 2025
9 checks passed
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Sep 25, 2025
…(#157545)

Move creation of the minimum iteration check for the epilogue vector
loop to VPlan. This is a first step towards breaking up and moving
skeleton creation for epilogue vectorization to VPlan.

It moves most logic out of EpilogueVectorizerEpilogueLoop: the minimum
iteration check is created directly in VPlan, connecting the check
blocks from the main vector loop is done as post-processing. Next steps
are to move connecting and updating the branches from the check blocks
to VPlan, as well as updating the incoming values for phis.

Test changes are improvements due to folding of live-ins.

PR: llvm/llvm-project#157545
@fhahn fhahn deleted the vplan-epi-min-iter-check branch September 25, 2025 08:52
YixingZhang007 pushed a commit to YixingZhang007/llvm-project that referenced this pull request Sep 27, 2025
Move creation of the minimum iteration check for the epilogue vector
loop to VPlan. This is a first step towards breaking up and moving
skeleton creation for epilogue vectorization to VPlan.

It moves most logic out of EpilogueVectorizerEpilogueLoop: the minimum
iteration check is created directly in VPlan, connecting the check
blocks from the main vector loop is done as post-processing. Next steps
are to move connecting and updating the branches from the check blocks
to VPlan, as well as updating the incoming values for phis.

Test changes are improvements due to folding of live-ins.

PR: llvm#157545
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
Move creation of the minimum iteration check for the epilogue vector
loop to VPlan. This is a first step towards breaking up and moving
skeleton creation for epilogue vectorization to VPlan.

It moves most logic out of EpilogueVectorizerEpilogueLoop: the minimum
iteration check is created directly in VPlan, connecting the check
blocks from the main vector loop is done as post-processing. Next steps
are to move connecting and updating the branches from the check blocks
to VPlan, as well as updating the incoming values for phis.

Test changes are improvements due to folding of live-ins.

PR: llvm#157545
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants