[VPlan] Model branch cond to enter scalar epilogue in VPlan. #92651

fhahn · 2024-05-18T14:38:40Z

This patch moves branch condition creation to enter the scalar epilogue
loop to VPlan. Modeling the branch in the middle block also requires
modeling the successor blocks. This is done using the recently introduced
VPIRBasicBloc.

Note that the middle.block is still created as part of the skeleton and
then patched in during VPlan execution. Unfortunately the skeleton needs
to create the middle.block early on, as it is also used for induction
resume value creation and is also needed to properly update the
dominator tree during skeleton creation.

After this patch lands, I plan to move induction resume value and phi
node creation in the scalar preheader to VPlan. Once that is done, we
should be able to create the middle.block in VPlan directly.

This is a re-worked version based on the earlier
https://reviews.llvm.org/D150398 and the main change is the use of
VPIRBasicBlock.

Depends on #92525

Use DTU to queue DominatorTree updates directly when connecting basic blocks during VPlan execution. This simplifies DT updates and also automatically allows updating the DT for the VPlan-native path as additional benefit.

This patch moves branch condition creation to enter the scalar epilogue loop to VPlan. Modeling the branch in the middle block also requires modeling the successor blocks. To do so, this patch introduces a new VPBlockBase sub-type that simply wraps an existing IR block: VPIRWrapperBlock (name subject to better suggestions). This allows allows connecting blocks naturally to leave nodes. It can also be used to transition modeling of more bits of the skeleton to VPlan gradually. Note that the middle.block is still created as part of the skeleton and then patched in during VPlan execution. Unfortunately the skeleton needs to create the middle.block early on, as it is also used for induction resume value creation and is also needed to properly update the dominator tree during skeleton creation. After this patch lands, I plan to move induction resume value and phi node creation in the scalar preheader to VPlan. Once that is done, we should be able to create the middle.block in VPlan directly. At the moment, VPIRWrapperBlocks only wrap an original IR block and don't allow any additions. We could allow IR wrapper blocks to also contain recipes that get added at the beginning of the block. Then the middle.block could also be wrapped. Something like that will also be needed to place the induction/reduction resume phi nodes in the scalar preheader. This is a re-worked version based on the earlier https://reviews.llvm.org/D150398 and the main change is the introduction and use of VPIRWrapperBlock. Note that this patch adds and uses a new helper reorderIncomingBlocks to preserve the original order of incoming blocks in created phi nodes; as this patch changes the order branches are created, the order of the predecessors changes, which would cause different order of incoming blocks in some phis we create. The helper ensures the created IR doesn't change (modulo some minor difference in name numbering). After this change landed, the tests should be updated and the helper removed separately. Depends on llvm#92525

llvmbot · 2024-05-18T14:39:10Z

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-risc-v

Author: Florian Hahn (fhahn)

Changes

This patch moves branch condition creation to enter the scalar epilogue
loop to VPlan. Modeling the branch in the middle block also requires
modeling the successor blocks. To do so, this patch introduces a new
VPBlockBase sub-type that simply wraps an existing IR block:
VPIRWrapperBlock (name subject to better suggestions).

This allows allows connecting blocks naturally to leave nodes. It can
also be used to transition modeling of more bits of the skeleton to
VPlan gradually.

Note that the middle.block is still created as part of the skeleton and
then patched in during VPlan execution. Unfortunately the skeleton needs
to create the middle.block early on, as it is also used for induction
resume value creation and is also needed to properly update the
dominator tree during skeleton creation.

After this patch lands, I plan to move induction resume value and phi
node creation in the scalar preheader to VPlan. Once that is done, we
should be able to create the middle.block in VPlan directly.

At the moment, VPIRWrapperBlocks only wrap an original IR block and
don't allow any additions. We could allow IR wrapper blocks to also
contain recipes that get added at the beginning of the block. Then the
middle.block could also be wrapped. Something like that will also be
needed to place the induction/reduction resume phi nodes in the scalar
preheader.

This is a re-worked version based on the earlier
https://reviews.llvm.org/D150398 and the main change is the introduction
and use of VPIRWrapperBlock.

Note that this patch adds and uses a new helper reorderIncomingBlocks to
preserve the original order of incoming blocks in created phi nodes; as
this patch changes the order branches are created, the order of the
predecessors changes, which would cause different order of incoming
blocks in some phis we create. The helper ensures the created IR doesn't
change (modulo some minor difference in name numbering). After this
change landed, the tests should be updated and the helper removed
separately.

Depends on #92525

Patch is 47.84 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/92651.diff

13 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+68-78)
(modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+132-57)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+60-13)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+15-11)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+4-1)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll (+18)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-chains-vplan.ll (+17-1)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-sink-replicate-region.ll (+42)
(modified) llvm/test/Transforms/LoopVectorize/outer_loop_test1.ll (+1-1)
(modified) llvm/test/Transforms/LoopVectorize/pr59319-loop-access-info-invalidation.ll (+2-2)
(modified) llvm/test/Transforms/LoopVectorize/vplan-iv-transforms.ll (+9-1)
(modified) llvm/test/Transforms/LoopVectorize/vplan-sink-scalars-and-merge.ll (+15)
(modified) llvm/unittests/Transforms/Vectorize/VPlanTestBase.h (+4-2)

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index fd652cb789549..0f9d68753a3d5 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -59,6 +59,7 @@
 #include "VPlan.h"
 #include "VPlanAnalysis.h"
 #include "VPlanHCFGBuilder.h"
+#include "VPlanPatternMatch.h"
 #include "VPlanTransforms.h"
 #include "VPlanVerifier.h"
 #include "llvm/ADT/APInt.h"
@@ -2972,22 +2973,7 @@ void InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {
       SplitBlock(LoopMiddleBlock, LoopMiddleBlock->getTerminator(), DT, LI,
                  nullptr, Twine(Prefix) + "scalar.ph");
 
-  auto *ScalarLatchTerm = OrigLoop->getLoopLatch()->getTerminator();
-
-  // Set up the middle block terminator.  Two cases:
-  // 1) If we know that we must execute the scalar epilogue, emit an
-  //    unconditional branch.
-  // 2) Otherwise, we must have a single unique exit block (due to how we
-  //    implement the multiple exit case).  In this case, set up a conditional
-  //    branch from the middle block to the loop scalar preheader, and the
-  //    exit block.  completeLoopSkeleton will update the condition to use an
-  //    iteration check, if required to decide whether to execute the remainder.
-  BranchInst *BrInst =
-      Cost->requiresScalarEpilogue(VF.isVector())
-          ? BranchInst::Create(LoopScalarPreHeader)
-          : BranchInst::Create(LoopExitBlock, LoopScalarPreHeader,
-                               Builder.getTrue());
-  BrInst->setDebugLoc(ScalarLatchTerm->getDebugLoc());
+  auto *BrInst = new UnreachableInst(LoopMiddleBlock->getContext());
   ReplaceInstWithInst(LoopMiddleBlock->getTerminator(), BrInst);
 
   // Update dominator for loop exit. During skeleton creation, only the vector
@@ -3094,50 +3080,7 @@ void InnerLoopVectorizer::createInductionResumeValues(
   }
 }
 
-BasicBlock *InnerLoopVectorizer::completeLoopSkeleton() {
-  // The trip counts should be cached by now.
-  Value *Count = getTripCount();
-  Value *VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader);
-
-  auto *ScalarLatchTerm = OrigLoop->getLoopLatch()->getTerminator();
-
-  // Add a check in the middle block to see if we have completed
-  // all of the iterations in the first vector loop.  Three cases:
-  // 1) If we require a scalar epilogue, there is no conditional branch as
-  //    we unconditionally branch to the scalar preheader.  Do nothing.
-  // 2) If (N - N%VF) == N, then we *don't* need to run the remainder.
-  //    Thus if tail is to be folded, we know we don't need to run the
-  //    remainder and we can use the previous value for the condition (true).
-  // 3) Otherwise, construct a runtime check.
-  if (!Cost->requiresScalarEpilogue(VF.isVector()) &&
-      !Cost->foldTailByMasking()) {
-    // Here we use the same DebugLoc as the scalar loop latch terminator instead
-    // of the corresponding compare because they may have ended up with
-    // different line numbers and we want to avoid awkward line stepping while
-    // debugging. Eg. if the compare has got a line number inside the loop.
-    // TODO: At the moment, CreateICmpEQ will simplify conditions with constant
-    // operands. Perform simplification directly on VPlan once the branch is
-    // modeled there.
-    IRBuilder<> B(LoopMiddleBlock->getTerminator());
-    B.SetCurrentDebugLocation(ScalarLatchTerm->getDebugLoc());
-    Value *CmpN = B.CreateICmpEQ(Count, VectorTripCount, "cmp.n");
-    BranchInst &BI = *cast<BranchInst>(LoopMiddleBlock->getTerminator());
-    BI.setCondition(CmpN);
-    if (hasBranchWeightMD(*ScalarLatchTerm)) {
-      // Assume that `Count % VectorTripCount` is equally distributed.
-      unsigned TripCount = UF * VF.getKnownMinValue();
-      assert(TripCount > 0 && "trip count should not be zero");
-      const uint32_t Weights[] = {1, TripCount - 1};
-      setBranchWeights(BI, Weights);
-    }
-  }
-
-#ifdef EXPENSIVE_CHECKS
-  assert(DT->verify(DominatorTree::VerificationLevel::Fast));
-#endif
 
-  return LoopVectorPreHeader;
-}
 
 std::pair<BasicBlock *, Value *>
 InnerLoopVectorizer::createVectorizedLoopSkeleton(
@@ -3198,7 +3141,7 @@ InnerLoopVectorizer::createVectorizedLoopSkeleton(
   // Emit phis for the new starting index of the scalar loop.
   createInductionResumeValues(ExpandedSCEVs);
 
-  return {completeLoopSkeleton(), nullptr};
+  return {LoopVectorPreHeader, nullptr};
 }
 
 // Fix up external users of the induction variable. At this point, we are
@@ -3481,6 +3424,18 @@ void InnerLoopVectorizer::fixVectorizedLoop(VPTransformState &State,
                                VF.getKnownMinValue() * UF);
 }
 
+// Helper to reorder blocks so they match the original order even after the
+// order of the predecessors changes. This is only used to avoid a number of
+// test changes due to reordering of incoming blocks in phi nodes and should be
+// removed soon, with the tests being updated.
+static void reorderIncomingBlocks(SmallVectorImpl<BasicBlock *> &Blocks,
+                                  BasicBlock *LoopMiddleBlock) {
+  if (Blocks.front() == LoopMiddleBlock)
+    std::swap(Blocks.front(), Blocks.back());
+  if (Blocks.size() == 3)
+    std::swap(Blocks[0], Blocks[1]);
+}
+
 void InnerLoopVectorizer::fixFixedOrderRecurrence(
     VPFirstOrderRecurrencePHIRecipe *PhiR, VPTransformState &State) {
   // This is the second phase of vectorizing first-order recurrences. An
@@ -3591,7 +3546,9 @@ void InnerLoopVectorizer::fixFixedOrderRecurrence(
   PHINode *Phi = cast<PHINode>(PhiR->getUnderlyingValue());
   auto *Start = Builder.CreatePHI(Phi->getType(), 2, "scalar.recur.init");
   auto *ScalarInit = PhiR->getStartValue()->getLiveInIRValue();
-  for (auto *BB : predecessors(LoopScalarPreHeader)) {
+  SmallVector<BasicBlock *> Blocks(predecessors(LoopScalarPreHeader));
+  reorderIncomingBlocks(Blocks, LoopMiddleBlock);
+  for (auto *BB : Blocks) {
     auto *Incoming = BB == LoopMiddleBlock ? ExtractForScalar : ScalarInit;
     Start->addIncoming(Incoming, BB);
   }
@@ -7480,7 +7437,9 @@ static void createAndCollectMergePhiForReduction(
   // If we are fixing reductions in the epilogue loop then we should already
   // have created a bc.merge.rdx Phi after the main vector body. Ensure that
   // we carry over the incoming values correctly.
-  for (auto *Incoming : predecessors(LoopScalarPreHeader)) {
+  SmallVector<BasicBlock *> Blocks(predecessors(LoopScalarPreHeader));
+  reorderIncomingBlocks(Blocks, LoopMiddleBlock);
+  for (auto *Incoming : Blocks) {
     if (Incoming == LoopMiddleBlock)
       BCBlockPhi->addIncoming(FinalValue, Incoming);
     else if (ResumePhi && is_contained(ResumePhi->blocks(), Incoming))
@@ -7551,6 +7510,21 @@ LoopVectorizationPlanner::executePlan(
   std::tie(State.CFG.PrevBB, CanonicalIVStartValue) =
       ILV.createVectorizedLoopSkeleton(ExpandedSCEVs ? *ExpandedSCEVs
                                                      : State.ExpandedSCEVs);
+#ifdef EXPENSIVE_CHECKS
+  assert(DT->verify(DominatorTree::VerificationLevel::Fast));
+#endif
+
+  VPBasicBlock *MiddleVPBB =
+      cast<VPBasicBlock>(BestVPlan.getVectorLoopRegion()->getSingleSuccessor());
+
+  using namespace llvm::VPlanPatternMatch;
+  if (MiddleVPBB->begin() != MiddleVPBB->end() &&
+      match(&MiddleVPBB->back(), m_BranchOnCond(m_VPValue()))) {
+    cast<VPIRWrapperBlock>(MiddleVPBB->getSuccessors()[1])
+        ->resetBlock(OrigLoop->getLoopPreheader());
+  } else
+    cast<VPIRWrapperBlock>(MiddleVPBB->getSuccessors()[0])
+        ->resetBlock(OrigLoop->getLoopPreheader());
 
   // Only use noalias metadata when using memory checks guaranteeing no overlap
   // across all iterations.
@@ -7687,7 +7661,7 @@ EpilogueVectorizerMainLoop::createEpilogueVectorizedLoopSkeleton(
   // inductions in the epilogue loop are created before executing the plan for
   // the epilogue loop.
 
-  return {completeLoopSkeleton(), nullptr};
+  return {LoopVectorPreHeader, nullptr};
 }
 
 void EpilogueVectorizerMainLoop::printDebugTracesAtStart() {
@@ -7811,8 +7785,11 @@ EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton(
       VecEpilogueIterationCountCheck,
       VecEpilogueIterationCountCheck->getSinglePredecessor());
 
-  DT->changeImmediateDominator(LoopScalarPreHeader,
-                               EPI.EpilogueIterationCountCheck);
+  if (auto *N = DT->getNode(LoopScalarPreHeader))
+    DT->changeImmediateDominator(LoopScalarPreHeader,
+                                 EPI.EpilogueIterationCountCheck);
+  else
+    DT->addNewBlock(LoopScalarPreHeader, EPI.EpilogueIterationCountCheck);
   if (!Cost->requiresScalarEpilogue(EPI.EpilogueVF.isVector()))
     // If there is an epilogue which must run, there's no edge from the
     // middle block to exit blocks  and thus no need to update the immediate
@@ -7876,7 +7853,7 @@ EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton(
                               {VecEpilogueIterationCountCheck,
                                EPI.VectorTripCount} /* AdditionalBypass */);
 
-  return {completeLoopSkeleton(), EPResumeVal};
+  return {LoopVectorPreHeader, EPResumeVal};
 }
 
 BasicBlock *
@@ -8625,9 +8602,25 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
   // modified; a basic block for the vector pre-header, followed by a region for
   // the vector loop, followed by the middle basic block. The skeleton vector
   // loop region contains a header and latch basic blocks.
+
+  // Add a check in the middle block to see if we have completed
+  // all of the iterations in the first vector loop.  Three cases:
+  // 1) If we require a scalar epilogue, there is no conditional branch as
+  //    we unconditionally branch to the scalar preheader.  Do nothing.
+  // 2) If (N - N%VF) == N, then we *don't* need to run the remainder.
+  //    Thus if tail is to be folded, we know we don't need to run the
+  //    remainder and we can use the previous value for the condition (true).
+  // 3) Otherwise, construct a runtime check.
+  bool RequiresScalarEpilogueCheck =
+      LoopVectorizationPlanner::getDecisionAndClampRange(
+          [this](ElementCount VF) {
+            return !CM.requiresScalarEpilogue(VF.isVector());
+          },
+          Range);
   VPlanPtr Plan = VPlan::createInitialVPlan(
       createTripCountSCEV(Legal->getWidestInductionType(), PSE, OrigLoop),
-      *PSE.getSE());
+      *PSE.getSE(), RequiresScalarEpilogueCheck, CM.foldTailByMasking(),
+      OrigLoop);
   VPBasicBlock *HeaderVPBB = new VPBasicBlock("vector.body");
   VPBasicBlock *LatchVPBB = new VPBasicBlock("vector.latch");
   VPBlockUtils::insertBlockAfter(LatchVPBB, HeaderVPBB);
@@ -8875,7 +8868,7 @@ VPlanPtr LoopVectorizationPlanner::buildVPlan(VFRange &Range) {
   // Create new empty VPlan
   auto Plan = VPlan::createInitialVPlan(
       createTripCountSCEV(Legal->getWidestInductionType(), PSE, OrigLoop),
-      *PSE.getSE());
+      *PSE.getSE(), true, false, OrigLoop);
 
   // Build hierarchical CFG
   VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
@@ -9084,6 +9077,9 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
     }
   }
   Builder.setInsertPoint(&*LatchVPBB->begin());
+  VPBasicBlock *MiddleVPBB =
+      cast<VPBasicBlock>(VectorLoopRegion->getSingleSuccessor());
+  VPBasicBlock::iterator IP = MiddleVPBB->begin();
   for (VPRecipeBase &R :
        Plan->getVectorLoopRegion()->getEntryBasicBlock()->phis()) {
     VPReductionPHIRecipe *PhiR = dyn_cast<VPReductionPHIRecipe>(&R);
@@ -9192,8 +9188,8 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
     // also modeled in VPlan.
     auto *FinalReductionResult = new VPInstruction(
         VPInstruction::ComputeReductionResult, {PhiR, NewExitingVPV}, ExitDL);
-    cast<VPBasicBlock>(VectorLoopRegion->getSingleSuccessor())
-        ->appendRecipe(FinalReductionResult);
+    FinalReductionResult->insertBefore(*MiddleVPBB, IP);
+    IP = std::next(FinalReductionResult->getIterator());
     OrigExitingVPV->replaceUsesWithIf(
         FinalReductionResult,
         [](VPUser &User, unsigned) { return isa<VPLiveOut>(&User); });
@@ -10390,15 +10386,9 @@ PreservedAnalyses LoopVectorizePass::run(Function &F,
         RemoveRedundantDbgInstrs(&BB);
     }
 
-    // We currently do not preserve dominator analyses with outer loop
-    // vectorization. Until this is addressed, mark these analyses as preserved
-    // only for non-VPlan-native path.
-    // TODO: Preserve Dominator analysis for VPlan-native path.
-    if (!EnableVPlanNativePath) {
-      PA.preserve<DominatorTreeAnalysis>();
-      PA.preserve<ScalarEvolutionAnalysis>();
-    }
     PA.preserve<LoopAnalysis>();
+    PA.preserve<DominatorTreeAnalysis>();
+    PA.preserve<ScalarEvolutionAnalysis>();
 
     if (Result.MadeCFGChange) {
       // Making CFG changes likely means a loop got vectorized. Indicate that
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 27f8e239b1c09..19cff481c4fbb 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -25,6 +25,7 @@
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/Twine.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
 #include "llvm/Analysis/LoopInfo.h"
 #include "llvm/IR/BasicBlock.h"
 #include "llvm/IR/CFG.h"
@@ -218,9 +219,9 @@ VPTransformState::VPTransformState(ElementCount VF, unsigned UF, LoopInfo *LI,
                                    DominatorTree *DT, IRBuilderBase &Builder,
                                    InnerLoopVectorizer *ILV, VPlan *Plan,
                                    LLVMContext &Ctx)
-    : VF(VF), UF(UF), LI(LI), DT(DT), Builder(Builder), ILV(ILV), Plan(Plan),
-      LVer(nullptr),
-      TypeAnalysis(Plan->getCanonicalIV()->getScalarType(), Ctx) {}
+    : VF(VF), UF(UF), LI(LI), Builder(Builder), ILV(ILV), Plan(Plan),
+      LVer(nullptr), TypeAnalysis(Plan->getCanonicalIV()->getScalarType(), Ctx),
+      DTU(DT, DomTreeUpdater::UpdateStrategy::Lazy) {}
 
 Value *VPTransformState::get(VPValue *Def, const VPIteration &Instance) {
   if (Def->isLiveIn())
@@ -399,8 +400,8 @@ void VPTransformState::packScalarIntoVectorValue(VPValue *Def,
   set(Def, VectorValue, Instance.Part);
 }
 
-BasicBlock *
-VPBasicBlock::createEmptyBasicBlock(VPTransformState::CFGState &CFG) {
+BasicBlock *VPBasicBlock::createEmptyBasicBlock(VPTransformState::CFGState &CFG,
+                                                DomTreeUpdater &DTU) {
   // BB stands for IR BasicBlocks. VPBB stands for VPlan VPBasicBlocks.
   // Pred stands for Predessor. Prev stands for Previous - last visited/created.
   BasicBlock *PrevBB = CFG.PrevBB;
@@ -436,10 +437,33 @@ VPBasicBlock::createEmptyBasicBlock(VPTransformState::CFGState &CFG) {
              "Trying to reset an existing successor block.");
       TermBr->setSuccessor(idx, NewBB);
     }
+    DTU.applyUpdates({{DominatorTree::Insert, PredBB, NewBB}});
   }
   return NewBB;
 }
 
+void VPIRWrapperBlock::execute(VPTransformState *State) {
+  for (VPBlockBase *PredVPBlock : getHierarchicalPredecessors()) {
+    VPBasicBlock *PredVPBB = PredVPBlock->getExitingBasicBlock();
+    auto &PredVPSuccessors = PredVPBB->getHierarchicalSuccessors();
+    BasicBlock *PredBB = State->CFG.VPBB2IRBB[PredVPBB];
+
+    assert(PredBB && "Predecessor basic-block not found building successor.");
+    auto *PredBBTerminator = PredBB->getTerminator();
+    LLVM_DEBUG(dbgs() << "LV: draw edge from" << PredBB->getName() << '\n');
+
+    auto *TermBr = dyn_cast<BranchInst>(PredBBTerminator);
+    if (TermBr) {
+      // Set each forward successor here when it is created, excluding
+      // backedges. A backward successor is set when the branch is created.
+      unsigned idx = PredVPSuccessors.front() == this ? 0 : 1;
+      assert(!TermBr->getSuccessor(idx) &&
+             "Trying to reset an existing successor block.");
+      TermBr->setSuccessor(idx, WrappedBlock);
+    }
+  }
+}
+
 void VPBasicBlock::execute(VPTransformState *State) {
   bool Replica = State->Instance && !State->Instance->isFirstIteration();
   VPBasicBlock *PrevVPBB = State->CFG.PrevVPBB;
@@ -467,6 +491,15 @@ void VPBasicBlock::execute(VPTransformState *State) {
     // The Exit block of a loop is always set to be successor 0 of the Exiting
     // block.
     cast<BranchInst>(ExitingBB->getTerminator())->setSuccessor(0, NewBB);
+    // Set the insert point for recipe execution in the block.
+    State->Builder.SetInsertPoint(NewBB->getTerminator());
+    if (getSuccessors().size() == 1) {
+      BranchInst *Br = State->Builder.CreateBr(NewBB);
+      Br->setSuccessor(0, nullptr);
+      NewBB->getTerminator()->eraseFromParent();
+      State->Builder.SetInsertPoint(NewBB->getTerminator());
+    }
+    State->DTU.applyUpdates({{DominatorTree::Insert, ExitingBB, NewBB}});
   } else if (PrevVPBB && /* A */
              !((SingleHPred = getSingleHierarchicalPredecessor()) &&
                SingleHPred->getExitingBasicBlock() == PrevVPBB &&
@@ -483,7 +516,7 @@ void VPBasicBlock::execute(VPTransformState *State) {
     //    is the exiting VPBB of this region from a previous instance, or the
     //    predecessor of this region.
 
-    NewBB = createEmptyBasicBlock(State->CFG);
+    NewBB = createEmptyBasicBlock(State->CFG, State->DTU);
     State->Builder.SetInsertPoint(NewBB);
     // Temporarily terminate with unreachable until CFG is rewired.
     UnreachableInst *Terminator = State->Builder.CreateUnreachable();
@@ -622,6 +655,13 @@ void VPBasicBlock::print(raw_ostream &O, const Twine &Indent,
 
   printSuccessors(O, Indent);
 }
+
+void VPIRWrapperBlock::print(raw_ostream &O, const Twine &Indent,
+                             VPSlotTracker &SlotTracker) const {
+  O << Indent << "ir-bb<" << getName() << ">\n";
+  printSuccessors(O, Indent);
+}
+
 #endif
 
 static std::pair<VPBlockBase *, VPBlockBase *> cloneSESE(VPBlockBase *Entry);
@@ -637,12 +677,23 @@ static std::pair<VPBlockBase *, VPBlockBase *> cloneSESE(VPBlockBase *Entry) {
       Entry);
   for (VPBlockBase *BB : RPOT) {
     VPBlockBase *NewBB = BB->clone();
-    for (VPBlockBase *Pred : BB->getPredecessors())
-      VPBlockUtils::connectBlocks(Old2NewVPBlocks[Pred], NewBB);
-
     Old2NewVPBlocks[BB] = NewBB;
   }
 
+  for (VPBlockBase *BB : RPOT) {
+    VPBlockBase *NewBB = Old2NewVPBlocks[BB];
+    SmallVector<VPBlockBase *> NewPreds;
+    for (VPBlockBase *Pred : BB->getPredecessors()) {
+      NewPreds.push_back(Old2NewVPBlocks[Pred]);
+    }
+    NewBB->setPredecessors(NewPreds);
+    SmallVector<VPBlockBase *> NewSuccs;
+    for (VPBlockBase *Succ : BB->successors()) {
+      NewSuccs.push_back(Old2NewVPBlocks[Succ]);
+    }
+    NewBB->setSuccessors(NewSuccs);
+  }
+
 #if !defined(NDEBUG)
   // Verify that the order of predecessors and successors matches in the cloned
   // version.
@@ -766,7 +817,9 @@ VPlan::~VPlan() {
     delete BackedgeTakenCount;
 }
 
-VPlanPtr VPlan::createInitialVPlan(const SCEV *TripCount, ScalarEvolution &SE) {
+VPlanPtr VPlan::createInitialVPlan(const SCEV *TripCount, ScalarEvolution &SE,
+                                   bool RequiresScalarEpilogueCheck,
+                                   bool TailFolded, Loop *TheLoop) {
   VPBasicBlock *Preheader = new VPBasicBlock("ph");
   VPBasicBlock *VecPreheader = new VPBasicBlock("vector.ph");
   auto Plan = std::make_unique<VPlan>(Preheader, VecPreheader);
@@ -777,6 +830,35 @@ VPlanPtr VPlan::createInitialVPlan(const SCEV *TripCount, ScalarEvolution &SE) {
   VPBlockUtils::insertBlockAfter(TopRegion, VecPreheader);
   VPBasicBlock *MiddleVPBB = new VPBasicBlock("middle.block");
   VPBlockUtils::insertBlockAfter(MiddleVPBB, TopRegion);
+
+  BasicBlock *EB = TheLoop->getUniqueExitBlock();
+  if (RequiresScalarEpilogueCheck) {
+    VPIRWrapperBlock *EBWrapper = new VPIRWrapperBlock(EB);
+    VPBlockUtils::insertBlockAfter(EBWrapper, MiddleVPBB);
+
+    auto *ScalarLatchTerm = TheLoop->getLoopLatch()->getTerminator();
+    // Here we use the same DebugLoc as the scalar loop latch terminator instead
+    // of the corresponding compare because they may have ended up with
+    // different line numbers and we want to avoid awkward line stepping while
+    // debugging. Eg. if the compare has got a line number inside the loop.
+    VPValue *Cmp =
+   ...
[truncated]

github-actions · 2024-05-18T14:41:24Z

✅ With the latest revision this PR passed the C/C++ code formatter.

…anch Conflicts: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/lib/Transforms/Vectorize/VPlan.cpp

fhahn · 2024-05-26T04:50:47Z

Split off the VPIRWrapperBlock changes to #93398 to use it for the pre-preheader, which required a bit of generalization.

This patch adds a new special type of VPBasicBlock that wraps an existing IR basic block. Recipes of the block get added before the terminator of the wrapped IR basic block. Making it a subclass of VPBasicBlock avoids duplicating various APIs to manage recipes in a block, as well as makes sure the traversals filtering VPBasicBlocks automatically apply as well. Initially VPIRBasicBlock are only used for the pre-preheader (wrapping the original preheader of the scalar loop). As follow-up, this will be used to move more parts of the skeleton inside VPlan, starting with the branch and condition in the middle block. Separated out of #92651 PR: #93398

…anch

fhahn · 2024-05-31T03:45:52Z

Updated after landing VPIRBasicBlock separately in #93398

…anch

llvm#92651

…anch

fhahn

Updated after landing split off changes.

fhahn · 2024-06-23T19:27:35Z

llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp

    BranchInst *CondBr =
        Builder.CreateCondBr(Cond, Builder.GetInsertBlock(), nullptr);
-
-    if (getParent()->isExiting())
+    if (getParent()->isExiting()) {


Done in ab9c2b1

fhahn · 2024-06-23T19:28:44Z

llvm/lib/Transforms/Vectorize/VPlan.cpp

-  VPIRBasicBlock *Preheader = new VPIRBasicBlock(PH);
+                                   bool RequiresScalarEpilogueCheck,
+                                   bool TailFolded, Loop *TheLoop) {
+  VPIRBasicBlock *Preheader = new VPIRBasicBlock(TheLoop->getLoopPreheader());


Done in 31a94bd

fhahn · 2024-06-23T19:44:24Z

llvm/lib/Transforms/Vectorize/VPlan.cpp

@@ -654,12 +680,23 @@ static std::pair<VPBlockBase *, VPBlockBase *> cloneSESE(VPBlockBase *Entry) {
      Entry);
  for (VPBlockBase *BB : RPOT) {
    VPBlockBase *NewBB = BB->clone();
-    for (VPBlockBase *Pred : BB->getPredecessors())


Split off in ef1773a

…anch

fhahn · 2024-07-01T13:52:16Z

ping :)

ayalz

This looks good to me, thanks.

Would be good to confirm the changes to two tests in llvm/test/Transforms/PhaseOrdering/AArch64/hoist-runtime-checks.ll are as desired.

Many test changes are due to sinking middle-block's compare down to its branch, occasionally beyond the coverage of CHECKs.

Leaving various minor comments.

ayalz · 2024-07-03T22:10:33Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  auto *MiddleTerm =
+      cast<BranchInst>(State.CFG.VPBB2IRBB[ExitVPBB]->getTerminator());
+  if (MiddleTerm->isConditional() &&
+      hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator())) {


Assigning weights deserves further discussion; the indicator could be recorded in VPlan itself - whether weights should be assigned to all new branches (possibly adjusting weights of existing branches).

ayalz · 2024-07-04T07:20:31Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  //    Thus if tail is to be folded, we know we don't need to run the
+  //    remainder and we can use the previous value for the condition (true).
+  // 3) Otherwise, construct a runtime check.
+  bool RequiresScalarEpilogueCheck =


Agree to have a follow up patch deal with regressions, but does retaining the original condition alone leads to any? I.e., RequiresScalarEpilogue = ... return CM.requiresScalarEpilogue(VF.isVector()) ...? The range should be clamped the same.

ayalz · 2024-07-04T07:42:19Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

-    cast<VPBasicBlock>(VectorLoopRegion->getSingleSuccessor())
-        ->appendRecipe(FinalReductionResult);
+    FinalReductionResult->insertBefore(*MiddleVPBB, IP);
+    IP = std::next(FinalReductionResult->getIterator());


Is this std::next needed - having inserted FinalReductionResult before IP implies that the latter already follows the former?

Not needed in the current version, it is sufficient to use the original IP

ayalz · 2024-07-04T07:45:23Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -9006,6 +8956,9 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
    }
  }
  Builder.setInsertPoint(&*LatchVPBB->begin());
+  VPBasicBlock *MiddleVPBB =
+      cast<VPBasicBlock>(VectorLoopRegion->getSingleSuccessor());
+  VPBasicBlock::iterator IP = MiddleVPBB->getFirstNonPhi();


Could (some of) this be pushed independently, earlier, to clarify change in insertion points, if any?

Done in 8299bfa, thanks

ayalz · 2024-07-04T07:49:13Z

llvm/lib/Transforms/Vectorize/VPlan.cpp

@@ -448,11 +448,13 @@ VPBasicBlock::createEmptyBasicBlock(VPTransformState::CFGState &CFG) {
 }

 void VPIRBasicBlock::execute(VPTransformState *State) {
-  assert(getHierarchicalSuccessors().empty() &&


Update to assert empty-or-single?

Updated, thanks!

ayalz · 2024-07-04T12:13:36Z

llvm/test/Transforms/LoopVectorize/vplan-printing-outer-loop.ll

+; CHECK-NEXT: Live-in vp<[[VTC:%.+]]> = vector-trip-count
+; CHECK-NEXT: Live-in ir<8> = original trip-count


For outer-loop vectorization.

ayalz · 2024-07-04T12:15:32Z

llvm/test/Transforms/PhaseOrdering/AArch64/hoist-runtime-checks.ll

+; CHECK:       for.body.preheader13:
+; CHECK-NEXT:    [[INDVARS_IV_PH:%.*]] = phi i64 [ 0, [[FOR_BODY_PREHEADER10]] ], [ [[N_VEC:%.*]], [[MIDDLE_BLOCK:%.*]] ]
+; CHECK-NEXT:    [[SUM_07_PH:%.*]] = phi i32 [ 0, [[FOR_BODY_PREHEADER10]] ], [ [[TMP7:%.*]], [[MIDDLE_BLOCK]] ]
+; CHECK-NEXT:    br label [[FOR_BODY:%.*]]


Multiple changes to this test combined with the following sum_prefix_with_sum() test?

changes to the input was left over from debugging, restored, thanks!

ayalz · 2024-07-04T12:19:26Z

llvm/test/Transforms/PhaseOrdering/X86/hoist-load-of-baseptr.ll

+; O2:       for.body4.preheader9:
+; O2-NEXT:    [[J_05_PH:%.*]] = phi i64 [ 0, [[FOR_BODY4_PREHEADER]] ], [ [[N_VEC]], [[MIDDLE_BLOCK:%.*]] ]
+; O2-NEXT:    br label [[FOR_BODY4:%.*]]


Just noting: basic-block hoisted from below.

ayalz · 2024-07-04T12:20:29Z

llvm/unittests/Transforms/Vectorize/VPlanTestBase.h

-    auto Plan = VPlan::createInitialVPlan(SE->getBackedgeTakenCount(L), *SE,
-                                          L->getLoopPreheader());
-    VPlanHCFGBuilder HCFGBuilder(L, LI.get(), *Plan);
+    auto Plan = VPlan::createInitialVPlan(


Better not?

ayalz · 2024-07-04T12:20:43Z

llvm/unittests/Transforms/Vectorize/VPlanTestBase.h

-    auto Plan = VPlan::createInitialVPlan(SE->getBackedgeTakenCount(L), *SE,
-                                          L->getLoopPreheader());
-    VPlanHCFGBuilder HCFGBuilder(L, LI.get(), *Plan);
+    auto Plan = VPlan::createInitialVPlan(


…anch

Split off from #92651 as suggested.

…anch

ayalz · 2024-07-04T20:05:23Z

nit: the last paragraph of the summary is obsolete: "Note that this patch adds and uses a new helper reorderIncomingBlocks ..."

ayalz · 2024-07-04T20:25:03Z

llvm/lib/Transforms/Vectorize/VPlanVerifier.cpp

+  VPBlockBase *MiddleBB =
+      IRBB->getPlan()->getVectorLoopRegion()->getSingleSuccessor();
+  if (IRBB != IRBB->getPlan()->getPreheader() &&
+      IRBB->getSinglePredecessor() != MiddleBB) {
    errs() << "VPIRBasicBlock can only be used as pre-header at the moment!\n";


Suggested change

errs() << "VPIRBasicBlock can only be used as pre-header at the moment!\n";

errs() << "VPIRBasicBlock can only be used as pre-header or a successor of middle-block at the moment!\n";

Adjusted thanks!

ayalz · 2024-07-04T20:28:53Z

llvm/lib/Transforms/Vectorize/VPlan.cpp

-  assert(getHierarchicalSuccessors().empty() &&
-         "VPIRBasicBlock cannot have successors at the moment");
-
+  assert(getHierarchicalSuccessors().size() <= 2);


Suggested change

assert(getHierarchicalSuccessors().size() <= 2);

assert(getHierarchicalSuccessors().size() <= 2 &&

"VPIRBasicBlock can have at most two successors at the moment");

can any block have more than 2? ...

I think at the moment 2 is the maximum for any block, would probably be good to also check in the verifier.

…anch

llvm-ci · 2024-07-05T09:22:17Z

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-debian running on gribozavr4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/1270

Here is the relevant piece of the build log for the reference:

Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: Analysis/CostModel/X86/gather-i16-with-i8-index.ll' FAILED ********************
Exit Code: 134

Command Output (stderr):
--
RUN: at line 2: /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt -passes=loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+sse2 --debug-only=loop-vectorize < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Analysis/CostModel/X86/gather-i16-with-i8-index.ll 2>&1 | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Analysis/CostModel/X86/gather-i16-with-i8-index.ll --check-prefixes=SSE
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Analysis/CostModel/X86/gather-i16-with-i8-index.ll --check-prefixes=SSE
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt -passes=loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+sse2 --debug-only=loop-vectorize
RUN: at line 3: /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt -passes=loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+sse4.2 --debug-only=loop-vectorize < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Analysis/CostModel/X86/gather-i16-with-i8-index.ll 2>&1 | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Analysis/CostModel/X86/gather-i16-with-i8-index.ll --check-prefixes=SSE
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Analysis/CostModel/X86/gather-i16-with-i8-index.ll --check-prefixes=SSE
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt -passes=loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+sse4.2 --debug-only=loop-vectorize
RUN: at line 4: /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt -passes=loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+avx  --debug-only=loop-vectorize < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Analysis/CostModel/X86/gather-i16-with-i8-index.ll 2>&1 | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Analysis/CostModel/X86/gather-i16-with-i8-index.ll --check-prefixes=AVX1
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt -passes=loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+avx --debug-only=loop-vectorize
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Analysis/CostModel/X86/gather-i16-with-i8-index.ll --check-prefixes=AVX1
RUN: at line 5: /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt -passes=loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+avx2,-fast-gather --debug-only=loop-vectorize < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Analysis/CostModel/X86/gather-i16-with-i8-index.ll 2>&1 | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Analysis/CostModel/X86/gather-i16-with-i8-index.ll --check-prefixes=AVX2-SLOWGATHER
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt -passes=loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+avx2,-fast-gather --debug-only=loop-vectorize
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Analysis/CostModel/X86/gather-i16-with-i8-index.ll --check-prefixes=AVX2-SLOWGATHER
/b/1/llvm-clang-x86_64-expensive-checks-debian/build/test/Analysis/CostModel/X86/Output/gather-i16-with-i8-index.ll.script: line 6: 618816 Aborted                 /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/opt -passes=loop-vectorize -vectorizer-maximize-bandwidth -S -mattr=+avx2,-fast-gather --debug-only=loop-vectorize < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Analysis/CostModel/X86/gather-i16-with-i8-index.ll 2>&1
     618818 Done                    | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/Analysis/CostModel/X86/gather-i16-with-i8-index.ll --check-prefixes=AVX2-SLOWGATHER

--

********************

fhahn · 2024-07-05T10:21:56Z

Expensive checks failure were caused by now obsolete DT updates during legacy skeleton creation, which have been removed in eedc2c8

Split off from llvm#92651 as suggested.

) This patch moves branch condition creation to enter the scalar epilogue loop to VPlan. Modeling the branch in the middle block also requires modeling the successor blocks. This is done using the recently introduced VPIRBasicBlock. Note that the middle.block is still created as part of the skeleton and then patched in during VPlan execution. Unfortunately the skeleton needs to create the middle.block early on, as it is also used for induction resume value creation and is also needed to properly update the dominator tree during skeleton creation. After this patch lands, I plan to move induction resume value and phi node creation in the scalar preheader to VPlan. Once that is done, we should be able to create the middle.block in VPlan directly. This is a re-worked version based on the earlier https://reviews.llvm.org/D150398 and the main change is the use of VPIRBasicBlock. Depends on llvm#92525 PR: llvm#92651

Use VPIRBasicBlock to wrap the middle block and implement patching up branches in predecessors in VPIRBasicBlock::execute. The IR middle block is only created after skeleton creation. Initially a regular VPBasicBlock is created, which will later be replaced by a VPIRBasicBlock once the middle IR basic block has been created. Note that this slightly changes the order of instructions created in the middle block; code generated by recipe execution in the middle block will now be inserted before the terminator (and in between the compare to used by the terminator). The original order will be restored in llvm#92651. PR: llvm#95816

Reoder code to exit early if the BranchOnCond isn't in an exiting block. This delays retrieving the parent region, which may not be present. Split off from llvm#92651.

Split off from llvm#92651 as suggested.

Rewrite cloneSESE to perform 2 depth-first passes with the first one cloning blocks and the second one updating the predecessors and successors. This is needed to preserve the correct predecessor/successor ordering with llvm#92651 and has been split off as suggested.

…#94760) This patch introduces a new ResumePhi VPInstruction which creates a phi in a leaf block of a VPlan. The first use is to create the phi node for fixed-order recurrence resume values in the scalar preheader. The VPInstruction takes 2 operands: 1) the incoming value from the middle-block and a default value to be used for all other incoming blocks. In follow-up changes, it will also be used to create phis for reduction and induction resume values. Depends on #92651 PR: #94760

…llvm#94760) This patch introduces a new ResumePhi VPInstruction which creates a phi in a leaf block of a VPlan. The first use is to create the phi node for fixed-order recurrence resume values in the scalar preheader. The VPInstruction takes 2 operands: 1) the incoming value from the middle-block and a default value to be used for all other incoming blocks. In follow-up changes, it will also be used to create phis for reduction and induction resume values. Depends on llvm#92651 PR: llvm#94760

fhahn added 2 commits May 18, 2024 15:04

fhahn requested review from ayalz and aniragil May 18, 2024 14:38

llvmbot added backend:RISC-V vectorizers llvm:transforms labels May 18, 2024

fhahn added 2 commits May 24, 2024 11:36

Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…

10e747a

…anch Conflicts: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/lib/Transforms/Vectorize/VPlan.cpp

!fixup update

4dfba5d

fhahn mentioned this pull request May 26, 2024

[VPlan] Add VPIRBasicBlock, use to model pre-preheader. #93398

Merged

fhahn added 2 commits May 30, 2024 17:49

Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…

fe03d51

…anch

!fixup fix after merge.

2372421

fhahn force-pushed the vplan-middle-block-branch branch from 1773da7 to 2372421 Compare May 31, 2024 03:45

!fixup fix formatting.

ab78e2a

fhahn added 2 commits June 5, 2024 11:22

Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…

446b313

…anch

!fixup clean up after merge, fix branch weights as post-processing

5c6e07e

Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…

0a34761

…anch

fhahn added 2 commits June 7, 2024 15:07

Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…

a5c1fd6

…anch

!fixup fix formatting

045f051

fhahn added a commit to fhahn/llvm-project that referenced this pull request Jun 7, 2024

[VPlan] Model branch cond to enter scalar epilogue in VPlan.

e1c1bd1

llvm#92651

fhahn mentioned this pull request Jun 7, 2024

[VPlan] Introduce ResumePhi VPInstruction, use to create phi for FOR. #94760

Merged

Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…

bd99d7b

…anch

fhahn commented Jun 23, 2024

View reviewed changes

Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…

887a56d

…anch

ayalz approved these changes Jul 4, 2024

View reviewed changes

Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…

8cab6d2

…anch

fhahn added a commit that referenced this pull request Jul 4, 2024

[VPlan] Extract reduction result insertion point to variable (NFCI).

8299bfa

Split off from #92651 as suggested.

fhahn added 2 commits July 4, 2024 18:46

Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…

2ba349c

…anch

!fixup address latest comments, thanks!

fd9975c

fhahn added 2 commits July 4, 2024 21:06

!fixup update tests

dee166d

!fixup update new tests.

d5cd98d

ayalz reviewed Jul 4, 2024

View reviewed changes

fhahn added 2 commits July 5, 2024 08:53

!fixup add assert messages, update remaining unit tests

ab6c49a

Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…

7867bfa

…anch

fhahn merged commit 99d6c6d into llvm:main Jul 5, 2024
7 checks passed

fhahn deleted the vplan-middle-block-branch branch July 5, 2024 09:08

kbluck pushed a commit to kbluck/llvm-project that referenced this pull request Jul 6, 2024

[VPlan] Extract reduction result insertion point to variable (NFCI).

c8e587c

Split off from llvm#92651 as suggested.

AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024

[VPlan] Rename Preheader -> Entry in createInitialVPlan (NFCI).

88cfc15

Split off from llvm#92651 as suggested.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VPlan] Model branch cond to enter scalar epilogue in VPlan. #92651

[VPlan] Model branch cond to enter scalar epilogue in VPlan. #92651

fhahn commented May 18, 2024 •

edited

Loading

llvmbot commented May 18, 2024 •

edited

Loading

github-actions bot commented May 18, 2024 •

edited

Loading

fhahn commented May 26, 2024

fhahn commented May 31, 2024

fhahn left a comment

fhahn Jun 23, 2024

fhahn Jun 23, 2024

fhahn Jun 23, 2024

fhahn commented Jul 1, 2024

ayalz left a comment

ayalz Jul 3, 2024

ayalz Jul 4, 2024

ayalz Jul 4, 2024

fhahn Jul 4, 2024

ayalz Jul 4, 2024

fhahn Jul 4, 2024

ayalz Jul 4, 2024

fhahn Jul 4, 2024

ayalz Jul 4, 2024

ayalz Jul 4, 2024

fhahn Jul 4, 2024

ayalz Jul 4, 2024

ayalz Jul 4, 2024

ayalz Jul 4, 2024

ayalz commented Jul 4, 2024

ayalz Jul 4, 2024

fhahn Jul 5, 2024

ayalz Jul 4, 2024

fhahn Jul 5, 2024

llvm-ci commented Jul 5, 2024

fhahn commented Jul 5, 2024

		; CHECK-NEXT: Live-in vp<[[VTC:%.+]]> = vector-trip-count
		; CHECK-NEXT: Live-in ir<8> = original trip-count

	errs() << "VPIRBasicBlock can only be used as pre-header at the moment!\n";
	errs() << "VPIRBasicBlock can only be used as pre-header or a successor of middle-block at the moment!\n";

	assert(getHierarchicalSuccessors().size() <= 2);
	assert(getHierarchicalSuccessors().size() <= 2 &&
	"VPIRBasicBlock can have at most two successors at the moment");

[VPlan] Model branch cond to enter scalar epilogue in VPlan. #92651

[VPlan] Model branch cond to enter scalar epilogue in VPlan. #92651

Conversation

fhahn commented May 18, 2024 • edited Loading

llvmbot commented May 18, 2024 • edited Loading

github-actions bot commented May 18, 2024 • edited Loading

fhahn commented May 26, 2024

fhahn commented May 31, 2024

fhahn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fhahn commented Jul 1, 2024

ayalz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ayalz commented Jul 4, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

llvm-ci commented Jul 5, 2024

fhahn commented Jul 5, 2024

fhahn commented May 18, 2024 •

edited

Loading

llvmbot commented May 18, 2024 •

edited

Loading

github-actions bot commented May 18, 2024 •

edited

Loading