Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VPlan] Use DomTreeUpdater to automatically update DT for vector loop. #92525

Merged
merged 4 commits into from
May 24, 2024

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented May 17, 2024

Use DTU to queue DominatorTree updates directly when connecting basic blocks during VPlan execution.

This simplifies DT updates and will also automatically allow updating the DT for the VPlan-native path as additional benefit in a follow-up

@llvmbot
Copy link
Collaborator

llvmbot commented May 17, 2024

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Use DTU to queue DominatorTree updates directly when connecting basic blocks during VPlan execution.

This simplifies DT updates and also automatically allows updating the DT for the VPlan-native path as additional benefit.


Full diff: https://github.com/llvm/llvm-project/pull/92525.diff

3 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+3-3)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+15-51)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+6-11)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index f2a541f5167a0..bd04489f177b6 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -10390,15 +10390,15 @@ PreservedAnalyses LoopVectorizePass::run(Function &F,
         RemoveRedundantDbgInstrs(&BB);
     }
 
-    // We currently do not preserve loopinfo/dominator analyses with outer loop
+    // We currently do not preserve loopinfo analysis with outer loop
     // vectorization. Until this is addressed, mark these analyses as preserved
     // only for non-VPlan-native path.
-    // TODO: Preserve Loop and Dominator analyses for VPlan-native path.
+    // TODO: Preserve Loop analysis for VPlan-native path.
     if (!EnableVPlanNativePath) {
       PA.preserve<LoopAnalysis>();
-      PA.preserve<DominatorTreeAnalysis>();
       PA.preserve<ScalarEvolutionAnalysis>();
     }
+    PA.preserve<DominatorTreeAnalysis>();
 
     if (Result.MadeCFGChange) {
       // Making CFG changes likely means a loop got vectorized. Indicate that
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 27f8e239b1c09..1964c3d12b362 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -25,6 +25,7 @@
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/Twine.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
 #include "llvm/Analysis/LoopInfo.h"
 #include "llvm/IR/BasicBlock.h"
 #include "llvm/IR/CFG.h"
@@ -218,9 +219,9 @@ VPTransformState::VPTransformState(ElementCount VF, unsigned UF, LoopInfo *LI,
                                    DominatorTree *DT, IRBuilderBase &Builder,
                                    InnerLoopVectorizer *ILV, VPlan *Plan,
                                    LLVMContext &Ctx)
-    : VF(VF), UF(UF), LI(LI), DT(DT), Builder(Builder), ILV(ILV), Plan(Plan),
-      LVer(nullptr),
-      TypeAnalysis(Plan->getCanonicalIV()->getScalarType(), Ctx) {}
+    : VF(VF), UF(UF), LI(LI), Builder(Builder), ILV(ILV), Plan(Plan),
+      LVer(nullptr), TypeAnalysis(Plan->getCanonicalIV()->getScalarType(), Ctx),
+      DTU(DT, DomTreeUpdater::UpdateStrategy::Lazy) {}
 
 Value *VPTransformState::get(VPValue *Def, const VPIteration &Instance) {
   if (Def->isLiveIn())
@@ -399,8 +400,8 @@ void VPTransformState::packScalarIntoVectorValue(VPValue *Def,
   set(Def, VectorValue, Instance.Part);
 }
 
-BasicBlock *
-VPBasicBlock::createEmptyBasicBlock(VPTransformState::CFGState &CFG) {
+BasicBlock *VPBasicBlock::createEmptyBasicBlock(VPTransformState::CFGState &CFG,
+                                                DomTreeUpdater &DTU) {
   // BB stands for IR BasicBlocks. VPBB stands for VPlan VPBasicBlocks.
   // Pred stands for Predessor. Prev stands for Previous - last visited/created.
   BasicBlock *PrevBB = CFG.PrevBB;
@@ -436,6 +437,7 @@ VPBasicBlock::createEmptyBasicBlock(VPTransformState::CFGState &CFG) {
              "Trying to reset an existing successor block.");
       TermBr->setSuccessor(idx, NewBB);
     }
+    DTU.applyUpdates({{DominatorTree::Insert, PredBB, NewBB}});
   }
   return NewBB;
 }
@@ -467,6 +469,7 @@ void VPBasicBlock::execute(VPTransformState *State) {
     // The Exit block of a loop is always set to be successor 0 of the Exiting
     // block.
     cast<BranchInst>(ExitingBB->getTerminator())->setSuccessor(0, NewBB);
+    State->DTU.applyUpdates({{DominatorTree::Insert, ExitingBB, NewBB}});
   } else if (PrevVPBB && /* A */
              !((SingleHPred = getSingleHierarchicalPredecessor()) &&
                SingleHPred->getExitingBasicBlock() == PrevVPBB &&
@@ -483,7 +486,7 @@ void VPBasicBlock::execute(VPTransformState *State) {
     //    is the exiting VPBB of this region from a previous instance, or the
     //    predecessor of this region.
 
-    NewBB = createEmptyBasicBlock(State->CFG);
+    NewBB = createEmptyBasicBlock(State->CFG, State->DTU);
     State->Builder.SetInsertPoint(NewBB);
     // Temporarily terminate with unreachable until CFG is rewired.
     UnreachableInst *Terminator = State->Builder.CreateUnreachable();
@@ -829,6 +832,9 @@ void VPlan::execute(VPTransformState *State) {
   BasicBlock *VectorPreHeader = State->CFG.PrevBB;
   State->Builder.SetInsertPoint(VectorPreHeader->getTerminator());
 
+  State->DTU.applyUpdates(
+      {{DominatorTree::Delete, VectorPreHeader, State->CFG.ExitBB}});
+
   // Generate code in the loop pre-header and body.
   for (VPBlockBase *Block : vp_depth_first_shallow(Entry))
     Block->execute(State);
@@ -891,13 +897,9 @@ void VPlan::execute(VPTransformState *State) {
     }
   }
 
-  // We do not attempt to preserve DT for outer loop vectorization currently.
-  if (!EnableVPlanNativePath) {
-    BasicBlock *VectorHeaderBB = State->CFG.VPBB2IRBB[Header];
-    State->DT->addNewBlock(VectorHeaderBB, VectorPreHeader);
-    updateDominatorTree(State->DT, VectorHeaderBB, VectorLatchBB,
-                        State->CFG.ExitBB);
-  }
+  State->DTU.flush();
+  assert(
+      State->DTU.getDomTree().verify(DominatorTree::VerificationLevel::Fast));
 }
 
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
@@ -995,44 +997,6 @@ void VPlan::addLiveOut(PHINode *PN, VPValue *V) {
   LiveOuts.insert({PN, new VPLiveOut(PN, V)});
 }
 
-void VPlan::updateDominatorTree(DominatorTree *DT, BasicBlock *LoopHeaderBB,
-                                BasicBlock *LoopLatchBB,
-                                BasicBlock *LoopExitBB) {
-  // The vector body may be more than a single basic-block by this point.
-  // Update the dominator tree information inside the vector body by propagating
-  // it from header to latch, expecting only triangular control-flow, if any.
-  BasicBlock *PostDomSucc = nullptr;
-  for (auto *BB = LoopHeaderBB; BB != LoopLatchBB; BB = PostDomSucc) {
-    // Get the list of successors of this block.
-    std::vector<BasicBlock *> Succs(succ_begin(BB), succ_end(BB));
-    assert(Succs.size() <= 2 &&
-           "Basic block in vector loop has more than 2 successors.");
-    PostDomSucc = Succs[0];
-    if (Succs.size() == 1) {
-      assert(PostDomSucc->getSinglePredecessor() &&
-             "PostDom successor has more than one predecessor.");
-      DT->addNewBlock(PostDomSucc, BB);
-      continue;
-    }
-    BasicBlock *InterimSucc = Succs[1];
-    if (PostDomSucc->getSingleSuccessor() == InterimSucc) {
-      PostDomSucc = Succs[1];
-      InterimSucc = Succs[0];
-    }
-    assert(InterimSucc->getSingleSuccessor() == PostDomSucc &&
-           "One successor of a basic block does not lead to the other.");
-    assert(InterimSucc->getSinglePredecessor() &&
-           "Interim successor has more than one predecessor.");
-    assert(PostDomSucc->hasNPredecessors(2) &&
-           "PostDom successor has more than two predecessors.");
-    DT->addNewBlock(InterimSucc, BB);
-    DT->addNewBlock(PostDomSucc, BB);
-  }
-  // Latch block is a new dominator for the loop exit.
-  DT->changeImmediateDominator(LoopExitBB, LoopLatchBB);
-  assert(DT->verify(DominatorTree::VerificationLevel::Fast));
-}
-
 static void remapOperands(VPBlockBase *Entry, VPBlockBase *NewEntry,
                           DenseMap<VPValue *, VPValue *> &Old2NewVPValues) {
   // Update the operands of all cloned recipes starting at NewEntry. This
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 4b3cb15b5e1e6..478cafb5a62e6 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -35,6 +35,7 @@
 #include "llvm/ADT/Twine.h"
 #include "llvm/ADT/ilist.h"
 #include "llvm/ADT/ilist_node.h"
+#include "llvm/Analysis/DomTreeUpdater.h"
 #include "llvm/Analysis/IVDescriptors.h"
 #include "llvm/Analysis/LoopInfo.h"
 #include "llvm/Analysis/VectorUtils.h"
@@ -382,9 +383,6 @@ struct VPTransformState {
   /// Hold a pointer to LoopInfo to register new basic blocks in the loop.
   LoopInfo *LI;
 
-  /// Hold a pointer to Dominator Tree to register new basic blocks in the loop.
-  DominatorTree *DT;
-
   /// Hold a reference to the IRBuilder used to generate output IR code.
   IRBuilderBase &Builder;
 
@@ -410,6 +408,9 @@ struct VPTransformState {
 
   /// VPlan-based type analysis.
   VPTypeAnalysis TypeAnalysis;
+
+  /// Updater for the DominatorTree.
+  DomTreeUpdater DTU;
 };
 
 /// VPBlockBase is the building block of the Hierarchical Control-Flow Graph.
@@ -2946,7 +2947,8 @@ class VPBasicBlock : public VPBlockBase {
 private:
   /// Create an IR BasicBlock to hold the output instructions generated by this
   /// VPBasicBlock, and return it. Update the CFGState accordingly.
-  BasicBlock *createEmptyBasicBlock(VPTransformState::CFGState &CFG);
+  BasicBlock *createEmptyBasicBlock(VPTransformState::CFGState &CFG,
+                                    DomTreeUpdater &DTU);
 };
 
 /// VPRegionBlock represents a collection of VPBasicBlocks and VPRegionBlocks
@@ -3289,13 +3291,6 @@ class VPlan {
   /// Clone the current VPlan, update all VPValues of the new VPlan and cloned
   /// recipes to refer to the clones, and return it.
   VPlan *duplicate();
-
-private:
-  /// Add to the given dominator tree the header block and every new basic block
-  /// that was created between it and the latch block, inclusive.
-  static void updateDominatorTree(DominatorTree *DT, BasicBlock *LoopHeaderBB,
-                                  BasicBlock *LoopLatchBB,
-                                  BasicBlock *LoopExitBB);
 };
 
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)

Use DTU to queue DominatorTree updates directly when connecting basic
blocks during VPlan execution.

This simplifies DT updates and also automatically allows updating the DT
for the VPlan-native path as additional benefit.
Copy link
Member

@rengolin rengolin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised by the simplicity of this patch. Not that it's wrong, just that it sounds simpler than the previous code that was meant to avoid complexity. :)

IIUC, instead of doing the full analysis at the end (in updateDomintorTree) you do it at the BB creation time (BB::execute) and wrapping time (VP::execute). That avoids the need for a whole update in the end, which is great.

In that sense, this looks really good to me, but I'm not an expert in DT updater, so I'll let others more comfortable in the area to approve.

fhahn added a commit to fhahn/llvm-project that referenced this pull request May 18, 2024
This patch moves branch condition creation to enter the scalar epilogue
loop to VPlan. Modeling the branch in the middle block also requires
modeling the successor blocks. To do so, this patch introduces a new
VPBlockBase sub-type that simply wraps an existing IR block:
VPIRWrapperBlock (name subject to better suggestions).

This allows allows connecting blocks naturally to leave nodes. It can
also be used to transition modeling of more bits of the skeleton to
VPlan gradually.

Note that the middle.block is still created as part of the skeleton and
then patched in during VPlan execution. Unfortunately the skeleton needs
to create the middle.block early on, as it is also used for induction
resume value creation and is also needed to properly update the
dominator tree during skeleton creation.

After this patch lands, I plan to move induction resume value and phi
node creation in the scalar preheader to VPlan. Once that is done, we
should be able to create the middle.block in VPlan directly.

At the moment, VPIRWrapperBlocks only wrap an original IR block and
don't allow any additions. We could allow IR wrapper blocks to also
contain recipes that get added at the beginning of the block. Then the
middle.block could also be wrapped. Something like that will also be
needed to place the induction/reduction resume phi nodes in the scalar
preheader.

This is a re-worked version based on the earlier
https://reviews.llvm.org/D150398 and the main change is the introduction
and use of VPIRWrapperBlock.

Note that this patch adds and uses a new helper reorderIncomingBlocks to
preserve the original order of incoming blocks in created phi nodes; as
this patch changes the order branches are created, the order of the
predecessors changes, which would cause different order of incoming
blocks in some phis we create. The helper ensures the created IR doesn't
change (modulo some minor difference in name numbering). After this
change landed, the tests should be updated and the helper removed
separately.

Depends on llvm#92525
@fhahn
Copy link
Contributor Author

fhahn commented May 22, 2024

I'm surprised by the simplicity of this patch. Not that it's wrong, just that it sounds simpler than the previous code that was meant to avoid complexity. :)

IIUC, instead of doing the full analysis at the end (in updateDomintorTree) you do it at the BB creation time (BB::execute) and wrapping time (VP::execute). That avoids the need for a whole update in the end, which is great.

Yes, that's the nice thing about the updater, all you need is to tell it which changes to the CFG has been made, it will efficiently apply those updates to the dominator tree. It's heavily used across various parts of LLVM, including common helpers like SplitBlockImpl (https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp#L1056) and many others in that file and elsewhere

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup!

Worth separating into two NFC's: one simplifying how legacy updates its DT, and another which teaches native to start update its DT?

Worth noting that parts of DT external to VPlan's scope are still updated directly (i.e., skeletal parts).

BasicBlock *
VPBasicBlock::createEmptyBasicBlock(VPTransformState::CFGState &CFG) {
BasicBlock *VPBasicBlock::createEmptyBasicBlock(VPTransformState::CFGState &CFG,
DomTreeUpdater &DTU) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should DTU reside inside CFGState, now that updating DT is integrated with modifying the CFG?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved, thanks!

@@ -410,6 +408,9 @@ struct VPTransformState {

/// VPlan-based type analysis.
VPTypeAnalysis TypeAnalysis;

/// Updater for the DominatorTree.
DomTreeUpdater DTU;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: place DTU in DT's position (after LI), or inside CFG?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now inside CFG

@@ -2946,7 +2947,8 @@ class VPBasicBlock : public VPBlockBase {
private:
/// Create an IR BasicBlock to hold the output instructions generated by this
/// VPBasicBlock, and return it. Update the CFGState accordingly.
BasicBlock *createEmptyBasicBlock(VPTransformState::CFGState &CFG);
BasicBlock *createEmptyBasicBlock(VPTransformState::CFGState &CFG,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if DTU is part of CFGState, then "Update the CFGState accordingly" would also imply updating DT, which this method now does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

@@ -829,6 +832,9 @@ void VPlan::execute(VPTransformState *State) {
BasicBlock *VectorPreHeader = State->CFG.PrevBB;
State->Builder.SetInsertPoint(VectorPreHeader->getTerminator());

State->DTU.applyUpdates(
{{DominatorTree::Delete, VectorPreHeader, State->CFG.ExitBB}});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExitBB is (still, the single) successor of PrevBB=VectorPreHeader, should they first be disconnected before updating DTU with their edge's deletion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, updated to remove the edge before

Copy link
Contributor Author

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth separating into two NFC's: one simplifying how legacy updates its DT, and another which teaches native to start update its DT?

Updated to only pass DT to DTU in non-native path, which makes DTU updates a no-op in that case

BasicBlock *
VPBasicBlock::createEmptyBasicBlock(VPTransformState::CFGState &CFG) {
BasicBlock *VPBasicBlock::createEmptyBasicBlock(VPTransformState::CFGState &CFG,
DomTreeUpdater &DTU) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved, thanks!

@@ -829,6 +832,9 @@ void VPlan::execute(VPTransformState *State) {
BasicBlock *VectorPreHeader = State->CFG.PrevBB;
State->Builder.SetInsertPoint(VectorPreHeader->getTerminator());

State->DTU.applyUpdates(
{{DominatorTree::Delete, VectorPreHeader, State->CFG.ExitBB}});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, updated to remove the edge before

@@ -410,6 +408,9 @@ struct VPTransformState {

/// VPlan-based type analysis.
VPTypeAnalysis TypeAnalysis;

/// Updater for the DominatorTree.
DomTreeUpdater DTU;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now inside CFG

@@ -2946,7 +2947,8 @@ class VPBasicBlock : public VPBlockBase {
private:
/// Create an IR BasicBlock to hold the output instructions generated by this
/// VPBasicBlock, and return it. Update the CFGState accordingly.
BasicBlock *createEmptyBasicBlock(VPTransformState::CFGState &CFG);
BasicBlock *createEmptyBasicBlock(VPTransformState::CFGState &CFG,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
Couple of last final nits, plus worth updating the summary that native is left for follow-up.

@@ -13,7 +13,7 @@
; }
; }
;
; RUN: opt -S -passes=loop-vectorize -enable-vplan-native-path -verify-loop-info < %s | FileCheck %s
; RUN: opt -S -passes=loop-vectorize -enable-vplan-native-path -verify-loop-info -verify-dom-info < %s | FileCheck %s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should that be added once DT is updated for native?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks!

Comment on lines +903 to +904
assert(EnableVPlanNativePath || State->CFG.DTU.getDomTree().verify(
DominatorTree::VerificationLevel::Fast));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(EnableVPlanNativePath || State->CFG.DTU.getDomTree().verify(
DominatorTree::VerificationLevel::Fast));
// DT is currently updated for non-native path only.
assert(EnableVPlanNativePath || State->CFG.DTU.getDomTree().verify(
DominatorTree::VerificationLevel::Fast));

(retaining the comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added back, thanks!

@fhahn fhahn merged commit b2f65e8 into llvm:main May 24, 2024
5 of 7 checks passed
@fhahn fhahn deleted the vplan-dtu branch May 24, 2024 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants