[LoopVectorize] Add support for vectorisation of more early exit loops #88385

david-arm · 2024-04-11T12:46:10Z

This patch adds support for vectorisation of a simple class of loops that typically involves searching for something, i.e.

  for (int i = 0; i < n; i++) {
    if (p[i] == val)
      return i;
  }
  return n;

or

  for (int i = 0; i < n; i++) {
    if (p1[i] != p2[i])
      return i;
  }
  return n;

In this initial commit we only vectorise loops with the following criteria:

There are no stores in the loop.
The loop must have only one early exit like those shown in the above example. I have referred to such exits as speculative early exits, to distinguish from existing support for early exits where the exit-not-taken count is known exactly at compile time.
The early exit block dominates the latch block.
There are no loads after the early exit block.
The loop must not contain reductions or recurrences. I don't see anything fundamental blocking vectorisation of such loops, but I just haven't done the work to support them yet.
We must be able to prove at compile-time that loops will not contain faulting loads.

For point 6 once this patch lands I intend to follow up by supporting some limited cases of faulting loops where we can version the loop based on pointer alignment. For example, it turns out in the SPEC2017 benchmark (xalancbmk) there is a std::find loop that we can vectorise provided we add SCEV checks for the initial pointer being aligned to a multiple of the VF. In practice, the pointer is regularly aligned to at least 32/64 bytes and since the VF is a power of 2, any vector loads <= 32/64 bytes in size will always fault on the first lane, following the same behaviour as the scalar loop. Given we already do such speculative versioning for loops with unknown strides, alignment-based versioning doesn't seem to be any worse at least for loops with only one load.

This patch makes use of the existing experimental_cttz_elems intrinsic that's required in the vectorised early exit block to determine the first lane that triggered the exit. This intrinsic has generic lowering support so it's guaranteed to work for all targets.

Tests have been added here:

Transforms/LoopVectorize/AArch64/simple_early_exit.ll

llvmbot · 2024-04-11T12:46:46Z

@llvm/pr-subscribers-llvm-analysis
@llvm/pr-subscribers-llvm-support

@llvm/pr-subscribers-llvm-ir

Author: David Sherwood (david-arm)

Changes

This patch adds support for vectorisation of a simple class of loops that typically involves searching for something, i.e.

  for (int i = 0; i &lt; n; i++) {
    if (p[i] == val)
      return i;
  }
  return n;

or

  for (int i = 0; i &lt; n; i++) {
    if (p1[i] != p2[i])
      return i;
  }
  return n;

In this initial commit we only vectorise loops with the following criteria:

There are no stores in the loop.
The loop must have only one early exit like those shown in the above example. I have referred to such exits as speculative early exits, to distinguish from existing support for early exits where the exit-not-taken count is known exactly at compile time.
The early exit block dominates the latch block.
There are no loads after the early exit block.
The loop must not contain reductions or recurrences. I don't see anything fundamental blocking vectorisation of such loops, but I just haven't done the work to support them yet.
We must be able to prove at compile-time that loops will not contain faulting loads.

For point 5 once this patch lands I intend to follow up by supporting some limited cases of faulting loops where we can version the loop based on pointer alignment. For example, it turns out in the SPEC2017 benchmark there is a std::find loop that we can vectorise provided we add SCEV checks for the initial pointer being aligned to a multiple of the VF. In practice, the pointer is regularly aligned to at least 32/64 bytes and since the VF is a power of 2, any vector loads <= 32/64 bytes in size will always fault on the first lane, following the same behaviour as the scalar loop. Given we already do such speculative versioning for loops with unknown strides, alignment-based versioning doesn't seem to be any worse at least for loops with only one load.

This patch makes use of the existing experimental_cttz_elems intrinsic that's required in the vectorised early exit block to determine the first lane that triggered the exit. This intrinsic has generic lowering support so it's guaranteed to work for all targets.

Tests have been added here:

Transforms/LoopVectorize/AArch64/simple_early_exit.ll

Patch is 226.95 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/88385.diff

18 Files Affected:

(modified) llvm/include/llvm/Analysis/LoopAccessAnalysis.h (+36)
(modified) llvm/include/llvm/Analysis/ScalarEvolution.h (+33-3)
(modified) llvm/include/llvm/IR/IRBuilder.h (+7)
(modified) llvm/include/llvm/Support/GenericLoopInfo.h (+4)
(modified) llvm/include/llvm/Support/GenericLoopInfoImpl.h (+10)
(modified) llvm/include/llvm/Transforms/Utils/ScalarEvolutionExpander.h (+8-1)
(modified) llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h (+18)
(modified) llvm/lib/Analysis/LoopAccessAnalysis.cpp (+180-9)
(modified) llvm/lib/Analysis/ScalarEvolution.cpp (+88-6)
(modified) llvm/lib/Transforms/Utils/ScalarEvolutionExpander.cpp (+2-2)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp (+10)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+348-42)
(modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+63-5)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+71-7)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+38-11)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+3-1)
(added) llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll (+2544)
(modified) llvm/test/Transforms/LoopVectorize/control-flow.ll (+1-1)

diff --git a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
index e39c371b41ec5c..d79c53f490c927 100644
--- a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
+++ b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
@@ -587,6 +587,9 @@ class LoopAccessInfo {
   /// not legal to insert them.
   bool hasConvergentOp() const { return HasConvergentOp; }
 
+  /// Return true if the loop may fault due to memory accesses.
+  bool mayFault() const { return LoopMayFault; }
+
   const RuntimePointerChecking *getRuntimePointerChecking() const {
     return PtrRtChecking.get();
   }
@@ -608,6 +611,24 @@ class LoopAccessInfo {
   unsigned getNumStores() const { return NumStores; }
   unsigned getNumLoads() const { return NumLoads;}
 
+  /// Returns the block that exits early from the loop, if there is one.
+  /// Otherwise returns nullptr.
+  BasicBlock *getSpeculativeEarlyExitingBlock() const {
+    return SpeculativeEarlyExitingBB;
+  }
+
+  /// Returns the successor of the block that exits early from the loop, if
+  /// there is one. Otherwise returns nullptr.
+  BasicBlock *getSpeculativeEarlyExitBlock() const {
+    return SpeculativeEarlyExitBB;
+  }
+
+  /// Returns all blocks with a countable exit, i.e. the exit-not-taken count
+  /// is known exactly at compile time.
+  const SmallVector<BasicBlock *, 4> &getCountableEarlyExitingBlocks() const {
+    return CountableEarlyExitBlocks;
+  }
+
   /// The diagnostics report generated for the analysis.  E.g. why we
   /// couldn't analyze the loop.
   const OptimizationRemarkAnalysis *getReport() const { return Report.get(); }
@@ -659,6 +680,10 @@ class LoopAccessInfo {
   /// pass.
   bool canAnalyzeLoop();
 
+  /// Returns true if this is a supported early exit loop that we can analyze
+  /// in this pass.
+  bool isAnalyzableEarlyExitLoop();
+
   /// Save the analysis remark.
   ///
   /// LAA does not directly emits the remarks.  Instead it stores it which the
@@ -696,6 +721,17 @@ class LoopAccessInfo {
   /// Cache the result of analyzeLoop.
   bool CanVecMem = false;
   bool HasConvergentOp = false;
+  bool LoopMayFault = false;
+
+  /// Keeps track of the early-exiting block, if present.
+  BasicBlock *SpeculativeEarlyExitingBB = nullptr;
+
+  /// Keeps track of the successor of the early-exiting block, if present.
+  BasicBlock *SpeculativeEarlyExitBB = nullptr;
+
+  /// Keeps track of all the early exits with known or countable exit-not-taken
+  /// counts.
+  SmallVector<BasicBlock *, 4> CountableEarlyExitBlocks;
 
   /// Indicator that there are non vectorizable stores to a uniform address.
   bool HasDependenceInvolvingLoopInvariantAddress = false;
diff --git a/llvm/include/llvm/Analysis/ScalarEvolution.h b/llvm/include/llvm/Analysis/ScalarEvolution.h
index 5828cc156cc785..562deab8b4159e 100644
--- a/llvm/include/llvm/Analysis/ScalarEvolution.h
+++ b/llvm/include/llvm/Analysis/ScalarEvolution.h
@@ -892,9 +892,13 @@ class ScalarEvolution {
   /// Similar to getBackedgeTakenCount, except it will add a set of
   /// SCEV predicates to Predicates that are required to be true in order for
   /// the answer to be correct. Predicates can be checked with run-time
-  /// checks and can be used to perform loop versioning.
-  const SCEV *getPredicatedBackedgeTakenCount(const Loop *L,
-                                              SmallVector<const SCEVPredicate *, 4> &Predicates);
+  /// checks and can be used to perform loop versioning. If \p Speculative is
+  /// true, this will attempt to return the speculative backedge count for loops
+  /// with early exits. However, this is only possible if we can formulate an
+  /// exact expression for the backedge count from the latch block.
+  const SCEV *getPredicatedBackedgeTakenCount(
+      const Loop *L, SmallVector<const SCEVPredicate *, 4> &Predicates,
+      bool Speculative = false);
 
   /// When successful, this returns a SCEVConstant that is greater than or equal
   /// to (i.e. a "conservative over-approximation") of the value returend by
@@ -912,6 +916,12 @@ class ScalarEvolution {
     return getBackedgeTakenCount(L, SymbolicMaximum);
   }
 
+  /// Return all the exiting blocks in with exact exit counts.
+  void getExactExitingBlocks(const Loop *L,
+                             SmallVector<BasicBlock *, 4> *Blocks) {
+    getBackedgeTakenInfo(L).getExactExitingBlocks(L, this, Blocks);
+  }
+
   /// Return true if the backedge taken count is either the value returned by
   /// getConstantMaxBackedgeTakenCount or zero.
   bool isBackedgeTakenCountMaxOrZero(const Loop *L);
@@ -1534,6 +1544,16 @@ class ScalarEvolution {
     const SCEV *getExact(const Loop *L, ScalarEvolution *SE,
                          SmallVector<const SCEVPredicate *, 4> *Predicates = nullptr) const;
 
+    /// Similar to the above, except we permit unknown exit counts from
+    /// non-latch exit blocks. Any such early exit blocks must dominate the
+    /// latch and so the returned expression represents the speculative, or
+    /// maximum possible, *backedge-taken* count of the loop. If there is no
+    /// exact exit count for the latch this function returns
+    /// SCEVCouldNotCompute.
+    const SCEV *getSpeculative(
+        const Loop *L, ScalarEvolution *SE,
+        SmallVector<const SCEVPredicate *, 4> *Predicates = nullptr) const;
+
     /// Return the number of times this loop exit may fall through to the back
     /// edge, or SCEVCouldNotCompute. The loop is guaranteed not to exit via
     /// this block before this number of iterations, but may exit via another
@@ -1541,6 +1561,10 @@ class ScalarEvolution {
     const SCEV *getExact(const BasicBlock *ExitingBlock,
                          ScalarEvolution *SE) const;
 
+    /// Return all the exiting blocks in with exact exit counts.
+    void getExactExitingBlocks(const Loop *L, ScalarEvolution *SE,
+                               SmallVector<BasicBlock *, 4> *Blocks) const;
+
     /// Get the constant max backedge taken count for the loop.
     const SCEV *getConstantMax(ScalarEvolution *SE) const;
 
@@ -2316,6 +2340,9 @@ class PredicatedScalarEvolution {
   /// Get the (predicated) backedge count for the analyzed loop.
   const SCEV *getBackedgeTakenCount();
 
+  /// Get the (predicated) speculative backedge count for the analyzed loop.
+  const SCEV *getSpeculativeBackedgeTakenCount();
+
   /// Adds a new predicate.
   void addPredicate(const SCEVPredicate &Pred);
 
@@ -2384,6 +2411,9 @@ class PredicatedScalarEvolution {
 
   /// The backedge taken count.
   const SCEV *BackedgeCount = nullptr;
+
+  /// The speculative backedge taken count.
+  const SCEV *SpeculativeBackedgeCount = nullptr;
 };
 
 template <> struct DenseMapInfo<ScalarEvolution::FoldID> {
diff --git a/llvm/include/llvm/IR/IRBuilder.h b/llvm/include/llvm/IR/IRBuilder.h
index f381273c46cfb8..81cf8a6f5d4793 100644
--- a/llvm/include/llvm/IR/IRBuilder.h
+++ b/llvm/include/llvm/IR/IRBuilder.h
@@ -2503,6 +2503,13 @@ class IRBuilderBase {
     return CreateShuffleVector(V, PoisonValue::get(V->getType()), Mask, Name);
   }
 
+  Value *CreateCountTrailingZeroElems(Type *ResTy, Value *Mask,
+                                      const Twine &Name = "") {
+    return CreateIntrinsic(
+        Intrinsic::experimental_cttz_elts, {ResTy, Mask->getType()},
+        {Mask, getInt1(/*ZeroIsPoison=*/true)}, nullptr, Name);
+  }
+
   Value *CreateExtractValue(Value *Agg, ArrayRef<unsigned> Idxs,
                             const Twine &Name = "") {
     if (auto *V = Folder.FoldExtractValue(Agg, Idxs))
diff --git a/llvm/include/llvm/Support/GenericLoopInfo.h b/llvm/include/llvm/Support/GenericLoopInfo.h
index d560ca648132c9..83cacf864089cc 100644
--- a/llvm/include/llvm/Support/GenericLoopInfo.h
+++ b/llvm/include/llvm/Support/GenericLoopInfo.h
@@ -294,6 +294,10 @@ template <class BlockT, class LoopT> class LoopBase {
   /// Otherwise return null.
   BlockT *getUniqueExitBlock() const;
 
+  /// Return the exit block for the latch if one exists. This function assumes
+  /// the loop has a latch.
+  BlockT *getLatchExitBlock() const;
+
   /// Return true if this loop does not have any exit blocks.
   bool hasNoExitBlocks() const;
 
diff --git a/llvm/include/llvm/Support/GenericLoopInfoImpl.h b/llvm/include/llvm/Support/GenericLoopInfoImpl.h
index 1e0d0ee446fc41..3beb3e538398ef 100644
--- a/llvm/include/llvm/Support/GenericLoopInfoImpl.h
+++ b/llvm/include/llvm/Support/GenericLoopInfoImpl.h
@@ -159,6 +159,16 @@ BlockT *LoopBase<BlockT, LoopT>::getUniqueExitBlock() const {
   return getExitBlockHelper(this, true).first;
 }
 
+template <class BlockT, class LoopT>
+BlockT *LoopBase<BlockT, LoopT>::getLatchExitBlock() const {
+  BlockT *Latch = getLoopLatch();
+  assert(Latch && "Latch block must exists");
+  for (BlockT *Successor : children<BlockT *>(Latch))
+    if (!contains(Successor))
+      return Successor;
+  return nullptr;
+}
+
 /// getExitEdges - Return all pairs of (_inside_block_,_outside_block_).
 template <class BlockT, class LoopT>
 void LoopBase<BlockT, LoopT>::getExitEdges(
diff --git a/llvm/include/llvm/Transforms/Utils/ScalarEvolutionExpander.h b/llvm/include/llvm/Transforms/Utils/ScalarEvolutionExpander.h
index 62c1e15a9a60e1..05850f864d042a 100644
--- a/llvm/include/llvm/Transforms/Utils/ScalarEvolutionExpander.h
+++ b/llvm/include/llvm/Transforms/Utils/ScalarEvolutionExpander.h
@@ -124,6 +124,11 @@ class SCEVExpander : public SCEVVisitor<SCEVExpander, Value *> {
   /// "expanded" form.
   bool LSRMode;
 
+  /// If the loop has an early exit we may have to use the speculative backedge
+  /// count, since the normal backedge count function is unable to compute a
+  /// SCEV expression.
+  bool UseSpeculativeBackedgeCount;
+
   typedef IRBuilder<InstSimplifyFolder, IRBuilderCallbackInserter> BuilderType;
   BuilderType Builder;
 
@@ -176,10 +181,12 @@ class SCEVExpander : public SCEVVisitor<SCEVExpander, Value *> {
 public:
   /// Construct a SCEVExpander in "canonical" mode.
   explicit SCEVExpander(ScalarEvolution &se, const DataLayout &DL,
-                        const char *name, bool PreserveLCSSA = true)
+                        const char *name, bool PreserveLCSSA = true,
+                        bool UseSpeculativeBackedgeCount = false)
       : SE(se), DL(DL), IVName(name), PreserveLCSSA(PreserveLCSSA),
         IVIncInsertLoop(nullptr), IVIncInsertPos(nullptr), CanonicalMode(true),
         LSRMode(false),
+        UseSpeculativeBackedgeCount(UseSpeculativeBackedgeCount),
         Builder(se.getContext(), InstSimplifyFolder(DL),
                 IRBuilderCallbackInserter(
                     [this](Instruction *I) { rememberInstruction(I); })) {
diff --git a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
index a509ebf6a7e1b3..20a53abeb2e5cc 100644
--- a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
+++ b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
@@ -374,6 +374,24 @@ class LoopVectorizationLegality {
     return LAI->getDepChecker().getMaxSafeVectorWidthInBits();
   }
 
+  /// Returns true if the loop has a early exit with a exact backedge
+  /// count that is speculative.
+  bool hasSpeculativeEarlyExit() const {
+    return LAI && LAI->getSpeculativeEarlyExitingBlock();
+  }
+
+  /// Returns the early exiting block in a loop with a speculative backedge
+  /// count.
+  BasicBlock *getSpeculativeEarlyExitingBlock() const {
+    return LAI->getSpeculativeEarlyExitingBlock();
+  }
+
+  /// Returns the destination of an early exiting block in a loop with a
+  /// speculative backedge count.
+  BasicBlock *getSpeculativeEarlyExitBlock() const {
+    return LAI->getSpeculativeEarlyExitBlock();
+  }
+
   /// Returns true if vector representation of the instruction \p I
   /// requires mask.
   bool isMaskRequired(const Instruction *I) const {
diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index 3bfc9700a14559..32e5816644310a 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -730,6 +730,9 @@ class AccessAnalysis {
     return UnderlyingObjects;
   }
 
+  /// Returns true if we cannot prove the loop will not fault.
+  bool mayFault();
+
 private:
   typedef MapVector<MemAccessInfo, SmallSetVector<Type *, 1>> PtrAccessMap;
 
@@ -1281,6 +1284,63 @@ bool AccessAnalysis::canCheckPtrAtRT(RuntimePointerChecking &RtCheck,
   return CanDoRTIfNeeded;
 }
 
+bool AccessAnalysis::mayFault() {
+  auto &DL = TheLoop->getHeader()->getModule()->getDataLayout();
+  for (auto &UO : UnderlyingObjects) {
+    // TODO: For now if we encounter more than one underlying object we just
+    // assume it could fault. However, with more analysis it's possible to look
+    // at all of them and calculate a common range of permitted GEP indices.
+    if (UO.second.size() != 1)
+      return true;
+
+    // For now only the simplest cases are permitted, but this could be
+    // extended further.
+    auto *GEP = dyn_cast<GetElementPtrInst>(UO.first);
+    if (!GEP || GEP->getPointerOperand() != UO.second[0] ||
+        GEP->getNumIndices() != 1)
+      return true;
+
+    // Verify pointer accessed within the loop always falls within the bounds
+    // of the underlying object, but first it's necessary to determine the
+    // object size.
+
+    auto GetKnownObjSize = [&](const Value *Obj) -> uint64_t {
+      // TODO: We should also be able to support global variables too.
+      if (auto *AllocaObj = dyn_cast<AllocaInst>(Obj)) {
+        if (TheLoop->isLoopInvariant(AllocaObj))
+          if (std::optional<TypeSize> AllocaSize =
+                  AllocaObj->getAllocationSize(DL))
+            return !AllocaSize->isScalable() ? AllocaSize->getFixedValue() : 0;
+      } else if (auto *ArgObj = dyn_cast<Argument>(Obj))
+        return ArgObj->getDereferenceableBytes();
+      return 0;
+    };
+
+    uint64_t ObjSize = GetKnownObjSize(UO.second[0]);
+    if (!ObjSize)
+      return true;
+
+    Value *GEPInd = GEP->getOperand(1);
+    const SCEV *IndScev = PSE.getSCEV(GEPInd);
+    if (!isa<SCEVAddRecExpr>(IndScev))
+      return true;
+
+    // Calculate the maximum number of addressable elements in the object.
+    uint64_t ElemSize = GEP->getSourceElementType()->getScalarSizeInBits() / 8;
+    uint64_t MaxNumElems = ObjSize / ElemSize;
+
+    const SCEV *MinScev = PSE.getSE()->getConstant(GEPInd->getType(), 0);
+    const SCEV *MaxScev =
+        PSE.getSE()->getConstant(GEPInd->getType(), MaxNumElems);
+    if (!PSE.getSE()->isKnownOnEveryIteration(
+            ICmpInst::ICMP_SGE, cast<SCEVAddRecExpr>(IndScev), MinScev) ||
+        !PSE.getSE()->isKnownOnEveryIteration(
+            ICmpInst::ICMP_SLT, cast<SCEVAddRecExpr>(IndScev), MaxScev))
+      return true;
+  }
+  return false;
+}
+
 void AccessAnalysis::processMemAccesses() {
   // We process the set twice: first we process read-write pointers, last we
   // process read-only pointers. This allows us to skip dependence tests for
@@ -2292,6 +2352,73 @@ void MemoryDepChecker::Dependence::print(
   OS.indent(Depth + 2) << *Instrs[Destination] << "\n";
 }
 
+bool LoopAccessInfo::isAnalyzableEarlyExitLoop() {
+  // At least one of the exiting blocks must be the latch.
+  BasicBlock *LatchBB = TheLoop->getLoopLatch();
+  if (!LatchBB)
+    return false;
+
+  SmallVector<BasicBlock *, 8> ExitingBlocks;
+  TheLoop->getExitingBlocks(ExitingBlocks);
+
+  // This is definitely not an early exit loop.
+  if (ExitingBlocks.size() < 2)
+    return false;
+
+  SmallVector<BasicBlock *, 4> ExactExitingBlocks;
+  PSE->getSE()->getExactExitingBlocks(TheLoop, &ExactExitingBlocks);
+
+  // We only support one speculative early exit.
+  if ((ExitingBlocks.size() - ExactExitingBlocks.size()) > 1)
+    return false;
+
+  // There could be multiple exiting blocks with an exact exit-not-taken
+  // count. Find the speculative early exit block, i.e. the one with an
+  // unknown count.
+  BasicBlock *TmpBB = nullptr;
+  for (BasicBlock *BB1 : ExitingBlocks) {
+    bool Found = false;
+    for (BasicBlock *BB2 : ExactExitingBlocks)
+      if (BB1 == BB2) {
+        Found = true;
+        break;
+      }
+    if (!Found) {
+      TmpBB = BB1;
+      break;
+    }
+  }
+  assert(TmpBB && "Expected to find speculative early exiting block");
+
+  // For now, let's keep things simple by ensuring the latch block only has
+  // the exiting block as a predecessor.
+  BasicBlock *LatchPredBB = LatchBB->getUniquePredecessor();
+  if (!LatchPredBB || LatchPredBB != TmpBB)
+    return false;
+
+  LLVM_DEBUG(
+      dbgs()
+      << "LAA: Found an early exit. Retrying with speculative exit count.\n");
+  const SCEV *SpecExitCount = PSE->getSpeculativeBackedgeTakenCount();
+  if (isa<SCEVCouldNotCompute>(SpecExitCount))
+    return false;
+
+  LLVM_DEBUG(dbgs() << "LAA: Found speculative backedge taken count: "
+                    << *SpecExitCount << '\n');
+  SpeculativeEarlyExitingBB = TmpBB;
+
+  for (BasicBlock *BB : successors(SpeculativeEarlyExitingBB))
+    if (BB != LatchBB) {
+      SpeculativeEarlyExitBB = BB;
+      break;
+    }
+  assert(SpeculativeEarlyExitBB &&
+         "Expected to find speculative early exit block");
+  CountableEarlyExitBlocks = std::move(ExactExitingBlocks);
+
+  return true;
+}
+
 bool LoopAccessInfo::canAnalyzeLoop() {
   // We need to have a loop header.
   LLVM_DEBUG(dbgs() << "LAA: Found a loop in "
@@ -2317,10 +2444,12 @@ bool LoopAccessInfo::canAnalyzeLoop() {
   // ScalarEvolution needs to be able to find the exit count.
   const SCEV *ExitCount = PSE->getBackedgeTakenCount();
   if (isa<SCEVCouldNotCompute>(ExitCount)) {
-    recordAnalysis("CantComputeNumberOfIterations")
-        << "could not determine number of loop iterations";
     LLVM_DEBUG(dbgs() << "LAA: SCEV could not compute the loop exit count.\n");
-    return false;
+    if (!isAnalyzableEarlyExitLoop()) {
+      recordAnalysis("CantComputeNumberOfIterations")
+          << "could not determine number of loop iterations";
+      return false;
+    }
   }
 
   return true;
@@ -2352,6 +2481,9 @@ void LoopAccessInfo::analyzeLoop(AAResults *AA, LoopInfo *LI,
       EnableMemAccessVersioning &&
       !TheLoop->getHeader()->getParent()->hasOptSize();
 
+  BasicBlock *LatchBB = TheLoop->getLoopLatch();
+  bool HasComplexWorkInEarlyExitLoop = false;
+
   // Traverse blocks in fixed RPOT order, regardless of their storage in the
   // loop info, as it may be arbitrary.
   LoopBlocksRPO RPOT(TheLoop);
@@ -2367,7 +2499,8 @@ void LoopAccessInfo::analyzeLoop(AAResults *AA, LoopInfo *LI,
 
       // With both a non-vectorizable memory instruction and a convergent
       // operation, found in this loop, no reason to continue the search.
-      if (HasComplexMemInst && HasConvergentOp) {
+      if ((HasComplexMemInst && HasConvergentOp) ||
+          HasComplexWorkInEarlyExitLoop) {
         CanVecMem = false;
         return;
       }
@@ -2385,6 +2518,14 @@ void LoopAccessInfo::analyzeLoop(AAResults *AA, LoopInfo *LI,
       // vectorize a loop if it contains known function calls that don't set
       // the flag. Therefore, it is safe to ignore this read from memory.
       auto *Call = dyn_cast<CallInst>(&I);
+      if (Call && SpeculativeEarlyExitingBB) {
+        recordAnalysis("CantVectorizeInstruction", Call)
+            << "cannot vectorize calls in early exit loop";
+        LLVM_DEBUG(dbgs() << "LAA: Found a call in early exit loop.\n");
+        HasComplexWorkInEarlyExitLoop = true;
+        continue;
+      }
+
       if (Call && getVectorIntrinsicIDForCall(Call, TLI))
         continue;
 
@@ -2412,6 +2553,13 @@ void LoopAccessInfo::analyzeLoop(AAResults *AA, LoopInfo *LI,
           HasComplexMemInst = true;
           continue;
         }
+        if (SpeculativeEarlyExitingBB && BB == LatchBB) {
+          recordAnalysis("CantVectorizeInstruction", Call)
+              << "cannot vectorize loads after early exit block";
+          LLVM_DEBUG(dbgs() << "LAA: Found a load after early exit.\n");
+          HasComplexWorkInEarlyExitLoop = true;
+          continue;
+        }
         NumLoads++;
         Loads.push_back(Ld);
         DepChecker->addAccess(Ld);
@@ -2423,6 +2571,13 @@ void LoopAccessInfo::analyzeLoop(AAResults *AA, LoopInfo *LI,
       // Save 'store' instructions. Abort if othe...
[truncated]

david-arm · 2024-04-11T12:50:59Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

@@ -10260,7 +10562,11 @@ bool LoopVectorizePass::processLoop(Loop *L) {
    Hints.setAlreadyVectorized();
  }

-  assert(!verifyFunction(*L->getHeader()->getParent(), &dbgs()));
+  //  assert(!verifyFunction(*L->getHeader()->getParent(), &dbgs()));


My apologies - I just realised this debug code is still present. I'll fix asap!

llvm/lib/Analysis/LoopAccessAnalysis.cpp

llvm/include/llvm/Support/GenericLoopInfo.h

huntergr-arm · 2024-04-11T13:13:24Z

llvm/lib/Analysis/LoopAccessAnalysis.cpp

@@ -2423,6 +2571,13 @@ void LoopAccessInfo::analyzeLoop(AAResults *AA, LoopInfo *LI,
      // Save 'store' instructions. Abort if other instructions write to memory.
      if (I.mayWriteToMemory()) {
        auto *St = dyn_cast<StoreInst>(&I);
+        if (SpeculativeEarlyExitingBB) {
+          recordAnalysis("CantVectorizeInstruction", St)


What happens if St is null here?

There is a similar problem in the code below too - recordAnalysis simply uses the debug information for the loop instead, but it won't crash. However, I think it makes sense to record the instruction using I instead and I'll update the message to show that it might not be a store.

llvm/lib/Analysis/LoopAccessAnalysis.cpp

llvm/lib/Analysis/ScalarEvolution.cpp

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/VPlan.h

nikic · 2024-04-23T05:46:01Z

llvm/lib/Analysis/ScalarEvolution.cpp

+  // a later poison exit count should not propagate into the result. This are
+  // exactly the semantics provided by umin_seq.
+  return SE->getUMinFromMismatchedTypes(Ops, /* Sequential */ true);
+}


How does this "speculative" BECount differ from the SymbolicMax BECount?

Hmm, it does seem like it does the same thing, except there is no predicated version that accepts a vector of SCEVPredicate pointers, which is required for getPredicatedBackedgeTakenCount. I can try adding a predicated version of getSymbolicMax to see if that works.

The other major difference between getSpeculative and getSymbolicMax is the former requires the exact-not-taken count for the latch to be known, whereas the latter doesn't care. So I think in order to use something close to the existing getSymbolicMax interface I'll need to do two things:

Rewrite getSymbolicMax (or add an overloaded interface) so that it's a const interface (allowing it to be called from getPredicatedBackedgeTakenCount). Also, add a SmallVector<const SCEVPredicate *, 4> *Predicates argument.

Add code to getPredicatedBackedgeTakenCount to explicitly check we have a exact-not-taken count for the latch.

I'm happy to do this of course - just pointing out that getSymbolicMax isn't a drop-in replacement that's all. I'll try it out and see if I get the same behaviour as before.

I've tried posting a new commit that teaches getPredicatedBackedgeTakenCount to use a version of getSymbolicMax that accepts predicates, provided we have an exact count for the latch. Hopefully this makes better reuse of the code.

llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll

nikolaypanchenko · 2024-04-24T17:59:54Z

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

+  bool LoopMayFault = false;
+
+  /// Keeps track of the early-exiting block, if present.
+  BasicBlock *SpeculativeEarlyExitingBB = nullptr;


Maybe better name just EarlyExitingBB? At least to me Speculative implies that there's speculation on memory, i.e. that refers to BBs with mayfault accesses

In a sense though that is not far off the truth, because when vectorising the loop we are by definition reading ahead in memory which could potentially cause a fault where the scalar loop would not. However, the main reason I added the word 'Speculative' was to distinguish between early exits with exact exit-not-taken counts (which the vectoriser does support) and early exits that cannot be counted.

I'd prefer not to call it EarlyExitingBB to avoid any possible confusion, but I'm happy to take suggestions on alternative names that are better? Perhaps UncountableEarlyExitingBB?

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

sjoerdmeijer · 2024-05-01T10:08:32Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+      auto *UI = cast<Instruction>(U);
+      if (!L->contains(UI)) {
+        PHINode *PHI = dyn_cast<PHINode>(UI);
+        assert(PHI && "Expected LCSSA form");


Nit: checking LCSSA form could be hoisted and checked earlier and just once?

I can't hoist this assert out, since it's based upon the User U which varies in each loop iteration.

In PR llvm#88385 I've added support for auto-vectorisation of some early exit loops, which requires using the experimental.cttz.elts to calculate final indices in the early exit block. We need a more accurate cost model for this intrinsic to better reflect the cost of work required in the early exit block. I've tried to accurately represent the expansion code for the intrinsic when the target does not have efficient lowering for it. It's quite tricky to model because you need to first figure out what types will actually be used in the expansion. The type used can have a significant effect on the cost if you end up using illegal vector types. Tests added here: Analysis/CostModel/AArch64/cttz_elts.ll Analysis/CostModel/RISCV/cttz_elts.ll

sjoerdmeijer · 2024-05-01T12:58:36Z

I was wondering if it would be good to add some AArch64 codegen tests too so that we can look at some codegen?

david-arm · 2024-05-01T13:06:54Z

I was wondering if it would be good to add some AArch64 codegen tests too so that we can look at some codegen?

If you're referring to the codegen coming out of clang after vectorising the loop, I don't think we typically have tests like that in test/Transform/LoopVectorize. They are normally IR/opt based tests. Are you referring specifically to the codegen from the cttz.elts intrinsic? If so, we already have tests for them - see CodeGen/AArch64/intrinsic-cttz-elts-sve.ll, for example.

sjoerdmeijer · 2024-05-07T13:12:09Z

I was wondering if it would be good to add some AArch64 codegen tests too so that we can look at some codegen?

If you're referring to the codegen coming out of clang after vectorising the loop, I don't think we typically have tests like that in test/Transform/LoopVectorize. They are normally IR/opt based tests. Are you referring specifically to the codegen from the cttz.elts intrinsic? If so, we already have tests for them - see CodeGen/AArch64/intrinsic-cttz-elts-sve.ll, for example.

Yes, I appreciate we test all things individually, but I was just thinking that it is a bit of shame we can't look at some codegen for a loop for all of this work. For example, take the resulting IR of some of the tests in test/Transform/LoopVectorize/AArch64, and create llc tests. Not sure if there's precedent for that, I guess not.

fhahn · 2024-05-07T13:24:31Z

Yes, I appreciate we test all things individually, but I was just thinking that it is a bit of shame we can't look at some codegen for a loop for all of this work. For example, take the resulting IR of some of the tests in test/Transform/LoopVectorize/AArch64, and create llc tests. Not sure if there's precedent for that, I guess not.

It would probably make sense to have some micro-benchmarks for some loops with varying trip counts (both statically known and unknown) to cover the end-to-end flow and allow for easy evaluation. Sharing the generated assembly end-to-end for some of those might help, as @sjoerdmeijer suggested?

(I don't think we should add end-to-end tests to llvm-project/llvm/tests/ directly that run the vectorizer (and possibly other passes) all the way down to assembly)

sjoerdmeijer

This patch basically contains two parts: the LAA/SCEV and the vectorisation part.

I have only looked at the vectorisation part and that looks good to me:

thanks for taking the cost-model remarks into account, the added logic seems like a good first step,
the option to vectorise loops with early breaks is off by default. This allows us to experiment more with this, possibly refine the cost-model, without creating a lot of turbulence.
It's a shame we can't look at final codegen for these sort of patches, but that is not a problem of this patch. I like the idea of some microbenchmarks for this, but given that this is off by default I don't think that this needs to hold up this patch.

So, LGTM, but I haven't looked at the LAA part, perhaps @nikic or @nikolaypanchenko can sign off on that part.

fhahn

LV already support vectorizing early-exit loops for arguably even simpler cases, i.e. the exit counts for all exits can be computed (in that case, the scalar epilogue is run at least once to fix) https://llvm.godbolt.org/z/35v3nzY7K. So it would probably be worth clarifying that in the commit title.

I added some initial comments/questions, but as this patch touches a lot of different things I wasn't yet able to complete a full look through. Hope to make it through all of it by early/mid next week.

fhahn · 2024-05-09T07:44:19Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+    cl::desc("Enable vectorization of early exit loops."));
+
+static cl::opt<bool> AssumeNoMemFault(
+    "vectorizer-no-mem-fault", cl::init(false), cl::Hidden,


Is the intention here to use this for writing unit tests or something else? for unit tests, here are multiple ways to tell LLVM that memory is dereferenceable for a range (with a constant number of elements), which should cover most cases when writing tests?

The intention really was for general testing, i.e. running the LLVM test suite, SPEC2017, etc. and making sure that tests/benchmarks are behaving correctly. With the flag enabled over 2000 loops in the LLVM test suite suddenly start vectorising. Well-behaved programs shouldn't fault if they've specified a trip count, but obviously it's completely unsafe to enable this in general. It was for convenience really - I could remove the flag, but it just means every time we want to test auto-vectorisation with wider coverage we have to hack the vectoriser every time manually.

fhahn · 2024-05-09T07:48:41Z

llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll

+;
+entry:
+  %p1 = alloca [1024 x i8]
+  %p2 = alloca [1024 x i8]


At the moment, the tests read from uninitialized memory, which means automatic verification won't work. It would probably also make them more robust to avoid reading from uninitialized memory (e.g. by passing the pointers to a call first)

Yeah sure I'll try to come up with a different title. I am aware the vectoriser already supports some simpler early exit cases, which I've made an effort to address in the patch in terms of functionality and terminology. In fact, this patch permits multiple early exits in a loop, provided only one of them has an unknown exit-not-taken count. In such cases a scalar epilogue is still required. Essentially we have to distinguish between loops with countable early exits and non-countable ones.

In PR #88385 I've added support for auto-vectorisation of some early exit loops, which requires using the experimental.cttz.elts to calculate final indices in the early exit block. We need a more accurate cost model for this intrinsic to better reflect the cost of work required in the early exit block. I've tried to accurately represent the expansion code for the intrinsic when the target does not have efficient lowering for it. It's quite tricky to model because you need to first figure out what types will actually be used in the expansion. The type used can have a significant effect on the cost if you end up using illegal vector types. Tests added here: Analysis/CostModel/AArch64/cttz_elts.ll Analysis/CostModel/RISCV/cttz_elts.ll

This patch adds support for vectorisation of a simple class of loops that typically involves searching for something, i.e. for (int i = 0; i < n; i++) { if (p[i] == val) return i; } return n; or for (int i = 0; i < n; i++) { if (p1[i] != p2[i]) return i; } return n; In this initial commit we only vectorise loops with the following criteria: 1. There are no stores in the loop. 2. The loop must have only one early exit like those shown in the above example. I have referred to such exits as speculative early exits, to distinguish from existing support for early exits where the exit-not-taken count is known exactly at compile time. 2. The early exit block dominates the latch block. 3. There are no loads after the early exit block. 4. The loop must not contain reductions or recurrences. I don't see anything fundamental blocking vectorisation of such loops, but I just haven't done the work to support them yet. 5. We must be able to prove at compile-time that loops will not contain faulting loads. For point 5 once this patch lands I intend to follow up by supporting some limited cases of faulting loops where we can version the loop based on pointer alignment. For example, it turns out in the SPEC2017 benchmark there is a std::find loop that we can vectorise provided we add SCEV checks for the initial pointer being aligned to a multiple of the VF. In practice, the pointer is regularly aligned to at least 32/64 bytes and since the VF is a power of 2, any vector loads <= 32/64 bytes in size will always fault on the first lane, following the same behaviour as the scalar loop. Given we already do such speculative versioning for loops with unknown strides, alignment-based versioning doesn't seem to be any worse. This patch makes use of the existing experimental_cttz_elems intrinsic that's required in the vectorised early exit block to determine the first lane that triggered the exit. This intrinsic has generic lowering support so it's guaranteed to work for all targets. Tests have been added here: Transforms/LoopVectorize/AArch64/simple_early_exit.ll

* Renamed CountableEarlyExitBlocks -> CountableEarlyExitingBlocks * Renamed getExactExitingBlocks -> getCountableExitingBlocks * Updated comments in code. * Improved analysis/debug message. * Simplified code in isAnalyzableEarlyExitLoop and BackedgeTakenInfo::getSpeculative.

* I've rewritten the loop variant of BackedgeTakenInfo::getSymbolicMax to be more consistent with BackedgeTakenInfo::getExact so that it now also accepts predicates. * I've changed getPredicatedBackedgeTakenCount to use getSymbolicMax, although we still require the latch block to have an exact exit-not-taken count.

* Rename CountableEarlyExitingBlocks -> CountableExitingBlocks. * Add another simple case to mayFault along with a supporting test. * Renamed areRuntimeChecksProfitable -> isOutsideLoopWorkProfitable and added the cost of work in the early exit block. * Added new flag to control early-exit vectorisation (EnableEarlyExitVectorization), which is off by default until we have a more accurate cost model for the cttz.elts intrinsic.

* Fix build warning in LoopAccessInfo::analyzeLoop. * Reuse more code in fixupEarlyExitIVUsers and calculateEarlyExitCost by using lambda functions.

* Updated code in collectLoopUniforms to take into account a compare instruction used to take an early exit is not considered uniform.

david-arm · 2024-05-09T12:31:05Z

I just did a simple rebase and fixed some failing tests by teaching collectLoopUniforms about early exits.

david-arm · 2024-05-10T09:34:59Z

@sjoerdmeijer @fhahn Here is an example of what happens with this patch when I build the following C code with -O3 -target aarch64-linux -mcpu=neoverse-v1 -S -mllvm -enable-early-exit-vectorization:

int foo(unsigned char *vec, unsigned char val) {
  unsigned char local_vec[128];

  for (int i = 0; i < 128; i++) {
    local_vec[i] = vec[i] + i;
  }

  // ACTUAL EARLY EXIT LOOP!
  unsigned char *p = &local_vec[0];
  for (int i = 0; i < 128; i++) {
    if (p[i] == val)
      return i;
  }
  return -1;
}

The code looks a bit contrived because this patch will only vectorise if it can prove the loads in the early-exit loop will no fault. Here is the assembly for the vector portion of the loop:

.LBB0_16:
        rdvl    x10, #15
        mov     x9, xzr    
        and     x0, x10, #0x80
        mov     z0.b, w1
        ptrue   p0.b
        mov     x10, sp
        .p2align        5, , 16
.LBB0_17:
        ld1b    { z1.b }, p0/z, [x10, x9]
        cmpeq   p1.b, p0/z, z1.b, z0.b
        b.ne    .LBB0_21
        add     x9, x9, x8
        cmp     x0, x9
        b.ne    .LBB0_17
        cbz     x0, .LBB0_12
        mov     w0, #-1
        add     sp, sp, #128
        ret
.LBB0_21:
        brkb    p0.b, p0/z, p1.b
        incp    x9, p0.b
        mov     x0, x9
.LBB0_22:
        add     sp, sp, #128
        ret

Thanks!

david-arm · 2024-05-10T09:37:05Z

I just realised this patch is missing a test for a reverse loop, which also works perfectly fine and generates the correct code. I'll put up a new patch today.

* Initialise all memory from allocas. * Add test for a reverse loop.

fhahn · 2024-05-28T04:43:08Z

Starting to think about the code for the dependence/memory analysis, I was wondering if this could be generalized a bit more separately. AFAICT the dependence analysis and runtime check generation part of LAA does not really care if there are early exits, as long as we can compute the symbolic max, which should be safe to use instead of the exact BTC: #93499

With that, it seems like isAnalyzableEarlyExitLoop and mayFault would be better placed outside LAA?

david-arm · 2024-05-28T17:48:04Z

Starting to think about the code for the dependence/memory analysis, I was wondering if this could be generalized a bit more separately. AFAICT the dependence analysis and runtime check generation part of LAA does not really care if there are early exits, as long as we can compute the symbolic max, which should be safe to use instead of the exact BTC: #93499

With that, it seems like isAnalyzableEarlyExitLoop and mayFault would be better placed outside LAA?

I'm on holiday at the moment I'm afraid, but I'm happy to review your comments and look at any patches you have when I get back on Wednesday next week.

david-arm · 2024-06-06T09:09:41Z

Starting to think about the code for the dependence/memory analysis, I was wondering if this could be generalized a bit more separately. AFAICT the dependence analysis and runtime check generation part of LAA does not really care if there are early exits, as long as we can compute the symbolic max, which should be safe to use instead of the exact BTC: #93499

With that, it seems like isAnalyzableEarlyExitLoop and mayFault would be better placed outside LAA?

I agree with trying to break this patch down and making LoopAccessAnalysis as generic and untied to the vectoriser as possible, but I will definitely need to do work in LoopAccessAnalysis to provide the right information for users of it. For example, LoopAccessAnalysis does all the analysis of loads and stores in the loop, not LoopVectorizationLegality where we now test for uncomputable backedge counts. We still need to determine whether or not the loop has an uncountable exit and, if so, whether there are any memory accesses after the exit, which is something more suited to LoopAccessAnalysis. If we move all the analysis into LoopVectorizationLegality then we basically have to duplicate the same work of walking through all the instructions in all the blocks that already happens in canAnalyzeLoop. In this new world I will still need to calculate the earliest uncountable exiting block in LoopAccessAnalysis and track post-exit memory accesses so that in LoopVectorizationLegality we can query the data to make an informed choice. I'll look into doing that, but the main point is that we can't devolve all work into LoopVectorizationLegality.

I may be able to move mayFault, but it does depend upon structures built up by LoopAccessAnalysis and I personally thought that this function might be useful outside of the vectoriser so it wasn't obvious where to move it to.

david-arm requested review from fhahn, preames, davemgreen and huntergr-arm April 11, 2024 12:46

david-arm requested a review from nikic as a code owner April 11, 2024 12:46

llvmbot added vectorization llvm:support llvm:ir llvm:analysis llvm:transforms labels Apr 11, 2024

david-arm commented Apr 11, 2024

View reviewed changes

david-arm force-pushed the ee_autovec branch from 82cac1d to f9600cb Compare April 11, 2024 12:55

huntergr-arm reviewed Apr 11, 2024

View reviewed changes

arcbbb reviewed Apr 15, 2024

View reviewed changes

llvm/include/llvm/Analysis/LoopAccessAnalysis.h Outdated Show resolved Hide resolved

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Outdated Show resolved Hide resolved

llvm/lib/Transforms/Vectorize/VPlan.h Outdated Show resolved Hide resolved

topperc mentioned this pull request Apr 15, 2024

[RISCV] Provide a more efficient lowering for experimental.cttz.elts. #88552

Merged

david-arm requested a review from sjoerdmeijer April 18, 2024 07:55

nikic reviewed Apr 23, 2024

View reviewed changes

sjoerdmeijer reviewed Apr 24, 2024

View reviewed changes

llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll Show resolved Hide resolved

nikolaypanchenko reviewed Apr 24, 2024

View reviewed changes

david-arm force-pushed the ee_autovec branch from 05f3f59 to bc36e82 Compare April 26, 2024 10:07

sjoerdmeijer reviewed May 1, 2024

View reviewed changes

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Outdated Show resolved Hide resolved

sjoerdmeijer reviewed May 1, 2024

View reviewed changes

david-arm mentioned this pull request May 1, 2024

[Analysis] Add cost model for experimental.cttz.elts intrinsic #90720

Merged

sjoerdmeijer approved these changes May 8, 2024

View reviewed changes

fhahn reviewed May 9, 2024

View reviewed changes

david-arm added 7 commits May 9, 2024 12:22

Make sure to call fixPhi on live-outs

8b44a9b

Address review comments

ead00e7

* Fix build warning in LoopAccessInfo::analyzeLoop. * Reuse more code in fixupEarlyExitIVUsers and calculateEarlyExitCost by using lambda functions.

Fix failing tests after rebase

a86781c

* Updated code in collectLoopUniforms to take into account a compare instruction used to take an early exit is not considered uniform.

david-arm force-pushed the ee_autovec branch from fee0d87 to a86781c Compare May 9, 2024 12:30

Update tests in simple_early_exit.ll

9abfc6e

* Initialise all memory from allocas. * Add test for a reverse loop.

david-arm changed the title ~~[LoopVectorize] Add support for vectorisation of simple early exit loops~~ [LoopVectorize] Add support for vectorisation of more early exit loops May 10, 2024

david-arm mentioned this pull request Jun 5, 2024

[SCEV] Add predicated version of getSymbolicMaxBackedgeTakenCount. #93498

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LoopVectorize] Add support for vectorisation of more early exit loops #88385

[LoopVectorize] Add support for vectorisation of more early exit loops #88385

david-arm commented Apr 11, 2024 •

edited

llvmbot commented Apr 11, 2024 •

edited

david-arm Apr 11, 2024

huntergr-arm Apr 11, 2024

david-arm Apr 22, 2024

nikic Apr 23, 2024

david-arm Apr 23, 2024

david-arm Apr 23, 2024

david-arm Apr 24, 2024

nikolaypanchenko Apr 24, 2024

david-arm Apr 25, 2024

sjoerdmeijer May 1, 2024

david-arm May 7, 2024

sjoerdmeijer commented May 1, 2024

david-arm commented May 1, 2024 •

edited

sjoerdmeijer commented May 7, 2024

fhahn commented May 7, 2024

sjoerdmeijer left a comment

fhahn left a comment

fhahn May 9, 2024

david-arm May 9, 2024

fhahn May 9, 2024

david-arm May 9, 2024

david-arm May 10, 2024

david-arm commented May 9, 2024

david-arm commented May 10, 2024

david-arm commented May 10, 2024

fhahn commented May 28, 2024

david-arm commented May 28, 2024

david-arm commented Jun 6, 2024

[LoopVectorize] Add support for vectorisation of more early exit loops #88385

Are you sure you want to change the base?

[LoopVectorize] Add support for vectorisation of more early exit loops #88385

Conversation

david-arm commented Apr 11, 2024 • edited

llvmbot commented Apr 11, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sjoerdmeijer commented May 1, 2024

david-arm commented May 1, 2024 • edited

sjoerdmeijer commented May 7, 2024

fhahn commented May 7, 2024

sjoerdmeijer left a comment

Choose a reason for hiding this comment

fhahn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

david-arm commented May 9, 2024

david-arm commented May 10, 2024

david-arm commented May 10, 2024

fhahn commented May 28, 2024

david-arm commented May 28, 2024

david-arm commented Jun 6, 2024

david-arm commented Apr 11, 2024 •

edited

llvmbot commented Apr 11, 2024 •

edited

david-arm commented May 1, 2024 •

edited