[LV] Vectorization of compress idiom #83467

nikolaypanchenko · 2024-02-29T19:37:36Z

Monotonic value m is a scalar value that has special loop carried dependency, which can be described as

  m += step
  ... m ...

where

m is a scalar variable, *step is a loop-invariant variable,
the update is done under some non-uniform condition,
use(s) is(are) done under the same or nested condition(s)

Whether m is used in rhs or lhs defines which special vector code needs to be generated on a use-side.
If m is used in lhs, the pattern is known as compress as stored data needs to be compressed according to the mask before the store
If m is used in rhs, the pattern is known as expand/decompress as use data needs to be expanded according to the mask

The changeset adds new descriptor for monotonic values as define above and adds initial support to vectorize unit-strided compress store.
The changeset bails out in case if monotonic value requires expand or expandload or step of it is not 1. Future work is planed to support these scenarios.

llvmbot · 2024-02-29T19:38:07Z

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-llvm-transforms

Author: Kolya Panchenko (nikolaypanchenko)

Changes

Monotonic value m is a scalar value that has special loop carried dependency, which can be described as

  m += step
  ... m ...

where

m is a scalar variable, *step is a loop-invariant variable,
the update is done under some non-uniform condition,
use(s) is(are) done under the same or nested condition(s)

Whether m is used in rhs or lhs defines which special vector code needs to be generated on a use-side.
If m is used in lhs, the pattern is known as compress as stored data needs to be compressed according to the mask before the store
If m is used in rhs, the pattern is known as expand/decompress as use data needs to be expanded according to the mask

The changeset adds new descriptor for monotonic values as define above and adds initial support to vectorize unit-strided compress store.
The changeset bails out in case if monotonic value requires expand or expandload or step of it is not 1. Future work is planed to support these scenarios.

Patch is 84.00 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/83467.diff

18 Files Affected:

(modified) llvm/include/llvm/Analysis/IVDescriptors.h (+49)
(modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+8)
(modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+2)
(modified) llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h (+58)
(modified) llvm/lib/Analysis/IVDescriptors.cpp (+124)
(modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+4)
(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+4)
(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h (+4)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp (+88)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+93-6)
(modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+1)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+108-2)
(modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+8-8)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+68)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+62)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+4)
(modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (+2)
(added) llvm/test/Transforms/LoopVectorize/RISCV/compress_expand.ll (+702)

diff --git a/llvm/include/llvm/Analysis/IVDescriptors.h b/llvm/include/llvm/Analysis/IVDescriptors.h
index 5c7b613ac48c40..877204a8b2d864 100644
--- a/llvm/include/llvm/Analysis/IVDescriptors.h
+++ b/llvm/include/llvm/Analysis/IVDescriptors.h
@@ -17,6 +17,7 @@
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/ValueHandle.h"
+#include "llvm/ADT/SetVector.h"
 
 namespace llvm {
 
@@ -395,6 +396,54 @@ class InductionDescriptor {
   SmallVector<Instruction *, 2> RedundantCasts;
 };
 
+class MonotonicDescriptor {
+public:
+  /// This enum represents the kinds of monotonic that we support.
+  enum MonotonicKind {
+    MK_None,  ///< Not a monotonic variable.
+    MK_Integer, /// < Integer monotonic variable. Step = C
+    MK_Pointer, /// < Pointer monotonic variable. Step = C
+  };
+
+public:
+  MonotonicDescriptor() = default;
+
+  Value *getStartValue() const { return StartValue; }
+  MonotonicKind getKind() const { return MK; }
+  const SCEV *getStep() const { return Step; }
+  const Instruction *getUpdateOp() const { return UpdateOp; }
+  const SetVector<PHINode *> &getPhis() const { return Phis; }
+  bool isHeaderPhi(const PHINode *Phi) const {
+    return !Phis.empty() && Phis[0] == Phi;
+  }
+
+  /// Returns true if \p Phi forms monotonic pattern within a loop \p L.
+  static MonotonicDescriptor isMonotonicPHI(PHINode *Phi, const Loop *L,
+                                            PredicatedScalarEvolution &PSE);
+
+  operator bool() const { return MK != MK_None; }
+
+private:
+  /// Private constructor - used by \c isMonotonicPHI
+  MonotonicDescriptor(Value *Start, MonotonicKind K, const SCEV *Step,
+                      const Instruction *UpdateOp, SetVector<PHINode *> &Phis)
+      : StartValue(Start), MK(K), Step(Step), UpdateOp(UpdateOp),
+        Phis(Phis.begin(), Phis.end()) {}
+
+  /// Start value.
+  TrackingVH<Value> StartValue = nullptr;
+  /// Induction kind.
+  MonotonicKind MK = MK_None;
+  /// Step value.
+  const SCEV *Step = nullptr;
+  // Instruction that advances induction variable.
+  const Instruction *UpdateOp = nullptr;
+
+  /// All phis that are used to update the monotonic variable. It's expected
+  /// that the first PHINode is in the header BB
+  SetVector<PHINode *> Phis;
+};
+
 } // end namespace llvm
 
 #endif // LLVM_ANALYSIS_IVDESCRIPTORS_H
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 58577a6b6eb5c0..a7bdefe0d95708 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1701,6 +1701,9 @@ class TargetTransformInfo {
   bool hasActiveVectorLength(unsigned Opcode, Type *DataType,
                              Align Alignment) const;
 
+  /// \returns true if vectorization of monotonics is supported by the target.
+  bool enableMonotonicVectorization() const;
+
   struct VPLegalization {
     enum VPTransform {
       // keep the predicating parameter
@@ -2131,6 +2134,7 @@ class TargetTransformInfo::Concept {
   virtual bool supportsScalableVectors() const = 0;
   virtual bool hasActiveVectorLength(unsigned Opcode, Type *DataType,
                                      Align Alignment) const = 0;
+  virtual bool enableMonotonicVectorization() const = 0;
   virtual VPLegalization
   getVPLegalizationStrategy(const VPIntrinsic &PI) const = 0;
   virtual bool hasArmWideBranch(bool Thumb) const = 0;
@@ -2874,6 +2878,10 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
     return Impl.hasActiveVectorLength(Opcode, DataType, Alignment);
   }
 
+  bool enableMonotonicVectorization() const override {
+    return Impl.enableMonotonicVectorization();
+  }
+
   VPLegalization
   getVPLegalizationStrategy(const VPIntrinsic &PI) const override {
     return Impl.getVPLegalizationStrategy(PI);
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 13379cc126a40c..e77838882ee725 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -923,6 +923,8 @@ class TargetTransformInfoImplBase {
     return false;
   }
 
+  bool enableMonotonicVectorization() const { return false; }
+
   TargetTransformInfo::VPLegalization
   getVPLegalizationStrategy(const VPIntrinsic &PI) const {
     return TargetTransformInfo::VPLegalization(
diff --git a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
index a509ebf6a7e1b3..9896211ca11d83 100644
--- a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
+++ b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
@@ -257,6 +257,10 @@ class LoopVectorizationLegality {
   /// induction descriptor.
   using InductionList = MapVector<PHINode *, InductionDescriptor>;
 
+  /// MonotonicPhiList contains phi nodes that represent monotonic idiom
+  using MonotonicPhiList =
+      MapVector<const PHINode *, MonotonicDescriptor>;
+
   /// RecurrenceSet contains the phi nodes that are recurrences other than
   /// inductions and reductions.
   using RecurrenceSet = SmallPtrSet<const PHINode *, 8>;
@@ -306,6 +310,42 @@ class LoopVectorizationLegality {
   /// Returns True if V is a Phi node of an induction variable in this loop.
   bool isInductionPhi(const Value *V) const;
 
+  /// Returns the Monotonics found in the loop
+  const MonotonicPhiList &getMonotonics() const { return MonotonicPhis; }
+
+  /// Returns the MonotonicDescriptor associated with an \p I instruction
+  /// Returns emtpy descriptor if \p I instruction is non-monotonic.
+  const MonotonicDescriptor *getMonotonicDescriptor(const Instruction *I) const {
+    for (const auto &PMD : getMonotonics()) {
+      if (const auto *Phi = dyn_cast<const PHINode>(I))
+        if (PMD.second.getPhis().contains(const_cast<PHINode *>(Phi)))
+          return &PMD.second;
+      if (PMD.second.getUpdateOp() == I)
+        return &PMD.second;
+    }
+    return nullptr;
+  }
+
+  /// Returns true if \p I instruction is a header phi of the monotonic.
+  bool isMonotonicPhi(const Instruction *I) const {
+    const auto *Phi = dyn_cast<PHINode>(I);
+    return Phi && MonotonicPhis.contains(Phi);
+  }
+
+  /// Returns true if \p V value is a header phi of the monotonic.
+  bool isMonotonicPhi(const Value *V) const {
+    const auto *I = dyn_cast<Instruction>(V);
+    return I && isMonotonicPhi(I);
+  }
+
+  /// Returns true of \p I instruction is an update instruction of the
+  /// monotonic.
+  bool isMonotonicUpdate(const Instruction *I) const {
+    return any_of(getMonotonics(), [I](const auto &PMD) {
+      return PMD.second.getUpdateOp() == I;
+    });
+  }
+
   /// Returns a pointer to the induction descriptor, if \p Phi is an integer or
   /// floating point induction.
   const InductionDescriptor *getIntOrFpInductionDescriptor(PHINode *Phi) const;
@@ -346,6 +386,13 @@ class LoopVectorizationLegality {
   /// loop. Do not use after invoking 'createVectorizedLoopSkeleton' (PR34965).
   int isConsecutivePtr(Type *AccessTy, Value *Ptr) const;
 
+  /// Returns true if \p Ptr is depends on a monotonic value and ptr diff
+  /// between two iterations is one if monotonic value is updated
+  bool isConsecutiveMonotonicPtr(Value *Ptr) const;
+
+  /// Return true if \p Ptr computation depends on monotonic value.
+  bool ptrHasMonotonicOperand(Value *Ptr) const;
+
   /// Returns true if value V is uniform across \p VF lanes, when \p VF is
   /// provided, and otherwise if \p V is invariant across all loop iterations.
   bool isInvariant(Value *V) const;
@@ -443,6 +490,11 @@ class LoopVectorizationLegality {
   /// specific checks for outer loop vectorization.
   bool canVectorizeOuterLoop();
 
+  /// Return true if loop vectorizer can generate correct code for that
+  /// monotonic. The method is needed to gradually enable vectorization of
+  /// monotonics.
+  bool canVectorizeMonotonic(const MonotonicDescriptor &MD);
+
   /// Return true if all of the instructions in the block can be speculatively
   /// executed, and record the loads/stores that require masking.
   /// \p SafePtrs is a list of addresses that are known to be legal and we know
@@ -460,6 +512,9 @@ class LoopVectorizationLegality {
   void addInductionPhi(PHINode *Phi, const InductionDescriptor &ID,
                        SmallPtrSetImpl<Value *> &AllowedExit);
 
+  /// Add MonotonicDescriptor
+  void addMonotonic(const MonotonicDescriptor &MD);
+
   /// The loop that we evaluate.
   Loop *TheLoop;
 
@@ -510,6 +565,9 @@ class LoopVectorizationLegality {
   /// loop body.
   SmallPtrSet<Instruction *, 4> InductionCastsToIgnore;
 
+  /// Holds the phis of the monotonics
+  MonotonicPhiList MonotonicPhis;
+
   /// Holds the phi nodes that are fixed-order recurrences.
   RecurrenceSet FixedOrderRecurrences;
 
diff --git a/llvm/lib/Analysis/IVDescriptors.cpp b/llvm/lib/Analysis/IVDescriptors.cpp
index 055f121e743411..9194a5622b7dc2 100644
--- a/llvm/lib/Analysis/IVDescriptors.cpp
+++ b/llvm/lib/Analysis/IVDescriptors.cpp
@@ -1475,6 +1475,130 @@ bool InductionDescriptor::isInductionPHI(PHINode *Phi, const Loop *TheLoop,
   return isInductionPHI(Phi, TheLoop, PSE.getSE(), D, AR);
 }
 
+MonotonicDescriptor
+MonotonicDescriptor::isMonotonicPHI(PHINode *Phi, const Loop *L,
+                                    PredicatedScalarEvolution &PSE) {
+  // Monotonic is a special loop carried dependency which is
+  // incremented by a invariant value under some condition and used under the
+  // same or nested condition. That's different to conditional reduction, which
+  // does not allow uses at all.
+
+  // Don't allow multiple updates of the value
+  if (Phi->getNumIncomingValues() != 2)
+    return MonotonicDescriptor();
+
+  Type *Ty = Phi->getType();
+  if (!Ty->isIntegerTy() && !Ty->isPointerTy())
+    return MonotonicDescriptor();
+
+  SetVector<PHINode *> Visited;
+  Visited.insert(Phi);
+  SmallVector<PHINode *> Worklist;
+
+  for (unsigned I = 0, E = Phi->getNumIncomingValues(); I != E; ++I) {
+    if (Phi->getIncomingBlock(I) == L->getLoopPreheader())
+      continue;
+    auto *P = dyn_cast<PHINode>(Phi->getIncomingValue(I));
+    if (!P)
+      return MonotonicDescriptor();
+    Worklist.push_back(P);
+  }
+
+  auto FindSelfUpdate = [&]() -> Instruction * {
+    Instruction *SelfUpdate = nullptr;
+    // Visit use-def chain of the Phi expecting all incoming values as phis
+    // which are used just once, i.e. within that chain.
+    while (!Worklist.empty()) {
+      PHINode *P = Worklist.pop_back_val();
+      if (Visited.contains(P))
+        continue;
+
+      Visited.insert(P);
+      // Expect all phi to be a part of the loop
+      if (!L->contains(P))
+        return nullptr;
+
+      for (unsigned I = 0, E = P->getNumIncomingValues(); I != E; ++I) {
+        Value *V = P->getIncomingValue(I);
+        if (auto *PN = dyn_cast<PHINode>(V)) {
+          Worklist.push_back(PN);
+          continue;
+        }
+        if (SelfUpdate != nullptr)
+          return nullptr;
+
+        if ((Ty->isIntegerTy() && !isa<BinaryOperator>(V)) ||
+            (Ty->isPointerTy() && !isa<GetElementPtrInst>(V)))
+          return nullptr;
+
+        SelfUpdate = cast<Instruction>(V);
+      }
+    }
+    return SelfUpdate;
+  };
+  Instruction *SelfUpdate = FindSelfUpdate();
+
+  // Expect `SelfUpdate` to bey used only once
+  // TODO: Support monotonic with a pre-increment
+  if (!SelfUpdate || SelfUpdate->getNumUses() != 1)
+    return MonotonicDescriptor();
+
+  Value *Step = nullptr;
+  if (auto *GEPUpdate = dyn_cast<GetElementPtrInst>(SelfUpdate)) {
+    if (GEPUpdate->getNumOperands() != 2)
+      return MonotonicDescriptor();
+
+    Step = GEPUpdate->getOperand(1);
+    // TODO: Re-enable update via GEP. This will require changes in VPlan to
+    // correctly print and generate updates
+    return MonotonicDescriptor();
+  }
+  auto *BO = cast<BinaryOperator>(SelfUpdate);
+  // TODO: support other than Add instruction to update monotonic variable
+  if (BO->getOpcode() != Instruction::Add)
+    return MonotonicDescriptor();
+
+  // Either `nsw` or `nuw` should be set, otherwise it's not safe to assume
+  // monotonic won't wrap.
+  if (!BO->hasNoSignedWrap() && !BO->hasNoUnsignedWrap())
+    return MonotonicDescriptor();
+  Step = BO->getOperand(0) == Phi ? BO->getOperand(1) : BO->getOperand(0);
+
+  if (!L->isLoopInvariant(Step))
+    return MonotonicDescriptor();
+
+  auto *StepSCEV = PSE.getSCEV(Step);
+  if (auto *C = dyn_cast<SCEVConstant>(StepSCEV))
+    // TODO: handle step != 1
+    if (!C->isOne())
+      return MonotonicDescriptor();
+
+  // It's important to check all uses of the Phi and make sure they are either
+  // outside of the loop.
+  // TODO: Support uses under nested predicate, which can be supported by vectorizer
+  for (User *U : Phi->users()) {
+    auto *UI = cast<Instruction>(U);
+    if (!L->contains(UI))
+      continue;
+
+    // Ignore phis that are necessary to represent self-update
+    if (auto *P = dyn_cast<PHINode>(UI))
+      if (Visited.contains(P))
+        continue;
+
+    BasicBlock *UIParent = UI->getParent();
+    if (UIParent != SelfUpdate->getParent())
+      return MonotonicDescriptor();
+  }
+
+  Value *StartValue = Phi->getIncomingValueForBlock(L->getLoopPreheader());
+  // Record all visited Phis in a vector and place Phi at the biginning to
+  // simplify future analysis
+  return MonotonicDescriptor(StartValue,
+                             Ty->isPointerTy() ? MK_Pointer : MK_Integer,
+                             StepSCEV, SelfUpdate, Visited);
+}
+
 bool InductionDescriptor::isInductionPHI(
     PHINode *Phi, const Loop *TheLoop, ScalarEvolution *SE,
     InductionDescriptor &D, const SCEV *Expr,
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index 1f11f0d7dd620e..becc8e821fd346 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -1302,6 +1302,10 @@ bool TargetTransformInfo::hasActiveVectorLength(unsigned Opcode, Type *DataType,
   return TTIImpl->hasActiveVectorLength(Opcode, DataType, Alignment);
 }
 
+bool TargetTransformInfo::enableMonotonicVectorization() const {
+  return TTIImpl->enableMonotonicVectorization();
+}
+
 TargetTransformInfo::Concept::~Concept() = default;
 
 TargetIRAnalysis::TargetIRAnalysis() : TTICallback(&getDefaultTTI) {}
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 2e4e69fb4f920f..da01d2f986b4b4 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1609,3 +1609,7 @@ bool RISCVTTIImpl::isLSRCostLess(const TargetTransformInfo::LSRCost &C1,
                   C2.NumIVMuls, C2.NumBaseAdds,
                   C2.ScaleCost, C2.ImmCost, C2.SetupCost);
 }
+
+bool RISCVTTIImpl::enableMonotonicVectorization() const {
+  return ST->hasVInstructions();
+}
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
index af36e9d5d5e886..c5e6fc26605b28 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
@@ -372,6 +372,10 @@ class RISCVTTIImpl : public BasicTTIImplBase<RISCVTTIImpl> {
   bool isLSRCostLess(const TargetTransformInfo::LSRCost &C1,
                      const TargetTransformInfo::LSRCost &C2);
 
+  /// \returns true if ISA supports all needed instructions to vectorize
+  /// monotonics
+  bool enableMonotonicVectorization() const;
+
   bool shouldFoldTerminatingConditionAfterLSR() const {
     return true;
   }
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
index 37a356c43e29a4..77348826e067cf 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
@@ -78,6 +78,11 @@ static cl::opt<LoopVectorizeHints::ScalableForceKind>
                 "Scalable vectorization is available and favored when the "
                 "cost is inconclusive.")));
 
+static cl::opt<bool>
+    EnableMonotonics("enable-monotonics", cl::init(true), cl::Hidden,
+                     cl::desc("Control whether vectorization of loops with "
+                              "monotonic variables is enabled"));
+
 /// Maximum vectorization interleave count.
 static const unsigned MaxInterleaveFactor = 16;
 
@@ -471,6 +476,36 @@ int LoopVectorizationLegality::isConsecutivePtr(Type *AccessTy,
   return 0;
 }
 
+bool LoopVectorizationLegality::isConsecutiveMonotonicPtr(Value *Ptr) const {
+  assert(ptrHasMonotonicOperand(Ptr) &&
+         "Pointer's computation does not use monotonic values.");
+
+  auto *GEP = dyn_cast<GetElementPtrInst>(Ptr);
+  assert(GEP->getNumOperands() == 2 &&
+         "GetElementPtr with more than 1 indexes is not currently supported "
+         "and should be filtered out before.");
+  Value *Monotonic = GEP->getOperand(1);
+  if (auto *Cast = dyn_cast<CastInst>(Monotonic))
+    Monotonic = Cast->getOperand(0);
+  const MonotonicDescriptor *MD =
+      getMonotonicDescriptor(cast<Instruction>(Monotonic));
+  assert(MD && "The index has no MonotonicDescriptor associated with it.");
+  const SCEVConstant *Step = dyn_cast<SCEVConstant>(MD->getStep());
+  return Step && Step->getAPInt().getZExtValue() == 1;
+}
+
+bool LoopVectorizationLegality::ptrHasMonotonicOperand(
+    Value *Ptr) const {
+  auto *GEP = dyn_cast<GetElementPtrInst>(Ptr);
+  if (!GEP)
+    return false;
+  return any_of(GEP->operands(), [&](Value *V) {
+    if (auto *Cast = dyn_cast<CastInst>(V))
+      return isMonotonicPhi(Cast->getOperand(0));
+    return isMonotonicPhi(V);
+  });
+}
+
 bool LoopVectorizationLegality::isInvariant(Value *V) const {
   return LAI->isInvariant(V);
 }
@@ -678,6 +713,47 @@ bool LoopVectorizationLegality::canVectorizeOuterLoop() {
   return Result;
 }
 
+bool LoopVectorizationLegality::canVectorizeMonotonic(const MonotonicDescriptor &MD) {
+  Value *Monotonic = MD.getPhis().front();
+  auto IsUserInLoop = [&](User *U) -> bool {
+    auto *I = dyn_cast<Instruction>(U);
+    return I && TheLoop->contains(I);
+  };
+  auto CanIgnoreUser = [&](User *U) -> bool {
+    if (auto *PN = dyn_cast<PHINode>(U))
+      if (MD.getPhis().contains(PN))
+        return true;
+    return U == MD.getUpdateOp();
+  };
+
+  for (User *U : Monotonic->users()) {
+    if (!IsUserInLoop(U) || CanIgnoreUser(U))
+      continue;
+
+    // For now expect monotonic value to be used by by zext with a single user
+    // or GEP
+    if (U->hasOneUser() && isa<ZExtInst, SExtInst>(U))
+      U = *cast<Instruction>(U)->users().begin();
+
+    if (!isa<GetElementPtrInst>(U))
+      return false;
+
+    // All GEPs should be used as a pointer operand of a store which represents
+    // compressstore.
+    if (any_of(U->users(), [&](User *UI) {
+          if (!IsUserInLoop(UI) || CanIgnoreUser(UI))
+            return false;
+          return UI != MD.getUpdateOp() &&
+                 (!isa<StoreInst>(UI) || getLoadStorePointerOperand(UI) != U);
+        })) {
+      LLVM_DEBUG(
+          dbgs() << "LV: Expand of a monotonic value is not yet supported.\n");
+      return false;
+    }
+  }
+  return true;
+}
+
 void LoopVectorizationLegality::addInductionPhi(
     PHINode *Phi, const InductionDescriptor &ID,
     SmallPtrSetImpl<Value *> &AllowedExit) {
@@ -730,6 +806,11 @@ void LoopVectorizationLegality::addInductionPhi(
   LLVM_DEBUG(dbgs() << "LV: Found an induction variable.\n");
 }
 
+void LoopVectorizationLegality::addMonotonic(const MonotonicDescriptor &MD) {
+  for (PHINode *P : MD.getPhis())
+    MonotonicPhis[P] = MD;
+}
+
 bool LoopVectorizationLegality::setupOuterLoopInductions() {
   BasicBlock *Header = TheLoop->getHeader();
 
@@ -880,6 +961,13 @@ bool LoopVectorizationLegality::canVectorizeInstrs() {
           addInductionPhi(Phi, ID, AllowedExit);
           continue;
         }
+        if (EnableMonotonics && TTI->enableMonotonicVectorization())
+          if (auto MD =
+                  MonotonicDescriptor::isMonotonicPHI(Phi, TheLoop, PSE))
+            if (canVectorizeMonotonic(MD)) {
+              addMon...
[truncated]

llvmbot · 2024-02-29T19:38:08Z

@llvm/pr-subscribers-backend-risc-v

Author: Kolya Panchenko (nikolaypanchenko)

Changes

Monotonic value m is a scalar value that has special loop carried dependency, which can be described as

  m += step
  ... m ...

where

m is a scalar variable, *step is a loop-invariant variable,
the update is done under some non-uniform condition,
use(s) is(are) done under the same or nested condition(s)

Whether m is used in rhs or lhs defines which special vector code needs to be generated on a use-side.
If m is used in lhs, the pattern is known as compress as stored data needs to be compressed according to the mask before the store
If m is used in rhs, the pattern is known as expand/decompress as use data needs to be expanded according to the mask

The changeset adds new descriptor for monotonic values as define above and adds initial support to vectorize unit-strided compress store.
The changeset bails out in case if monotonic value requires expand or expandload or step of it is not 1. Future work is planed to support these scenarios.

Patch is 84.00 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/83467.diff

18 Files Affected:

(modified) llvm/include/llvm/Analysis/IVDescriptors.h (+49)
(modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+8)
(modified) llvm/include/llvm/Analysis/TargetTransformInfoImpl.h (+2)
(modified) llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h (+58)
(modified) llvm/lib/Analysis/IVDescriptors.cpp (+124)
(modified) llvm/lib/Analysis/TargetTransformInfo.cpp (+4)
(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+4)
(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h (+4)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp (+88)
(modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+93-6)
(modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+1)
(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+108-2)
(modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+8-8)
(modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+68)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+62)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+4)
(modified) llvm/lib/Transforms/Vectorize/VPlanValue.h (+2)
(added) llvm/test/Transforms/LoopVectorize/RISCV/compress_expand.ll (+702)

diff --git a/llvm/include/llvm/Analysis/IVDescriptors.h b/llvm/include/llvm/Analysis/IVDescriptors.h
index 5c7b613ac48c40..877204a8b2d864 100644
--- a/llvm/include/llvm/Analysis/IVDescriptors.h
+++ b/llvm/include/llvm/Analysis/IVDescriptors.h
@@ -17,6 +17,7 @@
 #include "llvm/ADT/SmallVector.h"
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/ValueHandle.h"
+#include "llvm/ADT/SetVector.h"
 
 namespace llvm {
 
@@ -395,6 +396,54 @@ class InductionDescriptor {
   SmallVector<Instruction *, 2> RedundantCasts;
 };
 
+class MonotonicDescriptor {
+public:
+  /// This enum represents the kinds of monotonic that we support.
+  enum MonotonicKind {
+    MK_None,  ///< Not a monotonic variable.
+    MK_Integer, /// < Integer monotonic variable. Step = C
+    MK_Pointer, /// < Pointer monotonic variable. Step = C
+  };
+
+public:
+  MonotonicDescriptor() = default;
+
+  Value *getStartValue() const { return StartValue; }
+  MonotonicKind getKind() const { return MK; }
+  const SCEV *getStep() const { return Step; }
+  const Instruction *getUpdateOp() const { return UpdateOp; }
+  const SetVector<PHINode *> &getPhis() const { return Phis; }
+  bool isHeaderPhi(const PHINode *Phi) const {
+    return !Phis.empty() && Phis[0] == Phi;
+  }
+
+  /// Returns true if \p Phi forms monotonic pattern within a loop \p L.
+  static MonotonicDescriptor isMonotonicPHI(PHINode *Phi, const Loop *L,
+                                            PredicatedScalarEvolution &PSE);
+
+  operator bool() const { return MK != MK_None; }
+
+private:
+  /// Private constructor - used by \c isMonotonicPHI
+  MonotonicDescriptor(Value *Start, MonotonicKind K, const SCEV *Step,
+                      const Instruction *UpdateOp, SetVector<PHINode *> &Phis)
+      : StartValue(Start), MK(K), Step(Step), UpdateOp(UpdateOp),
+        Phis(Phis.begin(), Phis.end()) {}
+
+  /// Start value.
+  TrackingVH<Value> StartValue = nullptr;
+  /// Induction kind.
+  MonotonicKind MK = MK_None;
+  /// Step value.
+  const SCEV *Step = nullptr;
+  // Instruction that advances induction variable.
+  const Instruction *UpdateOp = nullptr;
+
+  /// All phis that are used to update the monotonic variable. It's expected
+  /// that the first PHINode is in the header BB
+  SetVector<PHINode *> Phis;
+};
+
 } // end namespace llvm
 
 #endif // LLVM_ANALYSIS_IVDESCRIPTORS_H
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 58577a6b6eb5c0..a7bdefe0d95708 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -1701,6 +1701,9 @@ class TargetTransformInfo {
   bool hasActiveVectorLength(unsigned Opcode, Type *DataType,
                              Align Alignment) const;
 
+  /// \returns true if vectorization of monotonics is supported by the target.
+  bool enableMonotonicVectorization() const;
+
   struct VPLegalization {
     enum VPTransform {
       // keep the predicating parameter
@@ -2131,6 +2134,7 @@ class TargetTransformInfo::Concept {
   virtual bool supportsScalableVectors() const = 0;
   virtual bool hasActiveVectorLength(unsigned Opcode, Type *DataType,
                                      Align Alignment) const = 0;
+  virtual bool enableMonotonicVectorization() const = 0;
   virtual VPLegalization
   getVPLegalizationStrategy(const VPIntrinsic &PI) const = 0;
   virtual bool hasArmWideBranch(bool Thumb) const = 0;
@@ -2874,6 +2878,10 @@ class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
     return Impl.hasActiveVectorLength(Opcode, DataType, Alignment);
   }
 
+  bool enableMonotonicVectorization() const override {
+    return Impl.enableMonotonicVectorization();
+  }
+
   VPLegalization
   getVPLegalizationStrategy(const VPIntrinsic &PI) const override {
     return Impl.getVPLegalizationStrategy(PI);
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
index 13379cc126a40c..e77838882ee725 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
@@ -923,6 +923,8 @@ class TargetTransformInfoImplBase {
     return false;
   }
 
+  bool enableMonotonicVectorization() const { return false; }
+
   TargetTransformInfo::VPLegalization
   getVPLegalizationStrategy(const VPIntrinsic &PI) const {
     return TargetTransformInfo::VPLegalization(
diff --git a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
index a509ebf6a7e1b3..9896211ca11d83 100644
--- a/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
+++ b/llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
@@ -257,6 +257,10 @@ class LoopVectorizationLegality {
   /// induction descriptor.
   using InductionList = MapVector<PHINode *, InductionDescriptor>;
 
+  /// MonotonicPhiList contains phi nodes that represent monotonic idiom
+  using MonotonicPhiList =
+      MapVector<const PHINode *, MonotonicDescriptor>;
+
   /// RecurrenceSet contains the phi nodes that are recurrences other than
   /// inductions and reductions.
   using RecurrenceSet = SmallPtrSet<const PHINode *, 8>;
@@ -306,6 +310,42 @@ class LoopVectorizationLegality {
   /// Returns True if V is a Phi node of an induction variable in this loop.
   bool isInductionPhi(const Value *V) const;
 
+  /// Returns the Monotonics found in the loop
+  const MonotonicPhiList &getMonotonics() const { return MonotonicPhis; }
+
+  /// Returns the MonotonicDescriptor associated with an \p I instruction
+  /// Returns emtpy descriptor if \p I instruction is non-monotonic.
+  const MonotonicDescriptor *getMonotonicDescriptor(const Instruction *I) const {
+    for (const auto &PMD : getMonotonics()) {
+      if (const auto *Phi = dyn_cast<const PHINode>(I))
+        if (PMD.second.getPhis().contains(const_cast<PHINode *>(Phi)))
+          return &PMD.second;
+      if (PMD.second.getUpdateOp() == I)
+        return &PMD.second;
+    }
+    return nullptr;
+  }
+
+  /// Returns true if \p I instruction is a header phi of the monotonic.
+  bool isMonotonicPhi(const Instruction *I) const {
+    const auto *Phi = dyn_cast<PHINode>(I);
+    return Phi && MonotonicPhis.contains(Phi);
+  }
+
+  /// Returns true if \p V value is a header phi of the monotonic.
+  bool isMonotonicPhi(const Value *V) const {
+    const auto *I = dyn_cast<Instruction>(V);
+    return I && isMonotonicPhi(I);
+  }
+
+  /// Returns true of \p I instruction is an update instruction of the
+  /// monotonic.
+  bool isMonotonicUpdate(const Instruction *I) const {
+    return any_of(getMonotonics(), [I](const auto &PMD) {
+      return PMD.second.getUpdateOp() == I;
+    });
+  }
+
   /// Returns a pointer to the induction descriptor, if \p Phi is an integer or
   /// floating point induction.
   const InductionDescriptor *getIntOrFpInductionDescriptor(PHINode *Phi) const;
@@ -346,6 +386,13 @@ class LoopVectorizationLegality {
   /// loop. Do not use after invoking 'createVectorizedLoopSkeleton' (PR34965).
   int isConsecutivePtr(Type *AccessTy, Value *Ptr) const;
 
+  /// Returns true if \p Ptr is depends on a monotonic value and ptr diff
+  /// between two iterations is one if monotonic value is updated
+  bool isConsecutiveMonotonicPtr(Value *Ptr) const;
+
+  /// Return true if \p Ptr computation depends on monotonic value.
+  bool ptrHasMonotonicOperand(Value *Ptr) const;
+
   /// Returns true if value V is uniform across \p VF lanes, when \p VF is
   /// provided, and otherwise if \p V is invariant across all loop iterations.
   bool isInvariant(Value *V) const;
@@ -443,6 +490,11 @@ class LoopVectorizationLegality {
   /// specific checks for outer loop vectorization.
   bool canVectorizeOuterLoop();
 
+  /// Return true if loop vectorizer can generate correct code for that
+  /// monotonic. The method is needed to gradually enable vectorization of
+  /// monotonics.
+  bool canVectorizeMonotonic(const MonotonicDescriptor &MD);
+
   /// Return true if all of the instructions in the block can be speculatively
   /// executed, and record the loads/stores that require masking.
   /// \p SafePtrs is a list of addresses that are known to be legal and we know
@@ -460,6 +512,9 @@ class LoopVectorizationLegality {
   void addInductionPhi(PHINode *Phi, const InductionDescriptor &ID,
                        SmallPtrSetImpl<Value *> &AllowedExit);
 
+  /// Add MonotonicDescriptor
+  void addMonotonic(const MonotonicDescriptor &MD);
+
   /// The loop that we evaluate.
   Loop *TheLoop;
 
@@ -510,6 +565,9 @@ class LoopVectorizationLegality {
   /// loop body.
   SmallPtrSet<Instruction *, 4> InductionCastsToIgnore;
 
+  /// Holds the phis of the monotonics
+  MonotonicPhiList MonotonicPhis;
+
   /// Holds the phi nodes that are fixed-order recurrences.
   RecurrenceSet FixedOrderRecurrences;
 
diff --git a/llvm/lib/Analysis/IVDescriptors.cpp b/llvm/lib/Analysis/IVDescriptors.cpp
index 055f121e743411..9194a5622b7dc2 100644
--- a/llvm/lib/Analysis/IVDescriptors.cpp
+++ b/llvm/lib/Analysis/IVDescriptors.cpp
@@ -1475,6 +1475,130 @@ bool InductionDescriptor::isInductionPHI(PHINode *Phi, const Loop *TheLoop,
   return isInductionPHI(Phi, TheLoop, PSE.getSE(), D, AR);
 }
 
+MonotonicDescriptor
+MonotonicDescriptor::isMonotonicPHI(PHINode *Phi, const Loop *L,
+                                    PredicatedScalarEvolution &PSE) {
+  // Monotonic is a special loop carried dependency which is
+  // incremented by a invariant value under some condition and used under the
+  // same or nested condition. That's different to conditional reduction, which
+  // does not allow uses at all.
+
+  // Don't allow multiple updates of the value
+  if (Phi->getNumIncomingValues() != 2)
+    return MonotonicDescriptor();
+
+  Type *Ty = Phi->getType();
+  if (!Ty->isIntegerTy() && !Ty->isPointerTy())
+    return MonotonicDescriptor();
+
+  SetVector<PHINode *> Visited;
+  Visited.insert(Phi);
+  SmallVector<PHINode *> Worklist;
+
+  for (unsigned I = 0, E = Phi->getNumIncomingValues(); I != E; ++I) {
+    if (Phi->getIncomingBlock(I) == L->getLoopPreheader())
+      continue;
+    auto *P = dyn_cast<PHINode>(Phi->getIncomingValue(I));
+    if (!P)
+      return MonotonicDescriptor();
+    Worklist.push_back(P);
+  }
+
+  auto FindSelfUpdate = [&]() -> Instruction * {
+    Instruction *SelfUpdate = nullptr;
+    // Visit use-def chain of the Phi expecting all incoming values as phis
+    // which are used just once, i.e. within that chain.
+    while (!Worklist.empty()) {
+      PHINode *P = Worklist.pop_back_val();
+      if (Visited.contains(P))
+        continue;
+
+      Visited.insert(P);
+      // Expect all phi to be a part of the loop
+      if (!L->contains(P))
+        return nullptr;
+
+      for (unsigned I = 0, E = P->getNumIncomingValues(); I != E; ++I) {
+        Value *V = P->getIncomingValue(I);
+        if (auto *PN = dyn_cast<PHINode>(V)) {
+          Worklist.push_back(PN);
+          continue;
+        }
+        if (SelfUpdate != nullptr)
+          return nullptr;
+
+        if ((Ty->isIntegerTy() && !isa<BinaryOperator>(V)) ||
+            (Ty->isPointerTy() && !isa<GetElementPtrInst>(V)))
+          return nullptr;
+
+        SelfUpdate = cast<Instruction>(V);
+      }
+    }
+    return SelfUpdate;
+  };
+  Instruction *SelfUpdate = FindSelfUpdate();
+
+  // Expect `SelfUpdate` to bey used only once
+  // TODO: Support monotonic with a pre-increment
+  if (!SelfUpdate || SelfUpdate->getNumUses() != 1)
+    return MonotonicDescriptor();
+
+  Value *Step = nullptr;
+  if (auto *GEPUpdate = dyn_cast<GetElementPtrInst>(SelfUpdate)) {
+    if (GEPUpdate->getNumOperands() != 2)
+      return MonotonicDescriptor();
+
+    Step = GEPUpdate->getOperand(1);
+    // TODO: Re-enable update via GEP. This will require changes in VPlan to
+    // correctly print and generate updates
+    return MonotonicDescriptor();
+  }
+  auto *BO = cast<BinaryOperator>(SelfUpdate);
+  // TODO: support other than Add instruction to update monotonic variable
+  if (BO->getOpcode() != Instruction::Add)
+    return MonotonicDescriptor();
+
+  // Either `nsw` or `nuw` should be set, otherwise it's not safe to assume
+  // monotonic won't wrap.
+  if (!BO->hasNoSignedWrap() && !BO->hasNoUnsignedWrap())
+    return MonotonicDescriptor();
+  Step = BO->getOperand(0) == Phi ? BO->getOperand(1) : BO->getOperand(0);
+
+  if (!L->isLoopInvariant(Step))
+    return MonotonicDescriptor();
+
+  auto *StepSCEV = PSE.getSCEV(Step);
+  if (auto *C = dyn_cast<SCEVConstant>(StepSCEV))
+    // TODO: handle step != 1
+    if (!C->isOne())
+      return MonotonicDescriptor();
+
+  // It's important to check all uses of the Phi and make sure they are either
+  // outside of the loop.
+  // TODO: Support uses under nested predicate, which can be supported by vectorizer
+  for (User *U : Phi->users()) {
+    auto *UI = cast<Instruction>(U);
+    if (!L->contains(UI))
+      continue;
+
+    // Ignore phis that are necessary to represent self-update
+    if (auto *P = dyn_cast<PHINode>(UI))
+      if (Visited.contains(P))
+        continue;
+
+    BasicBlock *UIParent = UI->getParent();
+    if (UIParent != SelfUpdate->getParent())
+      return MonotonicDescriptor();
+  }
+
+  Value *StartValue = Phi->getIncomingValueForBlock(L->getLoopPreheader());
+  // Record all visited Phis in a vector and place Phi at the biginning to
+  // simplify future analysis
+  return MonotonicDescriptor(StartValue,
+                             Ty->isPointerTy() ? MK_Pointer : MK_Integer,
+                             StepSCEV, SelfUpdate, Visited);
+}
+
 bool InductionDescriptor::isInductionPHI(
     PHINode *Phi, const Loop *TheLoop, ScalarEvolution *SE,
     InductionDescriptor &D, const SCEV *Expr,
diff --git a/llvm/lib/Analysis/TargetTransformInfo.cpp b/llvm/lib/Analysis/TargetTransformInfo.cpp
index 1f11f0d7dd620e..becc8e821fd346 100644
--- a/llvm/lib/Analysis/TargetTransformInfo.cpp
+++ b/llvm/lib/Analysis/TargetTransformInfo.cpp
@@ -1302,6 +1302,10 @@ bool TargetTransformInfo::hasActiveVectorLength(unsigned Opcode, Type *DataType,
   return TTIImpl->hasActiveVectorLength(Opcode, DataType, Alignment);
 }
 
+bool TargetTransformInfo::enableMonotonicVectorization() const {
+  return TTIImpl->enableMonotonicVectorization();
+}
+
 TargetTransformInfo::Concept::~Concept() = default;
 
 TargetIRAnalysis::TargetIRAnalysis() : TTICallback(&getDefaultTTI) {}
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 2e4e69fb4f920f..da01d2f986b4b4 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -1609,3 +1609,7 @@ bool RISCVTTIImpl::isLSRCostLess(const TargetTransformInfo::LSRCost &C1,
                   C2.NumIVMuls, C2.NumBaseAdds,
                   C2.ScaleCost, C2.ImmCost, C2.SetupCost);
 }
+
+bool RISCVTTIImpl::enableMonotonicVectorization() const {
+  return ST->hasVInstructions();
+}
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
index af36e9d5d5e886..c5e6fc26605b28 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
@@ -372,6 +372,10 @@ class RISCVTTIImpl : public BasicTTIImplBase<RISCVTTIImpl> {
   bool isLSRCostLess(const TargetTransformInfo::LSRCost &C1,
                      const TargetTransformInfo::LSRCost &C2);
 
+  /// \returns true if ISA supports all needed instructions to vectorize
+  /// monotonics
+  bool enableMonotonicVectorization() const;
+
   bool shouldFoldTerminatingConditionAfterLSR() const {
     return true;
   }
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
index 37a356c43e29a4..77348826e067cf 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
@@ -78,6 +78,11 @@ static cl::opt<LoopVectorizeHints::ScalableForceKind>
                 "Scalable vectorization is available and favored when the "
                 "cost is inconclusive.")));
 
+static cl::opt<bool>
+    EnableMonotonics("enable-monotonics", cl::init(true), cl::Hidden,
+                     cl::desc("Control whether vectorization of loops with "
+                              "monotonic variables is enabled"));
+
 /// Maximum vectorization interleave count.
 static const unsigned MaxInterleaveFactor = 16;
 
@@ -471,6 +476,36 @@ int LoopVectorizationLegality::isConsecutivePtr(Type *AccessTy,
   return 0;
 }
 
+bool LoopVectorizationLegality::isConsecutiveMonotonicPtr(Value *Ptr) const {
+  assert(ptrHasMonotonicOperand(Ptr) &&
+         "Pointer's computation does not use monotonic values.");
+
+  auto *GEP = dyn_cast<GetElementPtrInst>(Ptr);
+  assert(GEP->getNumOperands() == 2 &&
+         "GetElementPtr with more than 1 indexes is not currently supported "
+         "and should be filtered out before.");
+  Value *Monotonic = GEP->getOperand(1);
+  if (auto *Cast = dyn_cast<CastInst>(Monotonic))
+    Monotonic = Cast->getOperand(0);
+  const MonotonicDescriptor *MD =
+      getMonotonicDescriptor(cast<Instruction>(Monotonic));
+  assert(MD && "The index has no MonotonicDescriptor associated with it.");
+  const SCEVConstant *Step = dyn_cast<SCEVConstant>(MD->getStep());
+  return Step && Step->getAPInt().getZExtValue() == 1;
+}
+
+bool LoopVectorizationLegality::ptrHasMonotonicOperand(
+    Value *Ptr) const {
+  auto *GEP = dyn_cast<GetElementPtrInst>(Ptr);
+  if (!GEP)
+    return false;
+  return any_of(GEP->operands(), [&](Value *V) {
+    if (auto *Cast = dyn_cast<CastInst>(V))
+      return isMonotonicPhi(Cast->getOperand(0));
+    return isMonotonicPhi(V);
+  });
+}
+
 bool LoopVectorizationLegality::isInvariant(Value *V) const {
   return LAI->isInvariant(V);
 }
@@ -678,6 +713,47 @@ bool LoopVectorizationLegality::canVectorizeOuterLoop() {
   return Result;
 }
 
+bool LoopVectorizationLegality::canVectorizeMonotonic(const MonotonicDescriptor &MD) {
+  Value *Monotonic = MD.getPhis().front();
+  auto IsUserInLoop = [&](User *U) -> bool {
+    auto *I = dyn_cast<Instruction>(U);
+    return I && TheLoop->contains(I);
+  };
+  auto CanIgnoreUser = [&](User *U) -> bool {
+    if (auto *PN = dyn_cast<PHINode>(U))
+      if (MD.getPhis().contains(PN))
+        return true;
+    return U == MD.getUpdateOp();
+  };
+
+  for (User *U : Monotonic->users()) {
+    if (!IsUserInLoop(U) || CanIgnoreUser(U))
+      continue;
+
+    // For now expect monotonic value to be used by by zext with a single user
+    // or GEP
+    if (U->hasOneUser() && isa<ZExtInst, SExtInst>(U))
+      U = *cast<Instruction>(U)->users().begin();
+
+    if (!isa<GetElementPtrInst>(U))
+      return false;
+
+    // All GEPs should be used as a pointer operand of a store which represents
+    // compressstore.
+    if (any_of(U->users(), [&](User *UI) {
+          if (!IsUserInLoop(UI) || CanIgnoreUser(UI))
+            return false;
+          return UI != MD.getUpdateOp() &&
+                 (!isa<StoreInst>(UI) || getLoadStorePointerOperand(UI) != U);
+        })) {
+      LLVM_DEBUG(
+          dbgs() << "LV: Expand of a monotonic value is not yet supported.\n");
+      return false;
+    }
+  }
+  return true;
+}
+
 void LoopVectorizationLegality::addInductionPhi(
     PHINode *Phi, const InductionDescriptor &ID,
     SmallPtrSetImpl<Value *> &AllowedExit) {
@@ -730,6 +806,11 @@ void LoopVectorizationLegality::addInductionPhi(
   LLVM_DEBUG(dbgs() << "LV: Found an induction variable.\n");
 }
 
+void LoopVectorizationLegality::addMonotonic(const MonotonicDescriptor &MD) {
+  for (PHINode *P : MD.getPhis())
+    MonotonicPhis[P] = MD;
+}
+
 bool LoopVectorizationLegality::setupOuterLoopInductions() {
   BasicBlock *Header = TheLoop->getHeader();
 
@@ -880,6 +961,13 @@ bool LoopVectorizationLegality::canVectorizeInstrs() {
           addInductionPhi(Phi, ID, AllowedExit);
           continue;
         }
+        if (EnableMonotonics && TTI->enableMonotonicVectorization())
+          if (auto MD =
+                  MonotonicDescriptor::isMonotonicPHI(Phi, TheLoop, PSE))
+            if (canVectorizeMonotonic(MD)) {
+              addMon...
[truncated]

github-actions · 2024-02-29T19:39:58Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff 920094ea3b84a9f99f3e4d014c3eac062a617efe d3256232d31f361099ff917eef19c26443a3a09e -- llvm/include/llvm/Analysis/IVDescriptors.h llvm/include/llvm/Analysis/TargetTransformInfo.h llvm/include/llvm/Analysis/TargetTransformInfoImpl.h llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h llvm/lib/Analysis/IVDescriptors.cpp llvm/lib/Analysis/TargetTransformInfo.cpp llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/lib/Transforms/Vectorize/VPlan.cpp llvm/lib/Transforms/Vectorize/VPlan.h llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp llvm/lib/Transforms/Vectorize/VPlanTransforms.h llvm/lib/Transforms/Vectorize/VPlanValue.h

View the diff from clang-format here.

diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
index c41af389eb..5956c39736 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
@@ -712,7 +712,8 @@ bool LoopVectorizationLegality::canVectorizeOuterLoop() {
   return Result;
 }
 
-bool LoopVectorizationLegality::canVectorizeMonotonic(const MonotonicDescriptor &MD) {
+bool LoopVectorizationLegality::canVectorizeMonotonic(
+    const MonotonicDescriptor &MD) {
   Value *Monotonic = MD.getPhis().front();
   auto IsUserInLoop = [&](User *U) -> bool {
     auto *I = dyn_cast<Instruction>(U);

llvm/test/Transforms/LoopVectorize/RISCV/compress_expand.ll

Monotonic Idiom is a special form of a loop carried dependency, which can be described as ``` m += step ... m ... ``` where * `m` is a scalar variable, *`step` is a loop-invariant variable, * the update is done under some non-uniform condition, * use(s) is(are) done under the same or nested condition(s) Whether `m` is used in rhs or lhs defines which special vector code needs to be generated on a use-side. If `m` is used in lhs, the pattern is known as compress as stored data needs to be compressed before the store If `m` is used in rhs, the pattern is known as expand/decompress as use data needs to be expanded according to the mask The changeset adds new descriptor for monotonic values as define above and adds initial support for unit-strided compress store.

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

alexey-bataev · 2024-02-29T20:03:57Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp


+bool LoopVectorizationCostModel::memoryInstructionUsesMonotonic(
+    Instruction *I, ElementCount VF) {
+  assert((isa<LoadInst, StoreInst>(I)) && "Invalid memory instruction");


Drop extra parens

looks like I cannot:

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4117:24: error: too many arguments provided to function-like macro invocation assert(isa<LoadInst, StoreInst>(I) && "Invalid memory instruction"); ^ /usr/include/assert.h:89:11: note: macro 'assert' defined here # define assert(expr)

nikolaypanchenko · 2024-03-21T15:38:42Z

@fhahn ping

nikolaypanchenko · 2024-04-11T21:57:10Z

@fhahn ping

llvmbot added backend:RISC-V vectorizers llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Feb 29, 2024

npanchen requested review from fhahn, arcbbb, alexey-bataev and Mel-Chen February 29, 2024 19:38

topperc reviewed Feb 29, 2024

View reviewed changes

llvm/test/Transforms/LoopVectorize/RISCV/compress_expand.ll Outdated Show resolved Hide resolved

alexey-bataev reviewed Feb 29, 2024

View reviewed changes

format + addressed comments

d325623

nikolaypanchenko force-pushed the pr/npanchen/compress_vectorization branch from 4f6d03e to d325623 Compare February 29, 2024 20:42

michaelmaitland mentioned this pull request Aug 30, 2024

[LV][VPlan] Add initial support for CSA vectorization #106560

Closed

skachkov-sc mentioned this pull request Jan 14, 2025

[TTI][CostModel] Add cost modeling for expandload and compressstore intrinsics #122882

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LV] Vectorization of compress idiom #83467

[LV] Vectorization of compress idiom #83467

Uh oh!

nikolaypanchenko commented Feb 29, 2024

Uh oh!

llvmbot commented Feb 29, 2024 •

edited

Loading

Uh oh!

llvmbot commented Feb 29, 2024

Uh oh!

github-actions bot commented Feb 29, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexey-bataev Feb 29, 2024

Uh oh!

nikolaypanchenko Feb 29, 2024

Uh oh!

nikolaypanchenko commented Mar 21, 2024

Uh oh!

nikolaypanchenko commented Apr 11, 2024

Uh oh!

Uh oh!

[LV] Vectorization of compress idiom #83467

Are you sure you want to change the base?

[LV] Vectorization of compress idiom #83467

Uh oh!

Conversation

nikolaypanchenko commented Feb 29, 2024

Uh oh!

llvmbot commented Feb 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Feb 29, 2024

Uh oh!

github-actions bot commented Feb 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alexey-bataev Feb 29, 2024

Choose a reason for hiding this comment

Uh oh!

nikolaypanchenko Feb 29, 2024

Choose a reason for hiding this comment

Uh oh!

nikolaypanchenko commented Mar 21, 2024

Uh oh!

nikolaypanchenko commented Apr 11, 2024

Uh oh!

Uh oh!

llvmbot commented Feb 29, 2024 •

edited

Loading

github-actions bot commented Feb 29, 2024 •

edited

Loading