Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LoopVectorize] Vectorize select-cmp reduction pattern for increasing integer induction variable #67812

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

Mel-Chen
Copy link
Contributor

Consider the following loop:

  int rdx = init;
  for (int i = 0; i < n; ++i)
    rdx = (a[i] > b[i]) ? i : rdx;

We can vectorize this loop if i is an increasing induction variable.
The final reduced value will be the maximum of i that the condition
a[i] > b[i] is satisfied, or the start value init.

This patch added new RecurKind enums - IFindLastIV and FFindLastIV.

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 29, 2023

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-vectorizers

Changes

Consider the following loop:

  int rdx = init;
  for (int i = 0; i &lt; n; ++i)
    rdx = (a[i] &gt; b[i]) ? i : rdx;

We can vectorize this loop if i is an increasing induction variable.
The final reduced value will be the maximum of i that the condition
a[i] &gt; b[i] is satisfied, or the start value init.

This patch added new RecurKind enums - IFindLastIV and FFindLastIV.


Patch is 230.36 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/67812.diff

11 Files Affected:

  • (modified) llvm/include/llvm/Analysis/IVDescriptors.h (+26-3)
  • (modified) llvm/include/llvm/Transforms/Utils/LoopUtils.h (+21)
  • (modified) llvm/lib/Analysis/IVDescriptors.cpp (+108-5)
  • (modified) llvm/lib/Transforms/Utils/LoopUtils.cpp (+25)
  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+13-4)
  • (modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+4)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+16)
  • (modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp-no-wrap.ll (+103-4)
  • (modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp-trunc.ll (+577-9)
  • (modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp.ll (+1580-19)
  • (modified) llvm/test/Transforms/LoopVectorize/select-min-index.ll (+181-5)
diff --git a/llvm/include/llvm/Analysis/IVDescriptors.h b/llvm/include/llvm/Analysis/IVDescriptors.h
index 0ee3f4fed8c976d..622060df0d8cd6c 100644
--- a/llvm/include/llvm/Analysis/IVDescriptors.h
+++ b/llvm/include/llvm/Analysis/IVDescriptors.h
@@ -52,9 +52,16 @@ enum class RecurKind {
   FMulAdd,  ///< Sum of float products with llvm.fmuladd(a * b + sum).
   IAnyOf,   ///< Any_of reduction with select(icmp(),x,y) where one of (x,y) is
             ///< loop invariant, and both x and y are integer type.
-  FAnyOf    ///< Any_of reduction with select(fcmp(),x,y) where one of (x,y) is
+  FAnyOf,   ///< Any_of reduction with select(fcmp(),x,y) where one of (x,y) is
             ///< loop invariant, and both x and y are integer type.
-  // TODO: Any_of reduction need not be restricted to integer type only.
+  IFindLastIV, ///< FindLast reduction with select(icmp(),x,y) where one of
+               ///< (x,y) is increasing loop induction PHI, and both x and y are
+               ///< integer type.
+  FFindLastIV ///< FindLast reduction with select(fcmp(),x,y) where one of (x,y)
+              ///< is increasing loop induction PHI, and both x and y are
+              ///< integer type.
+  // TODO: Any_of and FindLast reduction need not be restricted to integer type
+  // only.
 };
 
 /// The RecurrenceDescriptor is used to identify recurrences variables in a
@@ -126,7 +133,7 @@ class RecurrenceDescriptor {
   /// the returned struct.
   static InstDesc isRecurrenceInstr(Loop *L, PHINode *Phi, Instruction *I,
                                     RecurKind Kind, InstDesc &Prev,
-                                    FastMathFlags FuncFMF);
+                                    FastMathFlags FuncFMF, ScalarEvolution *SE);
 
   /// Returns true if instruction I has multiple uses in Insts
   static bool hasMultipleUsesOf(Instruction *I,
@@ -153,6 +160,16 @@ class RecurrenceDescriptor {
   static InstDesc isAnyOfPattern(Loop *Loop, PHINode *OrigPhi, Instruction *I,
                                  InstDesc &Prev);
 
+  /// Returns a struct describing whether the instruction is either a
+  ///   Select(ICmp(A, B), X, Y), or
+  ///   Select(FCmp(A, B), X, Y)
+  /// where one of (X, Y) is an increasing loop induction variable, and the
+  /// other is a PHI value.
+  // TODO: FindLast does not need be restricted to increasing loop induction
+  // variables.
+  static InstDesc isFindLastIVPattern(Loop *Loop, PHINode *OrigPhi,
+                                      Instruction *I, ScalarEvolution *SE);
+
   /// Returns a struct describing if the instruction is a
   /// Select(FCmp(X, Y), (Z = X op PHINode), PHINode) instruction pattern.
   static InstDesc isConditionalRdxPattern(RecurKind Kind, Instruction *I);
@@ -241,6 +258,12 @@ class RecurrenceDescriptor {
     return Kind == RecurKind::IAnyOf || Kind == RecurKind::FAnyOf;
   }
 
+  /// Returns true if the recurrence kind is of the form
+  ///   select(cmp(),x,y) where one of (x,y) is increasing loop induction.
+  static bool isFindLastIVRecurrenceKind(RecurKind Kind) {
+    return Kind == RecurKind::IFindLastIV || Kind == RecurKind::FFindLastIV;
+  }
+
   /// Returns the type of the recurrence. This type can be narrower than the
   /// actual type of the Phi if the recurrence has been type-promoted.
   Type *getRecurrenceType() const { return RecurrenceType; }
diff --git a/llvm/include/llvm/Transforms/Utils/LoopUtils.h b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
index 0d99249be413762..b83cfbe07ec59d8 100644
--- a/llvm/include/llvm/Transforms/Utils/LoopUtils.h
+++ b/llvm/include/llvm/Transforms/Utils/LoopUtils.h
@@ -372,6 +372,12 @@ CmpInst::Predicate getMinMaxReductionPredicate(RecurKind RK);
 Value *createAnyOfOp(IRBuilderBase &Builder, Value *StartVal, RecurKind RK,
                      Value *Left, Value *Right);
 
+/// See RecurrenceDescriptor::isFindLastIVPattern for a description of the
+/// pattern we are trying to match. In this pattern, since the selected set of
+/// values forms an increasing sequence, we are selecting the maximum value from
+/// \p Left and \p Right.
+Value *createFindLastIVOp(IRBuilderBase &Builder, Value *Left, Value *Right);
+
 /// Returns a Min/Max operation corresponding to MinMaxRecurrenceKind.
 /// The Builder's fast-math-flags must be set to propagate the expected values.
 Value *createMinMaxOp(IRBuilderBase &Builder, RecurKind RK, Value *Left,
@@ -402,6 +408,12 @@ Value *createAnyOfTargetReduction(IRBuilderBase &B, Value *Src,
                                   const RecurrenceDescriptor &Desc,
                                   PHINode *OrigPhi);
 
+/// Create a target reduction of the given vector \p Src for a reduction of the
+/// kind RecurKind::IFindLastIV or RecurKind::FFindLastIV. The reduction
+/// operation is described by \p Desc.
+Value *createFindLastIVTargetReduction(IRBuilderBase &B, Value *Src,
+                                       const RecurrenceDescriptor &Desc);
+
 /// Create a generic target reduction using a recurrence descriptor \p Desc
 /// The target is queried to determine if intrinsics or shuffle sequences are
 /// required to implement the reduction.
@@ -415,6 +427,15 @@ Value *createOrderedReduction(IRBuilderBase &B,
                               const RecurrenceDescriptor &Desc, Value *Src,
                               Value *Start);
 
+/// Returns a set of cmp and select instructions as shown below:
+///   Select(Cmp(NE, Rdx, Iden), Rdx, InitVal)
+/// where \p Rdx is a scalar value generated by target reduction, Iden is the
+/// sentinel value of the recurrence descriptor \p Desc, and InitVal is the
+/// start value of the recurrence descriptor \p Desc.
+Value *createSentinelValueHandling(IRBuilderBase &Builder,
+                                   const RecurrenceDescriptor &Desc,
+                                   Value *Rdx);
+
 /// Get the intersection (logical and) of all of the potential IR flags
 /// of each scalar operation (VL) that will be converted into a vector (I).
 /// If OpValue is non-null, we only consider operations similar to OpValue
diff --git a/llvm/lib/Analysis/IVDescriptors.cpp b/llvm/lib/Analysis/IVDescriptors.cpp
index 46629e381bc3665..19f594d0453c574 100644
--- a/llvm/lib/Analysis/IVDescriptors.cpp
+++ b/llvm/lib/Analysis/IVDescriptors.cpp
@@ -54,6 +54,8 @@ bool RecurrenceDescriptor::isIntegerRecurrenceKind(RecurKind Kind) {
   case RecurKind::UMin:
   case RecurKind::IAnyOf:
   case RecurKind::FAnyOf:
+  case RecurKind::IFindLastIV:
+  case RecurKind::FFindLastIV:
     return true;
   }
   return false;
@@ -375,7 +377,7 @@ bool RecurrenceDescriptor::AddReductionVar(
     // type-promoted).
     if (Cur != Start) {
       ReduxDesc =
-          isRecurrenceInstr(TheLoop, Phi, Cur, Kind, ReduxDesc, FuncFMF);
+          isRecurrenceInstr(TheLoop, Phi, Cur, Kind, ReduxDesc, FuncFMF, SE);
       ExactFPMathInst = ExactFPMathInst == nullptr
                             ? ReduxDesc.getExactFPMathInst()
                             : ExactFPMathInst;
@@ -662,6 +664,96 @@ RecurrenceDescriptor::isAnyOfPattern(Loop *Loop, PHINode *OrigPhi,
                                                      : RecurKind::FAnyOf);
 }
 
+// We are looking for loops that do something like this:
+//   int r = 0;
+//   for (int i = 0; i < n; i++) {
+//     if (src[i] > 3)
+//       r = i;
+//   }
+// The reduction value (r) is derived from either the values of an increasing
+// induction variable (i) sequence, or from the start value (0).
+// The LLVM IR generated for such loops would be as follows:
+//   for.body:
+//     %r = phi i32 [ %spec.select, %for.body ], [ 0, %entry ]
+//     %i = phi i32 [ %inc, %for.body ], [ 0, %entry ]
+//     ...
+//     %cmp = icmp sgt i32 %5, 3
+//     %spec.select = select i1 %cmp, i32 %i, i32 %r
+//     %inc = add nsw i32 %i, 1
+//     ...
+// Since 'i' is an increasing induction variable, the reduction value after the
+// loop will be the maximum value of 'i' that the condition (src[i] > 3) is
+// satisfied, or the start value (0 in the example above). When the start value
+// of the increasing induction variable 'i' is greater than the minimum value of
+// the data type, we can use the minimum value of the data type as a sentinel
+// value to replace the start value. This allows us to perform a single
+// reduction max operation to obtain the final reduction result.
+// TODO: It is possible to solve the case where the start value is the minimum
+// value of the data type or a non-constant value by using mask and multiple
+// reduction operations.
+RecurrenceDescriptor::InstDesc
+RecurrenceDescriptor::isFindLastIVPattern(Loop *Loop, PHINode *OrigPhi,
+                                          Instruction *I, ScalarEvolution *SE) {
+  // Only match select with single use cmp condition.
+  // TODO: Only handle single use for now.
+  CmpInst::Predicate Pred;
+  if (!match(I, m_Select(m_OneUse(m_Cmp(Pred, m_Value(), m_Value())), m_Value(),
+                         m_Value())))
+    return InstDesc(false, I);
+
+  SelectInst *SI = cast<SelectInst>(I);
+  Value *NonRdxPhi = nullptr;
+
+  if (OrigPhi == dyn_cast<PHINode>(SI->getTrueValue()))
+    NonRdxPhi = SI->getFalseValue();
+  else if (OrigPhi == dyn_cast<PHINode>(SI->getFalseValue()))
+    NonRdxPhi = SI->getTrueValue();
+  else
+    return InstDesc(false, I);
+
+  auto IsIncreasingLoopInduction = [&SE, &Loop](Value *V) {
+    if (!SE)
+      return false;
+
+    Type *Ty = V->getType();
+    if (!SE->isSCEVable(Ty))
+      return false;
+
+    auto *AR = dyn_cast<SCEVAddRecExpr>(SE->getSCEV(V));
+    if (!AR)
+      return false;
+
+    const SCEV *Step = AR->getStepRecurrence(*SE);
+    if (!SE->isKnownPositive(Step))
+      return false;
+
+    const ConstantRange IVRange = SE->getSignedRange(AR);
+    unsigned NumBits = Ty->getIntegerBitWidth();
+    // Keep the minmum value of the recurrence type as the sentinel value.
+    // The maximum acceptable range for the increasing induction variable,
+    // called the valid range, will be defined as
+    //   [<sentinel value> + 1, SignedMin(<recurrence type>))
+    // TODO: This range restriction can be lifted by adding an additional
+    // virtual OR reduction.
+    const APInt Sentinel = APInt::getSignedMinValue(NumBits);
+    const ConstantRange ValidRange = ConstantRange::getNonEmpty(
+        Sentinel + 1, APInt::getSignedMinValue(NumBits));
+    LLVM_DEBUG(dbgs() << "LV: FindLastIV valid range is " << ValidRange
+                      << ", and the signed range of " << *AR << " is "
+                      << IVRange << "\n");
+    return ValidRange.contains(IVRange);
+  };
+
+  // We are looking for selects of the form:
+  //   select(cmp(), phi, loop_induction) or
+  //   select(cmp(), loop_induction, phi)
+  if (!IsIncreasingLoopInduction(NonRdxPhi))
+    return InstDesc(false, I);
+
+  return InstDesc(I, isa<ICmpInst>(I->getOperand(0)) ? RecurKind::IFindLastIV
+                                                     : RecurKind::FFindLastIV);
+}
+
 RecurrenceDescriptor::InstDesc
 RecurrenceDescriptor::isMinMaxPattern(Instruction *I, RecurKind Kind,
                                       const InstDesc &Prev) {
@@ -765,10 +857,9 @@ RecurrenceDescriptor::isConditionalRdxPattern(RecurKind Kind, Instruction *I) {
   return InstDesc(true, SI);
 }
 
-RecurrenceDescriptor::InstDesc
-RecurrenceDescriptor::isRecurrenceInstr(Loop *L, PHINode *OrigPhi,
-                                        Instruction *I, RecurKind Kind,
-                                        InstDesc &Prev, FastMathFlags FuncFMF) {
+RecurrenceDescriptor::InstDesc RecurrenceDescriptor::isRecurrenceInstr(
+    Loop *L, PHINode *OrigPhi, Instruction *I, RecurKind Kind, InstDesc &Prev,
+    FastMathFlags FuncFMF, ScalarEvolution *SE) {
   assert(Prev.getRecKind() == RecurKind::None || Prev.getRecKind() == Kind);
   switch (I->getOpcode()) {
   default:
@@ -798,6 +889,8 @@ RecurrenceDescriptor::isRecurrenceInstr(Loop *L, PHINode *OrigPhi,
     if (Kind == RecurKind::FAdd || Kind == RecurKind::FMul ||
         Kind == RecurKind::Add || Kind == RecurKind::Mul)
       return isConditionalRdxPattern(Kind, I);
+    if (isFindLastIVRecurrenceKind(Kind))
+      return isFindLastIVPattern(L, OrigPhi, I, SE);
     [[fallthrough]];
   case Instruction::FCmp:
   case Instruction::ICmp:
@@ -902,6 +995,11 @@ bool RecurrenceDescriptor::isReductionPHI(PHINode *Phi, Loop *TheLoop,
                       << *Phi << "\n");
     return true;
   }
+  if (AddReductionVar(Phi, RecurKind::IFindLastIV, TheLoop, FMF, RedDes, DB, AC,
+                      DT, SE)) {
+    LLVM_DEBUG(dbgs() << "Found a FindLastIV reduction PHI." << *Phi << "\n");
+    return true;
+  }
   if (AddReductionVar(Phi, RecurKind::FMul, TheLoop, FMF, RedDes, DB, AC, DT,
                       SE)) {
     LLVM_DEBUG(dbgs() << "Found an FMult reduction PHI." << *Phi << "\n");
@@ -1091,6 +1189,9 @@ Value *RecurrenceDescriptor::getRecurrenceIdentity(RecurKind K, Type *Tp,
   case RecurKind::FAnyOf:
     return getRecurrenceStartValue();
     break;
+  case RecurKind::IFindLastIV:
+  case RecurKind::FFindLastIV:
+    return getRecurrenceIdentity(RecurKind::SMax, Tp, FMF);
   default:
     llvm_unreachable("Unknown recurrence kind");
   }
@@ -1118,12 +1219,14 @@ unsigned RecurrenceDescriptor::getOpcode(RecurKind Kind) {
   case RecurKind::UMax:
   case RecurKind::UMin:
   case RecurKind::IAnyOf:
+  case RecurKind::IFindLastIV:
     return Instruction::ICmp;
   case RecurKind::FMax:
   case RecurKind::FMin:
   case RecurKind::FMaximum:
   case RecurKind::FMinimum:
   case RecurKind::FAnyOf:
+  case RecurKind::FFindLastIV:
     return Instruction::FCmp;
   default:
     llvm_unreachable("Unknown recurrence operation");
diff --git a/llvm/lib/Transforms/Utils/LoopUtils.cpp b/llvm/lib/Transforms/Utils/LoopUtils.cpp
index 21affe7bdce406e..192e93a2f9b455d 100644
--- a/llvm/lib/Transforms/Utils/LoopUtils.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUtils.cpp
@@ -942,6 +942,11 @@ Value *llvm::createAnyOfOp(IRBuilderBase &Builder, Value *StartVal,
   return Builder.CreateSelect(Cmp, Left, Right, "rdx.select");
 }
 
+Value *llvm::createFindLastIVOp(IRBuilderBase &Builder, Value *Left,
+                                Value *Right) {
+  return createMinMaxOp(Builder, RecurKind::SMax, Left, Right);
+}
+
 Value *llvm::createMinMaxOp(IRBuilderBase &Builder, RecurKind RK, Value *Left,
                             Value *Right) {
   Type *Ty = Left->getType();
@@ -1062,6 +1067,14 @@ Value *llvm::createAnyOfTargetReduction(IRBuilderBase &Builder, Value *Src,
   return Builder.CreateSelect(Cmp, NewVal, InitVal, "rdx.select");
 }
 
+Value *llvm::createFindLastIVTargetReduction(IRBuilderBase &Builder, Value *Src,
+                                             const RecurrenceDescriptor &Desc) {
+  assert(RecurrenceDescriptor::isFindLastIVRecurrenceKind(
+             Desc.getRecurrenceKind()) &&
+         "Unexpected reduction kind");
+  return Builder.CreateIntMaxReduce(Src, true);
+}
+
 Value *llvm::createSimpleTargetReduction(IRBuilderBase &Builder, Value *Src,
                                          RecurKind RdxKind) {
   auto *SrcVecEltTy = cast<VectorType>(Src->getType())->getElementType();
@@ -1115,6 +1128,8 @@ Value *llvm::createTargetReduction(IRBuilderBase &B,
   RecurKind RK = Desc.getRecurrenceKind();
   if (RecurrenceDescriptor::isAnyOfRecurrenceKind(RK))
     return createAnyOfTargetReduction(B, Src, Desc, OrigPhi);
+  if (RecurrenceDescriptor::isFindLastIVRecurrenceKind(RK))
+    return createFindLastIVTargetReduction(B, Src, Desc);
 
   return createSimpleTargetReduction(B, Src, RK);
 }
@@ -1131,6 +1146,16 @@ Value *llvm::createOrderedReduction(IRBuilderBase &B,
   return B.CreateFAddReduce(Start, Src);
 }
 
+Value *llvm::createSentinelValueHandling(IRBuilderBase &Builder,
+                                         const RecurrenceDescriptor &Desc,
+                                         Value *Rdx) {
+  Value *InitVal = Desc.getRecurrenceStartValue();
+  Value *Iden = Desc.getRecurrenceIdentity(
+      Desc.getRecurrenceKind(), Rdx->getType(), Desc.getFastMathFlags());
+  Value *Cmp = Builder.CreateCmp(CmpInst::ICMP_NE, Rdx, Iden, "rdx.select.cmp");
+  return Builder.CreateSelect(Cmp, Rdx, InitVal, "rdx.select");
+}
+
 void llvm::propagateIRFlags(Value *I, ArrayRef<Value *> VL, Value *OpValue,
                             bool IncludeWrapFlags) {
   auto *VecOp = dyn_cast<Instruction>(I);
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index cc17d91d4f43727..35be8deeb542c56 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -3901,6 +3901,8 @@ void InnerLoopVectorizer::fixReduction(VPReductionPHIRecipe *PhiR,
       else if (RecurrenceDescriptor::isAnyOfRecurrenceKind(RK))
         ReducedPartRdx = createAnyOfOp(Builder, ReductionStartValue, RK,
                                        ReducedPartRdx, RdxPart);
+      else if (RecurrenceDescriptor::isFindLastIVRecurrenceKind(RK))
+        ReducedPartRdx = createFindLastIVOp(Builder, ReducedPartRdx, RdxPart);
       else
         ReducedPartRdx = createMinMaxOp(Builder, RK, ReducedPartRdx, RdxPart);
     }
@@ -3919,6 +3921,10 @@ void InnerLoopVectorizer::fixReduction(VPReductionPHIRecipe *PhiR,
                            : Builder.CreateZExt(ReducedPartRdx, PhiTy);
   }
 
+  if (RecurrenceDescriptor::isFindLastIVRecurrenceKind(RK))
+    ReducedPartRdx =
+        createSentinelValueHandling(Builder, RdxDesc, ReducedPartRdx);
+
   PHINode *ResumePhi =
       dyn_cast<PHINode>(PhiR->getStartValue()->getUnderlyingValue());
 
@@ -5822,8 +5828,9 @@ LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,
         HasReductions &&
         any_of(Legal->getReductionVars(), [&](auto &Reduction) -> bool {
           const RecurrenceDescriptor &RdxDesc = Reduction.second;
-          return RecurrenceDescriptor::isAnyOfRecurrenceKind(
-              RdxDesc.getRecurrenceKind());
+          RecurKind RK = RdxDesc.getRecurrenceKind();
+          return RecurrenceDescriptor::isAnyOfRecurrenceKind(RK) ||
+                 RecurrenceDescriptor::isFindLastIVRecurrenceKind(RK);
         });
     if (HasSelectCmpReductions) {
       LLVM_DEBUG(dbgs() << "LV: Not interleaving select-cmp reductions.\n");
@@ -8973,8 +8980,10 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(
   for (VPReductionPHIRecipe *PhiR : InLoopReductionPhis) {
     const RecurrenceDescriptor &RdxDesc = PhiR->getRecurrenceDescriptor();
     RecurKind Kind = RdxDesc.getRecurrenceKind();
-    assert(!RecurrenceDescriptor::isAnyOfRecurrenceKind(Kind) &&
-           "AnyOf reductions are not allowed for in-loop reductions");
+    assert(
+        (!RecurrenceDescriptor::isAnyOfRecurrenceKind(Kind) &&
+         !RecurrenceDescriptor::isFindLastIVRecurrenceKind(Kind)) &&
+        "AnyOf and FindLast reductions are not allowed for in-loop reductions");
 
     // Collect the chain of "link" recipes for the reduction starting at PhiR.
     SetVector<VPRecipeBase *> Worklist;
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 572c4399b8b55cd..3ffba0113cbca03 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -14352,6 +14352,8 @@ class HorizontalReduction {
     case RecurKind::FMulAdd:
     case RecurKind::IAnyOf:
     case RecurKind::FAnyOf:
+    case RecurKind::IFindLastIV:
+    case RecurKind::FFindLastIV:
     case RecurKind::None:
       llvm_unreachable("Unexpected reduction kind for repeated scalar.");
     }
@@ -14441,6 +14443,8 @@ class HorizontalReduction {
     case RecurKind::FMulAdd:
     case RecurKind::IAnyOf:
     case RecurKind::FAnyOf:
+    case RecurKind::IFindLastIV:
+    case RecurKind::FFindLastIV:
     case RecurKind::None:
       llvm_unreachable("Unexpected reduction kind for reused scalars.");
     }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 2a1213a98095907..86e93f909bf747c 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -1589,6 +1589,22 @@ void VPReductionPHIRecipe::execute(VPTransformState &State) {
       StartV = Iden =
           Builder.CreateVectorSplat(State.VF, StartV, "minmax.ident");
     }
+  } ...
[truncated]

llvm/lib/Analysis/IVDescriptors.cpp Outdated Show resolved Hide resolved
llvm/lib/Analysis/IVDescriptors.cpp Outdated Show resolved Hide resolved
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@artagnon artagnon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, apart from minor nits.

; RUN: opt -passes=loop-vectorize -force-vector-interleave=4 -force-vector-width=4 -S < %s | FileCheck %s --check-prefix=CHECK
; RUN: opt -passes=loop-vectorize -force-vector-interleave=4 -force-vector-width=1 -S < %s | FileCheck %s --check-prefix=CHECK
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 3
; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -S < %s | FileCheck %s --check-prefix=CHECK-VF4IC1 --check-prefix=CHECK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileCheck %s --check-prefixes=CHECK,CHECK-VF4IC1.

// virtual OR reduction.
const APInt Sentinel = APInt::getSignedMinValue(NumBits);
const ConstantRange ValidRange = ConstantRange::getNonEmpty(
Sentinel + 1, APInt::getSignedMinValue(NumBits));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConstantRange::getNonEmpty(Sentinel + 1, Sentinel).

artagnon added a commit to artagnon/llvm-project that referenced this pull request Oct 3, 2023
Extend the idea in llvm#67812 to support vectorizion of decreasing IV in
select-cmp patterns. llvm#67812 enabled vectorization of the following
example:

  long src[20000] = {4, 5, 2};
  long r = 331;
  for (long i = 0; i < 20000; i++) {
    if (src[i] > 3)
      r = i;
  }
  return r;

This patch extends the above idea to also vectorize:

  long src[20000] = {4, 5, 2};
  long r = 331;
  for (long i = 20000 - 1; i >= 0; i--) {
    if (src[i] > 3)
      r = i;
  }
  return r;
@github-actions
Copy link

github-actions bot commented Oct 4, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@ShivaChen
Copy link
Collaborator

Although IFindLastIV can be fixed to FFindLastIV when parsing instructions, could we reject IFindLastIV with FCmpInst and AddReductionVar for FFindLastIV explicitily?

@Mel-Chen
Copy link
Contributor Author

Although IFindLastIV can be fixed to FFindLastIV when parsing instructions, could we reject IFindLastIV with FCmpInst and AddReductionVar for FFindLastIV explicitily?

This is possible, but it may cause going through the pattern identification process one more time.

@Mel-Chen
Copy link
Contributor Author

@ayalz @fhahn @david-arm Ping.

Copy link
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this patch - it looks like a useful addition. I'm still looking through all the tests, but I thought I'd leave some comments I have so far!

@@ -900,6 +993,11 @@ bool RecurrenceDescriptor::isReductionPHI(PHINode *Phi, Loop *TheLoop,
<< *Phi << "\n");
return true;
}
if (AddReductionVar(Phi, RecurKind::IFindLastIV, TheLoop, FMF, RedDes, DB, AC,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not also need one for FFindLastIV?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. We don't need to call AddReductionVar again for FFindLastIV.
This is because at the end of function isFindLastIVPattern, IFindLastIV can be transformed into FFindLastIV if the predicate is an fcmp instruction.

return InstDesc(false, I);

auto IsIncreasingLoopInduction = [&](Value *V) {
if (!SE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's better to bail out at the start of isFindLastIVPattern if SE is null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

952b5df
Nice suggestion. I pass SE by reference directly. Thanks.

return false;

const SCEV *Step = AR->getStepRecurrence(*SE);
if (!SE->isKnownPositive(Step))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there isn't a fundamental reason why we can't support known negative steps too? Is it worth adding a TODO for future work? It sounds like we'd just need a sentinel that was the max positive value and use smin instead of smax for the final reduction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ec9217d
Add the TODO for the decreasing induction.

// Keep the minimum value of the recurrence type as the sentinel value.
// The maximum acceptable range for the increasing induction variable,
// called the valid range, will be defined as
// [<sentinel value> + 1, SignedMin(<recurrence type>))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this comment a little confusing as it suggests <sentinel value> and SignedMin(<recurrence type>) are different, yet the code below suggests they are the same, i.e.

Sentinel = APInt::getSignedMinValue(NumBits);

Isn't this just

// [<sentinel value> + 1, <sentinel value>)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<sentinel value> and SignedMin(<recurrence type>) are the same.
The valid range refers to the acceptable range of the increasing induction variable.
Since <sentinel value> cannot be equal to any possible value of the increasing induction variable, the acceptable range of the increasing induction variable is limited to [<sentinel value> + 1, <sentinel value>).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks for explaining. Can you update the comment to show that please? i.e.

    // called the valid range, will be defined as
    //   [<sentinel value> + 1, <sentinel value>)
    // where <sentinel value> == SignedMin(<recurrence type>)

// TODO: This range restriction can be lifted by adding an additional
// virtual OR reduction.
const APInt Sentinel = APInt::getSignedMinValue(NumBits);
const ConstantRange ValidRange =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just avoid calling difference and specify the range explicitly? I'll be honest I struggle to understand the ConstantRange stuff so please forgive my lack of knowledge! I was thinking it might be something like:

ConstantRange::getNonEmpty(Sentinel + 1, Sentinel);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your answer is correct. :)
Actually, that's my initial version. The current version has been modified based on previous comments. 2466df5

integer induction variable

Consider the following loop:

  int rdx = init;
  for (int i = 0; i < n; ++i)
    rdx = (a[i] > b[i]) ? i : rdx;

We can vectorize this loop if `i` is an increasing induction variable.
The final reduced value will be the maximum of `i` that the condition
`a[i] > b[i]` is satisfied, or the start value `init`.

This patch added new RecurKind enums - IFindLastIV and FFindLastIV.
Mel-Chen and others added 13 commits April 24, 2024 01:31
This patch applys range analysis. It will exclude cases where the
range of induction variable cannot be fully contained within

  [<sentinel value> + 1, <minimum value of recurrence type>)

This approach also handles truncated induction variable cases well.
Comment from Alexey

Co-authored-by: Alexey Bataev <5361294+alexey-bataev@users.noreply.github.com>
comment from Alexey

Co-authored-by: Alexey Bataev <5361294+alexey-bataev@users.noreply.github.com>
Co-authored-by: Alexey Bataev <5361294+alexey-bataev@users.noreply.github.com>
Copy link
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've almost finished reviewing! Still have some tests to work through, but I thought I'd leave the comments I have so far. It looks like you've done a thorough job at testing all the edge cases though.

RecurrenceDescriptor::isFindLastIVPattern(PHINode *OrigPhi, Instruction *I,
ScalarEvolution &SE) {
// Only match select with single use cmp condition.
// TODO: Only handle single use for now.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This TODO line says the same thing as above. Maybe just have a single TODO line, i.e. something like:

  // TODO: Match selects with multi-use cmp conditions.

// Keep the minimum value of the recurrence type as the sentinel value.
// The maximum acceptable range for the increasing induction variable,
// called the valid range, will be defined as
// [<sentinel value> + 1, SignedMin(<recurrence type>))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks for explaining. Can you update the comment to show that please? i.e.

    // called the valid range, will be defined as
    //   [<sentinel value> + 1, <sentinel value>)
    // where <sentinel value> == SignedMin(<recurrence type>)

@@ -900,6 +991,11 @@ bool RecurrenceDescriptor::isReductionPHI(PHINode *Phi, Loop *TheLoop,
<< *Phi << "\n");
return true;
}
if (AddReductionVar(Phi, RecurKind::IFindLastIV, TheLoop, FMF, RedDes, DB, AC,
DT, SE)) {
LLVM_DEBUG(dbgs() << "Found a FindLastIV reduction PHI." << *Phi << "\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a good idea to indicate whether we've matched IFindLastIV or FFindLastIV because FindLastIV doesn't tell you which one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just delete this file, since it looks practically the same as llvm/test/Transforms/LoopVectorize/iv-select-cmp-no-wrap.ll and doesn't seem to offer any extra value?

Copy link
Contributor

@david-arm david-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, just one last comment. I've finished reviewing now!

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 3
; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -force-vector-width=4 -S < %s | FileCheck %s --check-prefix=CHECK-VF4IC1 --check-prefix=CHECK
; RUN: opt -passes=loop-vectorize -force-vector-interleave=4 -force-vector-width=4 -S < %s | FileCheck %s --check-prefix=CHECK-VF4IC4 --check-prefix=CHECK
; RUN: opt -passes=loop-vectorize -force-vector-interleave=4 -force-vector-width=1 -S < %s | FileCheck %s --check-prefix=CHECK-VF1IC4 --check-prefix=CHECK

define i64 @select_icmp_const_1(ptr %a, i64 %n) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both @select_icmp_const_1 and @select_icmp_const_2 look similar to test select_icmp_nuw_nsw in Transforms/LoopVectorize/iv-select-cmp-no-wrap.ll.

Also, I see the only difference between @select_icmp_const_1 and @select_icmp_const_2 is the operands to the select are swapped. I'm not sure having both versions really adds much value. Perhaps you can remove both of them and leave the one in iv-select-cmp-no-wrap.ll

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants