Skip to content

Conversation

preames
Copy link
Collaborator

@preames preames commented Mar 15, 2024

This is just an API cleanup at the moment. The newly added routines just proxy to the existing getScalarizationOverhead. I think the diff speaks for itself in terms of code clarity.

This is just an API cleanup at the moment.  The newly added routines
just proxy to the existing getScalarizationOverhead.  I think the diff
speaks for itself in terms of code clarity.
@llvmbot llvmbot added vectorizers llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms labels Mar 15, 2024
@llvmbot
Copy link
Member

llvmbot commented Mar 15, 2024

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-llvm-analysis

Author: Philip Reames (preames)

Changes

This is just an API cleanup at the moment. The newly added routines just proxy to the existing getScalarizationOverhead. I think the diff speaks for itself in terms of code clarity.


Full diff: https://github.com/llvm/llvm-project/pull/85421.diff

3 Files Affected:

  • (modified) llvm/include/llvm/Analysis/TargetTransformInfo.h (+20)
  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+10-15)
  • (modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+4-8)
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index c43a1b5c1b2aa0..46ea102d0084b5 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -871,6 +871,26 @@ class TargetTransformInfo {
                                            bool Insert, bool Extract,
                                            TTI::TargetCostKind CostKind) const;
 
+  /// Estimate the cost of a build_vector of unknown elements at the indices
+  /// implied by the active lanes in DemandedElts.  The default implementation
+  /// will simply cost a series of insertelements, but some targets can do
+  /// significantly better.
+  InstructionCost getBuildVectorCost(VectorType *Ty,
+                                     const APInt &DemandedElts,
+                                     TTI::TargetCostKind CostKind) const {
+    return getScalarizationOverhead(Ty, DemandedElts, true, false, CostKind);
+  }
+
+  /// Estimate the cost of exploding a vector of unknown elements at the
+  /// indices implied by the active lanes in DemandedElts into individual
+  /// scalar registers.  The default implementation will simply cost a
+  /// series of extractelements, but some targets can do significantly better.
+  InstructionCost getExplodeVectorCost(VectorType *Ty,
+                                       const APInt &DemandedElts,
+                                       TTI::TargetCostKind CostKind) const {
+    return getScalarizationOverhead(Ty, DemandedElts, false, true, CostKind);
+  }
+
   /// Estimate the overhead of scalarizing an instructions unique
   /// non-constant operands. The (potentially vector) types to use for each of
   /// argument are passes via Tys.
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 52b992b19e4b04..d999606836630c 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -5846,10 +5846,9 @@ InstructionCost LoopVectorizationCostModel::computePredInstDiscount(
     // and phi nodes.
     TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
     if (isScalarWithPredication(I, VF) && !I->getType()->isVoidTy()) {
-      ScalarCost += TTI.getScalarizationOverhead(
+      ScalarCost += TTI.getBuildVectorCost(
           cast<VectorType>(ToVectorTy(I->getType(), VF)),
-          APInt::getAllOnes(VF.getFixedValue()), /*Insert*/ true,
-          /*Extract*/ false, CostKind);
+          APInt::getAllOnes(VF.getFixedValue()), CostKind);
       ScalarCost +=
           VF.getFixedValue() * TTI.getCFInstrCost(Instruction::PHI, CostKind);
     }
@@ -5865,10 +5864,9 @@ InstructionCost LoopVectorizationCostModel::computePredInstDiscount(
         if (canBeScalarized(J))
           Worklist.push_back(J);
         else if (needsExtract(J, VF)) {
-          ScalarCost += TTI.getScalarizationOverhead(
+          ScalarCost += TTI.getExplodeVectorCost(
               cast<VectorType>(ToVectorTy(J->getType(), VF)),
-              APInt::getAllOnes(VF.getFixedValue()), /*Insert*/ false,
-              /*Extract*/ true, CostKind);
+              APInt::getAllOnes(VF.getFixedValue()), CostKind);
         }
       }
 
@@ -6011,9 +6009,8 @@ LoopVectorizationCostModel::getMemInstScalarizationCost(Instruction *I,
     // Add the cost of an i1 extract and a branch
     auto *Vec_i1Ty =
         VectorType::get(IntegerType::getInt1Ty(ValTy->getContext()), VF);
-    Cost += TTI.getScalarizationOverhead(
-        Vec_i1Ty, APInt::getAllOnes(VF.getKnownMinValue()),
-        /*Insert=*/false, /*Extract=*/true, CostKind);
+    Cost += TTI.getExplodeVectorCost(
+        Vec_i1Ty, APInt::getAllOnes(VF.getKnownMinValue()), CostKind);
     Cost += TTI.getCFInstrCost(Instruction::Br, CostKind);
 
     if (useEmulatedMaskMemRefHack(I, VF))
@@ -6386,10 +6383,9 @@ InstructionCost LoopVectorizationCostModel::getScalarizationOverhead(
   Type *RetTy = ToVectorTy(I->getType(), VF);
   if (!RetTy->isVoidTy() &&
       (!isa<LoadInst>(I) || !TTI.supportsEfficientVectorElementLoadStore()))
-    Cost += TTI.getScalarizationOverhead(
+    Cost += TTI.getBuildVectorCost(
         cast<VectorType>(RetTy), APInt::getAllOnes(VF.getKnownMinValue()),
-        /*Insert*/ true,
-        /*Extract*/ false, CostKind);
+        CostKind);
 
   // Some targets keep addresses scalar.
   if (isa<LoadInst>(I) && !TTI.prefersVectorizedAddressing())
@@ -6827,9 +6823,8 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
       auto *Vec_i1Ty =
           VectorType::get(IntegerType::getInt1Ty(RetTy->getContext()), VF);
       return (
-          TTI.getScalarizationOverhead(
-              Vec_i1Ty, APInt::getAllOnes(VF.getFixedValue()),
-              /*Insert*/ false, /*Extract*/ true, CostKind) +
+          TTI.getExplodeVectorCost(
+              Vec_i1Ty, APInt::getAllOnes(VF.getFixedValue()), CostKind) +
           (TTI.getCFInstrCost(Instruction::Br, CostKind) * VF.getFixedValue()));
     } else if (I->getParent() == TheLoop->getLoopLatch() || VF.isScalar())
       // The back-edge branch will remain, as will all scalar branches.
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index b4cce680e2876f..61013c2017f47a 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -8715,9 +8715,7 @@ BoUpSLP::getEntryCost(const TreeEntry *E, ArrayRef<Value *> VectorizedVals,
     assert(Offset < NumElts && "Failed to find vector index offset");
 
     InstructionCost Cost = 0;
-    Cost -= TTI->getScalarizationOverhead(SrcVecTy, DemandedElts,
-                                          /*Insert*/ true, /*Extract*/ false,
-                                          CostKind);
+    Cost -= TTI->getBuildVectorCost(SrcVecTy, DemandedElts, CostKind);
 
     // First cost - resize to actual vector size if not identity shuffle or
     // need to shift the vector.
@@ -9816,9 +9814,9 @@ InstructionCost BoUpSLP::getTreeCost(ArrayRef<Value *> VectorizedVals) {
         MutableArrayRef(Vector.data(), Vector.size()), Base,
         [](const TreeEntry *E) { return E->getVectorFactor(); }, ResizeToVF,
         EstimateShufflesCost);
-    InstructionCost InsertCost = TTI->getScalarizationOverhead(
+    InstructionCost InsertCost = TTI->getBuildVectorCost(
         cast<FixedVectorType>(FirstUsers[I].first->getType()), DemandedElts[I],
-        /*Insert*/ true, /*Extract*/ false, TTI::TCK_RecipThroughput);
+        TTI::TCK_RecipThroughput);
     Cost -= InsertCost;
   }
 
@@ -10531,9 +10529,7 @@ InstructionCost BoUpSLP::getGatherCost(ArrayRef<Value *> VL,
     EstimateInsertCost(I, V);
   }
   if (ForPoisonSrc)
-    Cost =
-        TTI->getScalarizationOverhead(VecTy, ~ShuffledElements, /*Insert*/ true,
-                                      /*Extract*/ false, CostKind);
+    Cost = TTI->getBuildVectorCost(VecTy, ~ShuffledElements, CostKind);
   if (DuplicateNonConst)
     Cost +=
         TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, VecTy);

Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 33960c90258ed78b9b877b1a43e219d1cbc2efce 90ef5a77188af9b7d2ff922066a2868b78bfd937 -- llvm/include/llvm/Analysis/TargetTransformInfo.h llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
View the diff from clang-format here.
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 46ea102d00..6e5fa5c996 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -875,8 +875,7 @@ public:
   /// implied by the active lanes in DemandedElts.  The default implementation
   /// will simply cost a series of insertelements, but some targets can do
   /// significantly better.
-  InstructionCost getBuildVectorCost(VectorType *Ty,
-                                     const APInt &DemandedElts,
+  InstructionCost getBuildVectorCost(VectorType *Ty, const APInt &DemandedElts,
                                      TTI::TargetCostKind CostKind) const {
     return getScalarizationOverhead(Ty, DemandedElts, true, false, CostKind);
   }
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index d999606836..0d36690ab8 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -6383,9 +6383,9 @@ InstructionCost LoopVectorizationCostModel::getScalarizationOverhead(
   Type *RetTy = ToVectorTy(I->getType(), VF);
   if (!RetTy->isVoidTy() &&
       (!isa<LoadInst>(I) || !TTI.supportsEfficientVectorElementLoadStore()))
-    Cost += TTI.getBuildVectorCost(
-        cast<VectorType>(RetTy), APInt::getAllOnes(VF.getKnownMinValue()),
-        CostKind);
+    Cost += TTI.getBuildVectorCost(cast<VectorType>(RetTy),
+                                   APInt::getAllOnes(VF.getKnownMinValue()),
+                                   CostKind);
 
   // Some targets keep addresses scalar.
   if (isa<LoadInst>(I) && !TTI.prefersVectorizedAddressing())

preames added a commit to preames/llvm-project that referenced this pull request Mar 15, 2024
Introduce utilities for costing build vector and explode vector
operations inside the TTI target implementation logic.  As can be seen
these are by far the most common operations actually performed.

In case the goal isn't clear here, I plan to eliminate
getScalarizationOverhead from the TTI interface layer.  All of our
targets cost a combined insert and extract as equivalent to a
explode vector followed by a build vector so the combined interface
can be killed off.

This is the inverse of llvm#85421. Once both patches land, only the actual meat of the change remains.

One subtlety here - we have to be very careful to make sure we're
calling the directly analogous cover function.  We've got a base
class and subclass involved here, and it's important at times
whether we call a method on the subclass or the base class.  This is
harder to follow since we have multiple getScalarizationOverhead
variants with different signatures - most of which only exist on the
base class, but some (not all) of which proxy back to the sub-class.
@preames preames closed this Jul 24, 2024
@preames preames deleted the pr-tti-api-for-build-and-explode-vector branch July 24, 2024 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding llvm:transforms vectorizers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants