-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[TTI] Add cover functions for costing build and explode vectors [nfc] #85421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is just an API cleanup at the moment. The newly added routines just proxy to the existing getScalarizationOverhead. I think the diff speaks for itself in terms of code clarity.
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-analysis Author: Philip Reames (preames) ChangesThis is just an API cleanup at the moment. The newly added routines just proxy to the existing getScalarizationOverhead. I think the diff speaks for itself in terms of code clarity. Full diff: https://github.com/llvm/llvm-project/pull/85421.diff 3 Files Affected:
diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index c43a1b5c1b2aa0..46ea102d0084b5 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -871,6 +871,26 @@ class TargetTransformInfo {
bool Insert, bool Extract,
TTI::TargetCostKind CostKind) const;
+ /// Estimate the cost of a build_vector of unknown elements at the indices
+ /// implied by the active lanes in DemandedElts. The default implementation
+ /// will simply cost a series of insertelements, but some targets can do
+ /// significantly better.
+ InstructionCost getBuildVectorCost(VectorType *Ty,
+ const APInt &DemandedElts,
+ TTI::TargetCostKind CostKind) const {
+ return getScalarizationOverhead(Ty, DemandedElts, true, false, CostKind);
+ }
+
+ /// Estimate the cost of exploding a vector of unknown elements at the
+ /// indices implied by the active lanes in DemandedElts into individual
+ /// scalar registers. The default implementation will simply cost a
+ /// series of extractelements, but some targets can do significantly better.
+ InstructionCost getExplodeVectorCost(VectorType *Ty,
+ const APInt &DemandedElts,
+ TTI::TargetCostKind CostKind) const {
+ return getScalarizationOverhead(Ty, DemandedElts, false, true, CostKind);
+ }
+
/// Estimate the overhead of scalarizing an instructions unique
/// non-constant operands. The (potentially vector) types to use for each of
/// argument are passes via Tys.
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index 52b992b19e4b04..d999606836630c 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -5846,10 +5846,9 @@ InstructionCost LoopVectorizationCostModel::computePredInstDiscount(
// and phi nodes.
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
if (isScalarWithPredication(I, VF) && !I->getType()->isVoidTy()) {
- ScalarCost += TTI.getScalarizationOverhead(
+ ScalarCost += TTI.getBuildVectorCost(
cast<VectorType>(ToVectorTy(I->getType(), VF)),
- APInt::getAllOnes(VF.getFixedValue()), /*Insert*/ true,
- /*Extract*/ false, CostKind);
+ APInt::getAllOnes(VF.getFixedValue()), CostKind);
ScalarCost +=
VF.getFixedValue() * TTI.getCFInstrCost(Instruction::PHI, CostKind);
}
@@ -5865,10 +5864,9 @@ InstructionCost LoopVectorizationCostModel::computePredInstDiscount(
if (canBeScalarized(J))
Worklist.push_back(J);
else if (needsExtract(J, VF)) {
- ScalarCost += TTI.getScalarizationOverhead(
+ ScalarCost += TTI.getExplodeVectorCost(
cast<VectorType>(ToVectorTy(J->getType(), VF)),
- APInt::getAllOnes(VF.getFixedValue()), /*Insert*/ false,
- /*Extract*/ true, CostKind);
+ APInt::getAllOnes(VF.getFixedValue()), CostKind);
}
}
@@ -6011,9 +6009,8 @@ LoopVectorizationCostModel::getMemInstScalarizationCost(Instruction *I,
// Add the cost of an i1 extract and a branch
auto *Vec_i1Ty =
VectorType::get(IntegerType::getInt1Ty(ValTy->getContext()), VF);
- Cost += TTI.getScalarizationOverhead(
- Vec_i1Ty, APInt::getAllOnes(VF.getKnownMinValue()),
- /*Insert=*/false, /*Extract=*/true, CostKind);
+ Cost += TTI.getExplodeVectorCost(
+ Vec_i1Ty, APInt::getAllOnes(VF.getKnownMinValue()), CostKind);
Cost += TTI.getCFInstrCost(Instruction::Br, CostKind);
if (useEmulatedMaskMemRefHack(I, VF))
@@ -6386,10 +6383,9 @@ InstructionCost LoopVectorizationCostModel::getScalarizationOverhead(
Type *RetTy = ToVectorTy(I->getType(), VF);
if (!RetTy->isVoidTy() &&
(!isa<LoadInst>(I) || !TTI.supportsEfficientVectorElementLoadStore()))
- Cost += TTI.getScalarizationOverhead(
+ Cost += TTI.getBuildVectorCost(
cast<VectorType>(RetTy), APInt::getAllOnes(VF.getKnownMinValue()),
- /*Insert*/ true,
- /*Extract*/ false, CostKind);
+ CostKind);
// Some targets keep addresses scalar.
if (isa<LoadInst>(I) && !TTI.prefersVectorizedAddressing())
@@ -6827,9 +6823,8 @@ LoopVectorizationCostModel::getInstructionCost(Instruction *I, ElementCount VF,
auto *Vec_i1Ty =
VectorType::get(IntegerType::getInt1Ty(RetTy->getContext()), VF);
return (
- TTI.getScalarizationOverhead(
- Vec_i1Ty, APInt::getAllOnes(VF.getFixedValue()),
- /*Insert*/ false, /*Extract*/ true, CostKind) +
+ TTI.getExplodeVectorCost(
+ Vec_i1Ty, APInt::getAllOnes(VF.getFixedValue()), CostKind) +
(TTI.getCFInstrCost(Instruction::Br, CostKind) * VF.getFixedValue()));
} else if (I->getParent() == TheLoop->getLoopLatch() || VF.isScalar())
// The back-edge branch will remain, as will all scalar branches.
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index b4cce680e2876f..61013c2017f47a 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -8715,9 +8715,7 @@ BoUpSLP::getEntryCost(const TreeEntry *E, ArrayRef<Value *> VectorizedVals,
assert(Offset < NumElts && "Failed to find vector index offset");
InstructionCost Cost = 0;
- Cost -= TTI->getScalarizationOverhead(SrcVecTy, DemandedElts,
- /*Insert*/ true, /*Extract*/ false,
- CostKind);
+ Cost -= TTI->getBuildVectorCost(SrcVecTy, DemandedElts, CostKind);
// First cost - resize to actual vector size if not identity shuffle or
// need to shift the vector.
@@ -9816,9 +9814,9 @@ InstructionCost BoUpSLP::getTreeCost(ArrayRef<Value *> VectorizedVals) {
MutableArrayRef(Vector.data(), Vector.size()), Base,
[](const TreeEntry *E) { return E->getVectorFactor(); }, ResizeToVF,
EstimateShufflesCost);
- InstructionCost InsertCost = TTI->getScalarizationOverhead(
+ InstructionCost InsertCost = TTI->getBuildVectorCost(
cast<FixedVectorType>(FirstUsers[I].first->getType()), DemandedElts[I],
- /*Insert*/ true, /*Extract*/ false, TTI::TCK_RecipThroughput);
+ TTI::TCK_RecipThroughput);
Cost -= InsertCost;
}
@@ -10531,9 +10529,7 @@ InstructionCost BoUpSLP::getGatherCost(ArrayRef<Value *> VL,
EstimateInsertCost(I, V);
}
if (ForPoisonSrc)
- Cost =
- TTI->getScalarizationOverhead(VecTy, ~ShuffledElements, /*Insert*/ true,
- /*Extract*/ false, CostKind);
+ Cost = TTI->getBuildVectorCost(VecTy, ~ShuffledElements, CostKind);
if (DuplicateNonConst)
Cost +=
TTI->getShuffleCost(TargetTransformInfo::SK_PermuteSingleSrc, VecTy);
|
You can test this locally with the following command:git-clang-format --diff 33960c90258ed78b9b877b1a43e219d1cbc2efce 90ef5a77188af9b7d2ff922066a2868b78bfd937 -- llvm/include/llvm/Analysis/TargetTransformInfo.h llvm/lib/Transforms/Vectorize/LoopVectorize.cpp llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp View the diff from clang-format here.diff --git a/llvm/include/llvm/Analysis/TargetTransformInfo.h b/llvm/include/llvm/Analysis/TargetTransformInfo.h
index 46ea102d00..6e5fa5c996 100644
--- a/llvm/include/llvm/Analysis/TargetTransformInfo.h
+++ b/llvm/include/llvm/Analysis/TargetTransformInfo.h
@@ -875,8 +875,7 @@ public:
/// implied by the active lanes in DemandedElts. The default implementation
/// will simply cost a series of insertelements, but some targets can do
/// significantly better.
- InstructionCost getBuildVectorCost(VectorType *Ty,
- const APInt &DemandedElts,
+ InstructionCost getBuildVectorCost(VectorType *Ty, const APInt &DemandedElts,
TTI::TargetCostKind CostKind) const {
return getScalarizationOverhead(Ty, DemandedElts, true, false, CostKind);
}
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index d999606836..0d36690ab8 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -6383,9 +6383,9 @@ InstructionCost LoopVectorizationCostModel::getScalarizationOverhead(
Type *RetTy = ToVectorTy(I->getType(), VF);
if (!RetTy->isVoidTy() &&
(!isa<LoadInst>(I) || !TTI.supportsEfficientVectorElementLoadStore()))
- Cost += TTI.getBuildVectorCost(
- cast<VectorType>(RetTy), APInt::getAllOnes(VF.getKnownMinValue()),
- CostKind);
+ Cost += TTI.getBuildVectorCost(cast<VectorType>(RetTy),
+ APInt::getAllOnes(VF.getKnownMinValue()),
+ CostKind);
// Some targets keep addresses scalar.
if (isa<LoadInst>(I) && !TTI.prefersVectorizedAddressing())
|
Introduce utilities for costing build vector and explode vector operations inside the TTI target implementation logic. As can be seen these are by far the most common operations actually performed. In case the goal isn't clear here, I plan to eliminate getScalarizationOverhead from the TTI interface layer. All of our targets cost a combined insert and extract as equivalent to a explode vector followed by a build vector so the combined interface can be killed off. This is the inverse of llvm#85421. Once both patches land, only the actual meat of the change remains. One subtlety here - we have to be very careful to make sure we're calling the directly analogous cover function. We've got a base class and subclass involved here, and it's important at times whether we call a method on the subclass or the base class. This is harder to follow since we have multiple getScalarizationOverhead variants with different signatures - most of which only exist on the base class, but some (not all) of which proxy back to the sub-class.
This is just an API cleanup at the moment. The newly added routines just proxy to the existing getScalarizationOverhead. I think the diff speaks for itself in terms of code clarity.