Skip to content

Commit

Permalink
[CostModel][X86] Don't count 2 shuffles on the last level of a pairwi…
Browse files Browse the repository at this point in the history
…se arithmetic or min/max reduction

This is split from D55452 with the correct patch this time.

Pairwise reductions require two shuffles on every level but the last. On the last level the two shuffles are <1, u, u, u...> and <0, u, u, u...>, but <0, u, u, u...> will be dropped by InstCombine/DAGCombine as being an identity shuffle.

Differential Revision: https://reviews.llvm.org/D55615

llvm-svn: 349072
  • Loading branch information
topperc committed Dec 13, 2018
1 parent 5f1706f commit c6bfb05
Show file tree
Hide file tree
Showing 2 changed files with 55 additions and 55 deletions.
28 changes: 24 additions & 4 deletions llvm/include/llvm/CodeGen/BasicTTIImpl.h
Expand Up @@ -1435,14 +1435,24 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
Ty = SubTy;
++LongVectorCount;
}

NumReduxLevels -= LongVectorCount;

// The minimal length of the vector is limited by the real length of vector
// operations performed on the current platform. That's why several final
// reduction operations are performed on the vectors with the same
// architecture-dependent length.
ShuffleCost += (NumReduxLevels - LongVectorCount) * (IsPairwise + 1) *

// Non pairwise reductions need one shuffle per reduction level. Pairwise
// reductions need two shuffles on every level, but the last one. On that
// level one of the shuffles is <0, u, u, ...> which is identity.
unsigned NumShuffles = NumReduxLevels;
if (IsPairwise && NumReduxLevels >= 1)
NumShuffles += NumReduxLevels - 1;
ShuffleCost += NumShuffles *
ConcreteTTI->getShuffleCost(TTI::SK_PermuteSingleSrc, Ty,
0, Ty);
ArithCost += (NumReduxLevels - LongVectorCount) *
ArithCost += NumReduxLevels *
ConcreteTTI->getArithmeticInstrCost(Opcode, Ty);
return ShuffleCost + ArithCost +
ConcreteTTI->getVectorInstrCost(Instruction::ExtractElement, Ty, 0);
Expand Down Expand Up @@ -1489,15 +1499,25 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
Ty = SubTy;
++LongVectorCount;
}

NumReduxLevels -= LongVectorCount;

// The minimal length of the vector is limited by the real length of vector
// operations performed on the current platform. That's why several final
// reduction opertions are perfomed on the vectors with the same
// architecture-dependent length.
ShuffleCost += (NumReduxLevels - LongVectorCount) * (IsPairwise + 1) *

// Non pairwise reductions need one shuffle per reduction level. Pairwise
// reductions need two shuffles on every level, but the last one. On that
// level one of the shuffles is <0, u, u, ...> which is identity.
unsigned NumShuffles = NumReduxLevels;
if (IsPairwise && NumReduxLevels >= 1)
NumShuffles += NumReduxLevels - 1;
ShuffleCost += NumShuffles *
ConcreteTTI->getShuffleCost(TTI::SK_PermuteSingleSrc, Ty,
0, Ty);
MinMaxCost +=
(NumReduxLevels - LongVectorCount) *
NumReduxLevels *
(ConcreteTTI->getCmpSelInstrCost(CmpOpcode, Ty, CondTy, nullptr) +
ConcreteTTI->getCmpSelInstrCost(Instruction::Select, Ty, CondTy,
nullptr));
Expand Down

0 comments on commit c6bfb05

Please sign in to comment.