-
Notifications
You must be signed in to change notification settings - Fork 15.5k
[AArch64] use isTRNMask to calculate shuffle costs
#171524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
7f26478 to
17d3d39
Compare
|
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-llvm-transforms Author: Philip Ginsbach-Chen (ginsbach) ChangesThis builds on #169858 to fix the divergence in codegen between two very similar functions initially observed in #137447 (represented in the diff by test cases int8x16_t f(int8_t x)
{
return (int8x16_t) { x, 0, x, 1, x, 2, x, 3,
x, 4, x, 5, x, 6, x, 7 };
}
int8x16_t g(int8_t x)
{
return (int8x16_t) { 0, x, 1, x, 2, x, 3, x,
4, x, 5, x, 6, x, 7, x };
}The PR consists of two commits:
I can think of several other ways to achieve the same result:
Option 3 seems less clean, whereas options 1&2 are not contained nicely inside AArch64. If we do want 1 or 2, I think it would still be better to leave that to a follow-up. Full diff: https://github.com/llvm/llvm-project/pull/171524.diff 3 Files Affected:
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
index 043be554f8441..0e667e6ecea86 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
@@ -5918,6 +5918,19 @@ InstructionCost AArch64TTIImpl::getPartialReductionCost(
return Cost + 2;
}
+TTI::ShuffleKind AArch64TTIImpl::improveShuffleKindFromMask(
+ TTI::ShuffleKind Kind, ArrayRef<int> Mask, VectorType *SrcTy, int &Index,
+ VectorType *&SubTy) const {
+ TTI::ShuffleKind SuperResult = BasicTTIImplBase::improveShuffleKindFromMask(
+ Kind, Mask, SrcTy, Index, SubTy);
+ if (SuperResult == TTI::ShuffleKind::SK_PermuteTwoSrc && !Mask.empty()) {
+ unsigned DummyUnsigned;
+ if (isTRNMask(Mask, Mask.size(), DummyUnsigned, DummyUnsigned))
+ return TTI::ShuffleKind::SK_Transpose;
+ }
+ return SuperResult;
+}
+
InstructionCost
AArch64TTIImpl::getShuffleCost(TTI::ShuffleKind Kind, VectorType *DstTy,
VectorType *SrcTy, ArrayRef<int> Mask,
diff --git a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
index ecefe2a7f1380..084c1af4298a1 100644
--- a/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
+++ b/llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
@@ -494,6 +494,11 @@ class AArch64TTIImpl final : public BasicTTIImplBase<AArch64TTIImpl> {
bool IsUnsigned, unsigned RedOpcode, Type *ResTy, VectorType *Ty,
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput) const override;
+ TTI::ShuffleKind improveShuffleKindFromMask(TTI::ShuffleKind Kind,
+ ArrayRef<int> Mask,
+ VectorType *SrcTy, int &Index,
+ VectorType *&SubTy) const;
+
InstructionCost
getShuffleCost(TTI::ShuffleKind Kind, VectorType *DstTy, VectorType *SrcTy,
ArrayRef<int> Mask, TTI::TargetCostKind CostKind, int Index,
diff --git a/llvm/test/Transforms/SLPVectorizer/AArch64/transpose-with-constants.ll b/llvm/test/Transforms/SLPVectorizer/AArch64/transpose-with-constants.ll
new file mode 100644
index 0000000000000..66461c9d91064
--- /dev/null
+++ b/llvm/test/Transforms/SLPVectorizer/AArch64/transpose-with-constants.ll
@@ -0,0 +1,47 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt < %s -passes=slp-vectorizer -S | FileCheck %s
+
+target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
+target triple = "aarch64--linux-gnu"
+
+define dso_local <16 x i8> @transpose_splat_constants(i8 noundef %x) {
+; CHECK-LABEL: @transpose_splat_constants(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x i8> poison, i8 [[X:%.*]], i32 0
+; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i8> [[TMP0]], <8 x i8> poison, <8 x i32> zeroinitializer
+; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> poison, <16 x i32> <i32 0, i32 poison, i32 1, i32 poison, i32 2, i32 poison, i32 3, i32 poison, i32 4, i32 poison, i32 5, i32 poison, i32 6, i32 poison, i32 7, i32 poison>
+; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <16 x i8> <i8 poison, i8 0, i8 poison, i8 1, i8 poison, i8 2, i8 poison, i8 3, i8 poison, i8 4, i8 poison, i8 5, i8 poison, i8 6, i8 poison, i8 7>, <16 x i8> [[TMP2]], <16 x i32> <i32 16, i32 1, i32 18, i32 3, i32 20, i32 5, i32 22, i32 7, i32 24, i32 9, i32 26, i32 11, i32 28, i32 13, i32 30, i32 15>
+; CHECK-NEXT: ret <16 x i8> [[TMP3]]
+;
+entry:
+ %0 = insertelement <16 x i8> <i8 poison, i8 0, i8 poison, i8 1, i8 poison, i8 2, i8 poison, i8 3, i8 poison, i8 4, i8 poison, i8 5, i8 poison, i8 6, i8 poison, i8 7>, i8 %x, i64 0
+ %1 = insertelement <16 x i8> %0, i8 %x, i64 2
+ %2 = insertelement <16 x i8> %1, i8 %x, i64 4
+ %3 = insertelement <16 x i8> %2, i8 %x, i64 6
+ %4 = insertelement <16 x i8> %3, i8 %x, i64 8
+ %5 = insertelement <16 x i8> %4, i8 %x, i64 10
+ %6 = insertelement <16 x i8> %5, i8 %x, i64 12
+ %7 = insertelement <16 x i8> %6, i8 %x, i64 14
+ ret <16 x i8> %7
+}
+
+define dso_local <16 x i8> @transpose_constants_splat(i8 noundef %x) {
+; CHECK-LABEL: @transpose_constants_splat(
+; CHECK-NEXT: entry:
+; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x i8> poison, i8 [[X:%.*]], i32 0
+; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <8 x i8> [[TMP0]], <8 x i8> poison, <8 x i32> zeroinitializer
+; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i8> [[TMP1]], <8 x i8> poison, <16 x i32> <i32 0, i32 poison, i32 1, i32 poison, i32 2, i32 poison, i32 3, i32 poison, i32 4, i32 poison, i32 5, i32 poison, i32 6, i32 poison, i32 7, i32 poison>
+; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <16 x i8> <i8 0, i8 poison, i8 1, i8 poison, i8 2, i8 poison, i8 3, i8 poison, i8 4, i8 poison, i8 5, i8 poison, i8 6, i8 poison, i8 7, i8 poison>, <16 x i8> [[TMP2]], <16 x i32> <i32 0, i32 16, i32 2, i32 18, i32 4, i32 20, i32 6, i32 22, i32 8, i32 24, i32 10, i32 26, i32 12, i32 28, i32 14, i32 30>
+; CHECK-NEXT: ret <16 x i8> [[TMP3]]
+;
+entry:
+ %0 = insertelement <16 x i8> <i8 0, i8 poison, i8 1, i8 poison, i8 2, i8 poison, i8 3, i8 poison, i8 4, i8 poison, i8 5, i8 poison, i8 6, i8 poison, i8 7, i8 poison>, i8 %x, i64 1
+ %1 = insertelement <16 x i8> %0, i8 %x, i64 3
+ %2 = insertelement <16 x i8> %1, i8 %x, i64 5
+ %3 = insertelement <16 x i8> %2, i8 %x, i64 7
+ %4 = insertelement <16 x i8> %3, i8 %x, i64 9
+ %5 = insertelement <16 x i8> %4, i8 %x, i64 11
+ %6 = insertelement <16 x i8> %5, i8 %x, i64 13
+ %7 = insertelement <16 x i8> %6, i8 %x, i64 15
+ ret <16 x i8> %7
+}
|
davemgreen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that either the generic ShuffleVectorInst::isTransposeMask should match transposes like this, or the isTRNMask can be reused in the same places we currently check isZipMask/isUZPMask/etc.
Can you add direct cost-modelling tests for shuffles of these kinds?
4b835da to
dbe240a
Compare
|
@davemgreen I have added three more commits that should address your suggestions.
With commit 3, I have picked the latter approach and used
Added in commit 4. Two of the test cases added in commit 4 showed that things can go wrong with two-element shuffles |
🪟 Windows x64 Test Results
✅ The build succeeded and all tests passed. |
🐧 Linux x64 Test Results
✅ The build succeeded and all tests passed. |
|
I think the test failure is unrelated and seems to have been introduced by #167798. |
|
Thanks, sounds great. LGTM. |
| LT.second.getVectorNumElements() == Mask.size() && | ||
| (Kind == TTI::SK_PermuteTwoSrc || Kind == TTI::SK_PermuteSingleSrc) && | ||
| (Kind == TTI::SK_PermuteTwoSrc || Kind == TTI::SK_PermuteSingleSrc || | ||
| Kind == TTI::SK_InsertSubvector) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe leave a comment about why SK_InsertSubvector is included here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit 7 adds such a comment.
MacDue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Otherwise, LGTM)
Thank you. Please, would you merge this for me? |
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/66/builds/23669 Here is the relevant piece of the build log for the reference |
This builds on #169858 to fix the divergence in codegen between two very similar functions initially observed in #137447 (represented in the diff by test cases
@transpose_splat_constantsand@transpose_constants_splat:The PR uses an additional
isTRNMaskcall inAArch64TTIImpl::getShuffleCostto ensure that we treat shuffle masks as transpose masks even ifisTransposeMaskfails to recognise them (meaning thatKind == TTI::SK_Transposecannot be relied upon).Follow-up work could consider modifying
isTransposeMask, but that would also impact other backends than AArch64.