-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PowerPC] adjust cost for vector insert/extract with non const index #79092
Conversation
@llvm/pr-subscribers-llvm-analysis Author: Chen Zheng (chenzheng1030) ChangesP9 has vxform Fixes #50249 Full diff: https://github.com/llvm/llvm-project/pull/79092.diff 2 Files Affected:
diff --git a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
index 062b53e24a0d79..4768d0a1ba693e 100644
--- a/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
+++ b/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
@@ -697,34 +697,44 @@ InstructionCost PPCTTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val,
return Cost;
- } else if (Val->getScalarType()->isIntegerTy() && Index != -1U) {
+ } else if (Val->getScalarType()->isIntegerTy()) {
unsigned EltSize = Val->getScalarSizeInBits();
// Computing on 1 bit values requires extra mask or compare operations.
unsigned MaskCost = VecMaskCost && EltSize == 1 ? 1 : 0;
if (ST->hasP9Altivec()) {
- if (ISD == ISD::INSERT_VECTOR_ELT)
- // A move-to VSR and a permute/insert. Assume vector operation cost
- // for both (cost will be 2x on P9).
- return 2 * CostFactor;
-
- // It's an extract. Maybe we can do a cheap move-from VSR.
- unsigned EltSize = Val->getScalarSizeInBits();
- if (EltSize == 64) {
- unsigned MfvsrdIndex = ST->isLittleEndian() ? 1 : 0;
- if (Index == MfvsrdIndex)
- return 1;
- } else if (EltSize == 32) {
- unsigned MfvsrwzIndex = ST->isLittleEndian() ? 2 : 1;
- if (Index == MfvsrwzIndex)
- return 1;
- }
-
- // We need a vector extract (or mfvsrld). Assume vector operation cost.
- // The cost of the load constant for a vector extract is disregarded
- // (invariant, easily schedulable).
- return CostFactor + MaskCost;
+ // P10 has vxform insert which can handle non const index. The MaskCost is
+ // for masking the index.
+ // P9 has insert for const index. A move-to VSR and a permute/insert.
+ // Assume vector operation cost for both (cost will be 2x on P9).
+ if (ISD == ISD::INSERT_VECTOR_ELT) {
+ if (ST->isISA3_1())
+ return CostFactor + MaskCost;
+ else if (Index != -1U)
+ return 2 * CostFactor;
+ } else if (ISD == ISD::EXTRACT_VECTOR_ELT) {
+ // It's an extract. Maybe we can do a cheap move-from VSR.
+ unsigned EltSize = Val->getScalarSizeInBits();
+ if (EltSize == 64) {
+ // FIXME: no need to worry about endian, P9 has both mfvsrd/mfvsrld.
+ unsigned MfvsrdIndex = ST->isLittleEndian() ? 1 : 0;
+ if (Index == MfvsrdIndex)
+ return 1;
+ } else if (EltSize == 32) {
+ unsigned MfvsrwzIndex = ST->isLittleEndian() ? 2 : 1;
+ if (Index == MfvsrwzIndex)
+ return 1;
+
+ // For other indexs like non const, P9 has vxform extract. The
+ // MaskCost is for masking the index.
+ return CostFactor + MaskCost;
+ }
- } else if (ST->hasDirectMove()) {
+ // We need a vector extract (or mfvsrld). Assume vector operation cost.
+ // The cost of the load constant for a vector extract is disregarded
+ // (invariant, easily schedulable).
+ return CostFactor + MaskCost;
+ }
+ } else if (ST->hasDirectMove() && Index != -1U) {
// Assume permute has standard cost.
// Assume move-to/move-from VSR have 2x standard cost.
if (ISD == ISD::INSERT_VECTOR_ELT)
diff --git a/llvm/test/Analysis/CostModel/PowerPC/insert_extract.ll b/llvm/test/Analysis/CostModel/PowerPC/insert_extract.ll
index 607d15790b5f31..512c83f0182377 100644
--- a/llvm/test/Analysis/CostModel/PowerPC/insert_extract.ll
+++ b/llvm/test/Analysis/CostModel/PowerPC/insert_extract.ll
@@ -27,7 +27,7 @@ define i32 @insert(i32 %arg) {
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;
; CHECK-P10-LABEL: 'insert'
-; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %x = insertelement <4 x i32> undef, i32 %arg, i32 0
+; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %x = insertelement <4 x i32> undef, i32 %arg, i32 0
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
;
%x = insertelement <4 x i32> undef, i32 %arg, i32 0
@@ -109,7 +109,7 @@ define void @test4xi32(<4 x i32> %v1, i32 %x1) {
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
; CHECK-P10-LABEL: 'test4xi32'
-; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %v2 = insertelement <4 x i32> %v1, i32 %x1, i32 2
+; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %v2 = insertelement <4 x i32> %v1, i32 %x1, i32 2
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
%v2 = insertelement <4 x i32> %v1, i32 %x1, i32 2
@@ -239,7 +239,7 @@ define <2 x i64> @insert_i64_x(<2 x i64> %dest, i64 %arg, i32 %idx) {
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i64> %x
;
; CHECK-P10-LABEL: 'insert_i64_x'
-; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %x = insertelement <2 x i64> %dest, i64 %arg, i32 %idx
+; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %x = insertelement <2 x i64> %dest, i64 %arg, i32 %idx
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i64> %x
;
%x = insertelement <2 x i64> %dest, i64 %arg, i32 %idx
@@ -264,7 +264,7 @@ define <4 x i32> @insert_i32_x(<4 x i32> %dest, i32 %arg, i32 %idx) {
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %x
;
; CHECK-P10-LABEL: 'insert_i32_x'
-; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %x = insertelement <4 x i32> %dest, i32 %arg, i32 %idx
+; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %x = insertelement <4 x i32> %dest, i32 %arg, i32 %idx
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <4 x i32> %x
;
%x = insertelement <4 x i32> %dest, i32 %arg, i32 %idx
@@ -289,7 +289,7 @@ define <8 x i16> @insert_i16_x(<8 x i16> %dest, i16 %arg, i32 %idx) {
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x i16> %x
;
; CHECK-P10-LABEL: 'insert_i16_x'
-; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %x = insertelement <8 x i16> %dest, i16 %arg, i32 %idx
+; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %x = insertelement <8 x i16> %dest, i16 %arg, i32 %idx
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <8 x i16> %x
;
%x = insertelement <8 x i16> %dest, i16 %arg, i32 %idx
@@ -314,7 +314,7 @@ define <16 x i8> @insert_i8_x(<16 x i8> %dest, i8 %arg, i32 %idx) {
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x i8> %x
;
; CHECK-P10-LABEL: 'insert_i8_x'
-; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %x = insertelement <16 x i8> %dest, i8 %arg, i32 %idx
+; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %x = insertelement <16 x i8> %dest, i8 %arg, i32 %idx
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret <16 x i8> %x
;
%x = insertelement <16 x i8> %dest, i8 %arg, i32 %idx
@@ -331,15 +331,15 @@ define i64 @extract_i64_x(<2 x i64> %arg, i32 %idx) {
; CHECK-P8LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %x
;
; CHECK-P9BE-LABEL: 'extract_i64_x'
-; CHECK-P9BE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %x = extractelement <2 x i64> %arg, i32 %idx
+; CHECK-P9BE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %x = extractelement <2 x i64> %arg, i32 %idx
; CHECK-P9BE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %x
;
; CHECK-P9LE-LABEL: 'extract_i64_x'
-; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %x = extractelement <2 x i64> %arg, i32 %idx
+; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %x = extractelement <2 x i64> %arg, i32 %idx
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %x
;
; CHECK-P10-LABEL: 'extract_i64_x'
-; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %x = extractelement <2 x i64> %arg, i32 %idx
+; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %x = extractelement <2 x i64> %arg, i32 %idx
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i64 %x
;
%x = extractelement <2 x i64> %arg, i32 %idx
@@ -356,15 +356,15 @@ define i32 @extract_i32_x(<4 x i32> %arg, i32 %idx) {
; CHECK-P8LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %x
;
; CHECK-P9BE-LABEL: 'extract_i32_x'
-; CHECK-P9BE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %x = extractelement <4 x i32> %arg, i32 %idx
+; CHECK-P9BE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %x = extractelement <4 x i32> %arg, i32 %idx
; CHECK-P9BE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %x
;
; CHECK-P9LE-LABEL: 'extract_i32_x'
-; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %x = extractelement <4 x i32> %arg, i32 %idx
+; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %x = extractelement <4 x i32> %arg, i32 %idx
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %x
;
; CHECK-P10-LABEL: 'extract_i32_x'
-; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %x = extractelement <4 x i32> %arg, i32 %idx
+; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %x = extractelement <4 x i32> %arg, i32 %idx
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 %x
;
%x = extractelement <4 x i32> %arg, i32 %idx
@@ -381,15 +381,15 @@ define i16 @extract_i16_x(<8 x i16> %arg, i32 %idx) {
; CHECK-P8LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i16 %x
;
; CHECK-P9BE-LABEL: 'extract_i16_x'
-; CHECK-P9BE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %x = extractelement <8 x i16> %arg, i32 %idx
+; CHECK-P9BE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %x = extractelement <8 x i16> %arg, i32 %idx
; CHECK-P9BE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i16 %x
;
; CHECK-P9LE-LABEL: 'extract_i16_x'
-; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %x = extractelement <8 x i16> %arg, i32 %idx
+; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %x = extractelement <8 x i16> %arg, i32 %idx
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i16 %x
;
; CHECK-P10-LABEL: 'extract_i16_x'
-; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %x = extractelement <8 x i16> %arg, i32 %idx
+; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %x = extractelement <8 x i16> %arg, i32 %idx
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i16 %x
;
%x = extractelement <8 x i16> %arg, i32 %idx
@@ -406,15 +406,15 @@ define i8 @extract_i8_x(<16 x i8> %arg, i32 %idx) {
; CHECK-P8LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %x
;
; CHECK-P9BE-LABEL: 'extract_i8_x'
-; CHECK-P9BE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %x = extractelement <16 x i8> %arg, i32 %idx
+; CHECK-P9BE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %x = extractelement <16 x i8> %arg, i32 %idx
; CHECK-P9BE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %x
;
; CHECK-P9LE-LABEL: 'extract_i8_x'
-; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %x = extractelement <16 x i8> %arg, i32 %idx
+; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %x = extractelement <16 x i8> %arg, i32 %idx
; CHECK-P9LE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %x
;
; CHECK-P10-LABEL: 'extract_i8_x'
-; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %x = extractelement <16 x i8> %arg, i32 %idx
+; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %x = extractelement <16 x i8> %arg, i32 %idx
; CHECK-P10-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i8 %x
;
%x = extractelement <16 x i8> %arg, i32 %idx
|
// Assume vector operation cost for both (cost will be 2x on P9). | ||
if (ISD == ISD::INSERT_VECTOR_ELT) { | ||
if (ST->isISA3_1()) | ||
return CostFactor + MaskCost; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MaskCost is dependent on element type - if the issue is the index it will be applied inconsistently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for review @RolandF77 .
For the "legal" types for insertelment, like i8/i16/i32/i64, on P10, see https://godbolt.org/z/jcor7KEWr , all of them can be represented with a vector instruction + some arithmetic instructions.
Are there other types I should test?
The issue #50249 is not for P10, it is for P9 instead.
Before the change, on P9, extractelement with non-const index will have the cost calculated at line 750. That cost is based on loading the required element from the memory where the bigger data is stored. This would inaccurately increase the cost for such extractelement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The basic idea of the change makes sense. The issue is specifically the use of MaskCost to account for indexing overhead. MaskCost is 0 unless the element type to the vector is i1, which does not look expected here. It was added to deal with the overhead of sub-byte values extracted from i1 vectors. I think you should probably use a different computation which is 0 unless the index is -1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Sorry, I misunderstood your previous comment. Patch updated.
1b57f34
to
735ccb4
Compare
// The cost of the load constant for a vector extract is disregarded | ||
// (invariant, easily schedulable). | ||
return CostFactor + MaskCost; | ||
// P10 has vxform insert which can handle non const index. The MaskCost is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: MaskCost => MaskCostForIdx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
P9 has vxform
Vector Extract Element Instructions
likevextuwrx
and P10 has vxformVector Insert Element instructions
likevinsd
. Update the instruction cost reflecting these instructions.Fixes #50249