-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RISCV][TTI] Cost a subvector insert at a register boundary with exact vlen #85240
base: main
Are you sure you want to change the base?
Conversation
…t vlen If we have exact vlen knowledge, we can figure out which indices correspond to register boundaries. Our lowering will use this knowledge to replace the vslideup.vi with a sub-register insert when the subvec passthru is undef. One case where the subvec passthru is known undef is when the subvec completely fills the subregister, and that's the easiest case to recognize during costing. Note: This is cost modeling a lowering which hasn't landed yet, see llvm#84107. This change will not land until after that one does. This is another piece split off llvm#80164
@llvm/pr-subscribers-backend-risc-v @llvm/pr-subscribers-llvm-analysis Author: Philip Reames (preames) ChangesIf we have exact vlen knowledge, we can figure out which indices correspond to register boundaries. Our lowering will use this knowledge to replace the vslideup.vi with a sub-register insert when the subvec passthru is undef. One case where the subvec passthru is known undef is when the subvec completely fills the subregister, and that's the easiest case to recognize during costing. Note: This is cost modeling a lowering which hasn't landed yet, see #84107. This change will not land until after that one does. This is another piece split off Full diff: https://github.com/llvm/llvm-project/pull/85240.diff 2 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 8f46fdc2f7ca93..34ffac8ab1650a 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -469,6 +469,22 @@ InstructionCost RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
return LT.first *
getRISCVInstructionCost(RISCV::VSLIDEDOWN_VI, LT.second, CostKind);
case TTI::SK_InsertSubvector:
+ // If we're inserting a subvector of *exactly* m1 size at a sub-register
+ // boundary this is a subregister insert at worst and won't require the
+ // slideup. We require the subvec to to be exactly VLEN as otherwise
+ // we'd have to account for tail elements in the m1 container if any.
+ // TODO: Extend for aligned m2, m4 inserts
+ // TODO: Extend for scalable subvector types
+ if (std::pair<InstructionCost, MVT> SubLT = getTypeLegalizationCost(SubTp);
+ SubLT.second.isValid() && SubLT.second.isFixedLengthVector()) {
+ const unsigned MinVLen = ST->getRealMinVLen();
+ const unsigned MaxVLen = ST->getRealMaxVLen();
+ if (MinVLen == MaxVLen &&
+ SubLT.second.getScalarSizeInBits() * Index % MinVLen == 0 &&
+ SubLT.second.getSizeInBits() == MinVLen)
+ return TTI::TCC_Free;
+ }
+
// Example sequence:
// vsetivli zero, 4, e8, mf2, tu, ma (ignored)
// vslideup.vi v8, v9, 2
diff --git a/llvm/test/Analysis/CostModel/RISCV/shuffle-insert_subvector.ll b/llvm/test/Analysis/CostModel/RISCV/shuffle-insert_subvector.ll
index a91d562b3f6f11..9b07e57752eec0 100644
--- a/llvm/test/Analysis/CostModel/RISCV/shuffle-insert_subvector.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/shuffle-insert_subvector.ll
@@ -527,7 +527,7 @@ define void @fixed_m1_in_m2_notail(<8 x i32> %src, <8 x i32> %passthru) vscale_r
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %2 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 5, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %3 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 10, i32 11, i32 6, i32 7>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %4 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 1, i32 2, i32 8, i32 9, i32 10, i32 11, i32 7>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %5 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %5 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
; SIZE-LABEL: 'fixed_m1_in_m2_notail'
@@ -535,7 +535,7 @@ define void @fixed_m1_in_m2_notail(<8 x i32> %src, <8 x i32> %passthru) vscale_r
; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %2 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 5, i32 6, i32 7>
; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %3 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 10, i32 11, i32 6, i32 7>
; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %4 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 1, i32 2, i32 8, i32 9, i32 10, i32 11, i32 7>
-; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %5 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
+; SIZE-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %5 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
;
shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 4, i32 5, i32 6, i32 7>
|
@@ -527,15 +527,15 @@ define void @fixed_m1_in_m2_notail(<8 x i32> %src, <8 x i32> %passthru) vscale_r | |||
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %2 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 5, i32 6, i32 7> | |||
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %3 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 10, i32 11, i32 6, i32 7> | |||
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %4 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 1, i32 2, i32 8, i32 9, i32 10, i32 11, i32 7> | |||
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %5 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11> | |||
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %5 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to call this out because I found it mildly surprising - we only see changes for non-zero insert elements here because a insertsubvector with index zero is a select instead. (That is, it's recognized as a select by the pattern matching before costing is invoked). We may want to revisit that separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to call this out because I found it mildly surprising - we only see changes for non-zero insert elements here because a insertsubvector with index zero is a select instead. (That is, it's recognized as a select by the pattern matching before costing is invoked). We may want to revisit that separately.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is #85302
if (std::pair<InstructionCost, MVT> SubLT = getTypeLegalizationCost(SubTp); | ||
SubLT.second.isValid() && SubLT.second.isFixedLengthVector()) { | ||
const unsigned MinVLen = ST->getRealMinVLen(); | ||
const unsigned MaxVLen = ST->getRealMaxVLen(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also use ST->getRealVLen()
// slideup. We require the subvec to to be exactly VLEN as otherwise | ||
// we'd have to account for tail elements in the m1 container if any. | ||
// TODO: Extend for aligned m2, m4 inserts | ||
// TODO: Extend for scalable subvector types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another case we handle as a subregister insert in #84107 is mf{2,4,8} subvector inserts, where the bottom element being inserted is aligned to a vector register and the vector being inserted into is undef.
If we have exact vlen knowledge, we can figure out which indices correspond to register boundaries. Our lowering will use this knowledge to replace the vslideup.vi with a sub-register insert when the subvec passthru is undef. One case where the subvec passthru is known undef is when the subvec completely fills the subregister, and that's the easiest case to recognize during costing.
Note: This is cost modeling a lowering which hasn't landed yet, see #84107. This change will not land until after that one does.
This is another piece split off
#80164