[RISCV][TTI] Cost a subvector insert at a register boundary with exact vlen #85240

preames · 2024-03-14T15:14:16Z

If we have exact vlen knowledge, we can figure out which indices correspond to register boundaries. Our lowering will use this knowledge to replace the vslideup.vi with a sub-register insert when the subvec passthru is undef. One case where the subvec passthru is known undef is when the subvec completely fills the subregister, and that's the easiest case to recognize during costing.

Note: This is cost modeling a lowering which hasn't landed yet, see #84107. This change will not land until after that one does.

This is another piece split off
#80164

…t vlen If we have exact vlen knowledge, we can figure out which indices correspond to register boundaries. Our lowering will use this knowledge to replace the vslideup.vi with a sub-register insert when the subvec passthru is undef. One case where the subvec passthru is known undef is when the subvec completely fills the subregister, and that's the easiest case to recognize during costing. Note: This is cost modeling a lowering which hasn't landed yet, see llvm#84107. This change will not land until after that one does. This is another piece split off llvm#80164

llvmbot · 2024-03-14T15:14:51Z

@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-analysis

Author: Philip Reames (preames)

Changes

If we have exact vlen knowledge, we can figure out which indices correspond to register boundaries. Our lowering will use this knowledge to replace the vslideup.vi with a sub-register insert when the subvec passthru is undef. One case where the subvec passthru is known undef is when the subvec completely fills the subregister, and that's the easiest case to recognize during costing.

Note: This is cost modeling a lowering which hasn't landed yet, see #84107. This change will not land until after that one does.

This is another piece split off
#80164

Full diff: https://github.com/llvm/llvm-project/pull/85240.diff

2 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+16)
(modified) llvm/test/Analysis/CostModel/RISCV/shuffle-insert_subvector.ll (+2-2)

diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index 8f46fdc2f7ca93..34ffac8ab1650a 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -469,6 +469,22 @@ InstructionCost RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
     return LT.first *
            getRISCVInstructionCost(RISCV::VSLIDEDOWN_VI, LT.second, CostKind);
   case TTI::SK_InsertSubvector:
+    // If we're inserting a subvector of *exactly* m1 size at a sub-register
+    // boundary this is a subregister insert at worst and won't require the
+    // slideup.  We require the subvec to to be exactly VLEN as otherwise
+    // we'd have to account for tail elements in the m1 container if any.
+    // TODO: Extend for aligned m2, m4 inserts
+    // TODO: Extend for scalable subvector types
+    if (std::pair<InstructionCost, MVT> SubLT = getTypeLegalizationCost(SubTp);
+        SubLT.second.isValid() && SubLT.second.isFixedLengthVector()) {
+      const unsigned MinVLen = ST->getRealMinVLen();
+      const unsigned MaxVLen = ST->getRealMaxVLen();
+      if (MinVLen == MaxVLen &&
+          SubLT.second.getScalarSizeInBits() * Index % MinVLen == 0 &&
+          SubLT.second.getSizeInBits() == MinVLen)
+        return TTI::TCC_Free;
+    }
+
     // Example sequence:
     // vsetivli     zero, 4, e8, mf2, tu, ma (ignored)
     // vslideup.vi  v8, v9, 2
diff --git a/llvm/test/Analysis/CostModel/RISCV/shuffle-insert_subvector.ll b/llvm/test/Analysis/CostModel/RISCV/shuffle-insert_subvector.ll
index a91d562b3f6f11..9b07e57752eec0 100644
--- a/llvm/test/Analysis/CostModel/RISCV/shuffle-insert_subvector.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/shuffle-insert_subvector.ll
@@ -527,7 +527,7 @@ define void @fixed_m1_in_m2_notail(<8 x i32> %src, <8 x i32> %passthru) vscale_r
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %2 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 5, i32 6, i32 7>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %3 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 10, i32 11, i32 6, i32 7>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %4 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 1, i32 2, i32 8, i32 9, i32 10, i32 11, i32 7>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %5 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %5 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
 ; SIZE-LABEL: 'fixed_m1_in_m2_notail'
@@ -535,7 +535,7 @@ define void @fixed_m1_in_m2_notail(<8 x i32> %src, <8 x i32> %passthru) vscale_r
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %2 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 5, i32 6, i32 7>
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %3 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 10, i32 11, i32 6, i32 7>
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %4 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 1, i32 2, i32 8, i32 9, i32 10, i32 11, i32 7>
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %5 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %5 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret void
 ;
   shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 4, i32 5, i32 6, i32 7>

preames · 2024-03-14T15:18:01Z

llvm/test/Analysis/CostModel/RISCV/shuffle-insert_subvector.ll

@@ -527,15 +527,15 @@ define void @fixed_m1_in_m2_notail(<8 x i32> %src, <8 x i32> %passthru) vscale_r
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %2 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 5, i32 6, i32 7>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %3 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 1, i32 8, i32 9, i32 10, i32 11, i32 6, i32 7>
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %4 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 1, i32 2, i32 8, i32 9, i32 10, i32 11, i32 7>
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %5 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %5 = shufflevector <8 x i32> %src, <8 x i32> %passthru, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>


Just to call this out because I found it mildly surprising - we only see changes for non-zero insert elements here because a insertsubvector with index zero is a select instead. (That is, it's recognized as a select by the pattern matching before costing is invoked). We may want to revisit that separately.

Just to call this out because I found it mildly surprising - we only see changes for non-zero insert elements here because a insertsubvector with index zero is a select instead. (That is, it's recognized as a select by the pattern matching before costing is invoked). We may want to revisit that separately.

+1

This is #85302

lukel97 · 2024-03-18T08:35:26Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

+    if (std::pair<InstructionCost, MVT> SubLT = getTypeLegalizationCost(SubTp);
+        SubLT.second.isValid() && SubLT.second.isFixedLengthVector()) {
+      const unsigned MinVLen = ST->getRealMinVLen();
+      const unsigned MaxVLen = ST->getRealMaxVLen();


We could also use ST->getRealVLen()

lukel97 · 2024-03-18T08:38:28Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

+    // slideup.  We require the subvec to to be exactly VLEN as otherwise
+    // we'd have to account for tail elements in the m1 container if any.
+    // TODO: Extend for aligned m2, m4 inserts
+    // TODO: Extend for scalable subvector types


Another case we handle as a subregister insert in #84107 is mf{2,4,8} subvector inserts, where the bottom element being inserted is aligned to a vector register and the vector being inserted into is undef.

preames requested review from lukel97, alexey-bataev and topperc March 14, 2024 15:14

llvmbot added backend:RISC-V llvm:analysis labels Mar 14, 2024

preames mentioned this pull request Mar 14, 2024

[TTI][RISCV]Improve costs for whole vector reg extract/insert. #80164

Closed

preames commented Mar 14, 2024

View reviewed changes

lukel97 reviewed Mar 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV][TTI] Cost a subvector insert at a register boundary with exact vlen #85240

[RISCV][TTI] Cost a subvector insert at a register boundary with exact vlen #85240

preames commented Mar 14, 2024

llvmbot commented Mar 14, 2024 •

edited

preames Mar 14, 2024

alexey-bataev Mar 14, 2024

preames Mar 14, 2024

lukel97 Mar 18, 2024

lukel97 Mar 18, 2024 •

edited

[RISCV][TTI] Cost a subvector insert at a register boundary with exact vlen #85240

Are you sure you want to change the base?

[RISCV][TTI] Cost a subvector insert at a register boundary with exact vlen #85240

Conversation

preames commented Mar 14, 2024

llvmbot commented Mar 14, 2024 • edited

preames Mar 14, 2024

Choose a reason for hiding this comment

alexey-bataev Mar 14, 2024

Choose a reason for hiding this comment

preames Mar 14, 2024

Choose a reason for hiding this comment

lukel97 Mar 18, 2024

Choose a reason for hiding this comment

lukel97 Mar 18, 2024 • edited

Choose a reason for hiding this comment

llvmbot commented Mar 14, 2024 •

edited

lukel97 Mar 18, 2024 •

edited