[RISCV] Cost @llvm.vector.{extract,insert} as free at index 0 #81818

lukel97 · 2024-02-15T04:30:46Z

In #81751 we still weren't costing extracts of scalable subvectors from scalable vectors at index 0 as free.
It turns out that if the subvector to extract is scalable, then getIntrinsicInstrCost is used instead of getShuffleCost. This handles the index = 0 case for the vector insert and extract intrinsics inside said hook.

Note we'll still need to keep the existing logic from #81751 inside getShuffleCost, since anything that's not:

a scalable extract of a scalable vector or
a scalable insert into a scalable vector

will still go down that path. As well as existing fixed-length shufflevectors.

Also note that there's some shortcut logic in BasicTTImplBase::getIntrinsicInstrCost where if the target getIntrinsicInstrCost is free, it won't bother calling into getShuffleCost:

llvm-project/llvm/include/llvm/CodeGen/BasicTTIImpl.h

Lines 1534 to 1538 in fc0b67e

    
           InstructionCost getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA, 
        
                                                 TTI::TargetCostKind CostKind) { 
        
             // Check for generically free intrinsics. 
        
             if (BaseT::getIntrinsicInstrCost(ICA, CostKind) == 0) 
        
               return 0;

This builds upon llvm#81751 by handling the @llvm.vector.{extract,insert} intrinsics. I believe we'll need the logic in both places as fixed vector extracts/inserts of fixed vectors will use the shuffle cost hook, whereas anything with a scalable return type will use the intrinsic cost hook.

llvmbot · 2024-02-15T04:31:17Z

@llvm/pr-subscribers-llvm-analysis

Author: Luke Lau (lukel97)

Changes

In #81751 we still weren't costing extracts of scalable subvectors from scalable vectors at index 0 as free.
It turns out that if the subvector to extract is scalable, then getIntrinsicInstrCost is used instead of getShuffleCost. This handles the index = 0 case for the vector insert and extract intrinsics inside said hook.

Note we'll still need to keep the existing logic inside getShuffleCost, since anything that's not:

a scalable extract of a scalable vector or
a scalable insert into a scalable vector

will still go down that path. As well as existing fixed-length shufflevectors.

Also note that there's some shortcut logic in BasicTTImplBase::getIntrinsicInstrCost where if the target getIntrinsicInstrCost is free, it won't bother calling into getShuffleCost:

llvm-project/llvm/include/llvm/CodeGen/BasicTTIImpl.h

Lines 1534 to 1538 in fc0b67e

    
           InstructionCost getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA, 
        
                                                 TTI::TargetCostKind CostKind) { 
        
             // Check for generically free intrinsics. 
        
             if (BaseT::getIntrinsicInstrCost(ICA, CostKind) == 0) 
        
               return 0;

Patch is 43.08 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/81818.diff

4 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+28)
(modified) llvm/test/Analysis/CostModel/RISCV/rvv-shuffle.ll (+6-6)
(modified) llvm/test/Analysis/CostModel/RISCV/rvv-vectorextract.ll (+42-42)
(modified) llvm/test/Analysis/CostModel/RISCV/rvv-vectorinsert.ll (+50-50)

diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index d1db47a6061e4e..81d2b7cc1353af 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -809,6 +809,34 @@ RISCVTTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
     }
     break;
   }
+  case Intrinsic::vector_extract: {
+    // A vector extract at index 0 is a (free) subregister extract.
+    if (auto *CIdx = dyn_cast<ConstantInt>(ICA.getArgs()[1]);
+        CIdx && CIdx->isZero())
+      return TTI::TCC_Free;
+    break;
+  }
+  case Intrinsic::vector_insert: {
+    auto FitsSubreg = [this](Type *Ty) {
+      if (!isa<ScalableVectorType>(Ty))
+        return false;
+      // Any scalable vector LMUL >= 1 will fit exactly into a register group.
+      auto [_Cost, LT] = getTypeLegalizationCost(Ty);
+      auto [_Coeff, Fractional] =
+          RISCVVType::decodeVLMUL(RISCVTargetLowering::getLMUL(LT));
+      return !Fractional;
+    };
+
+    // A vector insert at index 0 is a (free) subregister insert if:
+    //
+    // - The subvec fits exactly into a register group or
+    // - The vector is undef
+    if (auto *CIdx = dyn_cast<ConstantInt>(ICA.getArgs()[2]);
+        CIdx && CIdx->isZero() &&
+        (FitsSubreg(ICA.getArgTypes()[1]) || isa<UndefValue>(ICA.getArgs()[0])))
+      return TTI::TCC_Free;
+    break;
+  }
   // TODO: add more intrinsic
   case Intrinsic::experimental_stepvector: {
     unsigned Cost = 1; // vid
diff --git a/llvm/test/Analysis/CostModel/RISCV/rvv-shuffle.ll b/llvm/test/Analysis/CostModel/RISCV/rvv-shuffle.ll
index 4f3c7e2f90c655..348a6cf380e97d 100644
--- a/llvm/test/Analysis/CostModel/RISCV/rvv-shuffle.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/rvv-shuffle.ll
@@ -52,17 +52,17 @@ define void  @vector_broadcast() {
 
 define void @vector_insert_extract(<vscale x 4 x i32> %v0, <vscale x 16 x i32> %v1, <16 x i32> %v2) {
 ; CHECK-LABEL: 'vector_insert_extract'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %extract_fixed_from_scalable = call <16 x i32> @llvm.vector.extract.v16i32.nxv4i32(<vscale x 4 x i32> %v0, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %extract_fixed_from_scalable = call <16 x i32> @llvm.vector.extract.v16i32.nxv4i32(<vscale x 4 x i32> %v0, i64 0)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %insert_fixed_into_scalable = call <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v16i32(<vscale x 4 x i32> %v0, <16 x i32> %v2, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %extract_scalable_from_scalable = call <vscale x 4 x i32> @llvm.vector.extract.nxv4i32.nxv16i32(<vscale x 16 x i32> %v1, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %insert_scalable_into_scalable = call <vscale x 16 x i32> @llvm.vector.insert.nxv16i32.nxv4i32(<vscale x 16 x i32> %v1, <vscale x 4 x i32> %v0, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %extract_scalable_from_scalable = call <vscale x 4 x i32> @llvm.vector.extract.nxv4i32.nxv16i32(<vscale x 16 x i32> %v1, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %insert_scalable_into_scalable = call <vscale x 16 x i32> @llvm.vector.insert.nxv16i32.nxv4i32(<vscale x 16 x i32> %v1, <vscale x 4 x i32> %v0, i64 0)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
 ; SIZE-LABEL: 'vector_insert_extract'
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %extract_fixed_from_scalable = call <16 x i32> @llvm.vector.extract.v16i32.nxv4i32(<vscale x 4 x i32> %v0, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %extract_fixed_from_scalable = call <16 x i32> @llvm.vector.extract.v16i32.nxv4i32(<vscale x 4 x i32> %v0, i64 0)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %insert_fixed_into_scalable = call <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v16i32(<vscale x 4 x i32> %v0, <16 x i32> %v2, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %extract_scalable_from_scalable = call <vscale x 4 x i32> @llvm.vector.extract.nxv4i32.nxv16i32(<vscale x 16 x i32> %v1, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %insert_scalable_into_scalable = call <vscale x 16 x i32> @llvm.vector.insert.nxv16i32.nxv4i32(<vscale x 16 x i32> %v1, <vscale x 4 x i32> %v0, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %extract_scalable_from_scalable = call <vscale x 4 x i32> @llvm.vector.extract.nxv4i32.nxv16i32(<vscale x 16 x i32> %v1, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %insert_scalable_into_scalable = call <vscale x 16 x i32> @llvm.vector.insert.nxv16i32.nxv4i32(<vscale x 16 x i32> %v1, <vscale x 4 x i32> %v0, i64 0)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret void
 ;
   %extract_fixed_from_scalable = call <16 x i32> @llvm.vector.extract.v16i32.nxv4i32(<vscale x 4 x i32> %v0, i64 0)
diff --git a/llvm/test/Analysis/CostModel/RISCV/rvv-vectorextract.ll b/llvm/test/Analysis/CostModel/RISCV/rvv-vectorextract.ll
index 1e2d1f4d94954e..c4653ace9bac09 100644
--- a/llvm/test/Analysis/CostModel/RISCV/rvv-vectorextract.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/rvv-vectorextract.ll
@@ -4,37 +4,37 @@
 
 define void @vector_extract_nxv128i8_0(<vscale x 128 x i8> %v) {
 ; CHECK-LABEL: 'vector_extract_nxv128i8_0'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_mf8 = call <vscale x 1 x i8> @llvm.vector.extract.nxv1i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_mf4 = call <vscale x 2 x i8> @llvm.vector.extract.nxv2i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_mf2 = call <vscale x 4 x i8> @llvm.vector.extract.nxv4i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_m1 = call <vscale x 8 x i8> @llvm.vector.extract.nxv8i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_m2 = call <vscale x 16 x i8> @llvm.vector.extract.nxv16i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_m4 = call <vscale x 32 x i8> @llvm.vector.extract.nxv32i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_m8 = call <vscale x 64 x i8> @llvm.vector.extract.nxv64i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %fixed_mf8 = call <2 x i8> @llvm.vector.extract.v2i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %fixed_mf4 = call <4 x i8> @llvm.vector.extract.v4i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %fixed_mf2 = call <8 x i8> @llvm.vector.extract.v8i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %fixed_m1 = call <16 x i8> @llvm.vector.extract.v16i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %fixed_m2 = call <32 x i8> @llvm.vector.extract.v32i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %fixed_m4 = call <64 x i8> @llvm.vector.extract.v64i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %fixed_m8 = call <128 x i8> @llvm.vector.extract.v128i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_mf8 = call <vscale x 1 x i8> @llvm.vector.extract.nxv1i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_mf4 = call <vscale x 2 x i8> @llvm.vector.extract.nxv2i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_mf2 = call <vscale x 4 x i8> @llvm.vector.extract.nxv4i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_m1 = call <vscale x 8 x i8> @llvm.vector.extract.nxv8i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_m2 = call <vscale x 16 x i8> @llvm.vector.extract.nxv16i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_m4 = call <vscale x 32 x i8> @llvm.vector.extract.nxv32i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_m8 = call <vscale x 64 x i8> @llvm.vector.extract.nxv64i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_mf8 = call <2 x i8> @llvm.vector.extract.v2i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_mf4 = call <4 x i8> @llvm.vector.extract.v4i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_mf2 = call <8 x i8> @llvm.vector.extract.v8i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m1 = call <16 x i8> @llvm.vector.extract.v16i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m2 = call <32 x i8> @llvm.vector.extract.v32i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m4 = call <64 x i8> @llvm.vector.extract.v64i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m8 = call <128 x i8> @llvm.vector.extract.v128i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
 ; SIZE-LABEL: 'vector_extract_nxv128i8_0'
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_mf8 = call <vscale x 1 x i8> @llvm.vector.extract.nxv1i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_mf4 = call <vscale x 2 x i8> @llvm.vector.extract.nxv2i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_mf2 = call <vscale x 4 x i8> @llvm.vector.extract.nxv4i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_m1 = call <vscale x 8 x i8> @llvm.vector.extract.nxv8i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_m2 = call <vscale x 16 x i8> @llvm.vector.extract.nxv16i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_m4 = call <vscale x 32 x i8> @llvm.vector.extract.nxv32i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_m8 = call <vscale x 64 x i8> @llvm.vector.extract.nxv64i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %fixed_mf8 = call <2 x i8> @llvm.vector.extract.v2i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %fixed_mf4 = call <4 x i8> @llvm.vector.extract.v4i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %fixed_mf2 = call <8 x i8> @llvm.vector.extract.v8i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %fixed_m1 = call <16 x i8> @llvm.vector.extract.v16i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %fixed_m2 = call <32 x i8> @llvm.vector.extract.v32i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %fixed_m4 = call <64 x i8> @llvm.vector.extract.v64i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %fixed_m8 = call <128 x i8> @llvm.vector.extract.v128i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_mf8 = call <vscale x 1 x i8> @llvm.vector.extract.nxv1i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_mf4 = call <vscale x 2 x i8> @llvm.vector.extract.nxv2i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_mf2 = call <vscale x 4 x i8> @llvm.vector.extract.nxv4i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_m1 = call <vscale x 8 x i8> @llvm.vector.extract.nxv8i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_m2 = call <vscale x 16 x i8> @llvm.vector.extract.nxv16i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_m4 = call <vscale x 32 x i8> @llvm.vector.extract.nxv32i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_m8 = call <vscale x 64 x i8> @llvm.vector.extract.nxv64i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_mf8 = call <2 x i8> @llvm.vector.extract.v2i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_mf4 = call <4 x i8> @llvm.vector.extract.v4i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_mf2 = call <8 x i8> @llvm.vector.extract.v8i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m1 = call <16 x i8> @llvm.vector.extract.v16i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m2 = call <32 x i8> @llvm.vector.extract.v32i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m4 = call <64 x i8> @llvm.vector.extract.v64i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m8 = call <128 x i8> @llvm.vector.extract.v128i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret void
 ;
   %scalable_mf8 = call <vscale x 1 x i8> @llvm.vector.extract.nxv1i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
@@ -110,23 +110,23 @@ define void @vector_extract_nxv128i8_1(<vscale x 128 x i8> %v) {
 
 define void @vector_extract_v128i8_0(<128 x i8> %v) {
 ; CHECK-LABEL: 'vector_extract_v128i8_0'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %fixed_mf8 = call <2 x i8> @llvm.vector.extract.v2i8.v128i8(<128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %fixed_mf4 = call <4 x i8> @llvm.vector.extract.v4i8.v128i8(<128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %fixed_mf2 = call <8 x i8> @llvm.vector.extract.v8i8.v128i8(<128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %fixed_m1 = call <16 x i8> @llvm.vector.extract.v16i8.v128i8(<128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %fixed_m2 = call <32 x i8> @llvm.vector.extract.v32i8.v128i8(<128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %fixed_m4 = call <64 x i8> @llvm.vector.extract.v64i8.v128i8(<128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %fixed_m8 = call <128 x i8> @llvm.vector.extract.v128i8.v128i8(<128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_mf8 = call <2 x i8> @llvm.vector.extract.v2i8.v128i8(<128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_mf4 = call <4 x i8> @llvm.vector.extract.v4i8.v128i8(<128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_mf2 = call <8 x i8> @llvm.vector.extract.v8i8.v128i8(<128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m1 = call <16 x i8> @llvm.vector.extract.v16i8.v128i8(<128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m2 = call <32 x i8> @llvm.vector.extract.v32i8.v128i8(<128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m4 = call <64 x i8> @llvm.vector.extract.v64i8.v128i8(<128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m8 = call <128 x i8> @llvm.vector.extract.v128i8.v128i8(<128 x i8> %v, i64 0)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
 ; SIZE-LABEL: 'vector_extract_v128i8_0'
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %fixed_mf8 = call <2 x i8> @llvm.vector.extract.v2i8.v128i8(<128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %fixed_mf4 = call <4 x i8> @llvm.vector.extract.v4i8.v128i8(<128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %fixed_mf2 = call <8 x i8> @llvm.vector.extract.v8i8.v128i8(<128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %fixed_m1 = call <16 x i8> @llvm.vector.extract.v16i8.v128i8(<128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %fixed_m2 = call <32 x i8> @llvm.vector.extract.v32i8.v128i8(<128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %fixed_m4 = call <64 x i8> @llvm.vector.extract.v64i8.v128i8(<128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %fixed_m8 = call <128 x i8> @llvm.vector.extract.v128i8.v128i8(<128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_mf8 = call <2 x i8> @llvm.vector.extract.v2i8.v128i8(<128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_mf4 = call <4 x i8> @llvm.vector.extract.v4i8.v128i8(<128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_mf2 = call <8 x i8> @llvm.vector.extract.v8i8.v128i8(<128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m1 = call <16 x i8> @llvm.vector.extract.v16i8.v128i8(<128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m2 = call <32 x i8> @llvm.vector.extract.v32i8.v128i8(<128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m4 = call <64 x i8> @llvm.vector.extract.v64i8.v128i8(<128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m8 = call <128 x i8> @llvm.vector.extract.v128i8.v128i8(<128 x i8> %v, i64 0)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of ...
[truncated]

llvmbot · 2024-02-15T04:31:17Z

@llvm/pr-subscribers-backend-risc-v

Author: Luke Lau (lukel97)

Changes

In #81751 we still weren't costing extracts of scalable subvectors from scalable vectors at index 0 as free.
It turns out that if the subvector to extract is scalable, then getIntrinsicInstrCost is used instead of getShuffleCost. This handles the index = 0 case for the vector insert and extract intrinsics inside said hook.

Note we'll still need to keep the existing logic inside getShuffleCost, since anything that's not:

a scalable extract of a scalable vector or
a scalable insert into a scalable vector

will still go down that path. As well as existing fixed-length shufflevectors.

Also note that there's some shortcut logic in BasicTTImplBase::getIntrinsicInstrCost where if the target getIntrinsicInstrCost is free, it won't bother calling into getShuffleCost:

llvm-project/llvm/include/llvm/CodeGen/BasicTTIImpl.h

Lines 1534 to 1538 in fc0b67e

    
           InstructionCost getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA, 
        
                                                 TTI::TargetCostKind CostKind) { 
        
             // Check for generically free intrinsics. 
        
             if (BaseT::getIntrinsicInstrCost(ICA, CostKind) == 0) 
        
               return 0;

Patch is 43.08 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/81818.diff

4 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+28)
(modified) llvm/test/Analysis/CostModel/RISCV/rvv-shuffle.ll (+6-6)
(modified) llvm/test/Analysis/CostModel/RISCV/rvv-vectorextract.ll (+42-42)
(modified) llvm/test/Analysis/CostModel/RISCV/rvv-vectorinsert.ll (+50-50)

diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index d1db47a6061e4e..81d2b7cc1353af 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -809,6 +809,34 @@ RISCVTTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
     }
     break;
   }
+  case Intrinsic::vector_extract: {
+    // A vector extract at index 0 is a (free) subregister extract.
+    if (auto *CIdx = dyn_cast<ConstantInt>(ICA.getArgs()[1]);
+        CIdx && CIdx->isZero())
+      return TTI::TCC_Free;
+    break;
+  }
+  case Intrinsic::vector_insert: {
+    auto FitsSubreg = [this](Type *Ty) {
+      if (!isa<ScalableVectorType>(Ty))
+        return false;
+      // Any scalable vector LMUL >= 1 will fit exactly into a register group.
+      auto [_Cost, LT] = getTypeLegalizationCost(Ty);
+      auto [_Coeff, Fractional] =
+          RISCVVType::decodeVLMUL(RISCVTargetLowering::getLMUL(LT));
+      return !Fractional;
+    };
+
+    // A vector insert at index 0 is a (free) subregister insert if:
+    //
+    // - The subvec fits exactly into a register group or
+    // - The vector is undef
+    if (auto *CIdx = dyn_cast<ConstantInt>(ICA.getArgs()[2]);
+        CIdx && CIdx->isZero() &&
+        (FitsSubreg(ICA.getArgTypes()[1]) || isa<UndefValue>(ICA.getArgs()[0])))
+      return TTI::TCC_Free;
+    break;
+  }
   // TODO: add more intrinsic
   case Intrinsic::experimental_stepvector: {
     unsigned Cost = 1; // vid
diff --git a/llvm/test/Analysis/CostModel/RISCV/rvv-shuffle.ll b/llvm/test/Analysis/CostModel/RISCV/rvv-shuffle.ll
index 4f3c7e2f90c655..348a6cf380e97d 100644
--- a/llvm/test/Analysis/CostModel/RISCV/rvv-shuffle.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/rvv-shuffle.ll
@@ -52,17 +52,17 @@ define void  @vector_broadcast() {
 
 define void @vector_insert_extract(<vscale x 4 x i32> %v0, <vscale x 16 x i32> %v1, <16 x i32> %v2) {
 ; CHECK-LABEL: 'vector_insert_extract'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %extract_fixed_from_scalable = call <16 x i32> @llvm.vector.extract.v16i32.nxv4i32(<vscale x 4 x i32> %v0, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %extract_fixed_from_scalable = call <16 x i32> @llvm.vector.extract.v16i32.nxv4i32(<vscale x 4 x i32> %v0, i64 0)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %insert_fixed_into_scalable = call <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v16i32(<vscale x 4 x i32> %v0, <16 x i32> %v2, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %extract_scalable_from_scalable = call <vscale x 4 x i32> @llvm.vector.extract.nxv4i32.nxv16i32(<vscale x 16 x i32> %v1, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %insert_scalable_into_scalable = call <vscale x 16 x i32> @llvm.vector.insert.nxv16i32.nxv4i32(<vscale x 16 x i32> %v1, <vscale x 4 x i32> %v0, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %extract_scalable_from_scalable = call <vscale x 4 x i32> @llvm.vector.extract.nxv4i32.nxv16i32(<vscale x 16 x i32> %v1, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %insert_scalable_into_scalable = call <vscale x 16 x i32> @llvm.vector.insert.nxv16i32.nxv4i32(<vscale x 16 x i32> %v1, <vscale x 4 x i32> %v0, i64 0)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
 ; SIZE-LABEL: 'vector_insert_extract'
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %extract_fixed_from_scalable = call <16 x i32> @llvm.vector.extract.v16i32.nxv4i32(<vscale x 4 x i32> %v0, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %extract_fixed_from_scalable = call <16 x i32> @llvm.vector.extract.v16i32.nxv4i32(<vscale x 4 x i32> %v0, i64 0)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %insert_fixed_into_scalable = call <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v16i32(<vscale x 4 x i32> %v0, <16 x i32> %v2, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %extract_scalable_from_scalable = call <vscale x 4 x i32> @llvm.vector.extract.nxv4i32.nxv16i32(<vscale x 16 x i32> %v1, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %insert_scalable_into_scalable = call <vscale x 16 x i32> @llvm.vector.insert.nxv16i32.nxv4i32(<vscale x 16 x i32> %v1, <vscale x 4 x i32> %v0, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %extract_scalable_from_scalable = call <vscale x 4 x i32> @llvm.vector.extract.nxv4i32.nxv16i32(<vscale x 16 x i32> %v1, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %insert_scalable_into_scalable = call <vscale x 16 x i32> @llvm.vector.insert.nxv16i32.nxv4i32(<vscale x 16 x i32> %v1, <vscale x 4 x i32> %v0, i64 0)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret void
 ;
   %extract_fixed_from_scalable = call <16 x i32> @llvm.vector.extract.v16i32.nxv4i32(<vscale x 4 x i32> %v0, i64 0)
diff --git a/llvm/test/Analysis/CostModel/RISCV/rvv-vectorextract.ll b/llvm/test/Analysis/CostModel/RISCV/rvv-vectorextract.ll
index 1e2d1f4d94954e..c4653ace9bac09 100644
--- a/llvm/test/Analysis/CostModel/RISCV/rvv-vectorextract.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/rvv-vectorextract.ll
@@ -4,37 +4,37 @@
 
 define void @vector_extract_nxv128i8_0(<vscale x 128 x i8> %v) {
 ; CHECK-LABEL: 'vector_extract_nxv128i8_0'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_mf8 = call <vscale x 1 x i8> @llvm.vector.extract.nxv1i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_mf4 = call <vscale x 2 x i8> @llvm.vector.extract.nxv2i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_mf2 = call <vscale x 4 x i8> @llvm.vector.extract.nxv4i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_m1 = call <vscale x 8 x i8> @llvm.vector.extract.nxv8i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_m2 = call <vscale x 16 x i8> @llvm.vector.extract.nxv16i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_m4 = call <vscale x 32 x i8> @llvm.vector.extract.nxv32i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_m8 = call <vscale x 64 x i8> @llvm.vector.extract.nxv64i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %fixed_mf8 = call <2 x i8> @llvm.vector.extract.v2i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %fixed_mf4 = call <4 x i8> @llvm.vector.extract.v4i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %fixed_mf2 = call <8 x i8> @llvm.vector.extract.v8i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %fixed_m1 = call <16 x i8> @llvm.vector.extract.v16i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %fixed_m2 = call <32 x i8> @llvm.vector.extract.v32i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %fixed_m4 = call <64 x i8> @llvm.vector.extract.v64i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %fixed_m8 = call <128 x i8> @llvm.vector.extract.v128i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_mf8 = call <vscale x 1 x i8> @llvm.vector.extract.nxv1i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_mf4 = call <vscale x 2 x i8> @llvm.vector.extract.nxv2i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_mf2 = call <vscale x 4 x i8> @llvm.vector.extract.nxv4i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_m1 = call <vscale x 8 x i8> @llvm.vector.extract.nxv8i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_m2 = call <vscale x 16 x i8> @llvm.vector.extract.nxv16i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_m4 = call <vscale x 32 x i8> @llvm.vector.extract.nxv32i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_m8 = call <vscale x 64 x i8> @llvm.vector.extract.nxv64i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_mf8 = call <2 x i8> @llvm.vector.extract.v2i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_mf4 = call <4 x i8> @llvm.vector.extract.v4i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_mf2 = call <8 x i8> @llvm.vector.extract.v8i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m1 = call <16 x i8> @llvm.vector.extract.v16i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m2 = call <32 x i8> @llvm.vector.extract.v32i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m4 = call <64 x i8> @llvm.vector.extract.v64i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m8 = call <128 x i8> @llvm.vector.extract.v128i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
 ; SIZE-LABEL: 'vector_extract_nxv128i8_0'
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_mf8 = call <vscale x 1 x i8> @llvm.vector.extract.nxv1i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_mf4 = call <vscale x 2 x i8> @llvm.vector.extract.nxv2i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_mf2 = call <vscale x 4 x i8> @llvm.vector.extract.nxv4i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_m1 = call <vscale x 8 x i8> @llvm.vector.extract.nxv8i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_m2 = call <vscale x 16 x i8> @llvm.vector.extract.nxv16i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_m4 = call <vscale x 32 x i8> @llvm.vector.extract.nxv32i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %scalable_m8 = call <vscale x 64 x i8> @llvm.vector.extract.nxv64i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %fixed_mf8 = call <2 x i8> @llvm.vector.extract.v2i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %fixed_mf4 = call <4 x i8> @llvm.vector.extract.v4i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %fixed_mf2 = call <8 x i8> @llvm.vector.extract.v8i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %fixed_m1 = call <16 x i8> @llvm.vector.extract.v16i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %fixed_m2 = call <32 x i8> @llvm.vector.extract.v32i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %fixed_m4 = call <64 x i8> @llvm.vector.extract.v64i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %fixed_m8 = call <128 x i8> @llvm.vector.extract.v128i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_mf8 = call <vscale x 1 x i8> @llvm.vector.extract.nxv1i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_mf4 = call <vscale x 2 x i8> @llvm.vector.extract.nxv2i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_mf2 = call <vscale x 4 x i8> @llvm.vector.extract.nxv4i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_m1 = call <vscale x 8 x i8> @llvm.vector.extract.nxv8i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_m2 = call <vscale x 16 x i8> @llvm.vector.extract.nxv16i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_m4 = call <vscale x 32 x i8> @llvm.vector.extract.nxv32i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %scalable_m8 = call <vscale x 64 x i8> @llvm.vector.extract.nxv64i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_mf8 = call <2 x i8> @llvm.vector.extract.v2i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_mf4 = call <4 x i8> @llvm.vector.extract.v4i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_mf2 = call <8 x i8> @llvm.vector.extract.v8i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m1 = call <16 x i8> @llvm.vector.extract.v16i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m2 = call <32 x i8> @llvm.vector.extract.v32i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m4 = call <64 x i8> @llvm.vector.extract.v64i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m8 = call <128 x i8> @llvm.vector.extract.v128i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret void
 ;
   %scalable_mf8 = call <vscale x 1 x i8> @llvm.vector.extract.nxv1i8.nxv128i8(<vscale x 128 x i8> %v, i64 0)
@@ -110,23 +110,23 @@ define void @vector_extract_nxv128i8_1(<vscale x 128 x i8> %v) {
 
 define void @vector_extract_v128i8_0(<128 x i8> %v) {
 ; CHECK-LABEL: 'vector_extract_v128i8_0'
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %fixed_mf8 = call <2 x i8> @llvm.vector.extract.v2i8.v128i8(<128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %fixed_mf4 = call <4 x i8> @llvm.vector.extract.v4i8.v128i8(<128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %fixed_mf2 = call <8 x i8> @llvm.vector.extract.v8i8.v128i8(<128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %fixed_m1 = call <16 x i8> @llvm.vector.extract.v16i8.v128i8(<128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %fixed_m2 = call <32 x i8> @llvm.vector.extract.v32i8.v128i8(<128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %fixed_m4 = call <64 x i8> @llvm.vector.extract.v64i8.v128i8(<128 x i8> %v, i64 0)
-; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %fixed_m8 = call <128 x i8> @llvm.vector.extract.v128i8.v128i8(<128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_mf8 = call <2 x i8> @llvm.vector.extract.v2i8.v128i8(<128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_mf4 = call <4 x i8> @llvm.vector.extract.v4i8.v128i8(<128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_mf2 = call <8 x i8> @llvm.vector.extract.v8i8.v128i8(<128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m1 = call <16 x i8> @llvm.vector.extract.v16i8.v128i8(<128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m2 = call <32 x i8> @llvm.vector.extract.v32i8.v128i8(<128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m4 = call <64 x i8> @llvm.vector.extract.v64i8.v128i8(<128 x i8> %v, i64 0)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m8 = call <128 x i8> @llvm.vector.extract.v128i8.v128i8(<128 x i8> %v, i64 0)
 ; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
 ;
 ; SIZE-LABEL: 'vector_extract_v128i8_0'
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %fixed_mf8 = call <2 x i8> @llvm.vector.extract.v2i8.v128i8(<128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %fixed_mf4 = call <4 x i8> @llvm.vector.extract.v4i8.v128i8(<128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %fixed_mf2 = call <8 x i8> @llvm.vector.extract.v8i8.v128i8(<128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %fixed_m1 = call <16 x i8> @llvm.vector.extract.v16i8.v128i8(<128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %fixed_m2 = call <32 x i8> @llvm.vector.extract.v32i8.v128i8(<128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %fixed_m4 = call <64 x i8> @llvm.vector.extract.v64i8.v128i8(<128 x i8> %v, i64 0)
-; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %fixed_m8 = call <128 x i8> @llvm.vector.extract.v128i8.v128i8(<128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_mf8 = call <2 x i8> @llvm.vector.extract.v2i8.v128i8(<128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_mf4 = call <4 x i8> @llvm.vector.extract.v4i8.v128i8(<128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_mf2 = call <8 x i8> @llvm.vector.extract.v8i8.v128i8(<128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m1 = call <16 x i8> @llvm.vector.extract.v16i8.v128i8(<128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m2 = call <32 x i8> @llvm.vector.extract.v32i8.v128i8(<128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m4 = call <64 x i8> @llvm.vector.extract.v64i8.v128i8(<128 x i8> %v, i64 0)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %fixed_m8 = call <128 x i8> @llvm.vector.extract.v128i8.v128i8(<128 x i8> %v, i64 0)
 ; SIZE-NEXT:  Cost Model: Found an estimated cost of ...
[truncated]

alexey-bataev · 2024-02-16T12:58:50Z

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

+    // - The vector is undef
+    if (auto *CIdx = dyn_cast<ConstantInt>(ICA.getArgs()[2]);
+        CIdx && CIdx->isZero() &&
+        (FitsSubreg(ICA.getArgTypes()[1]) || isa<UndefValue>(ICA.getArgs()[0])))


isa(ICA.getArgs()[0])? Should it be isa(ICA.getArgs()[1])?

I believe we check if it's the vector that's undef, not the subvector, because if the vector is undef and the index is 0 we can just "replace" the entire vector with the subvector:

llvm-project/llvm/lib/Target/RISCV/RISCVISelLowering.cpp

Lines 9655 to 9668 in f01ed3b

// 1. If the Idx has been completely eliminated and this subvector's size is

// a vector register or a multiple thereof, or the surrounding elements are

// undef, then this is a subvector insert which naturally aligns to a vector

// register. These can easily be handled using subregister manipulation.

// 2. If the subvector is smaller than a vector register, then the insertion

// must preserve the undisturbed elements of the register. We do this by

// lowering to an EXTRACT_SUBVECTOR grabbing the nearest LMUL=1 vector type

// (which resolves to a subregister copy), performing a VSLIDEUP to place the

// subvector within the vector register, and an INSERT_SUBVECTOR of that

// LMUL=1 type back into the larger vector (resolving to another subregister

// operation). See below for how our VSLIDEUP works. We go via a LMUL=1 type

// to avoid allocating a large register group to hold our subvector.

if (RemIdx == 0 && (!IsSubVecPartReg || Vec.isUndef()))

return Op;

Ah, yes, just thought about it in shufflevector form, where the second argument is undef in a canonical form.

alexey-bataev

LG

lukel97 requested review from preames and alexey-bataev February 15, 2024 04:30

llvmbot added backend:RISC-V llvm:analysis labels Feb 15, 2024

alexey-bataev reviewed Feb 16, 2024

View reviewed changes

alexey-bataev approved these changes Feb 16, 2024

View reviewed changes

lukel97 mentioned this pull request Feb 16, 2024

[TTI][RISCV]Improve costs for whole vector reg extract/insert. #80164

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV] Cost @llvm.vector.{extract,insert} as free at index 0 #81818

[RISCV] Cost @llvm.vector.{extract,insert} as free at index 0 #81818

lukel97 commented Feb 15, 2024 •

edited

llvmbot commented Feb 15, 2024

llvmbot commented Feb 15, 2024

alexey-bataev Feb 16, 2024

lukel97 Feb 16, 2024

alexey-bataev Feb 16, 2024

alexey-bataev left a comment

	InstructionCost getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
	TTI::TargetCostKind CostKind) {
	// Check for generically free intrinsics.
	if (BaseT::getIntrinsicInstrCost(ICA, CostKind) == 0)
	return 0;

	// 1. If the Idx has been completely eliminated and this subvector's size is
	// a vector register or a multiple thereof, or the surrounding elements are
	// undef, then this is a subvector insert which naturally aligns to a vector
	// register. These can easily be handled using subregister manipulation.
	// 2. If the subvector is smaller than a vector register, then the insertion
	// must preserve the undisturbed elements of the register. We do this by
	// lowering to an EXTRACT_SUBVECTOR grabbing the nearest LMUL=1 vector type
	// (which resolves to a subregister copy), performing a VSLIDEUP to place the
	// subvector within the vector register, and an INSERT_SUBVECTOR of that
	// LMUL=1 type back into the larger vector (resolving to another subregister
	// operation). See below for how our VSLIDEUP works. We go via a LMUL=1 type
	// to avoid allocating a large register group to hold our subvector.
	if (RemIdx == 0 && (!IsSubVecPartReg \|\| Vec.isUndef()))
	return Op;

[RISCV] Cost @llvm.vector.{extract,insert} as free at index 0 #81818

Are you sure you want to change the base?

[RISCV] Cost @llvm.vector.{extract,insert} as free at index 0 #81818

Conversation

lukel97 commented Feb 15, 2024 • edited

llvmbot commented Feb 15, 2024

llvmbot commented Feb 15, 2024

alexey-bataev Feb 16, 2024

Choose a reason for hiding this comment

lukel97 Feb 16, 2024

Choose a reason for hiding this comment

alexey-bataev Feb 16, 2024

Choose a reason for hiding this comment

alexey-bataev left a comment

Choose a reason for hiding this comment

lukel97 commented Feb 15, 2024 •

edited