[RISCV][TTI] Improve SiFive7 reduction cost #90951

arcbbb · 2024-05-03T09:20:53Z

The cost calculation for intrinsic llvm.vector.reduce.* factors in the costs of instructions VRED, VFRED, VMV, and VFMV.

This patch implements a target-specific cost function. If an opcode is supported in the target-specific cost function, it takes precedence over the default cost.

Co-authored-by: Elvis Wang elvis.wang@sifive.com

llvmbot · 2024-05-03T09:21:28Z

@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-analysis

Author: Shih-Po Hung (arcbbb)

Changes

The cost calculation for intrinsic llvm.vector.reduce.* factors in the costs of instructions VRED, VFRED, VMV, and VFMV.

This patch implements a target-specific cost function. If an opcode is supported in the target-specific cost function, it takes precedence over the default cost.

Co-authored-by: Elvis Wang <elvis.wang@sifive.com>

Patch is 77.52 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/90951.diff

2 Files Affected:

(modified) llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp (+60)
(added) llvm/test/Analysis/CostModel/RISCV/sifive-x280-reduce.ll (+675)

diff --git a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
index ce26e61880fd05..81fc3e81fd1d0f 100644
--- a/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
+++ b/llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
@@ -34,6 +34,54 @@ static cl::opt<unsigned> SLPMaxVF(
         "exclusively by SLP vectorizer."),
     cl::Hidden);
 
+static std::optional<InstructionCost>
+getSiFiveX280RVVCost(unsigned Op, MVT VT, TTI::TargetCostKind CostKind) {
+  std::optional<InstructionCost> Cost;
+  unsigned VScale = 8;
+  switch (Op) {
+  default:
+    Cost = std::nullopt;
+    break;
+  case RISCV::VREDMAX_VS:
+  case RISCV::VREDMIN_VS:
+  case RISCV::VREDMAXU_VS:
+  case RISCV::VREDMINU_VS:
+  case RISCV::VREDSUM_VS:
+  case RISCV::VREDAND_VS:
+  case RISCV::VREDOR_VS:
+  case RISCV::VREDXOR_VS:
+  case RISCV::VFREDMAX_VS:
+  case RISCV::VFREDMIN_VS:
+  case RISCV::VFREDUSUM_VS: {
+    unsigned VL = VT.getVectorMinNumElements();
+    if (!VT.isFixedLengthVector())
+      VL *= VScale;
+    // For the cases with small VL, we use a lookup table for accurate
+    // cost estimation.
+    unsigned LookUpSiFive7ReduceLatency[] = {0, 20, 27, 32, 34, 38, 40, 41, 42};
+    if (VL <= 32) {
+      Cost = LookUpSiFive7ReduceLatency[(VL + 3) >> 2];
+      break;
+    }
+    Cost = 6 + 7 * Log2_32_Ceil(VL);
+    break;
+  }
+  case RISCV::VFREDOSUM_VS: {
+    unsigned VL = VT.getVectorMinNumElements();
+    if (!VT.isFixedLengthVector())
+      VL *= VScale;
+    Cost = VL * 6;
+    break;
+  }
+  case RISCV::VMV_X_S:
+  case RISCV::VFMV_F_S:
+  case RISCV::VCPOP_M:
+    /* Vector-to-scalar communication */
+    Cost = 8;
+  }
+  return Cost;
+}
+
 InstructionCost
 RISCVTTIImpl::getRISCVInstructionCost(ArrayRef<unsigned> OpCodes, MVT VT,
                                       TTI::TargetCostKind CostKind) {
@@ -43,11 +91,23 @@ RISCVTTIImpl::getRISCVInstructionCost(ArrayRef<unsigned> OpCodes, MVT VT,
   size_t NumInstr = OpCodes.size();
   if (CostKind == TTI::TCK_CodeSize)
     return NumInstr;
+
+  std::optional<InstructionCost> (*GetTargetCost)(
+      unsigned, MVT, TTI::TargetCostKind) = nullptr;
+  if (ST->getProcFamily() == RISCVSubtarget::SiFive7)
+    GetTargetCost = getSiFiveX280RVVCost;
   InstructionCost LMULCost = TLI->getLMULCost(VT);
   if ((CostKind != TTI::TCK_RecipThroughput) && (CostKind != TTI::TCK_Latency))
     return LMULCost * NumInstr;
   InstructionCost Cost = 0;
   for (auto Op : OpCodes) {
+    std::optional<InstructionCost> OverrideCost =
+        GetTargetCost ? GetTargetCost(Op, VT, CostKind) : std::nullopt;
+    if (OverrideCost) {
+      Cost += *OverrideCost;
+      continue;
+    }
+
     switch (Op) {
     case RISCV::VRGATHER_VI:
       Cost += TLI->getVRGatherVICost(VT);
diff --git a/llvm/test/Analysis/CostModel/RISCV/sifive-x280-reduce.ll b/llvm/test/Analysis/CostModel/RISCV/sifive-x280-reduce.ll
new file mode 100644
index 00000000000000..01632f9ab80643
--- /dev/null
+++ b/llvm/test/Analysis/CostModel/RISCV/sifive-x280-reduce.ll
@@ -0,0 +1,675 @@
+; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
+; RUN: opt < %s -mtriple=riscv64 -mcpu=sifive-x280 -passes="print<cost-model>" -cost-kind=throughput 2>&1 -disable-output | FileCheck %s --check-prefix=THROUGHPUT
+; RUN: opt < %s -mtriple=riscv64 -mcpu=sifive-x280 -passes="print<cost-model>" -cost-kind=code-size 2>&1 -disable-output | FileCheck %s --check-prefix=SIZE
+
+declare i1 @llvm.vector.reduce.add.v16i1(<16 x i1>)
+declare i1 @llvm.vector.reduce.add.v32i1(<32 x i1>)
+declare i8 @llvm.vector.reduce.add.v16i8(<16 x i8>)
+declare i8 @llvm.vector.reduce.add.v32i8(<32 x i8>)
+declare i8 @llvm.vector.reduce.add.v64i8(<64 x i8>)
+declare i8 @llvm.vector.reduce.add.v128i8(<128 x i8>)
+declare i8 @llvm.vector.reduce.add.v256i8(<256 x i8>)
+declare i8 @llvm.vector.reduce.add.v512i8(<512 x i8>)
+declare i16 @llvm.vector.reduce.add.v8i16(<8 x i16>)
+declare i16 @llvm.vector.reduce.add.v16i16(<16 x i16>)
+declare i16 @llvm.vector.reduce.add.v32i16(<32 x i16>)
+declare i16 @llvm.vector.reduce.add.v64i16(<64 x i16>)
+declare i16 @llvm.vector.reduce.add.v128i16(<128 x i16>)
+declare i16 @llvm.vector.reduce.add.v256i16(<256 x i16>)
+declare i32 @llvm.vector.reduce.add.v4i32(<4 x i32>)
+declare i32 @llvm.vector.reduce.add.v8i32(<8 x i32>)
+declare i32 @llvm.vector.reduce.add.v16i32(<16 x i32>)
+declare i32 @llvm.vector.reduce.add.v32i32(<32 x i32>)
+declare i32 @llvm.vector.reduce.add.v64i32(<64 x i32>)
+declare i32 @llvm.vector.reduce.add.v128i32(<128 x i32>)
+declare i64 @llvm.vector.reduce.add.v2i64(<2 x i64>)
+declare i64 @llvm.vector.reduce.add.v4i64(<4 x i64>)
+declare i64 @llvm.vector.reduce.add.v8i64(<8 x i64>)
+declare i64 @llvm.vector.reduce.add.v16i64(<16 x i64>)
+declare i64 @llvm.vector.reduce.add.v32i64(<32 x i64>)
+declare i64 @llvm.vector.reduce.add.v64i64(<64 x i64>)
+
+define void @reduce_add() {
+; THROUGHPUT-LABEL: 'reduce_add'
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %1 = call i1 @llvm.vector.reduce.add.v16i1(<16 x i1> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %2 = call i1 @llvm.vector.reduce.add.v32i1(<32 x i1> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 43 for instruction: %3 = call i8 @llvm.vector.reduce.add.v16i8(<16 x i8> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 51 for instruction: %4 = call i8 @llvm.vector.reduce.add.v32i8(<32 x i8> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 57 for instruction: %5 = call i8 @llvm.vector.reduce.add.v64i8(<64 x i8> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 64 for instruction: %6 = call i8 @llvm.vector.reduce.add.v128i8(<128 x i8> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 71 for instruction: %7 = call i8 @llvm.vector.reduce.add.v256i8(<256 x i8> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 78 for instruction: %8 = call i8 @llvm.vector.reduce.add.v512i8(<512 x i8> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 36 for instruction: %9 = call i16 @llvm.vector.reduce.add.v8i16(<8 x i16> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 43 for instruction: %10 = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 51 for instruction: %11 = call i16 @llvm.vector.reduce.add.v32i16(<32 x i16> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 57 for instruction: %12 = call i16 @llvm.vector.reduce.add.v64i16(<64 x i16> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 64 for instruction: %13 = call i16 @llvm.vector.reduce.add.v128i16(<128 x i16> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 71 for instruction: %14 = call i16 @llvm.vector.reduce.add.v256i16(<256 x i16> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 29 for instruction: %15 = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 36 for instruction: %16 = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 43 for instruction: %17 = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 51 for instruction: %18 = call i32 @llvm.vector.reduce.add.v32i32(<32 x i32> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 57 for instruction: %19 = call i32 @llvm.vector.reduce.add.v64i32(<64 x i32> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 64 for instruction: %20 = call i32 @llvm.vector.reduce.add.v128i32(<128 x i32> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 29 for instruction: %21 = call i64 @llvm.vector.reduce.add.v2i64(<2 x i64> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 29 for instruction: %22 = call i64 @llvm.vector.reduce.add.v4i64(<4 x i64> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 36 for instruction: %23 = call i64 @llvm.vector.reduce.add.v8i64(<8 x i64> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 43 for instruction: %24 = call i64 @llvm.vector.reduce.add.v16i64(<16 x i64> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 51 for instruction: %25 = call i64 @llvm.vector.reduce.add.v32i64(<32 x i64> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 57 for instruction: %26 = call i64 @llvm.vector.reduce.add.v64i64(<64 x i64> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %27 = call i1 @llvm.vector.reduce.add.nxv1i1(<vscale x 1 x i1> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %28 = call i1 @llvm.vector.reduce.add.nxv2i1(<vscale x 2 x i1> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %29 = call i1 @llvm.vector.reduce.add.nxv4i1(<vscale x 4 x i1> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %30 = call i1 @llvm.vector.reduce.add.nxv8i1(<vscale x 8 x i1> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %31 = call i1 @llvm.vector.reduce.add.nxv16i1(<vscale x 16 x i1> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 36 for instruction: %32 = call i8 @llvm.vector.reduce.add.nxv1i8(<vscale x 1 x i8> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 43 for instruction: %33 = call i8 @llvm.vector.reduce.add.nxv2i8(<vscale x 2 x i8> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 51 for instruction: %34 = call i8 @llvm.vector.reduce.add.nxv4i8(<vscale x 4 x i8> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 57 for instruction: %35 = call i8 @llvm.vector.reduce.add.nxv8i8(<vscale x 8 x i8> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 64 for instruction: %36 = call i8 @llvm.vector.reduce.add.nxv16i8(<vscale x 16 x i8> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 71 for instruction: %37 = call i8 @llvm.vector.reduce.add.nxv32i8(<vscale x 32 x i8> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 78 for instruction: %38 = call i8 @llvm.vector.reduce.add.nxv64i8(<vscale x 64 x i8> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 36 for instruction: %39 = call i16 @llvm.vector.reduce.add.nxv1i16(<vscale x 1 x i16> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 43 for instruction: %40 = call i16 @llvm.vector.reduce.add.nxv2i16(<vscale x 2 x i16> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 51 for instruction: %41 = call i16 @llvm.vector.reduce.add.nxv4i16(<vscale x 4 x i16> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 57 for instruction: %42 = call i16 @llvm.vector.reduce.add.nxv8i16(<vscale x 8 x i16> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 64 for instruction: %43 = call i16 @llvm.vector.reduce.add.nxv16i16(<vscale x 16 x i16> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 71 for instruction: %44 = call i16 @llvm.vector.reduce.add.nxv32i16(<vscale x 32 x i16> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 36 for instruction: %45 = call i32 @llvm.vector.reduce.add.nxv1i32(<vscale x 1 x i32> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 43 for instruction: %46 = call i32 @llvm.vector.reduce.add.nxv2i32(<vscale x 2 x i32> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 51 for instruction: %47 = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 57 for instruction: %48 = call i32 @llvm.vector.reduce.add.nxv8i32(<vscale x 8 x i32> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 64 for instruction: %49 = call i32 @llvm.vector.reduce.add.nxv16i32(<vscale x 16 x i32> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 36 for instruction: %50 = call i64 @llvm.vector.reduce.add.nxv1i64(<vscale x 1 x i64> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 43 for instruction: %51 = call i64 @llvm.vector.reduce.add.nxv2i64(<vscale x 2 x i64> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 51 for instruction: %52 = call i64 @llvm.vector.reduce.add.nxv4i64(<vscale x 4 x i64> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 57 for instruction: %53 = call i64 @llvm.vector.reduce.add.nxv8i64(<vscale x 8 x i64> undef)
+; THROUGHPUT-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
+;
+; SIZE-LABEL: 'reduce_add'
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %1 = call i1 @llvm.vector.reduce.add.v16i1(<16 x i1> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %2 = call i1 @llvm.vector.reduce.add.v32i1(<32 x i1> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %3 = call i8 @llvm.vector.reduce.add.v16i8(<16 x i8> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %4 = call i8 @llvm.vector.reduce.add.v32i8(<32 x i8> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %5 = call i8 @llvm.vector.reduce.add.v64i8(<64 x i8> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %6 = call i8 @llvm.vector.reduce.add.v128i8(<128 x i8> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %7 = call i8 @llvm.vector.reduce.add.v256i8(<256 x i8> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %8 = call i8 @llvm.vector.reduce.add.v512i8(<512 x i8> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %9 = call i16 @llvm.vector.reduce.add.v8i16(<8 x i16> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %10 = call i16 @llvm.vector.reduce.add.v16i16(<16 x i16> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %11 = call i16 @llvm.vector.reduce.add.v32i16(<32 x i16> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %12 = call i16 @llvm.vector.reduce.add.v64i16(<64 x i16> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %13 = call i16 @llvm.vector.reduce.add.v128i16(<128 x i16> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %14 = call i16 @llvm.vector.reduce.add.v256i16(<256 x i16> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %15 = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %16 = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %17 = call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %18 = call i32 @llvm.vector.reduce.add.v32i32(<32 x i32> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %19 = call i32 @llvm.vector.reduce.add.v64i32(<64 x i32> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %20 = call i32 @llvm.vector.reduce.add.v128i32(<128 x i32> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %21 = call i64 @llvm.vector.reduce.add.v2i64(<2 x i64> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %22 = call i64 @llvm.vector.reduce.add.v4i64(<4 x i64> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %23 = call i64 @llvm.vector.reduce.add.v8i64(<8 x i64> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %24 = call i64 @llvm.vector.reduce.add.v16i64(<16 x i64> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %25 = call i64 @llvm.vector.reduce.add.v32i64(<32 x i64> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %26 = call i64 @llvm.vector.reduce.add.v64i64(<64 x i64> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %27 = call i1 @llvm.vector.reduce.add.nxv1i1(<vscale x 1 x i1> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %28 = call i1 @llvm.vector.reduce.add.nxv2i1(<vscale x 2 x i1> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %29 = call i1 @llvm.vector.reduce.add.nxv4i1(<vscale x 4 x i1> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %30 = call i1 @llvm.vector.reduce.add.nxv8i1(<vscale x 8 x i1> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %31 = call i1 @llvm.vector.reduce.add.nxv16i1(<vscale x 16 x i1> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %32 = call i8 @llvm.vector.reduce.add.nxv1i8(<vscale x 1 x i8> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %33 = call i8 @llvm.vector.reduce.add.nxv2i8(<vscale x 2 x i8> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %34 = call i8 @llvm.vector.reduce.add.nxv4i8(<vscale x 4 x i8> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %35 = call i8 @llvm.vector.reduce.add.nxv8i8(<vscale x 8 x i8> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %36 = call i8 @llvm.vector.reduce.add.nxv16i8(<vscale x 16 x i8> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %37 = call i8 @llvm.vector.reduce.add.nxv32i8(<vscale x 32 x i8> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %38 = call i8 @llvm.vector.reduce.add.nxv64i8(<vscale x 64 x i8> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %39 = call i16 @llvm.vector.reduce.add.nxv1i16(<vscale x 1 x i16> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %40 = call i16 @llvm.vector.reduce.add.nxv2i16(<vscale x 2 x i16> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %41 = call i16 @llvm.vector.reduce.add.nxv4i16(<vscale x 4 x i16> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %42 = call i16 @llvm.vector.reduce.add.nxv8i16(<vscale x 8 x i16> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %43 = call i16 @llvm.vector.reduce.add.nxv16i16(<vscale x 16 x i16> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %44 = call i16 @llvm.vector.reduce.add.nxv32i16(<vscale x 32 x i16> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %45 = call i32 @llvm.vector.reduce.add.nxv1i32(<vscale x 1 x i32> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %46 = call i32 @llvm.vector.reduce.add.nxv2i32(<vscale x 2 x i32> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %47 = call i32 @llvm.vector.reduce.add.nxv4i32(<vscale x 4 x i32> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %48 = call i32 @llvm.vector.reduce.add.nxv8i32(<vscale x 8 x i32> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %49 = call i32 @llvm.vector.reduce.add.nxv16i32(<vscale x 16 x i32> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %50 = call i64 @llvm.vector.reduce.add.nxv1i64(<vscale x 1 x i64> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %51 = call i64 @llvm.vector.reduce.add.nxv2i64(<vscale x 2 x i64> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %52 = call i64 @llvm.vector.reduce.add.nxv4i64(<vscale x 4 x i64> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %53 = call i64 @llvm.vector.reduce.add.nxv8i64(<vscale x 8 x i64> undef)
+; SIZE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+  call i1 @llvm.vector.reduce.add.v1...
[truncated]

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

The cost calculation for intrinsic llvm.vector.reduce.* factors in the costs of instructions VRED, VFRED, VMV, and VFMV. This patch implements a target-specific cost function. If an opcode is supported in the target-specific cost function, it takes precedence over the default cost. Co-authored-by: Elvis Wang <elvis.wang@sifive.com>

arcbbb · 2024-05-20T08:14:40Z

ping

wangpc-pp

I just wonder if we can make cost model bind to scheduling model.

arcbbb · 2024-05-21T08:02:21Z

I just wonder if we can make cost model bind to scheduling model.

Considering

The interpretation of Latency and ReleaseAtCycles in the scheduling model may not align directly with TCK_Latency and TCK_RecipThroughput in the TTI cost model.
In the scheduling model, the vector length is assumed to be VLMAX, whereas in the TTI cost model, we can derive the element count from fixed-length vector types.

It might be better to keep them separate for now.

wangpc-pp · 2024-05-21T09:15:27Z

I just wonder if we can make cost model bind to scheduling model.

Considering

1. The interpretation of Latency and ReleaseAtCycles in the scheduling model may not align directly with TCK_Latency and TCK_RecipThroughput in the TTI cost model.

2. In the scheduling model, the vector length is assumed to be VLMAX, whereas in the TTI cost model, we can derive the element count from fixed-length vector types.

It might be better to keep them separate for now.

I'm not opposed to add processor-specific cost models, but I think we should well organize the code or it may be hard to mantain as time goes on.

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp

nikolaypanchenko · 2024-06-05T16:07:38Z

I'm not opposed to add processor-specific cost models, but I think we should well organize the code or it may be hard to mantain as time goes on.

Do you have any suggestion? The only other option I can think about is each vendor will provide its own TTI, say class VendorRISCVTTIImpl : public RISCVTTImpl

wangpc-pp · 2024-06-06T03:00:33Z

Do you have any suggestion? The only other option I can think about is each vendor will provide its own TTI, say class VendorRISCVTTIImpl : public RISCVTTImpl

Currently, I have no idea. I was thinking about adapting the scheduling model.

arcbbb requested review from preames and topperc May 3, 2024 09:20

llvmbot added backend:RISC-V llvm:analysis labels May 3, 2024

nikolaypanchenko reviewed May 3, 2024

View reviewed changes

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp Outdated Show resolved Hide resolved

sunshaoce reviewed May 7, 2024

View reviewed changes

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp Show resolved Hide resolved

arcbbb force-pushed the tti-x280-reduc branch from 82cbdf7 to ab42a9f Compare May 17, 2024 01:57

wangpc-pp reviewed May 20, 2024

View reviewed changes

wangpc-pp reviewed May 21, 2024

View reviewed changes

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp Show resolved Hide resolved

topperc reviewed May 22, 2024

View reviewed changes

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp Outdated Show resolved Hide resolved

Address comments

387e482

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISCV][TTI] Improve SiFive7 reduction cost #90951

[RISCV][TTI] Improve SiFive7 reduction cost #90951

arcbbb commented May 3, 2024

llvmbot commented May 3, 2024 •

edited

Loading

arcbbb commented May 20, 2024

wangpc-pp left a comment

arcbbb commented May 21, 2024

wangpc-pp commented May 21, 2024

nikolaypanchenko commented Jun 5, 2024

wangpc-pp commented Jun 6, 2024

[RISCV][TTI] Improve SiFive7 reduction cost #90951

Are you sure you want to change the base?

[RISCV][TTI] Improve SiFive7 reduction cost #90951

Conversation

arcbbb commented May 3, 2024

llvmbot commented May 3, 2024 • edited Loading

arcbbb commented May 20, 2024

wangpc-pp left a comment

Choose a reason for hiding this comment

arcbbb commented May 21, 2024

wangpc-pp commented May 21, 2024

nikolaypanchenko commented Jun 5, 2024

wangpc-pp commented Jun 6, 2024

llvmbot commented May 3, 2024 •

edited

Loading