Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV][CostModel] VPIntrinsics have same cost as their non-vp counterparts #67178

Merged
merged 7 commits into from
Oct 5, 2023

Conversation

michaelmaitland
Copy link
Contributor

On RISCV, only a few VPIntrinsics have their cost modeled by the
VectorIntrinsicCostTable. Even so, none of those entries consider LMUL.
All other VPIntrinsics do not have meaningful modeling.

This patch models the cost of a VPIntrinsic as the cost of its non-VP
counterpart. It is possible that the VP Intrinsic is cheaper than the non-VP
version depending on VL. On RISCV, this may be due two reasons (if the
instruction is part of a loop):

  1. A smaller VL can be used on the last iteration of the loop.
  2. The VP instruction may avoid a scalar remainder loop.

I have left this as a TODO since I think this change puts us on the right path
of modeling the cost of a VPInstruction, and it isn't entierly clear to me how
much of a discount we should give to a known VL<VLMAX or what to do when VL
is unknown at compile time.

@llvmbot
Copy link
Collaborator

llvmbot commented Sep 22, 2023

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-backend-risc-v

Changes

On RISCV, only a few VPIntrinsics have their cost modeled by the
VectorIntrinsicCostTable. Even so, none of those entries consider LMUL.
All other VPIntrinsics do not have meaningful modeling.

This patch models the cost of a VPIntrinsic as the cost of its non-VP
counterpart. It is possible that the VP Intrinsic is cheaper than the non-VP
version depending on VL. On RISCV, this may be due two reasons (if the
instruction is part of a loop):

  1. A smaller VL can be used on the last iteration of the loop.
  2. The VP instruction may avoid a scalar remainder loop.

I have left this as a TODO since I think this change puts us on the right path
of modeling the cost of a VPInstruction, and it isn't entierly clear to me how
much of a discount we should give to a known VL<VLMAX or what to do when VL
is unknown at compile time.


Patch is 26.16 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/67178.diff

3 Files Affected:

  • (modified) llvm/include/llvm/CodeGen/BasicTTIImpl.h (+26)
  • (modified) llvm/test/Analysis/CostModel/RISCV/gep.ll (+4-4)
  • (modified) llvm/test/Analysis/CostModel/RISCV/rvv-intrinsics.ll (+193)
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index c11d558a73e9d09..7901622f049e5c1 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -1687,6 +1687,32 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
     }
     }
 
+    // VP Intrinsics should have the same cost as their non-vp counterpart.
+    // TODO: Adjust the cost to make the vp intrinsic cheaper than its non-vp
+    // counterpart when the vector length argument is smaller than the maximum
+    // vector length.
+    if (VPIntrinsic::isVPIntrinsic(ICA.getID())) {
+      std::optional<Intrinsic::ID> FOp =
+          VPIntrinsic::getFunctionalOpcodeForVP(ICA.getID());
+      if (FOp)
+        return thisT()->getArithmeticInstrCost(*FOp, ICA.getReturnType(), CostKind);
+
+      std::optional<Intrinsic::ID> FID =
+          VPIntrinsic::getFunctionalIntrinsicIDForVP(ICA.getID());
+      if (FID) {
+        // Non-vp version will have same Args/Tys except mask and vector length.
+        ArrayRef<const Value *> NewArgs(ICA.getArgs().begin(),
+                                        ICA.getArgs().end() - 2);
+        ArrayRef<Type *> NewTys(ICA.getArgTypes().begin(),
+                                ICA.getArgTypes().end() - 2);
+
+        IntrinsicCostAttributes NewICA(*FID, ICA.getReturnType(), NewArgs,
+                                       NewTys, ICA.getFlags(), ICA.getInst(),
+                                       ICA.getScalarizationCost());
+        return thisT()->getIntrinsicInstrCost(NewICA, CostKind);
+      }
+    }
+
     // Assume that we need to scalarize this intrinsic.
     // Compute the scalarization overhead based on Args for a vector
     // intrinsic.
diff --git a/llvm/test/Analysis/CostModel/RISCV/gep.ll b/llvm/test/Analysis/CostModel/RISCV/gep.ll
index c7a3e5d30aba7f4..2c309412b9b2e35 100644
--- a/llvm/test/Analysis/CostModel/RISCV/gep.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/gep.ll
@@ -270,7 +270,7 @@ define void @non_foldable_vector_uses(ptr %base, <2 x ptr> %base.vec) {
 ; RVI-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %4 = getelementptr i8, ptr %base, i32 42
 ; RVI-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %x4 = call <2 x i8> @llvm.masked.expandload.v2i8(ptr %4, <2 x i1> undef, <2 x i8> undef)
 ; RVI-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %5 = getelementptr i8, ptr %base, i32 42
-; RVI-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %x5 = call <2 x i8> @llvm.vp.load.v2i8.p0(ptr %5, <2 x i1> undef, i32 undef)
+; RVI-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %x5 = call <2 x i8> @llvm.vp.load.v2i8.p0(ptr %5, <2 x i1> undef, i32 undef)
 ; RVI-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %6 = getelementptr i8, ptr %base, i32 42
 ; RVI-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %x6 = call <2 x i8> @llvm.experimental.vp.strided.load.v2i8.p0.i64(ptr %6, i64 undef, <2 x i1> undef, i32 undef)
 ; RVI-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %7 = getelementptr i8, ptr %base, i32 42
@@ -282,7 +282,7 @@ define void @non_foldable_vector_uses(ptr %base, <2 x ptr> %base.vec) {
 ; RVI-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %10 = getelementptr i8, ptr %base, i32 42
 ; RVI-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.masked.compressstore.v2i8(<2 x i8> undef, ptr %10, <2 x i1> undef)
 ; RVI-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %11 = getelementptr i8, ptr %base, i32 42
-; RVI-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.vp.store.v2i8.p0(<2 x i8> undef, ptr %11, <2 x i1> undef, i32 undef)
+; RVI-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.vp.store.v2i8.p0(<2 x i8> undef, ptr %11, <2 x i1> undef, i32 undef)
 ; RVI-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %12 = getelementptr i8, ptr %base, i32 42
 ; RVI-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.experimental.vp.strided.store.v2i8.p0.i64(<2 x i8> undef, ptr %12, i64 undef, <2 x i1> undef, i32 undef)
 ; RVI-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret void
@@ -340,7 +340,7 @@ define void @foldable_vector_uses(ptr %base, <2 x ptr> %base.vec) {
 ; RVI-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %4 = getelementptr i8, ptr %base, i32 0
 ; RVI-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %x4 = call <2 x i8> @llvm.masked.expandload.v2i8(ptr %4, <2 x i1> undef, <2 x i8> undef)
 ; RVI-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %5 = getelementptr i8, ptr %base, i32 0
-; RVI-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %x5 = call <2 x i8> @llvm.vp.load.v2i8.p0(ptr %5, <2 x i1> undef, i32 undef)
+; RVI-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %x5 = call <2 x i8> @llvm.vp.load.v2i8.p0(ptr %5, <2 x i1> undef, i32 undef)
 ; RVI-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %6 = getelementptr i8, ptr %base, i32 0
 ; RVI-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %x6 = call <2 x i8> @llvm.experimental.vp.strided.load.v2i8.p0.i64(ptr %6, i64 undef, <2 x i1> undef, i32 undef)
 ; RVI-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %7 = getelementptr i8, ptr %base, i32 0
@@ -352,7 +352,7 @@ define void @foldable_vector_uses(ptr %base, <2 x ptr> %base.vec) {
 ; RVI-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %10 = getelementptr i8, ptr %base, i32 0
 ; RVI-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.masked.compressstore.v2i8(<2 x i8> undef, ptr %10, <2 x i1> undef)
 ; RVI-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %11 = getelementptr i8, ptr %base, i32 0
-; RVI-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.vp.store.v2i8.p0(<2 x i8> undef, ptr %11, <2 x i1> undef, i32 undef)
+; RVI-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.vp.store.v2i8.p0(<2 x i8> undef, ptr %11, <2 x i1> undef, i32 undef)
 ; RVI-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %12 = getelementptr i8, ptr %base, i32 0
 ; RVI-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.experimental.vp.strided.store.v2i8.p0.i64(<2 x i8> undef, ptr %12, i64 undef, <2 x i1> undef, i32 undef)
 ; RVI-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret void
diff --git a/llvm/test/Analysis/CostModel/RISCV/rvv-intrinsics.ll b/llvm/test/Analysis/CostModel/RISCV/rvv-intrinsics.ll
index 6dcc218981f7a7a..6ad1e31ff61f63a 100644
--- a/llvm/test/Analysis/CostModel/RISCV/rvv-intrinsics.ll
+++ b/llvm/test/Analysis/CostModel/RISCV/rvv-intrinsics.ll
@@ -206,6 +206,199 @@ define void @vp_fshl() {
   ret void
 }
 
+define void @add() {
+; CHECK-LABEL: 'add'
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %t0 = call <2 x i8> @llvm.vp.add.v2i8(<2 x i8> undef, <2 x i8> undef, <2 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %t1 = add <2 x i8> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %t2 = call <4 x i8> @llvm.vp.add.v4i8(<4 x i8> undef, <4 x i8> undef, <4 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %t3 = add <4 x i8> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %t4 = call <8 x i8> @llvm.vp.add.v8i8(<8 x i8> undef, <8 x i8> undef, <8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %t5 = add <8 x i8> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %t6 = call <16 x i8> @llvm.vp.add.v16i8(<16 x i8> undef, <16 x i8> undef, <16 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %t7 = add <16 x i8> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %t8 = call <2 x i64> @llvm.vp.add.v2i64(<2 x i64> undef, <2 x i64> undef, <2 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %t9 = add <2 x i64> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %t10 = call <4 x i64> @llvm.vp.add.v4i64(<4 x i64> undef, <4 x i64> undef, <4 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %t12 = add <4 x i64> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %t13 = call <8 x i64> @llvm.vp.add.v8i64(<8 x i64> undef, <8 x i64> undef, <8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %t14 = add <8 x i64> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %t15 = call <16 x i64> @llvm.vp.add.v16i64(<16 x i64> undef, <16 x i64> undef, <16 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %t16 = add <16 x i64> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %t17 = call <vscale x 2 x i8> @llvm.vp.add.nxv2i8(<vscale x 2 x i8> undef, <vscale x 2 x i8> undef, <vscale x 2 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %t18 = add <vscale x 2 x i8> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %t19 = call <vscale x 4 x i8> @llvm.vp.add.nxv4i8(<vscale x 4 x i8> undef, <vscale x 4 x i8> undef, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %t20 = add <vscale x 4 x i8> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %t21 = call <vscale x 8 x i8> @llvm.vp.add.nxv8i8(<vscale x 8 x i8> undef, <vscale x 8 x i8> undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %t22 = add <vscale x 8 x i8> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %t23 = call <vscale x 16 x i8> @llvm.vp.add.nxv16i8(<vscale x 16 x i8> undef, <vscale x 16 x i8> undef, <vscale x 16 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %t24 = add <vscale x 16 x i8> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %t25 = call <vscale x 2 x i64> @llvm.vp.add.nxv2i64(<vscale x 2 x i64> undef, <vscale x 2 x i64> undef, <vscale x 2 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %t26 = add <vscale x 2 x i64> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %t27 = call <vscale x 4 x i64> @llvm.vp.add.nxv4i64(<vscale x 4 x i64> undef, <vscale x 4 x i64> undef, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %t28 = add <vscale x 4 x i64> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %t29 = call <vscale x 8 x i64> @llvm.vp.add.nxv8i64(<vscale x 8 x i64> undef, <vscale x 8 x i64> undef, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %t30 = add <vscale x 8 x i64> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %t31 = call <vscale x 16 x i64> @llvm.vp.add.nxv16i64(<vscale x 16 x i64> undef, <vscale x 16 x i64> undef, <vscale x 16 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %t32 = add <vscale x 16 x i64> undef, undef
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+  %t0 = call <2 x i8> @llvm.vp.add.v2i8(<2 x i8> undef, <2 x i8> undef, <2 x i1> undef, i32 undef)
+  %t1 = add <2 x i8> undef, undef
+  %t2 = call <4 x i8> @llvm.vp.add.v4i8(<4 x i8> undef, <4 x i8> undef, <4 x i1> undef, i32 undef)
+  %t3 = add <4 x i8> undef, undef
+  %t4 = call <8 x i8> @llvm.vp.add.v8i8(<8 x i8> undef, <8 x i8> undef, <8 x i1> undef, i32 undef)
+  %t5 = add <8 x i8> undef, undef
+  %t6 = call <16 x i8> @llvm.vp.add.v16i8(<16 x i8> undef, <16 x i8> undef, <16 x i1> undef, i32 undef)
+  %t7 = add <16 x i8> undef, undef
+  %t8 = call <2 x i64> @llvm.vp.add.v2i64(<2 x i64> undef, <2 x i64> undef, <2 x i1> undef, i32 undef)
+  %t9 = add <2 x i64> undef, undef
+  %t10 = call <4 x i64> @llvm.vp.add.v4i64(<4 x i64> undef, <4 x i64> undef, <4 x i1> undef, i32 undef)
+  %t12 = add <4 x i64> undef, undef
+  %t13 = call <8 x i64> @llvm.vp.add.v8i64(<8 x i64> undef, <8 x i64> undef, <8 x i1> undef, i32 undef)
+  %t14 = add <8 x i64> undef, undef
+  %t15 = call <16 x i64> @llvm.vp.add.v16i64(<16 x i64> undef, <16 x i64> undef, <16 x i1> undef, i32 undef)
+  %t16 = add <16 x i64> undef, undef
+  %t17 = call <vscale x 2 x i8> @llvm.vp.add.nv2i8(<vscale x 2 x i8> undef, <vscale x 2 x i8> undef, <vscale x 2 x i1> undef, i32 undef)
+  %t18 = add <vscale x 2 x i8> undef, undef
+  %t19 = call <vscale x 4 x i8> @llvm.vp.add.nv4i8(<vscale x 4 x i8> undef, <vscale x 4 x i8> undef, <vscale x 4 x i1> undef, i32 undef)
+  %t20 = add <vscale x 4 x i8> undef, undef
+  %t21 = call <vscale x 8 x i8> @llvm.vp.add.nv8i8(<vscale x 8 x i8> undef, <vscale x 8 x i8> undef, <vscale x 8 x i1> undef, i32 undef)
+  %t22 = add <vscale x 8 x i8> undef, undef
+  %t23 = call <vscale x 16 x i8> @llvm.vp.add.nv16i8(<vscale x 16 x i8> undef, <vscale x 16 x i8> undef, <vscale x 16 x i1> undef, i32 undef)
+  %t24 = add <vscale x 16 x i8> undef, undef
+  %t25 = call <vscale x 2 x i64> @llvm.vp.add.nv2i64(<vscale x 2 x i64> undef, <vscale x 2 x i64> undef, <vscale x 2 x i1> undef, i32 undef)
+  %t26 = add <vscale x 2 x i64> undef, undef
+  %t27 = call <vscale x 4 x i64> @llvm.vp.add.nv4i64(<vscale x 4 x i64> undef, <vscale x 4 x i64> undef, <vscale x 4 x i1> undef, i32 undef)
+  %t28 = add <vscale x 4 x i64> undef, undef
+  %t29 = call <vscale x 8 x i64> @llvm.vp.add.nv8i64(<vscale x 8 x i64> undef, <vscale x 8 x i64> undef, <vscale x 8 x i1> undef, i32 undef)
+  %t30 = add <vscale x 8 x i64> undef, undef
+  %t31 = call <vscale x 16 x i64> @llvm.vp.add.nv16i64(<vscale x 16 x i64> undef, <vscale x 16 x i64> undef, <vscale x 16 x i1> undef, i32 undef)
+  %t32 = add <vscale x 16 x i64> undef, undef
+  ret void
+}
+
+define void @abs() {
+; CHECK-LABEL: 'abs'
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %1 = call <2 x i8> @llvm.vp.abs.v2i8(<2 x i8> undef, i1 false, <2 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %2 = call <2 x i8> @llvm.abs.v2i8(<2 x i8> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %3 = call <4 x i8> @llvm.vp.abs.v4i8(<4 x i8> undef, i1 false, <4 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %4 = call <4 x i8> @llvm.abs.v4i8(<4 x i8> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %5 = call <8 x i8> @llvm.vp.abs.v8i8(<8 x i8> undef, i1 false, <8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %6 = call <8 x i8> @llvm.abs.v8i8(<8 x i8> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %7 = call <16 x i8> @llvm.vp.abs.v16i8(<16 x i8> undef, i1 false, <16 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %8 = call <16 x i8> @llvm.abs.v16i8(<16 x i8> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %9 = call <2 x i64> @llvm.vp.abs.v2i64(<2 x i64> undef, i1 false, <2 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %10 = call <2 x i64> @llvm.abs.v2i64(<2 x i64> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %11 = call <4 x i64> @llvm.vp.abs.v4i64(<4 x i64> undef, i1 false, <4 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %12 = call <4 x i64> @llvm.abs.v4i64(<4 x i64> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %13 = call <8 x i64> @llvm.vp.abs.v8i64(<8 x i64> undef, i1 false, <8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %14 = call <8 x i64> @llvm.abs.v8i64(<8 x i64> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %15 = call <16 x i64> @llvm.vp.abs.v16i64(<16 x i64> undef, i1 false, <16 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %16 = call <16 x i64> @llvm.abs.v16i64(<16 x i64> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %17 = call <vscale x 2 x i8> @llvm.vp.abs.nxv2i8(<vscale x 2 x i8> undef, i1 false, <vscale x 2 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %18 = call <vscale x 2 x i8> @llvm.abs.nxv2i8(<vscale x 2 x i8> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %19 = call <vscale x 4 x i8> @llvm.vp.abs.nxv4i8(<vscale x 4 x i8> undef, i1 false, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %20 = call <vscale x 4 x i8> @llvm.abs.nxv4i8(<vscale x 4 x i8> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %21 = call <vscale x 8 x i8> @llvm.vp.abs.nxv8i8(<vscale x 8 x i8> undef, i1 false, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %22 = call <vscale x 8 x i8> @llvm.abs.nxv8i8(<vscale x 8 x i8> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %23 = call <vscale x 16 x i8> @llvm.vp.abs.nxv16i8(<vscale x 16 x i8> undef, i1 false, <vscale x 16 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %24 = call <vscale x 16 x i8> @llvm.abs.nxv16i8(<vscale x 16 x i8> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %25 = call <vscale x 2 x i64> @llvm.vp.abs.nxv2i64(<vscale x 2 x i64> undef, i1 false, <vscale x 2 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %26 = call <vscale x 2 x i64> @llvm.abs.nxv2i64(<vscale x 2 x i64> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %27 = call <vscale x 4 x i64> @llvm.vp.abs.nxv4i64(<vscale x 4 x i64> undef, i1 false, <vscale x 4 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %28 = call <vscale x 4 x i64> @llvm.abs.nxv4i64(<vscale x 4 x i64> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %29 = call <vscale x 8 x i64> @llvm.vp.abs.nxv8i64(<vscale x 8 x i64> undef, i1 false, <vscale x 8 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %30 = call <vscale x 8 x i64> @llvm.abs.nxv8i64(<vscale x 8 x i64> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %31 = call <vscale x 16 x i64> @llvm.vp.abs.nxv16i64(<vscale x 16 x i64> undef, i1 false, <vscale x 16 x i1> undef, i32 undef)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %32 = call <vscale x 16 x i64> @llvm.abs.nxv16i64(<vscale x 16 x i64> undef, i1 false)
+; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: ret void
+;
+  call <2 x i8> @llvm.vp.abs.v2i8(<2 x i8> undef, i1 0, <2 x i1> undef, i32 undef)
+  call <2 x i8> @llvm.abs.v2i8(<2 x i8> undef, i1 0)
+  call <4 x i8> @llvm.vp.abs.v4i8(<4 x i8> undef, i1 0, <4 x i1> undef, i32 undef)
+  call <4 x i8> @llvm.abs.v4i8(<4 x i8> undef, i1 0)
+  call <8 x i8> @llvm.vp.abs.v8i8(<8 x i8> undef, i1 0, <8 x ...
[truncated]

@github-actions
Copy link

github-actions bot commented Sep 22, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@michaelmaitland michaelmaitland force-pushed the tti-cost-vpintrin branch 4 times, most recently from bc35af3 to 4b8583b Compare September 22, 2023 22:05
@michaelmaitland michaelmaitland requested a review from a team September 26, 2023 15:31
On RISCV, only a few VPIntrinsics have their cost modeled by the
VectorIntrinsicCostTable. Even so, none of those entries consider LMUL.
All other VPIntrinsics do not have meaningful modeling.
…rparts

On RISCV, only a few VPIntrinsics have their cost modeled by the
VectorIntrinsicCostTable. Even so, none of those entries consider LMUL.
All other VPIntrinsics do not have meaningful modeling.

This patch models the cost of a VPIntrinsic as the cost of its non-VP
counterpart. It is possible that the VP Intrinsic is cheaper than the non-VP
version depending on VL. On RISCV, this may be due two reasons (if the
instruction is part of a loop):
  1. A smaller VL can be used on the last iteration of the loop.
  2. The VP instruction may avoid a scalar remainder loop.

I have left this as a TODO since I think this change puts us on the right path
of modeling the cost of a VPInstruction, and it isn't entierly clear to me how
much of a discount we should give to a known VL<VLMAX or what to do when VL
is unknown at compile time.
Copy link
Contributor

@arcbbb arcbbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'll await approval from others.
After this, it seems reasonable to pass VL to TTI cost function.

@michaelmaitland
Copy link
Contributor Author

pinging for additional approval.

llvm/include/llvm/CodeGen/BasicTTIImpl.h Outdated Show resolved Hide resolved
llvm/include/llvm/CodeGen/BasicTTIImpl.h Outdated Show resolved Hide resolved
llvm/include/llvm/CodeGen/BasicTTIImpl.h Outdated Show resolved Hide resolved
llvm/include/llvm/CodeGen/BasicTTIImpl.h Outdated Show resolved Hide resolved
llvm/include/llvm/CodeGen/BasicTTIImpl.h Outdated Show resolved Hide resolved
Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@michaelmaitland michaelmaitland merged commit fc865c2 into llvm:main Oct 5, 2023
3 checks passed
@stellaraccident
Copy link
Contributor

Hi there - we bisected to this change in a downstream compiler crash (on x86, I think due to the common codegen change). Here is the stack trace:

FAILED: tests/e2e/stablehlo_ops/check_llvm-cpu_local-task_reduce.mlir_module.vmfb /work/full-build-dir/tests/e2e/stablehlo_ops/check_llvm-cpu_local-task_reduce.mlir_module.vmfb 
cd /work/full-build-dir/tests/e2e/stablehlo_ops && /work/full-build-dir/tools/iree-compile --output-format=vm-bytecode --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=llvm-cpu --iree-input-type=stablehlo /work/tests/e2e/stablehlo_ops/reduce.mlir -o check_llvm-cpu_local-task_reduce.mlir_module.vmfb --iree-hal-executable-object-search-path=\"/work/full-build-dir\" --iree-llvmcpu-embedded-linker-path=\"/work/full-build-dir/llvm-project/bin/lld\" --iree-llvmcpu-wasm-linker-path=\"/work/full-build-dir/llvm-project/bin/lld\"
iree-compile: /work/third_party/llvm-project/llvm/include/llvm/Support/Casting.h:662: decltype(auto) llvm::dyn_cast(From *) [To = llvm::PointerType, From = llvm::Type]: Assertion `detail::isPresent(Val) && "dyn_cast on a non-existent value"' failed.
Please report issues to https://github.com/openxla/iree/issues and include the crash backtrace.
Stack dump:
0.	Program arguments: /work/full-build-dir/tools/iree-compile --output-format=vm-bytecode --mlir-print-op-on-diagnostic=false --iree-hal-target-backends=llvm-cpu --iree-input-type=stablehlo /work/tests/e2e/stablehlo_ops/reduce.mlir -o check_llvm-cpu_local-task_reduce.mlir_module.vmfb --iree-hal-executable-object-search-path=\"/work/full-build-dir\" --iree-llvmcpu-embedded-linker-path=\"/work/full-build-dir/llvm-project/bin/lld\" --iree-llvmcpu-wasm-linker-path=\"/work/full-build-dir/llvm-project/bin/lld\"
 #0 0x00007fb3fd68375b llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /work/third_party/llvm-project/llvm/lib/Support/Unix/Signals.inc:723:13
 #1 0x00007fb3fd6817b0 llvm::sys::RunSignalHandlers() /work/third_party/llvm-project/llvm/lib/Support/Signals.cpp:106:18
 #2 0x00007fb3fd683e3f SignalHandler(int) /work/third_party/llvm-project/llvm/lib/Support/Unix/Signals.inc:413:1
 #3 0x00007fb404b23420 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14420)
 #4 0x00007fb3f6c6f00b raise (/lib/x86_64-linux-gnu/libc.so.6+0x4300b)
 #5 0x00007fb3f6c4e859 abort (/lib/x86_64-linux-gnu/libc.so.6+0x22859)
 #6 0x00007fb3f6c4e729 (/lib/x86_64-linux-gnu/libc.so.6+0x22729)
 #7 0x00007fb3f6c5ffd6 (/lib/x86_64-linux-gnu/libc.so.6+0x33fd6)
 #8 0x00007fb40161f5dd (/work/full-build-dir/lib/libIREECompiler.so+0xa61f5dd)
 #9 0x00007fb40198d6f6 llvm::EVT::isSimple() const /work/third_party/llvm-project/llvm/include/llvm/CodeGen/ValueTypes.h:130:25
#10 0x00007fb40198d6f6 llvm::X86TTIImpl::getArithmeticReductionCost(unsigned int, llvm::VectorType*, std::optional<llvm::FastMathFlags>, llvm::TargetTransformInfo::TargetCostKind) /work/third_party/llvm-project/llvm/lib/Target/X86/X86TargetTransformInfo.cpp:5078:10
#11 0x00007fb401995cee llvm::BasicTTIImplBase<llvm::X86TTIImpl>::getTypeBasedIntrinsicInstrCost(llvm::IntrinsicCostAttributes const&, llvm::TargetTransformInfo::TargetCostKind) /work/third_party/llvm-project/llvm/include/llvm/CodeGen/BasicTTIImpl.h:0:0
#12 0x00007fb4019888a0 llvm::BasicTTIImplBase<llvm::X86TTIImpl>::getIntrinsicInstrCost(llvm::IntrinsicCostAttributes const&, llvm::TargetTransformInfo::TargetCostKind) /work/third_party/llvm-project/llvm/include/llvm/CodeGen/BasicTTIImpl.h:0:0
#13 0x00007fb401988620 llvm::X86TTIImpl::getIntrinsicInstrCost(llvm::IntrinsicCostAttributes const&, llvm::TargetTransformInfo::TargetCostKind) /work/third_party/llvm-project/llvm/lib/Target/X86/X86TargetTransformInfo.cpp:4334:17
#14 0x00007fb40198980c llvm::BasicTTIImplBase<llvm::X86TTIImpl>::getIntrinsicInstrCost(llvm::IntrinsicCostAttributes const&, llvm::TargetTransformInfo::TargetCostKind) /work/third_party/llvm-project/llvm/include/llvm/CodeGen/BasicTTIImpl.h:1741:25
#15 0x00007fb401988620 llvm::X86TTIImpl::getIntrinsicInstrCost(llvm::IntrinsicCostAttributes const&, llvm::TargetTransformInfo::TargetCostKind) /work/third_party/llvm-project/llvm/lib/Target/X86/X86TargetTransformInfo.cpp:4334:17
#16 0x00007fb4018b0852 llvm::TargetTransformInfoImplCRTPBase<llvm::X86TTIImpl>::getInstructionCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>, llvm::TargetTransformInfo::TargetCostKind) /work/third_party/llvm-project/llvm/include/llvm/Analysis/TargetTransformInfoImpl.h:1138:25
#17 0x00007fb403d5f3ed llvm::TargetTransformInfo::getInstructionCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>, llvm::TargetTransformInfo::TargetCostKind) const /work/third_party/llvm-project/llvm/lib/Analysis/TargetTransformInfo.cpp:266:3
#18 0x00007fb402bd7349 llvm::TargetTransformInfo::getInstructionCost(llvm::User const*, llvm::TargetTransformInfo::TargetCostKind) const /work/third_party/llvm-project/llvm/include/llvm/Analysis/TargetTransformInfo.h:411:12
#19 0x00007fb403b18388 llvm::InstructionCost::propagateState(llvm::InstructionCost const&) /work/third_party/llvm-project/llvm/include/llvm/Support/InstructionCost.h:57:19
#20 0x00007fb403b18388 llvm::InstructionCost::operator+=(llvm::InstructionCost const&) /work/third_party/llvm-project/llvm/include/llvm/Support/InstructionCost.h:100:5
#21 0x00007fb403b18388 llvm::CodeMetrics::analyzeBasicBlock(llvm::BasicBlock const*, llvm::TargetTransformInfo const&, llvm::SmallPtrSetImpl<llvm::Value const*> const&, bool) /work/third_party/llvm-project/llvm/lib/Analysis/CodeMetrics.cpp:180:14
#22 0x00007fb403265ebb llvm::FunctionSpecializer::run() /work/third_party/llvm-project/llvm/lib/Transforms/IPO/FunctionSpecialization.cpp:0:0
#23 0x00007fb40325fbc3 runIPSCCP(llvm::Module&, llvm::DataLayout const&, llvm::AnalysisManager<llvm::Function>*, std::function<llvm::TargetLibraryInfo const& (llvm::Function&)>, std::function<llvm::TargetTransformInfo& (llvm::Function&)>, std::function<llvm::AssumptionCache& (llvm::Function&)>, std::function<llvm::DominatorTree& (llvm::Function&)>, std::function<llvm::BlockFrequencyInfo& (llvm::Function&)>, bool) /work/third_party/llvm-project/llvm/lib/Transforms/IPO/SCCP.cpp:165:5
#24 0x00007fb40325fbc3 llvm::IPSCCPPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /work/third_party/llvm-project/llvm/lib/Transforms/IPO/SCCP.cpp:402:8
#25 0x00007fb402ac373d llvm::detail::PassModel<llvm::Module, llvm::IPSCCPPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /work/third_party/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:89:5
#26 0x00007fb4041f9330 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /work/third_party/llvm-project/llvm/include/llvm/IR/PassManager.h:521:10
#27 0x00007fb3fe7ca5e0 llvm::SmallPtrSetImplBase::isSmall() const /work/third_party/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:195:33

We're going to revert locally. Especially given that this is impacting a target outside of RISCV, may I push a revert? We can try to generate a better repro asynchronously.

@michaelmaitland
Copy link
Contributor Author

michaelmaitland commented Oct 6, 2023

may I push a revert? We can try to generate a better repro asynchronously.

Sure thing. Can you please show me how I can reproduce this upstream?

stellaraccident added a commit that referenced this pull request Oct 6, 2023
…p counterparts (#67178)"

This reverts commit fc865c2.

Triggering assert on X86:

```
iree-compile: /work/third_party/llvm-project/llvm/include/llvm/Support/Casting.h:662: decltype(auto) llvm::dyn_cast(From *) [To = llvm::PointerType, From = llvm::Type]: Assertion `detail::isPresent(Val) && "dyn_cast on a non-existent value"' failed.
```

See PR for comments and full stack trace.
stellaraccident added a commit to iree-org/llvm-project that referenced this pull request Oct 6, 2023
…p counterparts (llvm#67178)"

This reverts commit fc865c2.

Breaks x86 test.
stellaraccident added a commit to iree-org/llvm-project that referenced this pull request Oct 6, 2023
…p counterparts (llvm#67178)"

This reverts commit fc865c2.

Breaks x86 test.
stellaraccident added a commit to iree-org/llvm-project that referenced this pull request Oct 6, 2023
…p counterparts (llvm#67178)"

This reverts commit fc865c2.

Breaks x86 test.
stellaraccident added a commit to iree-org/llvm-project that referenced this pull request Oct 6, 2023
…p counterparts (llvm#67178)"

This reverts commit fc865c2.

Breaks x86 test.
stellaraccident added a commit to iree-org/llvm-project that referenced this pull request Oct 6, 2023
…p counterparts (llvm#67178)"

This reverts commit fc865c2.

Breaks x86 test.
@Groverkss
Copy link
Member

may I push a revert? We can try to generate a better repro asynchronously.

Sure thing. Can you please show me how I can reproduce this upstream?

bug.ll : https://gist.github.com/Groverkss/4893d082222ea96dd6ac238cf8697cbd
opt -O3 bug.ll

This reproduces the bug for me

michaelmaitland added a commit to michaelmaitland/llvm-project that referenced this pull request Oct 10, 2023
…vp counterparts

This was reverted in commit 0abaf3c (llvm#67178).
This version of the patch includes a fix which was caused by
vp-reductions having an extra start value argument which the non-vp
counterparts did not have.
@michaelmaitland
Copy link
Contributor Author

@stellaraccident @Groverkss thank you for pointing out the bug and providing me a way to reproduce upstream. I have posted #68752 which recommits this here with the bug fix included.

stellaraccident added a commit to iree-org/llvm-project that referenced this pull request Oct 12, 2023
…p counterparts (llvm#67178)"

This reverts commit fc865c2.

Breaks x86 test.
stellaraccident added a commit to iree-org/llvm-project that referenced this pull request Oct 12, 2023
…p counterparts (llvm#67178)"

This reverts commit fc865c2.

Breaks x86 test.
stellaraccident added a commit to iree-org/llvm-project that referenced this pull request Oct 12, 2023
…p counterparts (llvm#67178)"

This reverts commit fc865c2.

Breaks x86 test.
stellaraccident added a commit to iree-org/llvm-project that referenced this pull request Oct 12, 2023
…p counterparts (llvm#67178)"

This reverts commit fc865c2.

Breaks x86 test.
stellaraccident added a commit to iree-org/llvm-project that referenced this pull request Oct 12, 2023
…p counterparts (llvm#67178)"

This reverts commit fc865c2.

Breaks x86 test.
stellaraccident added a commit to iree-org/llvm-project that referenced this pull request Oct 12, 2023
…p counterparts (llvm#67178)"

This reverts commit fc865c2.

Breaks x86 test.
stellaraccident added a commit to iree-org/llvm-project that referenced this pull request Oct 12, 2023
…p counterparts (llvm#67178)"

This reverts commit fc865c2.

Breaks x86 test.
stellaraccident added a commit to iree-org/llvm-project that referenced this pull request Oct 12, 2023
…p counterparts (llvm#67178)"

This reverts commit fc865c2.

Breaks x86 test.
stellaraccident added a commit to iree-org/llvm-project that referenced this pull request Oct 12, 2023
…p counterparts (llvm#67178)"

This reverts commit fc865c2.

Breaks x86 test.
stellaraccident added a commit to iree-org/llvm-project that referenced this pull request Oct 12, 2023
…p counterparts (llvm#67178)"

This reverts commit fc865c2.

Breaks x86 test.
stellaraccident added a commit to iree-org/llvm-project that referenced this pull request Oct 12, 2023
…p counterparts (llvm#67178)"

This reverts commit fc865c2.

Breaks x86 test.
stellaraccident added a commit to iree-org/llvm-project that referenced this pull request Oct 12, 2023
…p counterparts (llvm#67178)"

This reverts commit fc865c2.

Breaks x86 test.
stellaraccident added a commit to iree-org/llvm-project that referenced this pull request Oct 12, 2023
…p counterparts (llvm#67178)"

This reverts commit fc865c2.

Breaks x86 test.
stellaraccident added a commit to iree-org/llvm-project that referenced this pull request Oct 12, 2023
…p counterparts (llvm#67178)"

This reverts commit fc865c2.

Breaks x86 test.
antiagainst pushed a commit to iree-org/llvm-project that referenced this pull request Oct 15, 2023
…p counterparts (llvm#67178)"

This reverts commit fc865c2.

Breaks x86 test.
michaelmaitland added a commit that referenced this pull request Oct 20, 2023
…vp counterparts (#68752)

This was reverted in commit 0abaf3c
(#67178).

This version of the patch includes a fix which was caused by
vp-reductions having an extra start value argument which the non-vp
counterparts did not have.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants