Skip to content

Conversation

RKSimon
Copy link
Collaborator

@RKSimon RKSimon commented Sep 8, 2025

We need to account for AVX1 targets splitting 256-bit PMULUDQ style patterns - we included a AVX2 cost, but missed out AVX1 despite v4i64 being a legal type

Noticed while working on #156568

We need to account for AVX1 targets splitting 256-bit PMULUDQ style patterns - we included a AVX2 cost, but missed out AVX1 despite v4i64 being a legal type

Noticed while working on llvm#156568
@RKSimon RKSimon enabled auto-merge (squash) September 8, 2025 14:31
@llvmbot llvmbot added backend:X86 llvm:analysis Includes value tracking, cost tables and constant folding labels Sep 8, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 8, 2025

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-backend-x86

Author: Simon Pilgrim (RKSimon)

Changes

We need to account for AVX1 targets splitting 256-bit PMULUDQ style patterns - we included a AVX2 cost, but missed out AVX1 despite v4i64 being a legal type

Noticed while working on #156568


Full diff: https://github.com/llvm/llvm-project/pull/157475.diff

2 Files Affected:

  • (modified) llvm/lib/Target/X86/X86TargetTransformInfo.cpp (+2)
  • (modified) llvm/test/Analysis/CostModel/X86/mul64.ll (+15-15)
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
index 910c724a4e3aa..3d8d0a236a3c1 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -1206,6 +1206,8 @@ InstructionCost X86TTIImpl::getArithmeticInstrCost(
     { ISD::MUL,     MVT::v4i32,   {  2,  5,  1,  3 } }, // pmulld
     { ISD::MUL,     MVT::v4i64,   { 12, 15, 19, 20 } },
 
+    { X86ISD::PMULUDQ, MVT::v4i64, { 3,  5, 5, 6 } }, // pmuludq + split
+
     { ISD::AND,     MVT::v32i8,   {  1,  1, 1, 2 } }, // vandps
     { ISD::AND,     MVT::v16i16,  {  1,  1, 1, 2 } }, // vandps
     { ISD::AND,     MVT::v8i32,   {  1,  1, 1, 2 } }, // vandps
diff --git a/llvm/test/Analysis/CostModel/X86/mul64.ll b/llvm/test/Analysis/CostModel/X86/mul64.ll
index 9e4794760404f..9af20febeaffa 100644
--- a/llvm/test/Analysis/CostModel/X86/mul64.ll
+++ b/llvm/test/Analysis/CostModel/X86/mul64.ll
@@ -327,11 +327,11 @@ define void @mul_zext_vXi8(<2 x i8> %a2, <2 x i8> %b2, <4 x i8> %a4, <4 x i8> %b
 ; AVX1-NEXT:  Cost Model: Found costs of RThru:50 CodeSize:1 Lat:1 SizeLat:1 for: %xa64 = zext <64 x i8> %a64 to <64 x i64>
 ; AVX1-NEXT:  Cost Model: Found costs of RThru:50 CodeSize:1 Lat:1 SizeLat:1 for: %xb64 = zext <64 x i8> %b64 to <64 x i64>
 ; AVX1-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:5 SizeLat:1 for: %res2 = mul <2 x i64> %xa2, %xb2
-; AVX1-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %res4 = mul <4 x i64> %xa4, %xb4
-; AVX1-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:1 Lat:1 SizeLat:1 for: %res8 = mul <8 x i64> %xa8, %xb8
-; AVX1-NEXT:  Cost Model: Found costs of RThru:8 CodeSize:1 Lat:1 SizeLat:1 for: %res16 = mul <16 x i64> %xa16, %xb16
-; AVX1-NEXT:  Cost Model: Found costs of RThru:16 CodeSize:1 Lat:1 SizeLat:1 for: %res32 = mul <32 x i64> %xa32, %xb32
-; AVX1-NEXT:  Cost Model: Found costs of RThru:32 CodeSize:1 Lat:1 SizeLat:1 for: %res64 = mul <64 x i64> %xa64, %xb64
+; AVX1-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:5 Lat:5 SizeLat:6 for: %res4 = mul <4 x i64> %xa4, %xb4
+; AVX1-NEXT:  Cost Model: Found costs of RThru:6 CodeSize:10 Lat:10 SizeLat:12 for: %res8 = mul <8 x i64> %xa8, %xb8
+; AVX1-NEXT:  Cost Model: Found costs of RThru:12 CodeSize:20 Lat:20 SizeLat:24 for: %res16 = mul <16 x i64> %xa16, %xb16
+; AVX1-NEXT:  Cost Model: Found costs of RThru:24 CodeSize:40 Lat:40 SizeLat:48 for: %res32 = mul <32 x i64> %xa32, %xb32
+; AVX1-NEXT:  Cost Model: Found costs of RThru:48 CodeSize:80 Lat:80 SizeLat:96 for: %res64 = mul <64 x i64> %xa64, %xb64
 ; AVX1-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
 ; AVX2-LABEL: 'mul_zext_vXi8'
@@ -985,11 +985,11 @@ define void @mul_zext_vXi16(<2 x i16> %a2, <2 x i16> %b2, <4 x i16> %a4, <4 x i1
 ; AVX1-NEXT:  Cost Model: Found costs of RThru:52 CodeSize:1 Lat:1 SizeLat:1 for: %xa64 = zext <64 x i16> %a64 to <64 x i64>
 ; AVX1-NEXT:  Cost Model: Found costs of RThru:52 CodeSize:1 Lat:1 SizeLat:1 for: %xb64 = zext <64 x i16> %b64 to <64 x i64>
 ; AVX1-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:5 SizeLat:1 for: %res2 = mul <2 x i64> %xa2, %xb2
-; AVX1-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %res4 = mul <4 x i64> %xa4, %xb4
-; AVX1-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:1 Lat:1 SizeLat:1 for: %res8 = mul <8 x i64> %xa8, %xb8
-; AVX1-NEXT:  Cost Model: Found costs of RThru:8 CodeSize:1 Lat:1 SizeLat:1 for: %res16 = mul <16 x i64> %xa16, %xb16
-; AVX1-NEXT:  Cost Model: Found costs of RThru:16 CodeSize:1 Lat:1 SizeLat:1 for: %res32 = mul <32 x i64> %xa32, %xb32
-; AVX1-NEXT:  Cost Model: Found costs of RThru:32 CodeSize:1 Lat:1 SizeLat:1 for: %res64 = mul <64 x i64> %xa64, %xb64
+; AVX1-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:5 Lat:5 SizeLat:6 for: %res4 = mul <4 x i64> %xa4, %xb4
+; AVX1-NEXT:  Cost Model: Found costs of RThru:6 CodeSize:10 Lat:10 SizeLat:12 for: %res8 = mul <8 x i64> %xa8, %xb8
+; AVX1-NEXT:  Cost Model: Found costs of RThru:12 CodeSize:20 Lat:20 SizeLat:24 for: %res16 = mul <16 x i64> %xa16, %xb16
+; AVX1-NEXT:  Cost Model: Found costs of RThru:24 CodeSize:40 Lat:40 SizeLat:48 for: %res32 = mul <32 x i64> %xa32, %xb32
+; AVX1-NEXT:  Cost Model: Found costs of RThru:48 CodeSize:80 Lat:80 SizeLat:96 for: %res64 = mul <64 x i64> %xa64, %xb64
 ; AVX1-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
 ; AVX2-LABEL: 'mul_zext_vXi16'
@@ -1643,11 +1643,11 @@ define void @mul_zext_vXi32(<2 x i32> %a2, <2 x i32> %b2, <4 x i32> %a4, <4 x i3
 ; AVX1-NEXT:  Cost Model: Found costs of RThru:56 CodeSize:1 Lat:1 SizeLat:1 for: %xa64 = zext <64 x i32> %a64 to <64 x i64>
 ; AVX1-NEXT:  Cost Model: Found costs of RThru:56 CodeSize:1 Lat:1 SizeLat:1 for: %xb64 = zext <64 x i32> %b64 to <64 x i64>
 ; AVX1-NEXT:  Cost Model: Found costs of RThru:1 CodeSize:1 Lat:5 SizeLat:1 for: %res2 = mul <2 x i64> %xa2, %xb2
-; AVX1-NEXT:  Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %res4 = mul <4 x i64> %xa4, %xb4
-; AVX1-NEXT:  Cost Model: Found costs of RThru:4 CodeSize:1 Lat:1 SizeLat:1 for: %res8 = mul <8 x i64> %xa8, %xb8
-; AVX1-NEXT:  Cost Model: Found costs of RThru:8 CodeSize:1 Lat:1 SizeLat:1 for: %res16 = mul <16 x i64> %xa16, %xb16
-; AVX1-NEXT:  Cost Model: Found costs of RThru:16 CodeSize:1 Lat:1 SizeLat:1 for: %res32 = mul <32 x i64> %xa32, %xb32
-; AVX1-NEXT:  Cost Model: Found costs of RThru:32 CodeSize:1 Lat:1 SizeLat:1 for: %res64 = mul <64 x i64> %xa64, %xb64
+; AVX1-NEXT:  Cost Model: Found costs of RThru:3 CodeSize:5 Lat:5 SizeLat:6 for: %res4 = mul <4 x i64> %xa4, %xb4
+; AVX1-NEXT:  Cost Model: Found costs of RThru:6 CodeSize:10 Lat:10 SizeLat:12 for: %res8 = mul <8 x i64> %xa8, %xb8
+; AVX1-NEXT:  Cost Model: Found costs of RThru:12 CodeSize:20 Lat:20 SizeLat:24 for: %res16 = mul <16 x i64> %xa16, %xb16
+; AVX1-NEXT:  Cost Model: Found costs of RThru:24 CodeSize:40 Lat:40 SizeLat:48 for: %res32 = mul <32 x i64> %xa32, %xb32
+; AVX1-NEXT:  Cost Model: Found costs of RThru:48 CodeSize:80 Lat:80 SizeLat:96 for: %res64 = mul <64 x i64> %xa64, %xb64
 ; AVX1-NEXT:  Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
 ;
 ; AVX2-LABEL: 'mul_zext_vXi32'

Copy link

github-actions bot commented Sep 8, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff origin/main HEAD --extensions cpp -- llvm/lib/Target/X86/X86TargetTransformInfo.cpp

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
index 3d8d0a236..9d5fc327a 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -1196,101 +1196,141 @@ InstructionCost X86TTIImpl::getArithmeticInstrCost(
         return LT.first * *KindCost;
 
   static const CostKindTblEntry AVX1CostTable[] = {
-    // We don't have to scalarize unsupported ops. We can issue two half-sized
-    // operations and we only need to extract the upper YMM half.
-    // Two ops + 1 extract + 1 insert = 4.
-    { ISD::MUL,     MVT::v32i8,   { 10, 11, 18, 19 } }, // pmaddubsw + split
-    { ISD::MUL,     MVT::v16i8,   {  5,  6,  8, 12 } }, // 2*pmaddubsw/3*and/psllw/or
-    { ISD::MUL,     MVT::v16i16,  {  4,  8,  5,  6 } }, // pmullw + split
-    { ISD::MUL,     MVT::v8i32,   {  5,  8,  5, 10 } }, // pmulld + split
-    { ISD::MUL,     MVT::v4i32,   {  2,  5,  1,  3 } }, // pmulld
-    { ISD::MUL,     MVT::v4i64,   { 12, 15, 19, 20 } },
-
-    { X86ISD::PMULUDQ, MVT::v4i64, { 3,  5, 5, 6 } }, // pmuludq + split
-
-    { ISD::AND,     MVT::v32i8,   {  1,  1, 1, 2 } }, // vandps
-    { ISD::AND,     MVT::v16i16,  {  1,  1, 1, 2 } }, // vandps
-    { ISD::AND,     MVT::v8i32,   {  1,  1, 1, 2 } }, // vandps
-    { ISD::AND,     MVT::v4i64,   {  1,  1, 1, 2 } }, // vandps
-
-    { ISD::OR,      MVT::v32i8,   {  1,  1, 1, 2 } }, // vorps
-    { ISD::OR,      MVT::v16i16,  {  1,  1, 1, 2 } }, // vorps
-    { ISD::OR,      MVT::v8i32,   {  1,  1, 1, 2 } }, // vorps
-    { ISD::OR,      MVT::v4i64,   {  1,  1, 1, 2 } }, // vorps
-
-    { ISD::XOR,     MVT::v32i8,   {  1,  1, 1, 2 } }, // vxorps
-    { ISD::XOR,     MVT::v16i16,  {  1,  1, 1, 2 } }, // vxorps
-    { ISD::XOR,     MVT::v8i32,   {  1,  1, 1, 2 } }, // vxorps
-    { ISD::XOR,     MVT::v4i64,   {  1,  1, 1, 2 } }, // vxorps
-
-    { ISD::SUB,     MVT::v32i8,   {  4,  2, 5, 6 } }, // psubb + split
-    { ISD::ADD,     MVT::v32i8,   {  4,  2, 5, 6 } }, // paddb + split
-    { ISD::SUB,     MVT::v16i16,  {  4,  2, 5, 6 } }, // psubw + split
-    { ISD::ADD,     MVT::v16i16,  {  4,  2, 5, 6 } }, // paddw + split
-    { ISD::SUB,     MVT::v8i32,   {  4,  2, 5, 6 } }, // psubd + split
-    { ISD::ADD,     MVT::v8i32,   {  4,  2, 5, 6 } }, // paddd + split
-    { ISD::SUB,     MVT::v4i64,   {  4,  2, 5, 6 } }, // psubq + split
-    { ISD::ADD,     MVT::v4i64,   {  4,  2, 5, 6 } }, // paddq + split
-    { ISD::SUB,     MVT::v2i64,   {  1,  1, 1, 1 } }, // psubq
-    { ISD::ADD,     MVT::v2i64,   {  1,  1, 1, 1 } }, // paddq
-
-    { ISD::SHL,     MVT::v16i8,   { 10, 21,11,17 } }, // pblendvb sequence.
-    { ISD::SHL,     MVT::v32i8,   { 22, 22,27,40 } }, // pblendvb sequence + split.
-    { ISD::SHL,     MVT::v8i16,   {  6,  9,11,11 } }, // pblendvb sequence.
-    { ISD::SHL,     MVT::v16i16,  { 13, 16,24,25 } }, // pblendvb sequence + split.
-    { ISD::SHL,     MVT::v4i32,   {  3, 11, 4, 6 } }, // pslld/paddd/cvttps2dq/pmulld
-    { ISD::SHL,     MVT::v8i32,   {  9, 11,12,17 } }, // pslld/paddd/cvttps2dq/pmulld + split
-    { ISD::SHL,     MVT::v2i64,   {  2,  4, 4, 6 } }, // Shift each lane + blend.
-    { ISD::SHL,     MVT::v4i64,   {  6,  7,11,15 } }, // Shift each lane + blend + split.
-
-    { ISD::SRL,     MVT::v16i8,   { 11, 27,12,18 } }, // pblendvb sequence.
-    { ISD::SRL,     MVT::v32i8,   { 23, 23,30,43 } }, // pblendvb sequence + split.
-    { ISD::SRL,     MVT::v8i16,   { 13, 16,14,22 } }, // pblendvb sequence.
-    { ISD::SRL,     MVT::v16i16,  { 28, 30,31,48 } }, // pblendvb sequence + split.
-    { ISD::SRL,     MVT::v4i32,   {  6,  7,12,16 } }, // Shift each lane + blend.
-    { ISD::SRL,     MVT::v8i32,   { 14, 14,26,34 } }, // Shift each lane + blend + split.
-    { ISD::SRL,     MVT::v2i64,   {  2,  4, 4, 6 } }, // Shift each lane + blend.
-    { ISD::SRL,     MVT::v4i64,   {  6,  7,11,15 } }, // Shift each lane + blend + split.
-
-    { ISD::SRA,     MVT::v16i8,   { 21, 22,24,36 } }, // pblendvb sequence.
-    { ISD::SRA,     MVT::v32i8,   { 44, 45,51,76 } }, // pblendvb sequence + split.
-    { ISD::SRA,     MVT::v8i16,   { 13, 16,14,22 } }, // pblendvb sequence.
-    { ISD::SRA,     MVT::v16i16,  { 28, 30,31,48 } }, // pblendvb sequence + split.
-    { ISD::SRA,     MVT::v4i32,   {  6,  7,12,16 } }, // Shift each lane + blend.
-    { ISD::SRA,     MVT::v8i32,   { 14, 14,26,34 } }, // Shift each lane + blend + split.
-    { ISD::SRA,     MVT::v2i64,   {  5,  6,10,14 } }, // Shift each lane + blend.
-    { ISD::SRA,     MVT::v4i64,   { 12, 12,22,30 } }, // Shift each lane + blend + split.
-
-    { ISD::FNEG,    MVT::v4f64,   {  2,  2, 1, 2 } }, // BTVER2 from http://www.agner.org/
-    { ISD::FNEG,    MVT::v8f32,   {  2,  2, 1, 2 } }, // BTVER2 from http://www.agner.org/
-
-    { ISD::FADD,    MVT::f64,     {  1,  5, 1, 1 } }, // BDVER2 from http://www.agner.org/
-    { ISD::FADD,    MVT::f32,     {  1,  5, 1, 1 } }, // BDVER2 from http://www.agner.org/
-    { ISD::FADD,    MVT::v2f64,   {  1,  5, 1, 1 } }, // BDVER2 from http://www.agner.org/
-    { ISD::FADD,    MVT::v4f32,   {  1,  5, 1, 1 } }, // BDVER2 from http://www.agner.org/
-    { ISD::FADD,    MVT::v4f64,   {  2,  5, 1, 2 } }, // BDVER2 from http://www.agner.org/
-    { ISD::FADD,    MVT::v8f32,   {  2,  5, 1, 2 } }, // BDVER2 from http://www.agner.org/
-
-    { ISD::FSUB,    MVT::f64,     {  1,  5, 1, 1 } }, // BDVER2 from http://www.agner.org/
-    { ISD::FSUB,    MVT::f32,     {  1,  5, 1, 1 } }, // BDVER2 from http://www.agner.org/
-    { ISD::FSUB,    MVT::v2f64,   {  1,  5, 1, 1 } }, // BDVER2 from http://www.agner.org/
-    { ISD::FSUB,    MVT::v4f32,   {  1,  5, 1, 1 } }, // BDVER2 from http://www.agner.org/
-    { ISD::FSUB,    MVT::v4f64,   {  2,  5, 1, 2 } }, // BDVER2 from http://www.agner.org/
-    { ISD::FSUB,    MVT::v8f32,   {  2,  5, 1, 2 } }, // BDVER2 from http://www.agner.org/
-
-    { ISD::FMUL,    MVT::f64,     {  2,  5, 1, 1 } }, // BTVER2 from http://www.agner.org/
-    { ISD::FMUL,    MVT::f32,     {  1,  5, 1, 1 } }, // BTVER2 from http://www.agner.org/
-    { ISD::FMUL,    MVT::v2f64,   {  2,  5, 1, 1 } }, // BTVER2 from http://www.agner.org/
-    { ISD::FMUL,    MVT::v4f32,   {  1,  5, 1, 1 } }, // BTVER2 from http://www.agner.org/
-    { ISD::FMUL,    MVT::v4f64,   {  4,  5, 1, 2 } }, // BTVER2 from http://www.agner.org/
-    { ISD::FMUL,    MVT::v8f32,   {  2,  5, 1, 2 } }, // BTVER2 from http://www.agner.org/
-
-    { ISD::FDIV,    MVT::f32,     { 14, 14, 1, 1 } }, // SNB from http://www.agner.org/
-    { ISD::FDIV,    MVT::v4f32,   { 14, 14, 1, 1 } }, // SNB from http://www.agner.org/
-    { ISD::FDIV,    MVT::v8f32,   { 28, 29, 1, 3 } }, // SNB from http://www.agner.org/
-    { ISD::FDIV,    MVT::f64,     { 22, 22, 1, 1 } }, // SNB from http://www.agner.org/
-    { ISD::FDIV,    MVT::v2f64,   { 22, 22, 1, 1 } }, // SNB from http://www.agner.org/
-    { ISD::FDIV,    MVT::v4f64,   { 44, 45, 1, 3 } }, // SNB from http://www.agner.org/
+      // We don't have to scalarize unsupported ops. We can issue two half-sized
+      // operations and we only need to extract the upper YMM half.
+      // Two ops + 1 extract + 1 insert = 4.
+      {ISD::MUL, MVT::v32i8, {10, 11, 18, 19}}, // pmaddubsw + split
+      {ISD::MUL, MVT::v16i8, {5, 6, 8, 12}},    // 2*pmaddubsw/3*and/psllw/or
+      {ISD::MUL, MVT::v16i16, {4, 8, 5, 6}},    // pmullw + split
+      {ISD::MUL, MVT::v8i32, {5, 8, 5, 10}},    // pmulld + split
+      {ISD::MUL, MVT::v4i32, {2, 5, 1, 3}},     // pmulld
+      {ISD::MUL, MVT::v4i64, {12, 15, 19, 20}},
+
+      {X86ISD::PMULUDQ, MVT::v4i64, {3, 5, 5, 6}}, // pmuludq + split
+
+      {ISD::AND, MVT::v32i8, {1, 1, 1, 2}},  // vandps
+      {ISD::AND, MVT::v16i16, {1, 1, 1, 2}}, // vandps
+      {ISD::AND, MVT::v8i32, {1, 1, 1, 2}},  // vandps
+      {ISD::AND, MVT::v4i64, {1, 1, 1, 2}},  // vandps
+
+      {ISD::OR, MVT::v32i8, {1, 1, 1, 2}},  // vorps
+      {ISD::OR, MVT::v16i16, {1, 1, 1, 2}}, // vorps
+      {ISD::OR, MVT::v8i32, {1, 1, 1, 2}},  // vorps
+      {ISD::OR, MVT::v4i64, {1, 1, 1, 2}},  // vorps
+
+      {ISD::XOR, MVT::v32i8, {1, 1, 1, 2}},  // vxorps
+      {ISD::XOR, MVT::v16i16, {1, 1, 1, 2}}, // vxorps
+      {ISD::XOR, MVT::v8i32, {1, 1, 1, 2}},  // vxorps
+      {ISD::XOR, MVT::v4i64, {1, 1, 1, 2}},  // vxorps
+
+      {ISD::SUB, MVT::v32i8, {4, 2, 5, 6}},  // psubb + split
+      {ISD::ADD, MVT::v32i8, {4, 2, 5, 6}},  // paddb + split
+      {ISD::SUB, MVT::v16i16, {4, 2, 5, 6}}, // psubw + split
+      {ISD::ADD, MVT::v16i16, {4, 2, 5, 6}}, // paddw + split
+      {ISD::SUB, MVT::v8i32, {4, 2, 5, 6}},  // psubd + split
+      {ISD::ADD, MVT::v8i32, {4, 2, 5, 6}},  // paddd + split
+      {ISD::SUB, MVT::v4i64, {4, 2, 5, 6}},  // psubq + split
+      {ISD::ADD, MVT::v4i64, {4, 2, 5, 6}},  // paddq + split
+      {ISD::SUB, MVT::v2i64, {1, 1, 1, 1}},  // psubq
+      {ISD::ADD, MVT::v2i64, {1, 1, 1, 1}},  // paddq
+
+      {ISD::SHL, MVT::v16i8, {10, 21, 11, 17}},  // pblendvb sequence.
+      {ISD::SHL, MVT::v32i8, {22, 22, 27, 40}},  // pblendvb sequence + split.
+      {ISD::SHL, MVT::v8i16, {6, 9, 11, 11}},    // pblendvb sequence.
+      {ISD::SHL, MVT::v16i16, {13, 16, 24, 25}}, // pblendvb sequence + split.
+      {ISD::SHL, MVT::v4i32, {3, 11, 4, 6}},     // pslld/paddd/cvttps2dq/pmulld
+      {ISD::SHL,
+       MVT::v8i32,
+       {9, 11, 12, 17}}, // pslld/paddd/cvttps2dq/pmulld + split
+      {ISD::SHL, MVT::v2i64, {2, 4, 4, 6}}, // Shift each lane + blend.
+      {ISD::SHL,
+       MVT::v4i64,
+       {6, 7, 11, 15}}, // Shift each lane + blend + split.
+
+      {ISD::SRL, MVT::v16i8, {11, 27, 12, 18}},  // pblendvb sequence.
+      {ISD::SRL, MVT::v32i8, {23, 23, 30, 43}},  // pblendvb sequence + split.
+      {ISD::SRL, MVT::v8i16, {13, 16, 14, 22}},  // pblendvb sequence.
+      {ISD::SRL, MVT::v16i16, {28, 30, 31, 48}}, // pblendvb sequence + split.
+      {ISD::SRL, MVT::v4i32, {6, 7, 12, 16}},    // Shift each lane + blend.
+      {ISD::SRL,
+       MVT::v8i32,
+       {14, 14, 26, 34}},                   // Shift each lane + blend + split.
+      {ISD::SRL, MVT::v2i64, {2, 4, 4, 6}}, // Shift each lane + blend.
+      {ISD::SRL,
+       MVT::v4i64,
+       {6, 7, 11, 15}}, // Shift each lane + blend + split.
+
+      {ISD::SRA, MVT::v16i8, {21, 22, 24, 36}},  // pblendvb sequence.
+      {ISD::SRA, MVT::v32i8, {44, 45, 51, 76}},  // pblendvb sequence + split.
+      {ISD::SRA, MVT::v8i16, {13, 16, 14, 22}},  // pblendvb sequence.
+      {ISD::SRA, MVT::v16i16, {28, 30, 31, 48}}, // pblendvb sequence + split.
+      {ISD::SRA, MVT::v4i32, {6, 7, 12, 16}},    // Shift each lane + blend.
+      {ISD::SRA,
+       MVT::v8i32,
+       {14, 14, 26, 34}}, // Shift each lane + blend + split.
+      {ISD::SRA, MVT::v2i64, {5, 6, 10, 14}}, // Shift each lane + blend.
+      {ISD::SRA,
+       MVT::v4i64,
+       {12, 12, 22, 30}}, // Shift each lane + blend + split.
+
+      {ISD::FNEG,
+       MVT::v4f64,
+       {2, 2, 1, 2}}, // BTVER2 from http://www.agner.org/
+      {ISD::FNEG,
+       MVT::v8f32,
+       {2, 2, 1, 2}}, // BTVER2 from http://www.agner.org/
+
+      {ISD::FADD, MVT::f64, {1, 5, 1, 1}}, // BDVER2 from http://www.agner.org/
+      {ISD::FADD, MVT::f32, {1, 5, 1, 1}}, // BDVER2 from http://www.agner.org/
+      {ISD::FADD,
+       MVT::v2f64,
+       {1, 5, 1, 1}}, // BDVER2 from http://www.agner.org/
+      {ISD::FADD,
+       MVT::v4f32,
+       {1, 5, 1, 1}}, // BDVER2 from http://www.agner.org/
+      {ISD::FADD,
+       MVT::v4f64,
+       {2, 5, 1, 2}}, // BDVER2 from http://www.agner.org/
+      {ISD::FADD,
+       MVT::v8f32,
+       {2, 5, 1, 2}}, // BDVER2 from http://www.agner.org/
+
+      {ISD::FSUB, MVT::f64, {1, 5, 1, 1}}, // BDVER2 from http://www.agner.org/
+      {ISD::FSUB, MVT::f32, {1, 5, 1, 1}}, // BDVER2 from http://www.agner.org/
+      {ISD::FSUB,
+       MVT::v2f64,
+       {1, 5, 1, 1}}, // BDVER2 from http://www.agner.org/
+      {ISD::FSUB,
+       MVT::v4f32,
+       {1, 5, 1, 1}}, // BDVER2 from http://www.agner.org/
+      {ISD::FSUB,
+       MVT::v4f64,
+       {2, 5, 1, 2}}, // BDVER2 from http://www.agner.org/
+      {ISD::FSUB,
+       MVT::v8f32,
+       {2, 5, 1, 2}}, // BDVER2 from http://www.agner.org/
+
+      {ISD::FMUL, MVT::f64, {2, 5, 1, 1}}, // BTVER2 from http://www.agner.org/
+      {ISD::FMUL, MVT::f32, {1, 5, 1, 1}}, // BTVER2 from http://www.agner.org/
+      {ISD::FMUL,
+       MVT::v2f64,
+       {2, 5, 1, 1}}, // BTVER2 from http://www.agner.org/
+      {ISD::FMUL,
+       MVT::v4f32,
+       {1, 5, 1, 1}}, // BTVER2 from http://www.agner.org/
+      {ISD::FMUL,
+       MVT::v4f64,
+       {4, 5, 1, 2}}, // BTVER2 from http://www.agner.org/
+      {ISD::FMUL,
+       MVT::v8f32,
+       {2, 5, 1, 2}}, // BTVER2 from http://www.agner.org/
+
+      {ISD::FDIV, MVT::f32, {14, 14, 1, 1}},   // SNB from http://www.agner.org/
+      {ISD::FDIV, MVT::v4f32, {14, 14, 1, 1}}, // SNB from http://www.agner.org/
+      {ISD::FDIV, MVT::v8f32, {28, 29, 1, 3}}, // SNB from http://www.agner.org/
+      {ISD::FDIV, MVT::f64, {22, 22, 1, 1}},   // SNB from http://www.agner.org/
+      {ISD::FDIV, MVT::v2f64, {22, 22, 1, 1}}, // SNB from http://www.agner.org/
+      {ISD::FDIV, MVT::v4f64, {44, 45, 1, 3}}, // SNB from http://www.agner.org/
   };
 
   if (ST->hasAVX())

@RKSimon RKSimon merged commit dfbefe4 into llvm:main Sep 8, 2025
11 of 12 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 8, 2025

LLVM Buildbot has detected a new failure on builder cross-project-tests-sie-ubuntu running on doug-worker-1a while building llvm at step 6 "test-build-unified-tree-check-cross-project".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/181/builds/27438

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-cross-project) failure: test (failure)
******************** TEST 'cross-project-tests :: debuginfo-tests/dexter/feature_tests/commands/penalty/unreachable_line_range.cpp' FAILED ********************
Exit Code: 1

Command Output (stderr):
--
clang++ -O0 -glldb -std=gnu++11 /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/cross-project-tests/debuginfo-tests/dexter/feature_tests/commands/penalty/unreachable_line_range.cpp -o /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/build/projects/cross-project-tests/debuginfo-tests/dexter/feature_tests/commands/penalty/Output/unreachable_line_range.cpp.tmp # RUN: at line 8
+ clang++ -O0 -glldb -std=gnu++11 /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/cross-project-tests/debuginfo-tests/dexter/feature_tests/commands/penalty/unreachable_line_range.cpp -o /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/build/projects/cross-project-tests/debuginfo-tests/dexter/feature_tests/commands/penalty/Output/unreachable_line_range.cpp.tmp
not "/usr/bin/python3.8" "/home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/cross-project-tests/debuginfo-tests/dexter/dexter.py" test --fail-lt 1.0 -w --debugger lldb-dap --lldb-executable "/home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/build/bin/lldb-dap"  --dap-message-log=/home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/build/projects/cross-project-tests/debuginfo-tests/dexter/feature_tests/commands/penalty/Output/unreachable_line_range.cpp.tmp.dap.log --binary /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/build/projects/cross-project-tests/debuginfo-tests/dexter/feature_tests/commands/penalty/Output/unreachable_line_range.cpp.tmp -- /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/cross-project-tests/debuginfo-tests/dexter/feature_tests/commands/penalty/unreachable_line_range.cpp | /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/build/bin/FileCheck --dump-input-context=999999999 /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/cross-project-tests/debuginfo-tests/dexter/feature_tests/commands/penalty/unreachable_line_range.cpp # RUN: at line 9
+ /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/build/bin/FileCheck --dump-input-context=999999999 /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/cross-project-tests/debuginfo-tests/dexter/feature_tests/commands/penalty/unreachable_line_range.cpp
+ not /usr/bin/python3.8 /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/cross-project-tests/debuginfo-tests/dexter/dexter.py test --fail-lt 1.0 -w --debugger lldb-dap --lldb-executable /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/build/bin/lldb-dap --dap-message-log=/home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/build/projects/cross-project-tests/debuginfo-tests/dexter/feature_tests/commands/penalty/Output/unreachable_line_range.cpp.tmp.dap.log --binary /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/build/projects/cross-project-tests/debuginfo-tests/dexter/feature_tests/commands/penalty/Output/unreachable_line_range.cpp.tmp -- /home/buildbot/buildbot-root/cross-project-tests-sie-ubuntu/llvm-project/cross-project-tests/debuginfo-tests/dexter/feature_tests/commands/penalty/unreachable_line_range.cpp

--

********************


@RKSimon RKSimon deleted the x86-costs-pmuludq branch September 8, 2025 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:X86 llvm:analysis Includes value tracking, cost tables and constant folding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants