-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[CostModel][X86] Add missing AVX1 costs for PMULUDQ v4i64 pattern #157475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We need to account for AVX1 targets splitting 256-bit PMULUDQ style patterns - we included a AVX2 cost, but missed out AVX1 despite v4i64 being a legal type Noticed while working on llvm#156568
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-backend-x86 Author: Simon Pilgrim (RKSimon) ChangesWe need to account for AVX1 targets splitting 256-bit PMULUDQ style patterns - we included a AVX2 cost, but missed out AVX1 despite v4i64 being a legal type Noticed while working on #156568 Full diff: https://github.com/llvm/llvm-project/pull/157475.diff 2 Files Affected:
diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
index 910c724a4e3aa..3d8d0a236a3c1 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -1206,6 +1206,8 @@ InstructionCost X86TTIImpl::getArithmeticInstrCost(
{ ISD::MUL, MVT::v4i32, { 2, 5, 1, 3 } }, // pmulld
{ ISD::MUL, MVT::v4i64, { 12, 15, 19, 20 } },
+ { X86ISD::PMULUDQ, MVT::v4i64, { 3, 5, 5, 6 } }, // pmuludq + split
+
{ ISD::AND, MVT::v32i8, { 1, 1, 1, 2 } }, // vandps
{ ISD::AND, MVT::v16i16, { 1, 1, 1, 2 } }, // vandps
{ ISD::AND, MVT::v8i32, { 1, 1, 1, 2 } }, // vandps
diff --git a/llvm/test/Analysis/CostModel/X86/mul64.ll b/llvm/test/Analysis/CostModel/X86/mul64.ll
index 9e4794760404f..9af20febeaffa 100644
--- a/llvm/test/Analysis/CostModel/X86/mul64.ll
+++ b/llvm/test/Analysis/CostModel/X86/mul64.ll
@@ -327,11 +327,11 @@ define void @mul_zext_vXi8(<2 x i8> %a2, <2 x i8> %b2, <4 x i8> %a4, <4 x i8> %b
; AVX1-NEXT: Cost Model: Found costs of RThru:50 CodeSize:1 Lat:1 SizeLat:1 for: %xa64 = zext <64 x i8> %a64 to <64 x i64>
; AVX1-NEXT: Cost Model: Found costs of RThru:50 CodeSize:1 Lat:1 SizeLat:1 for: %xb64 = zext <64 x i8> %b64 to <64 x i64>
; AVX1-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:5 SizeLat:1 for: %res2 = mul <2 x i64> %xa2, %xb2
-; AVX1-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %res4 = mul <4 x i64> %xa4, %xb4
-; AVX1-NEXT: Cost Model: Found costs of RThru:4 CodeSize:1 Lat:1 SizeLat:1 for: %res8 = mul <8 x i64> %xa8, %xb8
-; AVX1-NEXT: Cost Model: Found costs of RThru:8 CodeSize:1 Lat:1 SizeLat:1 for: %res16 = mul <16 x i64> %xa16, %xb16
-; AVX1-NEXT: Cost Model: Found costs of RThru:16 CodeSize:1 Lat:1 SizeLat:1 for: %res32 = mul <32 x i64> %xa32, %xb32
-; AVX1-NEXT: Cost Model: Found costs of RThru:32 CodeSize:1 Lat:1 SizeLat:1 for: %res64 = mul <64 x i64> %xa64, %xb64
+; AVX1-NEXT: Cost Model: Found costs of RThru:3 CodeSize:5 Lat:5 SizeLat:6 for: %res4 = mul <4 x i64> %xa4, %xb4
+; AVX1-NEXT: Cost Model: Found costs of RThru:6 CodeSize:10 Lat:10 SizeLat:12 for: %res8 = mul <8 x i64> %xa8, %xb8
+; AVX1-NEXT: Cost Model: Found costs of RThru:12 CodeSize:20 Lat:20 SizeLat:24 for: %res16 = mul <16 x i64> %xa16, %xb16
+; AVX1-NEXT: Cost Model: Found costs of RThru:24 CodeSize:40 Lat:40 SizeLat:48 for: %res32 = mul <32 x i64> %xa32, %xb32
+; AVX1-NEXT: Cost Model: Found costs of RThru:48 CodeSize:80 Lat:80 SizeLat:96 for: %res64 = mul <64 x i64> %xa64, %xb64
; AVX1-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
; AVX2-LABEL: 'mul_zext_vXi8'
@@ -985,11 +985,11 @@ define void @mul_zext_vXi16(<2 x i16> %a2, <2 x i16> %b2, <4 x i16> %a4, <4 x i1
; AVX1-NEXT: Cost Model: Found costs of RThru:52 CodeSize:1 Lat:1 SizeLat:1 for: %xa64 = zext <64 x i16> %a64 to <64 x i64>
; AVX1-NEXT: Cost Model: Found costs of RThru:52 CodeSize:1 Lat:1 SizeLat:1 for: %xb64 = zext <64 x i16> %b64 to <64 x i64>
; AVX1-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:5 SizeLat:1 for: %res2 = mul <2 x i64> %xa2, %xb2
-; AVX1-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %res4 = mul <4 x i64> %xa4, %xb4
-; AVX1-NEXT: Cost Model: Found costs of RThru:4 CodeSize:1 Lat:1 SizeLat:1 for: %res8 = mul <8 x i64> %xa8, %xb8
-; AVX1-NEXT: Cost Model: Found costs of RThru:8 CodeSize:1 Lat:1 SizeLat:1 for: %res16 = mul <16 x i64> %xa16, %xb16
-; AVX1-NEXT: Cost Model: Found costs of RThru:16 CodeSize:1 Lat:1 SizeLat:1 for: %res32 = mul <32 x i64> %xa32, %xb32
-; AVX1-NEXT: Cost Model: Found costs of RThru:32 CodeSize:1 Lat:1 SizeLat:1 for: %res64 = mul <64 x i64> %xa64, %xb64
+; AVX1-NEXT: Cost Model: Found costs of RThru:3 CodeSize:5 Lat:5 SizeLat:6 for: %res4 = mul <4 x i64> %xa4, %xb4
+; AVX1-NEXT: Cost Model: Found costs of RThru:6 CodeSize:10 Lat:10 SizeLat:12 for: %res8 = mul <8 x i64> %xa8, %xb8
+; AVX1-NEXT: Cost Model: Found costs of RThru:12 CodeSize:20 Lat:20 SizeLat:24 for: %res16 = mul <16 x i64> %xa16, %xb16
+; AVX1-NEXT: Cost Model: Found costs of RThru:24 CodeSize:40 Lat:40 SizeLat:48 for: %res32 = mul <32 x i64> %xa32, %xb32
+; AVX1-NEXT: Cost Model: Found costs of RThru:48 CodeSize:80 Lat:80 SizeLat:96 for: %res64 = mul <64 x i64> %xa64, %xb64
; AVX1-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
; AVX2-LABEL: 'mul_zext_vXi16'
@@ -1643,11 +1643,11 @@ define void @mul_zext_vXi32(<2 x i32> %a2, <2 x i32> %b2, <4 x i32> %a4, <4 x i3
; AVX1-NEXT: Cost Model: Found costs of RThru:56 CodeSize:1 Lat:1 SizeLat:1 for: %xa64 = zext <64 x i32> %a64 to <64 x i64>
; AVX1-NEXT: Cost Model: Found costs of RThru:56 CodeSize:1 Lat:1 SizeLat:1 for: %xb64 = zext <64 x i32> %b64 to <64 x i64>
; AVX1-NEXT: Cost Model: Found costs of RThru:1 CodeSize:1 Lat:5 SizeLat:1 for: %res2 = mul <2 x i64> %xa2, %xb2
-; AVX1-NEXT: Cost Model: Found costs of RThru:2 CodeSize:1 Lat:1 SizeLat:1 for: %res4 = mul <4 x i64> %xa4, %xb4
-; AVX1-NEXT: Cost Model: Found costs of RThru:4 CodeSize:1 Lat:1 SizeLat:1 for: %res8 = mul <8 x i64> %xa8, %xb8
-; AVX1-NEXT: Cost Model: Found costs of RThru:8 CodeSize:1 Lat:1 SizeLat:1 for: %res16 = mul <16 x i64> %xa16, %xb16
-; AVX1-NEXT: Cost Model: Found costs of RThru:16 CodeSize:1 Lat:1 SizeLat:1 for: %res32 = mul <32 x i64> %xa32, %xb32
-; AVX1-NEXT: Cost Model: Found costs of RThru:32 CodeSize:1 Lat:1 SizeLat:1 for: %res64 = mul <64 x i64> %xa64, %xb64
+; AVX1-NEXT: Cost Model: Found costs of RThru:3 CodeSize:5 Lat:5 SizeLat:6 for: %res4 = mul <4 x i64> %xa4, %xb4
+; AVX1-NEXT: Cost Model: Found costs of RThru:6 CodeSize:10 Lat:10 SizeLat:12 for: %res8 = mul <8 x i64> %xa8, %xb8
+; AVX1-NEXT: Cost Model: Found costs of RThru:12 CodeSize:20 Lat:20 SizeLat:24 for: %res16 = mul <16 x i64> %xa16, %xb16
+; AVX1-NEXT: Cost Model: Found costs of RThru:24 CodeSize:40 Lat:40 SizeLat:48 for: %res32 = mul <32 x i64> %xa32, %xb32
+; AVX1-NEXT: Cost Model: Found costs of RThru:48 CodeSize:80 Lat:80 SizeLat:96 for: %res64 = mul <64 x i64> %xa64, %xb64
; AVX1-NEXT: Cost Model: Found costs of RThru:0 CodeSize:1 Lat:1 SizeLat:1 for: ret void
;
; AVX2-LABEL: 'mul_zext_vXi32'
|
You can test this locally with the following command:git-clang-format --diff origin/main HEAD --extensions cpp -- llvm/lib/Target/X86/X86TargetTransformInfo.cpp
View the diff from clang-format here.diff --git a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
index 3d8d0a236..9d5fc327a 100644
--- a/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
+++ b/llvm/lib/Target/X86/X86TargetTransformInfo.cpp
@@ -1196,101 +1196,141 @@ InstructionCost X86TTIImpl::getArithmeticInstrCost(
return LT.first * *KindCost;
static const CostKindTblEntry AVX1CostTable[] = {
- // We don't have to scalarize unsupported ops. We can issue two half-sized
- // operations and we only need to extract the upper YMM half.
- // Two ops + 1 extract + 1 insert = 4.
- { ISD::MUL, MVT::v32i8, { 10, 11, 18, 19 } }, // pmaddubsw + split
- { ISD::MUL, MVT::v16i8, { 5, 6, 8, 12 } }, // 2*pmaddubsw/3*and/psllw/or
- { ISD::MUL, MVT::v16i16, { 4, 8, 5, 6 } }, // pmullw + split
- { ISD::MUL, MVT::v8i32, { 5, 8, 5, 10 } }, // pmulld + split
- { ISD::MUL, MVT::v4i32, { 2, 5, 1, 3 } }, // pmulld
- { ISD::MUL, MVT::v4i64, { 12, 15, 19, 20 } },
-
- { X86ISD::PMULUDQ, MVT::v4i64, { 3, 5, 5, 6 } }, // pmuludq + split
-
- { ISD::AND, MVT::v32i8, { 1, 1, 1, 2 } }, // vandps
- { ISD::AND, MVT::v16i16, { 1, 1, 1, 2 } }, // vandps
- { ISD::AND, MVT::v8i32, { 1, 1, 1, 2 } }, // vandps
- { ISD::AND, MVT::v4i64, { 1, 1, 1, 2 } }, // vandps
-
- { ISD::OR, MVT::v32i8, { 1, 1, 1, 2 } }, // vorps
- { ISD::OR, MVT::v16i16, { 1, 1, 1, 2 } }, // vorps
- { ISD::OR, MVT::v8i32, { 1, 1, 1, 2 } }, // vorps
- { ISD::OR, MVT::v4i64, { 1, 1, 1, 2 } }, // vorps
-
- { ISD::XOR, MVT::v32i8, { 1, 1, 1, 2 } }, // vxorps
- { ISD::XOR, MVT::v16i16, { 1, 1, 1, 2 } }, // vxorps
- { ISD::XOR, MVT::v8i32, { 1, 1, 1, 2 } }, // vxorps
- { ISD::XOR, MVT::v4i64, { 1, 1, 1, 2 } }, // vxorps
-
- { ISD::SUB, MVT::v32i8, { 4, 2, 5, 6 } }, // psubb + split
- { ISD::ADD, MVT::v32i8, { 4, 2, 5, 6 } }, // paddb + split
- { ISD::SUB, MVT::v16i16, { 4, 2, 5, 6 } }, // psubw + split
- { ISD::ADD, MVT::v16i16, { 4, 2, 5, 6 } }, // paddw + split
- { ISD::SUB, MVT::v8i32, { 4, 2, 5, 6 } }, // psubd + split
- { ISD::ADD, MVT::v8i32, { 4, 2, 5, 6 } }, // paddd + split
- { ISD::SUB, MVT::v4i64, { 4, 2, 5, 6 } }, // psubq + split
- { ISD::ADD, MVT::v4i64, { 4, 2, 5, 6 } }, // paddq + split
- { ISD::SUB, MVT::v2i64, { 1, 1, 1, 1 } }, // psubq
- { ISD::ADD, MVT::v2i64, { 1, 1, 1, 1 } }, // paddq
-
- { ISD::SHL, MVT::v16i8, { 10, 21,11,17 } }, // pblendvb sequence.
- { ISD::SHL, MVT::v32i8, { 22, 22,27,40 } }, // pblendvb sequence + split.
- { ISD::SHL, MVT::v8i16, { 6, 9,11,11 } }, // pblendvb sequence.
- { ISD::SHL, MVT::v16i16, { 13, 16,24,25 } }, // pblendvb sequence + split.
- { ISD::SHL, MVT::v4i32, { 3, 11, 4, 6 } }, // pslld/paddd/cvttps2dq/pmulld
- { ISD::SHL, MVT::v8i32, { 9, 11,12,17 } }, // pslld/paddd/cvttps2dq/pmulld + split
- { ISD::SHL, MVT::v2i64, { 2, 4, 4, 6 } }, // Shift each lane + blend.
- { ISD::SHL, MVT::v4i64, { 6, 7,11,15 } }, // Shift each lane + blend + split.
-
- { ISD::SRL, MVT::v16i8, { 11, 27,12,18 } }, // pblendvb sequence.
- { ISD::SRL, MVT::v32i8, { 23, 23,30,43 } }, // pblendvb sequence + split.
- { ISD::SRL, MVT::v8i16, { 13, 16,14,22 } }, // pblendvb sequence.
- { ISD::SRL, MVT::v16i16, { 28, 30,31,48 } }, // pblendvb sequence + split.
- { ISD::SRL, MVT::v4i32, { 6, 7,12,16 } }, // Shift each lane + blend.
- { ISD::SRL, MVT::v8i32, { 14, 14,26,34 } }, // Shift each lane + blend + split.
- { ISD::SRL, MVT::v2i64, { 2, 4, 4, 6 } }, // Shift each lane + blend.
- { ISD::SRL, MVT::v4i64, { 6, 7,11,15 } }, // Shift each lane + blend + split.
-
- { ISD::SRA, MVT::v16i8, { 21, 22,24,36 } }, // pblendvb sequence.
- { ISD::SRA, MVT::v32i8, { 44, 45,51,76 } }, // pblendvb sequence + split.
- { ISD::SRA, MVT::v8i16, { 13, 16,14,22 } }, // pblendvb sequence.
- { ISD::SRA, MVT::v16i16, { 28, 30,31,48 } }, // pblendvb sequence + split.
- { ISD::SRA, MVT::v4i32, { 6, 7,12,16 } }, // Shift each lane + blend.
- { ISD::SRA, MVT::v8i32, { 14, 14,26,34 } }, // Shift each lane + blend + split.
- { ISD::SRA, MVT::v2i64, { 5, 6,10,14 } }, // Shift each lane + blend.
- { ISD::SRA, MVT::v4i64, { 12, 12,22,30 } }, // Shift each lane + blend + split.
-
- { ISD::FNEG, MVT::v4f64, { 2, 2, 1, 2 } }, // BTVER2 from http://www.agner.org/
- { ISD::FNEG, MVT::v8f32, { 2, 2, 1, 2 } }, // BTVER2 from http://www.agner.org/
-
- { ISD::FADD, MVT::f64, { 1, 5, 1, 1 } }, // BDVER2 from http://www.agner.org/
- { ISD::FADD, MVT::f32, { 1, 5, 1, 1 } }, // BDVER2 from http://www.agner.org/
- { ISD::FADD, MVT::v2f64, { 1, 5, 1, 1 } }, // BDVER2 from http://www.agner.org/
- { ISD::FADD, MVT::v4f32, { 1, 5, 1, 1 } }, // BDVER2 from http://www.agner.org/
- { ISD::FADD, MVT::v4f64, { 2, 5, 1, 2 } }, // BDVER2 from http://www.agner.org/
- { ISD::FADD, MVT::v8f32, { 2, 5, 1, 2 } }, // BDVER2 from http://www.agner.org/
-
- { ISD::FSUB, MVT::f64, { 1, 5, 1, 1 } }, // BDVER2 from http://www.agner.org/
- { ISD::FSUB, MVT::f32, { 1, 5, 1, 1 } }, // BDVER2 from http://www.agner.org/
- { ISD::FSUB, MVT::v2f64, { 1, 5, 1, 1 } }, // BDVER2 from http://www.agner.org/
- { ISD::FSUB, MVT::v4f32, { 1, 5, 1, 1 } }, // BDVER2 from http://www.agner.org/
- { ISD::FSUB, MVT::v4f64, { 2, 5, 1, 2 } }, // BDVER2 from http://www.agner.org/
- { ISD::FSUB, MVT::v8f32, { 2, 5, 1, 2 } }, // BDVER2 from http://www.agner.org/
-
- { ISD::FMUL, MVT::f64, { 2, 5, 1, 1 } }, // BTVER2 from http://www.agner.org/
- { ISD::FMUL, MVT::f32, { 1, 5, 1, 1 } }, // BTVER2 from http://www.agner.org/
- { ISD::FMUL, MVT::v2f64, { 2, 5, 1, 1 } }, // BTVER2 from http://www.agner.org/
- { ISD::FMUL, MVT::v4f32, { 1, 5, 1, 1 } }, // BTVER2 from http://www.agner.org/
- { ISD::FMUL, MVT::v4f64, { 4, 5, 1, 2 } }, // BTVER2 from http://www.agner.org/
- { ISD::FMUL, MVT::v8f32, { 2, 5, 1, 2 } }, // BTVER2 from http://www.agner.org/
-
- { ISD::FDIV, MVT::f32, { 14, 14, 1, 1 } }, // SNB from http://www.agner.org/
- { ISD::FDIV, MVT::v4f32, { 14, 14, 1, 1 } }, // SNB from http://www.agner.org/
- { ISD::FDIV, MVT::v8f32, { 28, 29, 1, 3 } }, // SNB from http://www.agner.org/
- { ISD::FDIV, MVT::f64, { 22, 22, 1, 1 } }, // SNB from http://www.agner.org/
- { ISD::FDIV, MVT::v2f64, { 22, 22, 1, 1 } }, // SNB from http://www.agner.org/
- { ISD::FDIV, MVT::v4f64, { 44, 45, 1, 3 } }, // SNB from http://www.agner.org/
+ // We don't have to scalarize unsupported ops. We can issue two half-sized
+ // operations and we only need to extract the upper YMM half.
+ // Two ops + 1 extract + 1 insert = 4.
+ {ISD::MUL, MVT::v32i8, {10, 11, 18, 19}}, // pmaddubsw + split
+ {ISD::MUL, MVT::v16i8, {5, 6, 8, 12}}, // 2*pmaddubsw/3*and/psllw/or
+ {ISD::MUL, MVT::v16i16, {4, 8, 5, 6}}, // pmullw + split
+ {ISD::MUL, MVT::v8i32, {5, 8, 5, 10}}, // pmulld + split
+ {ISD::MUL, MVT::v4i32, {2, 5, 1, 3}}, // pmulld
+ {ISD::MUL, MVT::v4i64, {12, 15, 19, 20}},
+
+ {X86ISD::PMULUDQ, MVT::v4i64, {3, 5, 5, 6}}, // pmuludq + split
+
+ {ISD::AND, MVT::v32i8, {1, 1, 1, 2}}, // vandps
+ {ISD::AND, MVT::v16i16, {1, 1, 1, 2}}, // vandps
+ {ISD::AND, MVT::v8i32, {1, 1, 1, 2}}, // vandps
+ {ISD::AND, MVT::v4i64, {1, 1, 1, 2}}, // vandps
+
+ {ISD::OR, MVT::v32i8, {1, 1, 1, 2}}, // vorps
+ {ISD::OR, MVT::v16i16, {1, 1, 1, 2}}, // vorps
+ {ISD::OR, MVT::v8i32, {1, 1, 1, 2}}, // vorps
+ {ISD::OR, MVT::v4i64, {1, 1, 1, 2}}, // vorps
+
+ {ISD::XOR, MVT::v32i8, {1, 1, 1, 2}}, // vxorps
+ {ISD::XOR, MVT::v16i16, {1, 1, 1, 2}}, // vxorps
+ {ISD::XOR, MVT::v8i32, {1, 1, 1, 2}}, // vxorps
+ {ISD::XOR, MVT::v4i64, {1, 1, 1, 2}}, // vxorps
+
+ {ISD::SUB, MVT::v32i8, {4, 2, 5, 6}}, // psubb + split
+ {ISD::ADD, MVT::v32i8, {4, 2, 5, 6}}, // paddb + split
+ {ISD::SUB, MVT::v16i16, {4, 2, 5, 6}}, // psubw + split
+ {ISD::ADD, MVT::v16i16, {4, 2, 5, 6}}, // paddw + split
+ {ISD::SUB, MVT::v8i32, {4, 2, 5, 6}}, // psubd + split
+ {ISD::ADD, MVT::v8i32, {4, 2, 5, 6}}, // paddd + split
+ {ISD::SUB, MVT::v4i64, {4, 2, 5, 6}}, // psubq + split
+ {ISD::ADD, MVT::v4i64, {4, 2, 5, 6}}, // paddq + split
+ {ISD::SUB, MVT::v2i64, {1, 1, 1, 1}}, // psubq
+ {ISD::ADD, MVT::v2i64, {1, 1, 1, 1}}, // paddq
+
+ {ISD::SHL, MVT::v16i8, {10, 21, 11, 17}}, // pblendvb sequence.
+ {ISD::SHL, MVT::v32i8, {22, 22, 27, 40}}, // pblendvb sequence + split.
+ {ISD::SHL, MVT::v8i16, {6, 9, 11, 11}}, // pblendvb sequence.
+ {ISD::SHL, MVT::v16i16, {13, 16, 24, 25}}, // pblendvb sequence + split.
+ {ISD::SHL, MVT::v4i32, {3, 11, 4, 6}}, // pslld/paddd/cvttps2dq/pmulld
+ {ISD::SHL,
+ MVT::v8i32,
+ {9, 11, 12, 17}}, // pslld/paddd/cvttps2dq/pmulld + split
+ {ISD::SHL, MVT::v2i64, {2, 4, 4, 6}}, // Shift each lane + blend.
+ {ISD::SHL,
+ MVT::v4i64,
+ {6, 7, 11, 15}}, // Shift each lane + blend + split.
+
+ {ISD::SRL, MVT::v16i8, {11, 27, 12, 18}}, // pblendvb sequence.
+ {ISD::SRL, MVT::v32i8, {23, 23, 30, 43}}, // pblendvb sequence + split.
+ {ISD::SRL, MVT::v8i16, {13, 16, 14, 22}}, // pblendvb sequence.
+ {ISD::SRL, MVT::v16i16, {28, 30, 31, 48}}, // pblendvb sequence + split.
+ {ISD::SRL, MVT::v4i32, {6, 7, 12, 16}}, // Shift each lane + blend.
+ {ISD::SRL,
+ MVT::v8i32,
+ {14, 14, 26, 34}}, // Shift each lane + blend + split.
+ {ISD::SRL, MVT::v2i64, {2, 4, 4, 6}}, // Shift each lane + blend.
+ {ISD::SRL,
+ MVT::v4i64,
+ {6, 7, 11, 15}}, // Shift each lane + blend + split.
+
+ {ISD::SRA, MVT::v16i8, {21, 22, 24, 36}}, // pblendvb sequence.
+ {ISD::SRA, MVT::v32i8, {44, 45, 51, 76}}, // pblendvb sequence + split.
+ {ISD::SRA, MVT::v8i16, {13, 16, 14, 22}}, // pblendvb sequence.
+ {ISD::SRA, MVT::v16i16, {28, 30, 31, 48}}, // pblendvb sequence + split.
+ {ISD::SRA, MVT::v4i32, {6, 7, 12, 16}}, // Shift each lane + blend.
+ {ISD::SRA,
+ MVT::v8i32,
+ {14, 14, 26, 34}}, // Shift each lane + blend + split.
+ {ISD::SRA, MVT::v2i64, {5, 6, 10, 14}}, // Shift each lane + blend.
+ {ISD::SRA,
+ MVT::v4i64,
+ {12, 12, 22, 30}}, // Shift each lane + blend + split.
+
+ {ISD::FNEG,
+ MVT::v4f64,
+ {2, 2, 1, 2}}, // BTVER2 from http://www.agner.org/
+ {ISD::FNEG,
+ MVT::v8f32,
+ {2, 2, 1, 2}}, // BTVER2 from http://www.agner.org/
+
+ {ISD::FADD, MVT::f64, {1, 5, 1, 1}}, // BDVER2 from http://www.agner.org/
+ {ISD::FADD, MVT::f32, {1, 5, 1, 1}}, // BDVER2 from http://www.agner.org/
+ {ISD::FADD,
+ MVT::v2f64,
+ {1, 5, 1, 1}}, // BDVER2 from http://www.agner.org/
+ {ISD::FADD,
+ MVT::v4f32,
+ {1, 5, 1, 1}}, // BDVER2 from http://www.agner.org/
+ {ISD::FADD,
+ MVT::v4f64,
+ {2, 5, 1, 2}}, // BDVER2 from http://www.agner.org/
+ {ISD::FADD,
+ MVT::v8f32,
+ {2, 5, 1, 2}}, // BDVER2 from http://www.agner.org/
+
+ {ISD::FSUB, MVT::f64, {1, 5, 1, 1}}, // BDVER2 from http://www.agner.org/
+ {ISD::FSUB, MVT::f32, {1, 5, 1, 1}}, // BDVER2 from http://www.agner.org/
+ {ISD::FSUB,
+ MVT::v2f64,
+ {1, 5, 1, 1}}, // BDVER2 from http://www.agner.org/
+ {ISD::FSUB,
+ MVT::v4f32,
+ {1, 5, 1, 1}}, // BDVER2 from http://www.agner.org/
+ {ISD::FSUB,
+ MVT::v4f64,
+ {2, 5, 1, 2}}, // BDVER2 from http://www.agner.org/
+ {ISD::FSUB,
+ MVT::v8f32,
+ {2, 5, 1, 2}}, // BDVER2 from http://www.agner.org/
+
+ {ISD::FMUL, MVT::f64, {2, 5, 1, 1}}, // BTVER2 from http://www.agner.org/
+ {ISD::FMUL, MVT::f32, {1, 5, 1, 1}}, // BTVER2 from http://www.agner.org/
+ {ISD::FMUL,
+ MVT::v2f64,
+ {2, 5, 1, 1}}, // BTVER2 from http://www.agner.org/
+ {ISD::FMUL,
+ MVT::v4f32,
+ {1, 5, 1, 1}}, // BTVER2 from http://www.agner.org/
+ {ISD::FMUL,
+ MVT::v4f64,
+ {4, 5, 1, 2}}, // BTVER2 from http://www.agner.org/
+ {ISD::FMUL,
+ MVT::v8f32,
+ {2, 5, 1, 2}}, // BTVER2 from http://www.agner.org/
+
+ {ISD::FDIV, MVT::f32, {14, 14, 1, 1}}, // SNB from http://www.agner.org/
+ {ISD::FDIV, MVT::v4f32, {14, 14, 1, 1}}, // SNB from http://www.agner.org/
+ {ISD::FDIV, MVT::v8f32, {28, 29, 1, 3}}, // SNB from http://www.agner.org/
+ {ISD::FDIV, MVT::f64, {22, 22, 1, 1}}, // SNB from http://www.agner.org/
+ {ISD::FDIV, MVT::v2f64, {22, 22, 1, 1}}, // SNB from http://www.agner.org/
+ {ISD::FDIV, MVT::v4f64, {44, 45, 1, 3}}, // SNB from http://www.agner.org/
};
if (ST->hasAVX())
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/181/builds/27438 Here is the relevant piece of the build log for the reference
|
We need to account for AVX1 targets splitting 256-bit PMULUDQ style patterns - we included a AVX2 cost, but missed out AVX1 despite v4i64 being a legal type
Noticed while working on #156568