-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[BasicTTI] When costing a scalarized cast, use distinct src and dest types #109325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…types When we fallback on scalarizing a vector cast, BasicTTI was assuming that the type being extracted was the same as that being inserted. This is not true when scalarizing a e.g. truncate, zext, or sext. I have made no attempt to confirm the test diffs are profitable in terms of e.g. vectorization result for the various targets impacted.
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-llvm-analysis Author: Philip Reames (preames) ChangesWhen we fallback on scalarizing a vector cast, BasicTTI was assuming that the type being extracted was the same as that being inserted. This is not true when scalarizing a e.g. truncate, zext, or sext. I have made no attempt to confirm the test diffs are profitable in terms of e.g. vectorization result for the various targets impacted. Patch is 975.24 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/109325.diff 19 Files Affected:
diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 7198e134a2d262..4cc4f6704c155d 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -1186,7 +1186,9 @@ class BasicTTIImplBase : public TargetTransformInfoImplCRTPBase<T> {
// Return the cost of multiple scalar invocation plus the cost of
// inserting and extracting the values.
- return getScalarizationOverhead(DstVTy, /*Insert*/ true, /*Extract*/ true,
+ return getScalarizationOverhead(SrcVTy, /*Insert*/ false, /*Extract*/ true,
+ CostKind) +
+ getScalarizationOverhead(DstVTy, /*Insert*/ true, /*Extract*/ false,
CostKind) +
Num * Cost;
}
diff --git a/llvm/test/Analysis/CostModel/AArch64/cast.ll b/llvm/test/Analysis/CostModel/AArch64/cast.ll
index fa778864ae978f..01e9d3483fc167 100644
--- a/llvm/test/Analysis/CostModel/AArch64/cast.ll
+++ b/llvm/test/Analysis/CostModel/AArch64/cast.ll
@@ -937,8 +937,8 @@ define i32 @casts_no_users() {
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r97 = fptosi <2 x float> undef to <2 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r98 = fptoui <2 x float> undef to <2 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r99 = fptosi <2 x float> undef to <2 x i64>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r100 = fptoui <2 x double> undef to <2 x i1>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r101 = fptosi <2 x double> undef to <2 x i1>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r100 = fptoui <2 x double> undef to <2 x i1>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r101 = fptosi <2 x double> undef to <2 x i1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r102 = fptoui <2 x double> undef to <2 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r103 = fptosi <2 x double> undef to <2 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r104 = fptoui <2 x double> undef to <2 x i16>
@@ -947,8 +947,8 @@ define i32 @casts_no_users() {
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r107 = fptosi <2 x double> undef to <2 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r108 = fptoui <2 x double> undef to <2 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r109 = fptosi <2 x double> undef to <2 x i64>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r110 = fptoui <4 x float> undef to <4 x i1>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r111 = fptosi <4 x float> undef to <4 x i1>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %r110 = fptoui <4 x float> undef to <4 x i1>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %r111 = fptosi <4 x float> undef to <4 x i1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r112 = fptoui <4 x float> undef to <4 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r113 = fptosi <4 x float> undef to <4 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r114 = fptoui <4 x float> undef to <4 x i16>
@@ -957,8 +957,8 @@ define i32 @casts_no_users() {
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r117 = fptosi <4 x float> undef to <4 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r118 = fptoui <4 x float> undef to <4 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r119 = fptosi <4 x float> undef to <4 x i64>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %r120 = fptoui <4 x double> undef to <4 x i1>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %r121 = fptosi <4 x double> undef to <4 x i1>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %r120 = fptoui <4 x double> undef to <4 x i1>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %r121 = fptosi <4 x double> undef to <4 x i1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r122 = fptoui <4 x double> undef to <4 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r123 = fptosi <4 x double> undef to <4 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r124 = fptoui <4 x double> undef to <4 x i16>
@@ -967,8 +967,8 @@ define i32 @casts_no_users() {
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r127 = fptosi <4 x double> undef to <4 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r128 = fptoui <4 x double> undef to <4 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r129 = fptosi <4 x double> undef to <4 x i64>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 41 for instruction: %r130 = fptoui <8 x float> undef to <8 x i1>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 41 for instruction: %r131 = fptosi <8 x float> undef to <8 x i1>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 37 for instruction: %r130 = fptoui <8 x float> undef to <8 x i1>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 37 for instruction: %r131 = fptosi <8 x float> undef to <8 x i1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r132 = fptoui <8 x float> undef to <8 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r133 = fptosi <8 x float> undef to <8 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %r134 = fptoui <8 x float> undef to <8 x i16>
@@ -977,8 +977,8 @@ define i32 @casts_no_users() {
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r137 = fptosi <8 x float> undef to <8 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r138 = fptoui <8 x float> undef to <8 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r139 = fptosi <8 x float> undef to <8 x i64>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 43 for instruction: %r140 = fptoui <8 x double> undef to <8 x i1>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 43 for instruction: %r141 = fptosi <8 x double> undef to <8 x i1>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 35 for instruction: %r140 = fptoui <8 x double> undef to <8 x i1>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 35 for instruction: %r141 = fptosi <8 x double> undef to <8 x i1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %r142 = fptoui <8 x double> undef to <8 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %r143 = fptosi <8 x double> undef to <8 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %r144 = fptoui <8 x double> undef to <8 x i16>
@@ -987,8 +987,8 @@ define i32 @casts_no_users() {
; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r147 = fptosi <8 x double> undef to <8 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r148 = fptoui <8 x double> undef to <8 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r149 = fptosi <8 x double> undef to <8 x i64>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 83 for instruction: %r150 = fptoui <16 x float> undef to <16 x i1>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 83 for instruction: %r151 = fptosi <16 x float> undef to <16 x i1>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 75 for instruction: %r150 = fptoui <16 x float> undef to <16 x i1>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 75 for instruction: %r151 = fptosi <16 x float> undef to <16 x i1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %r152 = fptoui <16 x float> undef to <16 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %r153 = fptosi <16 x float> undef to <16 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r154 = fptoui <16 x float> undef to <16 x i16>
@@ -997,8 +997,8 @@ define i32 @casts_no_users() {
; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %r157 = fptosi <16 x float> undef to <16 x i32>
; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r158 = fptoui <16 x float> undef to <16 x i64>
; CHECK-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r159 = fptosi <16 x float> undef to <16 x i64>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 87 for instruction: %r160 = fptoui <16 x double> undef to <16 x i1>
-; CHECK-NEXT: Cost Model: Found an estimated cost of 87 for instruction: %r161 = fptosi <16 x double> undef to <16 x i1>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 71 for instruction: %r160 = fptoui <16 x double> undef to <16 x i1>
+; CHECK-NEXT: Cost Model: Found an estimated cost of 71 for instruction: %r161 = fptosi <16 x double> undef to <16 x i1>
; CHECK-NEXT: Cost Model: Found an estimated cost of 23 for instruction: %r162 = fptoui <16 x double> undef to <16 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 23 for instruction: %r163 = fptosi <16 x double> undef to <16 x i8>
; CHECK-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %r164 = fptoui <16 x double> undef to <16 x i16>
@@ -1363,8 +1363,8 @@ define i32 @casts_no_users() {
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r97 = fptosi <2 x float> undef to <2 x i32>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r98 = fptoui <2 x float> undef to <2 x i64>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r99 = fptosi <2 x float> undef to <2 x i64>
-; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r100 = fptoui <2 x double> undef to <2 x i1>
-; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r101 = fptosi <2 x double> undef to <2 x i1>
+; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r100 = fptoui <2 x double> undef to <2 x i1>
+; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r101 = fptosi <2 x double> undef to <2 x i1>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r102 = fptoui <2 x double> undef to <2 x i8>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r103 = fptosi <2 x double> undef to <2 x i8>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r104 = fptoui <2 x double> undef to <2 x i16>
@@ -1373,8 +1373,8 @@ define i32 @casts_no_users() {
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r107 = fptosi <2 x double> undef to <2 x i32>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r108 = fptoui <2 x double> undef to <2 x i64>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r109 = fptosi <2 x double> undef to <2 x i64>
-; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r110 = fptoui <4 x float> undef to <4 x i1>
-; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r111 = fptosi <4 x float> undef to <4 x i1>
+; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %r110 = fptoui <4 x float> undef to <4 x i1>
+; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %r111 = fptosi <4 x float> undef to <4 x i1>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r112 = fptoui <4 x float> undef to <4 x i8>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r113 = fptosi <4 x float> undef to <4 x i8>
; FIXED-MIN-256-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r114 = fptoui <4 x float> undef to <4 x i16>
@@ -1576,8 +1576,8 @@ define i32 @casts_no_users() {
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r97 = fptosi <2 x float> undef to <2 x i32>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r98 = fptoui <2 x float> undef to <2 x i64>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r99 = fptosi <2 x float> undef to <2 x i64>
-; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r100 = fptoui <2 x double> undef to <2 x i1>
-; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r101 = fptosi <2 x double> undef to <2 x i1>
+; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r100 = fptoui <2 x double> undef to <2 x i1>
+; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r101 = fptosi <2 x double> undef to <2 x i1>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r102 = fptoui <2 x double> undef to <2 x i8>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r103 = fptosi <2 x double> undef to <2 x i8>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r104 = fptoui <2 x double> undef to <2 x i16>
@@ -1586,8 +1586,8 @@ define i32 @casts_no_users() {
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r107 = fptosi <2 x double> undef to <2 x i32>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r108 = fptoui <2 x double> undef to <2 x i64>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r109 = fptosi <2 x double> undef to <2 x i64>
-; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r110 = fptoui <4 x float> undef to <4 x i1>
-; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r111 = fptosi <4 x float> undef to <4 x i1>
+; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %r110 = fptoui <4 x float> undef to <4 x i1>
+; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %r111 = fptosi <4 x float> undef to <4 x i1>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r112 = fptoui <4 x float> undef to <4 x i8>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r113 = fptosi <4 x float> undef to <4 x i8>
; FIXED-MIN-2048-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %r114 = fptoui <4 x float> undef to <4 x i16>
@@ -3292,38 +3292,38 @@ define void @fp16cast() {
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r95 = fptosi <2 x half> undef to <2 x i16>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r96 = fptoui <2 x half> undef to <2 x i32>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r97 = fptosi <2 x half> undef to <2 x i32>
-; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r98 = fptoui <2 x half> undef to <2 x i64>
-; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 10 for instruction: %r99 = fptosi <2 x half> undef to <2 x i64>
+; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r98 = fptoui <2 x half> undef to <2 x i64>
+; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %r99 = fptosi <2 x half> undef to <2 x i64>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r110 = fptoui <4 x half> undef to <4 x i1>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r111 = fptosi <4 x half> undef to <4 x i1>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r112 = fptoui <4 x half> undef to <4 x i8>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r113 = fptosi <4 x half> undef to <4 x i8>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r114 = fptoui <4 x half> undef to <4 x i16>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r115 = fptosi <4 x half> undef to <4 x i16>
-; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r116 = fptoui <4 x half> undef to <4 x i32>
-; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %r117 = fptosi <4 x half> undef to <4 x i32>
-; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %r118 = fptoui <4 x half> undef to <4 x i64>
-; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %r119 = fptosi <4 x half> undef to <4 x i64>
-; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r130 = fptoui <8 x half> undef to <8 x i1>
-; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r131 = fptosi <8 x half> undef to <8 x i1>
-; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r132 = fptoui <8 x half> undef to <8 x i8>
-; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 40 for instruction: %r133 = fptosi <8 x half> undef to <8 x i8>
+; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %r116 = fptoui <4 x half> undef to <4 x i32>
+; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %r117 = fptosi <4 x half> undef to <4 x i32>
+; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %r118 = fptoui <4 x half> undef to <4 x i64>
+; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %r119 = fptosi <4 x half> undef to <4 x i64>
+; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 38 for instruction: %r130 = fptoui <8 x half> undef to <8 x i1>
+; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 38 for instruction: %r131 = fptosi <8 x half> undef to <8 x i1>
+; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 38 for instruction: %r132 = fptoui <8 x half> undef to <8 x i8>
+; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 38 for instruction: %r133 = fptosi <8 x half> undef to <8 x i8>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r134 = fptoui <8 x half> undef to <8 x i16>
; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %r135 = fptosi <8 x half> undef to <8 x i16>
-; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 41 for instruction: %r136 = fptoui <8 x half> undef to <8 x i32>
-; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 41 for instruction: %r137 = fptosi <8 x half> undef to <8 x i32>
-; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 43 for instruction: %r138 = fptoui <8 x half> undef to <8 x i64>
-; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 43 for instruction: %r139 = fptosi <8 x half> undef to <8 x i64>
-; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 81 for instruction: %r150 = fptoui <16 x half> undef to <16 x i1>
-; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 81 for instruction: %r151 = fptosi <16 x half> undef to <16 x i1>
-; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 81 for instruction: %r152 = fptoui <16 x half> undef to <16 x i8>
-; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 81 for instruction: %r153 = fptosi <16 x half> undef to <16 x i8>
+; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 37 for instruction: %r136 = fptoui <8 x half> undef to <8 x i32>
+; CHECK-NOFP16-NEXT: Cost Model: Found an estimated cost of 37 for instruction: %r137 = fptosi <8 x half> undef to <8...
[truncated]
|
You can test this locally with the following command:git-clang-format --diff f3f3883f4b9d15770a5ce49956ed4425c71ad69f fc36a5fd34fa0caa6dba3ff89e9a8e599acf6aa0 --extensions h -- llvm/include/llvm/CodeGen/BasicTTIImpl.h View the diff from clang-format here.diff --git a/llvm/include/llvm/CodeGen/BasicTTIImpl.h b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
index 26f4a6dfaa..6bfea6b523 100644
--- a/llvm/include/llvm/CodeGen/BasicTTIImpl.h
+++ b/llvm/include/llvm/CodeGen/BasicTTIImpl.h
@@ -1186,10 +1186,10 @@ public:
// Return the cost of multiple scalar invocation plus the cost of
// inserting and extracting the values.
- return getScalarizationOverhead(SrcVTy, /*Insert=*/ false,
- /*Extract=*/ true, CostKind) +
- getScalarizationOverhead(DstVTy, /*Insert=*/ true,
- /*Extract=*/ false, CostKind) +
+ return getScalarizationOverhead(SrcVTy, /*Insert=*/false,
+ /*Extract=*/true, CostKind) +
+ getScalarizationOverhead(DstVTy, /*Insert=*/true,
+ /*Extract=*/false, CostKind) +
Num * Cost;
}
|
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
No longer actively working on this. |
When we fallback on scalarizing a vector cast, BasicTTI was assuming that the type being extracted was the same as that being inserted. This is not true when scalarizing a e.g. truncate, zext, or sext.
I have made no attempt to confirm the test diffs are profitable in terms of e.g. vectorization result for the various targets impacted.