[SROA] Use shufflevector instead of select for vector blend#175756
[SROA] Use shufflevector instead of select for vector blend#175756
Conversation
A patch from May 2013, commit 1e21191, changed SROA into using a select instruction to perform vector blend. Idea was that using a select was the canonical form, and that we optimize select better than shufflevector. This patch is changing SROA back into using shufflevector instead of select when doing the blend (inserting a smaller vector into a larger vector). Motivation: Nowadays InstCombine is canonicalizing this kind of vector blends, using vector select instructions, into a shufflevector instruction. So it is assumed that shufflevector is the canonical form now. It is also assumed that we are better at optimizing shufflevector today, compared to back in 2013. Commit f26710d includes links to a discussion from 2016 (https://discourse.llvm.org/t/ir-canonicalization-vector-select-or-shufflevector/42257/6) about picking shufflevector as the canonical form.
|
Stumbled upon this when looking at some downstream test diffs after #175199 With the vector select in the input (not running instcombine after sroa) we now get a freeze in the selection DAG when simplifying the select. And then we ended up with worse code compared to when using shufflevector. This fixes the problem at the root (not picking the vector select in the first place). |
| Mask.reserve(cast<FixedVectorType>(VecTy)->getNumElements()); | ||
| for (unsigned i = 0; i != cast<FixedVectorType>(VecTy)->getNumElements(); ++i) | ||
| Mask.reserve(NumElements); | ||
| for (unsigned i = 0; i != NumElements; ++i) |
There was a problem hiding this comment.
nit: might as well fix the casing on i, it can be Idx if I is taken.
|
@llvm/pr-subscribers-llvm-transforms Author: Björn Pettersson (bjope) ChangesA patch from May 2013, commit 1e21191, changed SROA into using a select instruction to perform vector blend. Idea was that using a select was the canonical form, and that we optimize select better than shufflevector. This patch is changing SROA back into using shufflevector instead of select when doing the blend (inserting a smaller vector into a larger vector). Motivation: Commit f26710d includes links to a discussion from 2016 (https://discourse.llvm.org/t/ir-canonicalization-vector-select-or-shufflevector/42257/6) about picking shufflevector as the canonical form. Patch is 24.78 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/175756.diff 5 Files Affected:
diff --git a/llvm/lib/Transforms/Scalar/SROA.cpp b/llvm/lib/Transforms/Scalar/SROA.cpp
index 43a6269c357dd..9e1f695d4fd97 100644
--- a/llvm/lib/Transforms/Scalar/SROA.cpp
+++ b/llvm/lib/Transforms/Scalar/SROA.cpp
@@ -2684,39 +2684,37 @@ static Value *insertVector(IRBuilderTy &IRB, Value *Old, Value *V,
return V;
}
- assert(cast<FixedVectorType>(Ty)->getNumElements() <=
- cast<FixedVectorType>(VecTy)->getNumElements() &&
- "Too many elements!");
- if (cast<FixedVectorType>(Ty)->getNumElements() ==
- cast<FixedVectorType>(VecTy)->getNumElements()) {
+ unsigned NumSubElements = cast<FixedVectorType>(Ty)->getNumElements();
+ unsigned NumElements = cast<FixedVectorType>(VecTy)->getNumElements();
+
+ assert(NumSubElements <= NumElements && "Too many elements!");
+ if (NumSubElements == NumElements) {
assert(V->getType() == VecTy && "Vector type mismatch");
return V;
}
- unsigned EndIndex = BeginIndex + cast<FixedVectorType>(Ty)->getNumElements();
+ unsigned EndIndex = BeginIndex + NumSubElements;
// When inserting a smaller vector into the larger to store, we first
// use a shuffle vector to widen it with undef elements, and then
// a second shuffle vector to select between the loaded vector and the
// incoming vector.
SmallVector<int, 8> Mask;
- Mask.reserve(cast<FixedVectorType>(VecTy)->getNumElements());
- for (unsigned i = 0; i != cast<FixedVectorType>(VecTy)->getNumElements(); ++i)
- if (i >= BeginIndex && i < EndIndex)
- Mask.push_back(i - BeginIndex);
+ Mask.reserve(NumElements);
+ for (unsigned Idx = 0; Idx != NumElements; ++Idx)
+ if (Idx >= BeginIndex && Idx < EndIndex)
+ Mask.push_back(Idx - BeginIndex);
else
Mask.push_back(-1);
V = IRB.CreateShuffleVector(V, Mask, Name + ".expand");
LLVM_DEBUG(dbgs() << " shuffle: " << *V << "\n");
- SmallVector<Constant *, 8> Mask2;
- Mask2.reserve(cast<FixedVectorType>(VecTy)->getNumElements());
- for (unsigned i = 0; i != cast<FixedVectorType>(VecTy)->getNumElements(); ++i)
- Mask2.push_back(IRB.getInt1(i >= BeginIndex && i < EndIndex));
-
- // No profiling support for vector selects.
- V = IRB.CreateSelectWithUnknownProfile(ConstantVector::get(Mask2), V, Old,
- DEBUG_TYPE, Name + "blend");
-
+ Mask.clear();
+ for (unsigned Idx = 0; Idx != NumElements; ++Idx)
+ if (Idx >= BeginIndex && Idx < EndIndex)
+ Mask.push_back(Idx);
+ else
+ Mask.push_back(Idx + NumElements);
+ V = IRB.CreateShuffleVector(V, Old, Mask, Name + "blend");
LLVM_DEBUG(dbgs() << " blend: " << *V << "\n");
return V;
}
diff --git a/llvm/test/Transforms/SROA/scalable-vectors-with-known-vscale.ll b/llvm/test/Transforms/SROA/scalable-vectors-with-known-vscale.ll
index 85715e406e065..f0ecd07d037e0 100644
--- a/llvm/test/Transforms/SROA/scalable-vectors-with-known-vscale.ll
+++ b/llvm/test/Transforms/SROA/scalable-vectors-with-known-vscale.ll
@@ -47,7 +47,7 @@ define <vscale x 4 x i32> @cast_alloca_to_svint32_t(<vscale x 4 x i32> %type.coe
; CHECK-LABEL: @cast_alloca_to_svint32_t(
; CHECK-NEXT: [[TMP1:%.*]] = call <4 x i32> @llvm.vector.extract.v4i32.nxv4i32(<vscale x 4 x i32> [[TYPE_COERCE:%.*]], i64 0)
; CHECK-NEXT: [[TYPE_0_VEC_EXPAND:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
-; CHECK-NEXT: [[TYPE_0_VECBLEND:%.*]] = select <16 x i1> <i1 true, i1 true, i1 true, i1 true, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false, i1 false>, <16 x i32> [[TYPE_0_VEC_EXPAND]], <16 x i32> undef
+; CHECK-NEXT: [[TYPE_0_VECBLEND:%.*]] = shufflevector <16 x i32> [[TYPE_0_VEC_EXPAND]], <16 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
; CHECK-NEXT: [[TYPE_ADDR_0_VEC_EXTRACT:%.*]] = shufflevector <16 x i32> [[TYPE_0_VECBLEND]], <16 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP2:%.*]] = call <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v4i32(<vscale x 4 x i32> poison, <4 x i32> [[TYPE_ADDR_0_VEC_EXTRACT]], i64 0)
; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
diff --git a/llvm/test/Transforms/SROA/slice-width.ll b/llvm/test/Transforms/SROA/slice-width.ll
index 3b77e49e78358..0a01451952ab2 100644
--- a/llvm/test/Transforms/SROA/slice-width.ll
+++ b/llvm/test/Transforms/SROA/slice-width.ll
@@ -101,7 +101,7 @@ define i32 @memcpy_vec3float_widening(ptr %x) !prof !0 {
; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP1_SROA_0_0_COPYLOAD:%.*]] = load <3 x float>, ptr [[X:%.*]], align 4
; CHECK-NEXT: [[TMP1_SROA_0_0_VEC_EXPAND:%.*]] = shufflevector <3 x float> [[TMP1_SROA_0_0_COPYLOAD]], <3 x float> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 poison>
-; CHECK-NEXT: [[TMP1_SROA_0_0_VECBLEND:%.*]] = select <4 x i1> <i1 true, i1 true, i1 true, i1 false>, <4 x float> [[TMP1_SROA_0_0_VEC_EXPAND]], <4 x float> undef, !prof [[PROF1:![0-9]+]]
+; CHECK-NEXT: [[TMP1_SROA_0_0_VECBLEND:%.*]] = shufflevector <4 x float> [[TMP1_SROA_0_0_VEC_EXPAND]], <4 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 7>
; CHECK-NEXT: [[TMP2:%.*]] = alloca [[S_VEC3FLOAT:%.*]], align 4
; CHECK-NEXT: [[TMP1_SROA_0_0_VEC_EXTRACT:%.*]] = shufflevector <4 x float> [[TMP1_SROA_0_0_VECBLEND]], <4 x float> poison, <3 x i32> <i32 0, i32 1, i32 2>
; CHECK-NEXT: store <3 x float> [[TMP1_SROA_0_0_VEC_EXTRACT]], ptr [[TMP2]], align 4
@@ -169,7 +169,6 @@ define i1 @presplit_overlarge_load() {
; CHECK: attributes #[[ATTR1:[0-9]+]] = { nocallback nofree nounwind willreturn memory(argmem: write) }
;.
; CHECK: [[META0:![0-9]+]] = !{!"function_entry_count", i32 10}
-; CHECK: [[PROF1]] = !{!"unknown", !"sroa"}
;.
;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
; CHECK-MODIFY-CFG: {{.*}}
diff --git a/llvm/test/Transforms/SROA/vector-conversion.ll b/llvm/test/Transforms/SROA/vector-conversion.ll
index 12df63db96a08..4b6abde543e4e 100644
--- a/llvm/test/Transforms/SROA/vector-conversion.ll
+++ b/llvm/test/Transforms/SROA/vector-conversion.ll
@@ -8,11 +8,11 @@ define <4 x i64> @vector_ptrtoint({<2 x ptr>, <2 x ptr>} %x) {
; CHECK-NEXT: [[X_FCA_0_EXTRACT:%.*]] = extractvalue { <2 x ptr>, <2 x ptr> } [[X:%.*]], 0
; CHECK-NEXT: [[TMP1:%.*]] = ptrtoint <2 x ptr> [[X_FCA_0_EXTRACT]] to <2 x i64>
; CHECK-NEXT: [[A_SROA_0_0_VEC_EXPAND:%.*]] = shufflevector <2 x i64> [[TMP1]], <2 x i64> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
-; CHECK-NEXT: [[A_SROA_0_0_VECBLEND:%.*]] = select <4 x i1> <i1 true, i1 true, i1 false, i1 false>, <4 x i64> [[A_SROA_0_0_VEC_EXPAND]], <4 x i64> undef
+; CHECK-NEXT: [[A_SROA_0_0_VECBLEND:%.*]] = shufflevector <4 x i64> [[A_SROA_0_0_VEC_EXPAND]], <4 x i64> undef, <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; CHECK-NEXT: [[X_FCA_1_EXTRACT:%.*]] = extractvalue { <2 x ptr>, <2 x ptr> } [[X]], 1
; CHECK-NEXT: [[TMP2:%.*]] = ptrtoint <2 x ptr> [[X_FCA_1_EXTRACT]] to <2 x i64>
; CHECK-NEXT: [[A_SROA_0_16_VEC_EXPAND:%.*]] = shufflevector <2 x i64> [[TMP2]], <2 x i64> poison, <4 x i32> <i32 poison, i32 poison, i32 0, i32 1>
-; CHECK-NEXT: [[A_SROA_0_16_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 false, i1 true, i1 true>, <4 x i64> [[A_SROA_0_16_VEC_EXPAND]], <4 x i64> [[A_SROA_0_0_VECBLEND]]
+; CHECK-NEXT: [[A_SROA_0_16_VECBLEND:%.*]] = shufflevector <4 x i64> [[A_SROA_0_16_VEC_EXPAND]], <4 x i64> [[A_SROA_0_0_VECBLEND]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
; CHECK-NEXT: ret <4 x i64> [[A_SROA_0_16_VECBLEND]]
;
%a = alloca {<2 x ptr>, <2 x ptr>}, align 32
@@ -29,11 +29,11 @@ define <4 x ptr> @vector_inttoptr({<2 x i64>, <2 x i64>} %x) {
; CHECK-NEXT: [[X_FCA_0_EXTRACT:%.*]] = extractvalue { <2 x i64>, <2 x i64> } [[X:%.*]], 0
; CHECK-NEXT: [[TMP1:%.*]] = inttoptr <2 x i64> [[X_FCA_0_EXTRACT]] to <2 x ptr>
; CHECK-NEXT: [[A_SROA_0_0_VEC_EXPAND:%.*]] = shufflevector <2 x ptr> [[TMP1]], <2 x ptr> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
-; CHECK-NEXT: [[A_SROA_0_0_VECBLEND:%.*]] = select <4 x i1> <i1 true, i1 true, i1 false, i1 false>, <4 x ptr> [[A_SROA_0_0_VEC_EXPAND]], <4 x ptr> undef
+; CHECK-NEXT: [[A_SROA_0_0_VECBLEND:%.*]] = shufflevector <4 x ptr> [[A_SROA_0_0_VEC_EXPAND]], <4 x ptr> undef, <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; CHECK-NEXT: [[X_FCA_1_EXTRACT:%.*]] = extractvalue { <2 x i64>, <2 x i64> } [[X]], 1
; CHECK-NEXT: [[TMP2:%.*]] = inttoptr <2 x i64> [[X_FCA_1_EXTRACT]] to <2 x ptr>
; CHECK-NEXT: [[A_SROA_0_16_VEC_EXPAND:%.*]] = shufflevector <2 x ptr> [[TMP2]], <2 x ptr> poison, <4 x i32> <i32 poison, i32 poison, i32 0, i32 1>
-; CHECK-NEXT: [[A_SROA_0_16_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 false, i1 true, i1 true>, <4 x ptr> [[A_SROA_0_16_VEC_EXPAND]], <4 x ptr> [[A_SROA_0_0_VECBLEND]]
+; CHECK-NEXT: [[A_SROA_0_16_VECBLEND:%.*]] = shufflevector <4 x ptr> [[A_SROA_0_16_VEC_EXPAND]], <4 x ptr> [[A_SROA_0_0_VECBLEND]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
; CHECK-NEXT: ret <4 x ptr> [[A_SROA_0_16_VECBLEND]]
;
%a = alloca {<2 x i64>, <2 x i64>}, align 32
diff --git a/llvm/test/Transforms/SROA/vector-promotion.ll b/llvm/test/Transforms/SROA/vector-promotion.ll
index 682e8e3c433b5..0a5e741d10bec 100644
--- a/llvm/test/Transforms/SROA/vector-promotion.ll
+++ b/llvm/test/Transforms/SROA/vector-promotion.ll
@@ -395,20 +395,20 @@ define i64 @test6(<4 x i64> %x, <4 x i64> %y, i64 %n) {
define <4 x i32> @test_subvec_store() {
; CHECK-LABEL: @test_subvec_store(
; CHECK-NEXT: entry:
-; CHECK-NEXT: [[A_0_VECBLEND:%.*]] = select <4 x i1> <i1 true, i1 true, i1 false, i1 false>, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>, <4 x i32> undef
-; CHECK-NEXT: [[A_4_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 true, i1 true, i1 false>, <4 x i32> <i32 undef, i32 1, i32 1, i32 undef>, <4 x i32> [[A_0_VECBLEND]]
-; CHECK-NEXT: [[A_8_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 false, i1 true, i1 true>, <4 x i32> <i32 undef, i32 undef, i32 2, i32 2>, <4 x i32> [[A_4_VECBLEND]]
+; CHECK-NEXT: [[A_0_VECBLEND:%.*]] = shufflevector <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>, <4 x i32> undef, <4 x i32> <i32 0, i32 1, i32 6, i32 7>
+; CHECK-NEXT: [[A_4_VECBLEND:%.*]] = shufflevector <4 x i32> <i32 undef, i32 1, i32 1, i32 undef>, <4 x i32> [[A_0_VECBLEND]], <4 x i32> <i32 4, i32 1, i32 2, i32 7>
+; CHECK-NEXT: [[A_8_VECBLEND:%.*]] = shufflevector <4 x i32> <i32 undef, i32 undef, i32 2, i32 2>, <4 x i32> [[A_4_VECBLEND]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
; CHECK-NEXT: [[A_12_VEC_INSERT:%.*]] = insertelement <4 x i32> [[A_8_VECBLEND]], i32 3, i32 3
; CHECK-NEXT: ret <4 x i32> [[A_12_VEC_INSERT]]
;
; DEBUG-LABEL: @test_subvec_store(
; DEBUG-NEXT: entry:
; DEBUG-NEXT: #dbg_value(ptr poison, [[META178:![0-9]+]], !DIExpression(), [[META184:![0-9]+]])
-; DEBUG-NEXT: [[A_0_VECBLEND:%.*]] = select <4 x i1> <i1 true, i1 true, i1 false, i1 false>, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>, <4 x i32> undef, !dbg [[DBG185:![0-9]+]]
+; DEBUG-NEXT: [[A_0_VECBLEND:%.*]] = shufflevector <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>, <4 x i32> undef, <4 x i32> <i32 0, i32 1, i32 6, i32 7>, !dbg [[DBG185:![0-9]+]]
; DEBUG-NEXT: #dbg_value(ptr undef, [[META179:![0-9]+]], !DIExpression(), [[META186:![0-9]+]])
-; DEBUG-NEXT: [[A_4_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 true, i1 true, i1 false>, <4 x i32> <i32 undef, i32 1, i32 1, i32 undef>, <4 x i32> [[A_0_VECBLEND]], !dbg [[DBG187:![0-9]+]]
+; DEBUG-NEXT: [[A_4_VECBLEND:%.*]] = shufflevector <4 x i32> <i32 undef, i32 1, i32 1, i32 undef>, <4 x i32> [[A_0_VECBLEND]], <4 x i32> <i32 4, i32 1, i32 2, i32 7>, !dbg [[DBG187:![0-9]+]]
; DEBUG-NEXT: #dbg_value(ptr undef, [[META180:![0-9]+]], !DIExpression(), [[META188:![0-9]+]])
-; DEBUG-NEXT: [[A_8_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 false, i1 true, i1 true>, <4 x i32> <i32 undef, i32 undef, i32 2, i32 2>, <4 x i32> [[A_4_VECBLEND]], !dbg [[DBG189:![0-9]+]]
+; DEBUG-NEXT: [[A_8_VECBLEND:%.*]] = shufflevector <4 x i32> <i32 undef, i32 undef, i32 2, i32 2>, <4 x i32> [[A_4_VECBLEND]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>, !dbg [[DBG189:![0-9]+]]
; DEBUG-NEXT: #dbg_value(ptr undef, [[META181:![0-9]+]], !DIExpression(), [[META190:![0-9]+]])
; DEBUG-NEXT: [[A_12_VEC_INSERT:%.*]] = insertelement <4 x i32> [[A_8_VECBLEND]], i32 3, i32 3, !dbg [[DBG191:![0-9]+]]
; DEBUG-NEXT: #dbg_value(<4 x i32> [[A_12_VEC_INSERT]], [[META182:![0-9]+]], !DIExpression(), [[META192:![0-9]+]])
@@ -482,20 +482,20 @@ entry:
define <4 x float> @test_subvec_memset() {
; CHECK-LABEL: @test_subvec_memset(
; CHECK-NEXT: entry:
-; CHECK-NEXT: [[A_0_VECBLEND:%.*]] = select <4 x i1> <i1 true, i1 true, i1 false, i1 false>, <4 x float> <float 0.000000e+00, float 0.000000e+00, float undef, float undef>, <4 x float> undef
-; CHECK-NEXT: [[A_4_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 true, i1 true, i1 false>, <4 x float> <float undef, float 0x3820202020000000, float 0x3820202020000000, float undef>, <4 x float> [[A_0_VECBLEND]]
-; CHECK-NEXT: [[A_8_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 false, i1 true, i1 true>, <4 x float> <float undef, float undef, float 0x3860606060000000, float 0x3860606060000000>, <4 x float> [[A_4_VECBLEND]]
+; CHECK-NEXT: [[A_0_VECBLEND:%.*]] = shufflevector <4 x float> <float 0.000000e+00, float 0.000000e+00, float undef, float undef>, <4 x float> undef, <4 x i32> <i32 0, i32 1, i32 6, i32 7>
+; CHECK-NEXT: [[A_4_VECBLEND:%.*]] = shufflevector <4 x float> <float undef, float 0x3820202020000000, float 0x3820202020000000, float undef>, <4 x float> [[A_0_VECBLEND]], <4 x i32> <i32 4, i32 1, i32 2, i32 7>
+; CHECK-NEXT: [[A_8_VECBLEND:%.*]] = shufflevector <4 x float> <float undef, float undef, float 0x3860606060000000, float 0x3860606060000000>, <4 x float> [[A_4_VECBLEND]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
; CHECK-NEXT: [[A_12_VEC_INSERT:%.*]] = insertelement <4 x float> [[A_8_VECBLEND]], float 0x38E0E0E0E0000000, i32 3
; CHECK-NEXT: ret <4 x float> [[A_12_VEC_INSERT]]
;
; DEBUG-LABEL: @test_subvec_memset(
; DEBUG-NEXT: entry:
; DEBUG-NEXT: #dbg_value(ptr poison, [[META215:![0-9]+]], !DIExpression(), [[META220:![0-9]+]])
-; DEBUG-NEXT: [[A_0_VECBLEND:%.*]] = select <4 x i1> <i1 true, i1 true, i1 false, i1 false>, <4 x float> <float 0.000000e+00, float 0.000000e+00, float undef, float undef>, <4 x float> undef, !dbg [[DBG221:![0-9]+]]
+; DEBUG-NEXT: [[A_0_VECBLEND:%.*]] = shufflevector <4 x float> <float 0.000000e+00, float 0.000000e+00, float undef, float undef>, <4 x float> undef, <4 x i32> <i32 0, i32 1, i32 6, i32 7>, !dbg [[DBG221:![0-9]+]]
; DEBUG-NEXT: #dbg_value(ptr undef, [[META216:![0-9]+]], !DIExpression(), [[META222:![0-9]+]])
-; DEBUG-NEXT: [[A_4_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 true, i1 true, i1 false>, <4 x float> <float undef, float 0x3820202020000000, float 0x3820202020000000, float undef>, <4 x float> [[A_0_VECBLEND]], !dbg [[DBG223:![0-9]+]]
+; DEBUG-NEXT: [[A_4_VECBLEND:%.*]] = shufflevector <4 x float> <float undef, float 0x3820202020000000, float 0x3820202020000000, float undef>, <4 x float> [[A_0_VECBLEND]], <4 x i32> <i32 4, i32 1, i32 2, i32 7>, !dbg [[DBG223:![0-9]+]]
; DEBUG-NEXT: #dbg_value(ptr undef, [[META217:![0-9]+]], !DIExpression(), [[META224:![0-9]+]])
-; DEBUG-NEXT: [[A_8_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 false, i1 true, i1 true>, <4 x float> <float undef, float undef, float 0x3860606060000000, float 0x3860606060000000>, <4 x float> [[A_4_VECBLEND]], !dbg [[DBG225:![0-9]+]]
+; DEBUG-NEXT: [[A_8_VECBLEND:%.*]] = shufflevector <4 x float> <float undef, float undef, float 0x3860606060000000, float 0x3860606060000000>, <4 x float> [[A_4_VECBLEND]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>, !dbg [[DBG225:![0-9]+]]
; DEBUG-NEXT: #dbg_value(ptr undef, [[META218:![0-9]+]], !DIExpression(), [[META226:![0-9]+]])
; DEBUG-NEXT: [[A_12_VEC_INSERT:%.*]] = insertelement <4 x float> [[A_8_VECBLEND]], float 0x38E0E0E0E0000000, i32 3, !dbg [[DBG227:![0-9]+]]
; DEBUG-NEXT: #dbg_value(<4 x float> [[A_12_VEC_INSERT]], [[META219:![0-9]+]], !DIExpression(), [[META228:![0-9]+]])
@@ -525,13 +525,13 @@ define <4 x float> @test_subvec_memcpy(ptr %x, ptr %y, ptr %z, ptr %f, ptr %out)
; CHECK-NEXT: entry:
; CHECK-NEXT: [[A_0_COPYLOAD:%.*]] = load <2 x float>, ptr [[X:%.*]], align 1
; CHECK-NEXT: [[A_0_VEC_EXPAND:%.*]] = shufflevector <2 x float> [[A_0_COPYLOAD]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
-; CHECK-NEXT: [[A_0_VECBLEND:%.*]] = select <4 x i1> <i1 true, i1 true, i1 false, i1 false>, <4 x float> [[A_0_VEC_EXPAND]], <4 x float> undef
+; CHECK-NEXT: [[A_0_VECBLEND:%.*]] = shufflevector <4 x float> [[A_0_VEC_EXPAND]], <4 x float> undef, <4 x i32> <i32 0, i32 1, i32 6, i32 7>
; CHECK-NEXT: [[A_4_COPYLOAD:%.*]] = load <2 x float>, ptr [[Y:%.*]], align 1
; CHECK-NEXT: [[A_4_VEC_EXPAND:%.*]] = shufflevector <2 x float> [[A_4_COPYLOAD]], <2 x float> poison, <4 x i32> <i32 poison, i32 0, i32 1, i32 poison>
-; CHECK-NEXT: [[A_4_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 true, i1 true, i1 false>, <4 x float> [[A_4_VEC_EXPAND]], <4 x float> [[A_0_VECBLEND]]
+; CHECK-NEXT: [[A_4_VECBLEND:%.*]] = shufflevector <4 x float> [[A_4_VEC_EXPAND]], <4 x float> [[A_0_VECBLEND]], <4 x i32> <i32 4, i32 1, i32 2, i32 7>
; CHECK-NEXT: [[A_8_COPYLOAD:%.*]] = load <2 x float>, ptr [[Z:%.*]], align 1
; CHECK-NEXT: [[A_8_VEC_EXPAND:%.*]] = shufflevector <2 x float> [[A_8_COPYLOAD]], <2 x float> poison, <4 x i32> <i32 poison, i32 poison, i32 0, i32 1>
-; CHECK-NEXT: [[A_8_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 false, i1 true, i1 true>, <4 x float> [[A_8_VEC_EXPAND]], <4 x float> [[A_4_VECBLEND]]
+; CHECK-NEXT: [[A_8_VECBLEND:%.*]] = shufflevector <4 x float> [[A_8_VEC_EXPAND]], <4 x float> [[A_4_VECBLEND]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
; CHECK-NEXT: [[A_12_COPYLOAD:%.*]] = load float, ptr [[F:%.*]], align 1
; CHECK-NEXT: [[A_12_VEC_INSERT:%.*]] = insertelement <4 x float> [[A_8_VECBLEND]], float [[A_12_COPYLOAD]], i32 3
; CHECK-NEXT: [[A_8_VEC_EXTRACT:%.*]] = shufflevector <4 x float> [[A_12_VEC_INSERT]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
@@ -543,15 +543,15 @@ define <4 x float> @test_subvec_memcpy(ptr %x, ptr %y, ptr %z, ptr %f, ptr %out)
; DEBUG-NEXT: #dbg_value(ptr poison, [[META232:![0-9]+]], !DIExpression(), [[META237:![0-9]+]])
; DEBUG-NEXT: [[A_0_COPYLOAD:%.*]] = load <2 x float>, ptr [[X:%.*]], align 1, !dbg [[DBG238:![0-9]+]]
; DEBUG-NEXT: [[A_0_VEC_EXPAND:%.*]] = shufflevector <2 x float> [[A_0_COPYLOAD]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>, !dbg [[DBG238]]
-; DEBUG-NEXT: [[A_0_VECBLEND:%.*]] = select <4 x i1> <i1 true, i1 true, i1 false, i1 false>, <4 x float> [[A_0_VEC_EXPAND]], <4 x float> undef, !dbg [[DBG238]]
+; DEBUG-NEXT: [[A_0_VECBLEND:%.*]] = shufflevector <4 x float> [[A_0_VEC_EXPAND]], <4 x float> undef, <4 x i32> <i32 0, i32 1, i32 6, i32 7>, !dbg [[DBG238]]
; DEBUG-NEXT: #dbg_value(ptr undef, [[META233:![0-9]+]], !DIExpression(), [[META239:![0-9]+]])
; DEBUG-NEXT: [[A_4_COPYLOAD:%.*]] = load <2 x float>, ptr [[Y:%.*]], align 1, !dbg [[DBG240:![0-9]+]]
; DEBUG-NEXT: [[A_4_VEC_EXPAND:%.*]] = shufflevector <2 x float> [[A_4_COPYLOAD]], <2 x float> poison, <4 x i32> <i32 poison, i32 0, i32 1, i32 poison>, !dbg [[DBG240]]
-; DEBUG-NEXT: [[A_4_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 true, i1 true, i1 false>, <4 x float> [[A_4_VEC_EXPAND]], <4 x float> [[A_0_VECBLEND]], !dbg [[DBG240]]
+; DEBUG-NEXT: [[A_4_VECBLEND:%.*]] = shufflevector <4 x float> [[A_4_VEC_EXPAND]], <4 x float> [[A_0_VECBLEND]], <4 x i32> <i32 4, i32 1, i32 2, i32 7>, !dbg [[DBG240]]
; DEBUG-NEXT: #dbg_value(ptr undef, [[META234:![0-9]+]], !DIExpression(), [[META241:![0-9]+]])
; DEBUG-NEXT: [[A_8_COPYLOAD:%.*]] = load <2 x float>, ptr [[Z:%.*]], align 1, !dbg [[DBG242:![0-9]+]]
; DEBUG-NEXT: [[A_8_VEC_EXPAND:%.*]] = shufflevector <2 x float> [[A_8_COPYLOAD]], <2...
[truncated]
|
) A patch from May 2013, commit 1e21191, changed SROA into using a select instruction to perform vector blend. Idea was that using a select was the canonical form, and that we optimize select better than shufflevector. This patch is changing SROA back into using shufflevector instead of select when doing the blend (inserting a smaller vector into a larger vector). Motivation: Nowadays InstCombine is canonicalizing this kind of vector blends, using vector select instructions, into a shufflevector instruction. So it is assumed that shufflevector is the canonical form now. It is also assumed that we are better at optimizing shufflevector today, compared to back in 2013. Commit f26710d includes links to a discussion from 2016 (https://discourse.llvm.org/t/ir-canonicalization-vector-select-or-shufflevector/42257/6) about picking shufflevector as the canonical form.
A patch from May 2013, commit 1e21191, changed SROA into using a select instruction to perform vector blend. Idea was that using a select was the canonical form, and that we optimize select better than shufflevector.
This patch is changing SROA back into using shufflevector instead of select when doing the blend (inserting a smaller vector into a larger vector).
Motivation:
Nowadays InstCombine is canonicalizing this kind of vector blends, using vector select instructions, into a shufflevector instruction. So it is assumed that shufflevector is the canonical form now. It is also assumed that we are better at optimizing shufflevector today, compared to back in 2013.
Commit f26710d includes links to a discussion from 2016 (https://discourse.llvm.org/t/ir-canonicalization-vector-select-or-shufflevector/42257/6) about picking shufflevector as the canonical form.