Skip to content

Commit

Permalink
[X86] Don't create VBROADCAST nodes with 256-bit or 512-bit input types
Browse files Browse the repository at this point in the history
Summary:
We don't seem to have great rules on what a valid VBROADCAST node looks like. And as a consequence we end up with a lot of patterns to try to catch everything. We have patterns with scalar inputs, 128-bit vector inputs, 256-bit vector inputs, and 512-bit vector inputs.

As you can see from the things improved here we are currently missing patterns for 128-bit loads being extended to 256-bit before the vbroadcast.

I'd like to propose that VBROADCAST should always take a 128-bit vector type as input. As a first step towards that this patch adds an EXTRACT_SUBVECTOR in front of VBROADCAST when the input is 256 or 512-bits. In the future I would like to add scalar_to_vector around all the scalar operations. And maybe we should consider adding a VBROADCAST+load node to avoid separating loads from the broadcasting operation when the load itself isn't foldable.

This requires an additional change in target shuffle combining to look for the extract subvector and look through it to find the original operand. I'm sure this change isn't perfect but was enough to fix a few test failures that were being caused.

Another interesting thing I noticed is that the changes in masked_gather_scatter.ll show cases were we don't remove a useless insert into element 1 before broadcasting element 0.

Reviewers: delena, RKSimon, zvi

Reviewed By: zvi

Subscribers: igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D28747

llvm-svn: 295155
  • Loading branch information
topperc committed Feb 15, 2017
1 parent ec5df5f commit fbc7805
Show file tree
Hide file tree
Showing 4 changed files with 36 additions and 40 deletions.
20 changes: 18 additions & 2 deletions llvm/lib/Target/X86/X86ISelLowering.cpp
Expand Up @@ -5394,8 +5394,18 @@ static bool getTargetShuffleMask(SDNode *N, MVT VT, bool AllowSentinelZero,
IsUnary = true;
break;
case X86ISD::VBROADCAST: {
// We only decode broadcasts of same-sized vectors at the moment.
if (N->getOperand(0).getValueType() == VT) {
SDValue N0 = N->getOperand(0);
// See if we're broadcasting from index 0 of an EXTRACT_SUBVECTOR. If so,
// add the pre-extracted value to the Ops vector.
if (N0.getOpcode() == ISD::EXTRACT_SUBVECTOR &&
N0.getOperand(0).getValueType() == VT &&
N0.getConstantOperandVal(1) == 0)
Ops.push_back(N0.getOperand(0));

// We only decode broadcasts of same-sized vectors, unless the broadcast
// came from an extract from the original width. If we found one, we
// pushed it the Ops vector above.
if (N0.getValueType() == VT || !Ops.empty()) {
DecodeVectorBroadcast(VT, Mask);
IsUnary = true;
break;
Expand Down Expand Up @@ -9729,6 +9739,12 @@ static SDValue lowerVectorShuffleAsBroadcast(const SDLoc &DL, MVT VT,
BroadcastVT = MVT::getVectorVT(MVT::f64, NumBroadcastElts);
}

// We only support broadcasting from 128-bit vectors to minimize the
// number of patterns we need to deal with in isel. So extract down to
// 128-bits.
if (SrcVT.getSizeInBits() > 128)
V = extract128BitVector(V, 0, DAG, DL);

return DAG.getBitcast(VT, DAG.getNode(Opcode, DL, BroadcastVT, V));
}

Expand Down
12 changes: 4 additions & 8 deletions llvm/test/CodeGen/X86/masked_gather_scatter.ll
Expand Up @@ -714,8 +714,7 @@ define <16 x float> @test13(float* %base, <16 x i32> %ind) {
define <16 x float> @test14(float* %base, i32 %ind, <16 x float*> %vec) {
; KNL_64-LABEL: test14:
; KNL_64: # BB#0:
; KNL_64-NEXT: vpinsrq $1, %rdi, %xmm0, %xmm1
; KNL_64-NEXT: vinserti32x4 $0, %xmm1, %zmm0, %zmm0
; KNL_64-NEXT: vpinsrq $1, %rdi, %xmm0, %xmm0
; KNL_64-NEXT: vpbroadcastq %xmm0, %zmm0
; KNL_64-NEXT: vmovd %esi, %xmm1
; KNL_64-NEXT: vpbroadcastd %xmm1, %ymm1
Expand All @@ -731,8 +730,7 @@ define <16 x float> @test14(float* %base, i32 %ind, <16 x float*> %vec) {
;
; KNL_32-LABEL: test14:
; KNL_32: # BB#0:
; KNL_32-NEXT: vpinsrd $1, {{[0-9]+}}(%esp), %xmm0, %xmm1
; KNL_32-NEXT: vinserti32x4 $0, %xmm1, %zmm0, %zmm0
; KNL_32-NEXT: vpinsrd $1, {{[0-9]+}}(%esp), %xmm0, %xmm0
; KNL_32-NEXT: vpbroadcastd %xmm0, %zmm0
; KNL_32-NEXT: vpslld $2, {{[0-9]+}}(%esp){1to16}, %zmm1
; KNL_32-NEXT: vpaddd %zmm1, %zmm0, %zmm1
Expand All @@ -742,8 +740,7 @@ define <16 x float> @test14(float* %base, i32 %ind, <16 x float*> %vec) {
;
; SKX-LABEL: test14:
; SKX: # BB#0:
; SKX-NEXT: vpinsrq $1, %rdi, %xmm0, %xmm1
; SKX-NEXT: vinserti64x2 $0, %xmm1, %zmm0, %zmm0
; SKX-NEXT: vpinsrq $1, %rdi, %xmm0, %xmm0
; SKX-NEXT: vpbroadcastq %xmm0, %zmm0
; SKX-NEXT: vpbroadcastd %esi, %ymm1
; SKX-NEXT: vpmovsxdq %ymm1, %zmm1
Expand All @@ -758,8 +755,7 @@ define <16 x float> @test14(float* %base, i32 %ind, <16 x float*> %vec) {
;
; SKX_32-LABEL: test14:
; SKX_32: # BB#0:
; SKX_32-NEXT: vpinsrd $1, {{[0-9]+}}(%esp), %xmm0, %xmm1
; SKX_32-NEXT: vinserti32x4 $0, %xmm1, %zmm0, %zmm0
; SKX_32-NEXT: vpinsrd $1, {{[0-9]+}}(%esp), %xmm0, %xmm0
; SKX_32-NEXT: vpbroadcastd %xmm0, %zmm0
; SKX_32-NEXT: vpslld $2, {{[0-9]+}}(%esp){1to16}, %zmm1
; SKX_32-NEXT: vpaddd %zmm1, %zmm0, %zmm1
Expand Down
2 changes: 0 additions & 2 deletions llvm/test/CodeGen/X86/vector-shuffle-avx512.ll
Expand Up @@ -126,7 +126,6 @@ define <8 x i32> @expand3(<4 x i32> %a ) {
;
; KNL64-LABEL: expand3:
; KNL64: # BB#0:
; KNL64-NEXT: # kill: %XMM0<def> %XMM0<kill> %YMM0<def>
; KNL64-NEXT: vpbroadcastq %xmm0, %ymm0
; KNL64-NEXT: vpxor %ymm1, %ymm1, %ymm1
; KNL64-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3,4,5,6],ymm0[7]
Expand All @@ -142,7 +141,6 @@ define <8 x i32> @expand3(<4 x i32> %a ) {
;
; KNL32-LABEL: expand3:
; KNL32: # BB#0:
; KNL32-NEXT: # kill: %XMM0<def> %XMM0<kill> %YMM0<def>
; KNL32-NEXT: vpbroadcastq %xmm0, %ymm0
; KNL32-NEXT: vpxor %ymm1, %ymm1, %ymm1
; KNL32-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3,4,5,6],ymm0[7]
Expand Down
42 changes: 14 additions & 28 deletions llvm/test/CodeGen/X86/widened-broadcast.ll
Expand Up @@ -51,14 +51,12 @@ define <8 x float> @load_splat_8f32_4f32_01010101(<4 x float>* %ptr) nounwind uw
;
; AVX2-LABEL: load_splat_8f32_4f32_01010101:
; AVX2: # BB#0: # %entry
; AVX2-NEXT: vmovaps (%rdi), %xmm0
; AVX2-NEXT: vbroadcastsd %xmm0, %ymm0
; AVX2-NEXT: vbroadcastsd (%rdi), %ymm0
; AVX2-NEXT: retq
;
; AVX512-LABEL: load_splat_8f32_4f32_01010101:
; AVX512: # BB#0: # %entry
; AVX512-NEXT: vmovaps (%rdi), %xmm0
; AVX512-NEXT: vbroadcastsd %xmm0, %ymm0
; AVX512-NEXT: vbroadcastsd (%rdi), %ymm0
; AVX512-NEXT: retq
entry:
%ld = load <4 x float>, <4 x float>* %ptr
Expand Down Expand Up @@ -131,14 +129,12 @@ define <8 x i32> @load_splat_8i32_4i32_01010101(<4 x i32>* %ptr) nounwind uwtabl
;
; AVX2-LABEL: load_splat_8i32_4i32_01010101:
; AVX2: # BB#0: # %entry
; AVX2-NEXT: vmovaps (%rdi), %xmm0
; AVX2-NEXT: vbroadcastsd %xmm0, %ymm0
; AVX2-NEXT: vbroadcastsd (%rdi), %ymm0
; AVX2-NEXT: retq
;
; AVX512-LABEL: load_splat_8i32_4i32_01010101:
; AVX512: # BB#0: # %entry
; AVX512-NEXT: vmovaps (%rdi), %xmm0
; AVX512-NEXT: vbroadcastsd %xmm0, %ymm0
; AVX512-NEXT: vbroadcastsd (%rdi), %ymm0
; AVX512-NEXT: retq
entry:
%ld = load <4 x i32>, <4 x i32>* %ptr
Expand Down Expand Up @@ -242,14 +238,12 @@ define <16 x i16> @load_splat_16i16_8i16_0101010101010101(<8 x i16>* %ptr) nounw
;
; AVX2-LABEL: load_splat_16i16_8i16_0101010101010101:
; AVX2: # BB#0: # %entry
; AVX2-NEXT: vmovaps (%rdi), %xmm0
; AVX2-NEXT: vbroadcastss %xmm0, %ymm0
; AVX2-NEXT: vbroadcastss (%rdi), %ymm0
; AVX2-NEXT: retq
;
; AVX512-LABEL: load_splat_16i16_8i16_0101010101010101:
; AVX512: # BB#0: # %entry
; AVX512-NEXT: vmovaps (%rdi), %xmm0
; AVX512-NEXT: vbroadcastss %xmm0, %ymm0
; AVX512-NEXT: vbroadcastss (%rdi), %ymm0
; AVX512-NEXT: retq
entry:
%ld = load <8 x i16>, <8 x i16>* %ptr
Expand All @@ -272,14 +266,12 @@ define <16 x i16> @load_splat_16i16_8i16_0123012301230123(<8 x i16>* %ptr) nounw
;
; AVX2-LABEL: load_splat_16i16_8i16_0123012301230123:
; AVX2: # BB#0: # %entry
; AVX2-NEXT: vmovaps (%rdi), %xmm0
; AVX2-NEXT: vbroadcastsd %xmm0, %ymm0
; AVX2-NEXT: vbroadcastsd (%rdi), %ymm0
; AVX2-NEXT: retq
;
; AVX512-LABEL: load_splat_16i16_8i16_0123012301230123:
; AVX512: # BB#0: # %entry
; AVX512-NEXT: vmovaps (%rdi), %xmm0
; AVX512-NEXT: vbroadcastsd %xmm0, %ymm0
; AVX512-NEXT: vbroadcastsd (%rdi), %ymm0
; AVX512-NEXT: retq
entry:
%ld = load <8 x i16>, <8 x i16>* %ptr
Expand Down Expand Up @@ -442,14 +434,12 @@ define <32 x i8> @load_splat_32i8_16i8_01010101010101010101010101010101(<16 x i8
;
; AVX2-LABEL: load_splat_32i8_16i8_01010101010101010101010101010101:
; AVX2: # BB#0: # %entry
; AVX2-NEXT: vmovdqa (%rdi), %xmm0
; AVX2-NEXT: vpbroadcastw %xmm0, %ymm0
; AVX2-NEXT: vpbroadcastw (%rdi), %ymm0
; AVX2-NEXT: retq
;
; AVX512-LABEL: load_splat_32i8_16i8_01010101010101010101010101010101:
; AVX512: # BB#0: # %entry
; AVX512-NEXT: vmovdqa (%rdi), %xmm0
; AVX512-NEXT: vpbroadcastw %xmm0, %ymm0
; AVX512-NEXT: vpbroadcastw (%rdi), %ymm0
; AVX512-NEXT: retq
entry:
%ld = load <16 x i8>, <16 x i8>* %ptr
Expand All @@ -472,14 +462,12 @@ define <32 x i8> @load_splat_32i8_16i8_01230123012301230123012301230123(<16 x i8
;
; AVX2-LABEL: load_splat_32i8_16i8_01230123012301230123012301230123:
; AVX2: # BB#0: # %entry
; AVX2-NEXT: vmovaps (%rdi), %xmm0
; AVX2-NEXT: vbroadcastss %xmm0, %ymm0
; AVX2-NEXT: vbroadcastss (%rdi), %ymm0
; AVX2-NEXT: retq
;
; AVX512-LABEL: load_splat_32i8_16i8_01230123012301230123012301230123:
; AVX512: # BB#0: # %entry
; AVX512-NEXT: vmovaps (%rdi), %xmm0
; AVX512-NEXT: vbroadcastss %xmm0, %ymm0
; AVX512-NEXT: vbroadcastss (%rdi), %ymm0
; AVX512-NEXT: retq
entry:
%ld = load <16 x i8>, <16 x i8>* %ptr
Expand All @@ -502,14 +490,12 @@ define <32 x i8> @load_splat_32i8_16i8_01234567012345670123456701234567(<16 x i8
;
; AVX2-LABEL: load_splat_32i8_16i8_01234567012345670123456701234567:
; AVX2: # BB#0: # %entry
; AVX2-NEXT: vmovaps (%rdi), %xmm0
; AVX2-NEXT: vbroadcastsd %xmm0, %ymm0
; AVX2-NEXT: vbroadcastsd (%rdi), %ymm0
; AVX2-NEXT: retq
;
; AVX512-LABEL: load_splat_32i8_16i8_01234567012345670123456701234567:
; AVX512: # BB#0: # %entry
; AVX512-NEXT: vmovaps (%rdi), %xmm0
; AVX512-NEXT: vbroadcastsd %xmm0, %ymm0
; AVX512-NEXT: vbroadcastsd (%rdi), %ymm0
; AVX512-NEXT: retq
entry:
%ld = load <16 x i8>, <16 x i8>* %ptr
Expand Down

0 comments on commit fbc7805

Please sign in to comment.