Skip to content

Commit

Permalink
[X86] combineX86ShufflesRecursively - treat ISD::TRUNCATE as faux shu…
Browse files Browse the repository at this point in the history
…ffle

getFauxShuffleMask can't handle ISD::TRUNCATE itself as it can't handle inputs that are larger than the output

Another step towards removing combineX86ShuffleChainWithExtract
  • Loading branch information
RKSimon committed Feb 11, 2023
1 parent d05f889 commit 23cb32c
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 17 deletions.
10 changes: 10 additions & 0 deletions llvm/lib/Target/X86/X86ISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -40287,6 +40287,16 @@ static SDValue combineX86ShufflesRecursively(
OpMask.assign(NumElts, SM_SentinelUndef);
std::iota(OpMask.begin(), OpMask.end(), ExtractIdx);
OpZero = OpUndef = APInt::getNullValue(NumElts);
} else if (Op.getOpcode() == ISD::TRUNCATE &&
(RootSizeInBits % Op.getOperand(0).getValueSizeInBits()) == 0) {
SDValue SrcVec = Op.getOperand(0);
unsigned Scale = SrcVec.getValueSizeInBits() / VT.getSizeInBits();
unsigned NumElts = VT.getVectorNumElements();
OpInputs.assign({SrcVec});
OpMask.assign(Scale * NumElts, SM_SentinelUndef);

This comment has been minimized.

Copy link
@d0k

d0k Feb 12, 2023

Member

Will this get extended to double the size by line 40321 below? Seeing some really bizarre miscompiles with this change.

This comment has been minimized.

Copy link
@RKSimon

RKSimon Feb 12, 2023

Author Collaborator

Miscompiles or regressions? It shouldn't use any src wider than the original root, but it might interact poorly with combineX86ShuffleChainWithExtract in some way?

This comment has been minimized.

Copy link
@d0k

d0k Feb 12, 2023

Member

Wrong results involving this:

t233: v8i32,ch = load<(invariant load (s256) from %ir.uglygep97, align 2)> t0, t10, undef:i64
t197: v8i16 = truncate t233

Haven't managed to isolate it, but it only happens with AVX512 enabled.

This comment has been minimized.

Copy link
@RKSimon

RKSimon Feb 12, 2023

Author Collaborator

Cheers - I'll revert for now, but a repro would be appreciated.

This comment has been minimized.

Copy link
@RKSimon

RKSimon Feb 12, 2023

Author Collaborator

This comment has been minimized.

Copy link
@d0k

d0k Feb 12, 2023

Member

Thanks. The problematic IR is in https://gist.github.com/d0k/7fedb98632540ff9da680f7bfda8e7bf

If I diff llc -mcpu=skx before and after this change I get this:

-       vpslld  $16, -96(%r11,%rsi,4), %ymm0
-       vpslld  $16, -64(%r11,%rsi,4), %ymm1
-       vpslld  $16, -32(%r11,%rsi,4), %ymm2
-       vpslld  $16, (%r11,%rsi,4), %ymm3
-       vmovdqu %ymm0, -96(%r10,%rsi,4)
-       vmovdqu %ymm1, -64(%r10,%rsi,4)
-       vmovdqu %ymm2, -32(%r10,%rsi,4)
-       vmovdqu %ymm3, (%r10,%rsi,4)
+       vmovdqu -96(%r11,%rsi,4), %ymm1
+       vmovdqu -64(%r11,%rsi,4), %ymm2
+       vmovdqu -32(%r11,%rsi,4), %ymm3
+       vmovdqu (%r11,%rsi,4), %ymm4
+       vpshufb %ymm0, %ymm1, %ymm1
+       vpshufb %ymm0, %ymm2, %ymm2
+       vpshufb %ymm0, %ymm3, %ymm3
+       vpshufb %ymm0, %ymm4, %ymm4
+       vmovdqu %ymm1, -96(%r10,%rsi,4)
+       vmovdqu %ymm2, -64(%r10,%rsi,4)
+       vmovdqu %ymm3, -32(%r10,%rsi,4)
+       vmovdqu %ymm4, (%r10,%rsi,4)

Which seems worse, but I haven't checked if it's incorrect.

This comment has been minimized.

Copy link
@RKSimon

RKSimon Feb 12, 2023

Author Collaborator

Thanks - found the problem

OpZero = OpUndef = APInt::getNullValue(Scale * NumElts);
for (unsigned I = 0; I != NumElts; ++I)
OpMask[I] = I * Scale;
} else {
return SDValue();
}
Expand Down
27 changes: 10 additions & 17 deletions llvm/test/CodeGen/X86/prefer-avx256-mask-shuffle.ll
Original file line number Diff line number Diff line change
Expand Up @@ -131,32 +131,25 @@ define <32 x i1> @shuf32i1_3_6_22_12_3_7_7_0_3_6_1_13_3_21_7_0_3_6_22_12_3_7_7_0
; AVX256VL-LABEL: shuf32i1_3_6_22_12_3_7_7_0_3_6_1_13_3_21_7_0_3_6_22_12_3_7_7_0_3_6_1_13_3_21_7_0:
; AVX256VL: # %bb.0:
; AVX256VL-NEXT: vpxor %xmm1, %xmm1, %xmm1
; AVX256VL-NEXT: vpcmpeqb %ymm1, %ymm0, %ymm0
; AVX256VL-NEXT: vextracti128 $1, %ymm0, %xmm1
; AVX256VL-NEXT: vpmovsxbd %xmm1, %ymm1
; AVX256VL-NEXT: vptestmd %ymm1, %ymm1, %k1
; AVX256VL-NEXT: vpcmpeqb %xmm1, %xmm0, %xmm0
; AVX256VL-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,2,3]
; AVX256VL-NEXT: vpmovsxbd %xmm1, %ymm1
; AVX256VL-NEXT: vptestmd %ymm1, %ymm1, %k2
; AVX256VL-NEXT: vptestmd %ymm1, %ymm1, %k1
; AVX256VL-NEXT: vpmovsxbd %xmm0, %ymm0
; AVX256VL-NEXT: vptestmd %ymm0, %ymm0, %k3
; AVX256VL-NEXT: vptestmd %ymm0, %ymm0, %k2
; AVX256VL-NEXT: vpcmpeqd %ymm0, %ymm0, %ymm0
; AVX256VL-NEXT: vmovdqa32 %ymm0, %ymm1 {%k3} {z}
; AVX256VL-NEXT: vmovdqa32 %ymm0, %ymm1 {%k2} {z}
; AVX256VL-NEXT: vpmovdw %ymm1, %xmm1
; AVX256VL-NEXT: vmovdqa32 %ymm0, %ymm2 {%k2} {z}
; AVX256VL-NEXT: vmovdqa32 %ymm0, %ymm2 {%k1} {z}
; AVX256VL-NEXT: vpmovdw %ymm2, %xmm2
; AVX256VL-NEXT: vinserti128 $1, %xmm2, %ymm1, %ymm1
; AVX256VL-NEXT: vpermq {{.*#+}} ymm2 = ymm1[2,3,0,1]
; AVX256VL-NEXT: vpblendd {{.*#+}} ymm1 = ymm1[0,1],ymm2[2],ymm1[3],ymm2[4,5],ymm1[6],ymm2[7]
; AVX256VL-NEXT: vpshufb {{.*#+}} ymm1 = ymm1[6,7,12,13],zero,zero,ymm1[8,9,6,7,14,15,14,15,0,1,22,23,28,29,18,19,26,27,22,23],zero,zero,ymm1[30,31,16,17]
; AVX256VL-NEXT: vmovdqa32 %ymm0, %ymm2 {%k1} {z}
; AVX256VL-NEXT: vpmovdw %ymm2, %xmm2
; AVX256VL-NEXT: vpermq {{.*#+}} ymm2 = ymm2[1,1,1,1]
; AVX256VL-NEXT: vpternlogq $220, {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm1, %ymm2
; AVX256VL-NEXT: vpmovsxwd %xmm2, %ymm1
; AVX256VL-NEXT: vpslld $31, %ymm1, %ymm1
; AVX256VL-NEXT: vptestmd %ymm1, %ymm1, %k1
; AVX256VL-NEXT: vextracti128 $1, %ymm2, %xmm1
; AVX256VL-NEXT: vpshufb {{.*#+}} ymm1 = ymm1[6,7,12,13,u,u,8,9,6,7,14,15,14,15,0,1,22,23,28,29,18,19,26,27,22,23,u,u,30,31,16,17]
; AVX256VL-NEXT: vpmovsxwd %xmm1, %ymm2
; AVX256VL-NEXT: vpslld $31, %ymm2, %ymm2
; AVX256VL-NEXT: vptestmd %ymm2, %ymm2, %k1
; AVX256VL-NEXT: vextracti128 $1, %ymm1, %xmm1
; AVX256VL-NEXT: vpmovsxwd %xmm1, %ymm1
; AVX256VL-NEXT: vpslld $31, %ymm1, %ymm1
; AVX256VL-NEXT: vptestmd %ymm1, %ymm1, %k0
Expand Down

0 comments on commit 23cb32c

Please sign in to comment.