Skip to content

Commit

Permalink
[X86][AVX512BF16] Add a few missing insert/extract patterns
Browse files Browse the repository at this point in the history
These are really the same as the f16 (and i16) instructions, but we need
them for any type that can occur.
  • Loading branch information
d0k committed Mar 5, 2024
1 parent 6f11c95 commit 55c466d
Show file tree
Hide file tree
Showing 2 changed files with 33 additions and 0 deletions.
12 changes: 12 additions & 0 deletions llvm/lib/Target/X86/X86InstrAVX512.td
Original file line number Diff line number Diff line change
Expand Up @@ -494,20 +494,26 @@ defm : vinsert_for_size_lowering<"VINSERTI32x4Z256", v16i8x_info, v32i8x_info,
vinsert128_insert, INSERT_get_vinsert128_imm, [HasVLX]>;
defm : vinsert_for_size_lowering<"VINSERTF32x4Z256", v8f16x_info, v16f16x_info,
vinsert128_insert, INSERT_get_vinsert128_imm, [HasVLX]>;
defm : vinsert_for_size_lowering<"VINSERTF32x4Z256", v8bf16x_info, v16bf16x_info,
vinsert128_insert, INSERT_get_vinsert128_imm, [HasVLX]>;
// Codegen pattern with the alternative types insert VEC128 into VEC512
defm : vinsert_for_size_lowering<"VINSERTI32x4Z", v8i16x_info, v32i16_info,
vinsert128_insert, INSERT_get_vinsert128_imm, [HasAVX512]>;
defm : vinsert_for_size_lowering<"VINSERTI32x4Z", v16i8x_info, v64i8_info,
vinsert128_insert, INSERT_get_vinsert128_imm, [HasAVX512]>;
defm : vinsert_for_size_lowering<"VINSERTF32x4Z", v8f16x_info, v32f16_info,
vinsert128_insert, INSERT_get_vinsert128_imm, [HasAVX512]>;
defm : vinsert_for_size_lowering<"VINSERTF32x4Z", v8bf16x_info, v32bf16_info,
vinsert128_insert, INSERT_get_vinsert128_imm, [HasAVX512]>;
// Codegen pattern with the alternative types insert VEC256 into VEC512
defm : vinsert_for_size_lowering<"VINSERTI64x4Z", v16i16x_info, v32i16_info,
vinsert256_insert, INSERT_get_vinsert256_imm, [HasAVX512]>;
defm : vinsert_for_size_lowering<"VINSERTI64x4Z", v32i8x_info, v64i8_info,
vinsert256_insert, INSERT_get_vinsert256_imm, [HasAVX512]>;
defm : vinsert_for_size_lowering<"VINSERTF64x4Z", v16f16x_info, v32f16_info,
vinsert256_insert, INSERT_get_vinsert256_imm, [HasAVX512]>;
defm : vinsert_for_size_lowering<"VINSERTF64x4Z", v16bf16x_info, v32bf16_info,
vinsert256_insert, INSERT_get_vinsert256_imm, [HasAVX512]>;


multiclass vinsert_for_mask_cast<string InstrStr, X86VectorVTInfo From,
Expand Down Expand Up @@ -795,6 +801,8 @@ defm : vextract_for_size_lowering<"VEXTRACTI32x4Z256", v32i8x_info, v16i8x_info,
vextract128_extract, EXTRACT_get_vextract128_imm, [HasVLX]>;
defm : vextract_for_size_lowering<"VEXTRACTF32x4Z256", v16f16x_info, v8f16x_info,
vextract128_extract, EXTRACT_get_vextract128_imm, [HasVLX]>;
defm : vextract_for_size_lowering<"VEXTRACTF32x4Z256", v16bf16x_info, v8bf16x_info,
vextract128_extract, EXTRACT_get_vextract128_imm, [HasVLX]>;

// Codegen pattern with the alternative types extract VEC128 from VEC512
defm : vextract_for_size_lowering<"VEXTRACTI32x4Z", v32i16_info, v8i16x_info,
Expand All @@ -803,13 +811,17 @@ defm : vextract_for_size_lowering<"VEXTRACTI32x4Z", v64i8_info, v16i8x_info,
vextract128_extract, EXTRACT_get_vextract128_imm, [HasAVX512]>;
defm : vextract_for_size_lowering<"VEXTRACTF32x4Z", v32f16_info, v8f16x_info,
vextract128_extract, EXTRACT_get_vextract128_imm, [HasAVX512]>;
defm : vextract_for_size_lowering<"VEXTRACTF32x4Z", v32bf16_info, v8bf16x_info,
vextract128_extract, EXTRACT_get_vextract128_imm, [HasAVX512]>;
// Codegen pattern with the alternative types extract VEC256 from VEC512
defm : vextract_for_size_lowering<"VEXTRACTI64x4Z", v32i16_info, v16i16x_info,
vextract256_extract, EXTRACT_get_vextract256_imm, [HasAVX512]>;
defm : vextract_for_size_lowering<"VEXTRACTI64x4Z", v64i8_info, v32i8x_info,
vextract256_extract, EXTRACT_get_vextract256_imm, [HasAVX512]>;
defm : vextract_for_size_lowering<"VEXTRACTF64x4Z", v32f16_info, v16f16x_info,
vextract256_extract, EXTRACT_get_vextract256_imm, [HasAVX512]>;
defm : vextract_for_size_lowering<"VEXTRACTF64x4Z", v32bf16_info, v16bf16x_info,
vextract256_extract, EXTRACT_get_vextract256_imm, [HasAVX512]>;


// A 128-bit extract from bits [255:128] of a 512-bit vector should use a
Expand Down
21 changes: 21 additions & 0 deletions llvm/test/CodeGen/X86/bfloat.ll
Original file line number Diff line number Diff line change
Expand Up @@ -2805,3 +2805,24 @@ define <16 x bfloat> @concat_zero_v8bf16(<8 x bfloat> %x, <8 x bfloat> %y) {
%a = shufflevector <8 x bfloat> %x, <8 x bfloat> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
ret <16 x bfloat> %a
}

define <16 x bfloat> @concat_dup_v8bf16(<8 x bfloat> %x, <8 x bfloat> %y) {

This comment has been minimized.

Copy link
@phoebewang

phoebewang Mar 6, 2024

Contributor

I think the test case doesn't show the problem, because it only crashes without avx512vl. https://godbolt.org/z/7Gs5qYsKj
And if I read the code correctly, it doesn't fix the issue either because the v8bf16x_info <-> v16bf16x_info patterns require avx512vl too.

This comment has been minimized.

Copy link
@d0k

d0k Mar 6, 2024

Author Member

It's not a great test case, but it used to crash and works now. The crash that motivated this change was crashing on SPR, which has avx512vl. What needs to be done to get it to work without avx512vl?

I wish I had a more principled way of writing tests for this, it's really hard to coerce the IR into turning into exactly what the backend needs to see.

This comment has been minimized.

Copy link
@phoebewang

phoebewang Mar 6, 2024

Contributor

Sorry, I assumed godbolt hasn't included this change, but seems not true. https://godbolt.org/z/v6h9rxhca
So the patch just optimizated the codegen. But now it introduced a new crash :)

This comment has been minimized.

Copy link
@phoebewang

phoebewang Mar 6, 2024

Contributor

I see, guess caused by some change introduced between LLVM17 and LLVM18.

I'm reluctant to introduce extra code to handled case without avx512vl. We don't have a real target that has avx512bf16 but no avx512vl.

So if it is not a new crash introduced by this patch, we may leave it as is until someone think it is a issue.

; X86-LABEL: concat_dup_v8bf16:
; X86: # %bb.0:
; X86-NEXT: vmovddup {{.*#+}} xmm0 = xmm0[0,0]
; X86-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; X86-NEXT: retl
;
; SSE2-LABEL: concat_dup_v8bf16:
; SSE2: # %bb.0:
; SSE2-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0,0]
; SSE2-NEXT: retq
;
; AVX-LABEL: concat_dup_v8bf16:
; AVX: # %bb.0:
; AVX-NEXT: vmovddup {{.*#+}} xmm0 = xmm0[0,0]
; AVX-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; AVX-NEXT: retq
%a = shufflevector <8 x bfloat> %x, <8 x bfloat> %y, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
ret <16 x bfloat> %a
}

0 comments on commit 55c466d

Please sign in to comment.