Skip to content

[X86] Poor AVX512 codegen with constant predicate #164399

@RKSimon

Description

@RKSimon

Noticed while reviewing constexpr handling of the predicated arithmetic:

define <16 x i32> @add(<16 x i32> %x, <16 x i32> %y) {
  %add = add <16 x i32> %y, %x
  %res = shufflevector <16 x i32> %add, <16 x i32> zeroinitializer, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31>
  ret <16 x i32> %res
}
add: # @add
  vpaddd %zmm0, %zmm1, %zmm0
  movw $255, %ax
  kmovd %eax, %k1
  vpexpandd %zmm0, %zmm0 {%k1} {z}
  retq

Lots of things going wrong here:

  1. Lowering the shuffle as an expansion instead of a select (which would fold into a predicated instruction)
  2. Use of movw/kmovd instead of kxnorb to rematerialize the 0xFF predicate mask directly
  3. Zeroing upper 256-bits of the vector - so this could have just been done as vpaddd %ymm0, %ymm1, %ymm0 for implicit zeroing

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions