Skip to content

[X86] Poor codegen for SSE ((x == 1) || (x == 2)) #66479

@RKSimon

Description

@RKSimon

https://godbolt.org/z/KT5fxKjbG

#include <x86intrin.h>
__m128i cmp1OR2_epi32(__m128i x) {
    __m128i is1 = _mm_cmpeq_epi32(x, _mm_set1_epi32(1));
    __m128i is2 = _mm_cmpeq_epi32(x, _mm_set1_epi32(2));
    return _mm_or_si128(is1, is2);
}

I used 1 and 2, but it occurs with any 2 sequential constants.

Due to an InstCombine fold, instead of just comparing the 2 sequential values and ORing the result, we end up with:

define <2 x i64> @cmp1OR2_epi32(<2 x i64> noundef %x) {
entry:
  %0 = bitcast <2 x i64> %x to <4 x i32>
  %1 = add <4 x i32> %0, <i32 -1, i32 -1, i32 -1, i32 -1>
  %or.i89 = icmp ult <4 x i32> %1, <i32 2, i32 2, i32 2, i32 2>
  %or.i8 = sext <4 x i1> %or.i89 to <4 x i32>
  %or.i = bitcast <4 x i32> %or.i8 to <2 x i64>
  ret <2 x i64> %or.i
}
cmp1OR2_epi32:
  pcmpeqd %xmm1, %xmm1
  paddd %xmm1, %xmm0
  movdqa .LCPI2_0(%rip), %xmm1 # xmm1 = [1,1,1,1]
  pminud %xmm0, %xmm1
  pcmpeqd %xmm1, %xmm0
  retq

I could /almost/ accept this if it meant we reduced constant loads, but on pre-SSE4 targets we end up with:

cmp1OR2_epi32:
  pcmpeqd %xmm1, %xmm1
  paddd %xmm0, %xmm1
  pxor .LCPI2_0(%rip), %xmm1
  movdqa .LCPI2_1(%rip), %xmm0 # xmm0 = [2147483650,2147483650,2147483650,2147483650]
  pcmpgtd %xmm1, %xmm0
  retq

Untangling the icmp_ult(add(x,-1),2) (or the general solution) shouldn't be too difficult.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions