[X86] Poor codegen for SSE ((x == 1) || (x == 2))

https://godbolt.org/z/KT5fxKjbG
```c
#include <x86intrin.h>
__m128i cmp1OR2_epi32(__m128i x) {
    __m128i is1 = _mm_cmpeq_epi32(x, _mm_set1_epi32(1));
    __m128i is2 = _mm_cmpeq_epi32(x, _mm_set1_epi32(2));
    return _mm_or_si128(is1, is2);
}
```
I used 1 and 2, but it occurs with any 2 sequential constants.

Due to an InstCombine fold, instead of just comparing the 2 sequential values and ORing the result, we end up with:
```ll
define <2 x i64> @cmp1OR2_epi32(<2 x i64> noundef %x) {
entry:
  %0 = bitcast <2 x i64> %x to <4 x i32>
  %1 = add <4 x i32> %0, <i32 -1, i32 -1, i32 -1, i32 -1>
  %or.i89 = icmp ult <4 x i32> %1, <i32 2, i32 2, i32 2, i32 2>
  %or.i8 = sext <4 x i1> %or.i89 to <4 x i32>
  %or.i = bitcast <4 x i32> %or.i8 to <2 x i64>
  ret <2 x i64> %or.i
}
```
```asm
cmp1OR2_epi32:
  pcmpeqd %xmm1, %xmm1
  paddd %xmm1, %xmm0
  movdqa .LCPI2_0(%rip), %xmm1 # xmm1 = [1,1,1,1]
  pminud %xmm0, %xmm1
  pcmpeqd %xmm1, %xmm0
  retq
```
I could /almost/ accept this if it meant we reduced constant loads, but on pre-SSE4  targets we end up with:
```asm
cmp1OR2_epi32:
  pcmpeqd %xmm1, %xmm1
  paddd %xmm0, %xmm1
  pxor .LCPI2_0(%rip), %xmm1
  movdqa .LCPI2_1(%rip), %xmm0 # xmm0 = [2147483650,2147483650,2147483650,2147483650]
  pcmpgtd %xmm1, %xmm0
  retq
```
Untangling the icmp_ult(add(x,-1),2) (or the general solution) shouldn't be too difficult.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[X86] Poor codegen for SSE ((x == 1) || (x == 2)) #66479

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[X86] Poor codegen for SSE ((x == 1) || (x == 2)) #66479

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions