Skip to content

[Clang 21] Potential bug in lowering of shufflevector #173030

@alepping

Description

@alepping

Reproducer:
https://godbolt.org/z/YEvhYdPo7

Background:
We use the MLIR IR builder to generate MLIR code from C++ code.
We lower the MLIR code to LLVM IR, JIT compile and execute the resulting code.
The problem does not occur with representative C code, because C adds a sign extension instruction for the 8-bit signed integer, which prevents the optimization that causes the problem.

Test case:
C code representing the test case (does not cause the error when compiled with clang (see godbolt), but helps clarifying the setup):

void float_division(int8_t i8, int16_t i16, int32_t i32, int64_t i64, float f32, float* results) {
    results[0] = f32 / i8;
    results[1] = f32 / (i8 + 1);
    results[2] = f32 / i16;
    results[3] = f32 / (i16 + 1);
    results[4] = f32 / i32;
    results[5] = f32 / (i32 + 1);
    results[6] = f32 / i64;
    results[7] = f32 / (i64 + 1);
}

Input:
float_division(1,1,1,1,2.0,results)
Expected result:
{2,1,2,1,2,1,2,1}
Actual result (using the MLIR builder):
{2,12,inf,2,1,2,1}

Architecture:
see godbold

Problem:
When switching from LLVM 20 to LLVM 21 one of our tests started to fail, returning incorrect results.
The godbolt link shows a boiled-down version.
In essence, the generated LLVM IR looks fine, but the generated assembly seems to work with and return poison values.

Details:
We have four signed integers, one 8-bit, one 16-bit, one 32-bit and one 64-bit and a 32-bit floating point value.
We divide the floating point value by each of the four integers and by each of the four integers + 1 (8 divisions overall).
The generated LLVM IR (not using Clang, but using the MLIR builder) essentially looks like this:

define void @faulty(i8 %0, i16 %1, i32 %2, i64 %3, float %4, ptr writeonly captures(none) initializes((0, 32)) %5) local_unnamed_addr #0 {
  %7 = add i8 %0, 1
  %8 = insertelement <2 x i8> poison, i8 %0, i64 0
  %9 = insertelement <2 x i8> %8, i8 %7, i64 1
  %10 = sitofp <2 x i8> %9 to <2 x float>
  %11 = add i16 %1, 1
  %12 = insertelement <2 x i16> poison, i16 %1, i64 0
  %13 = insertelement <2 x i16> %12, i16 %11, i64 1
  %14 = sitofp <2 x i16> %13 to <2 x float>
  %15 = add i32 %2, 1
  %16 = insertelement <2 x i32> poison, i32 %2, i64 0
  %17 = insertelement <2 x i32> %16, i32 %15, i64 1
  %18 = sitofp <2 x i32> %17 to <2 x float>
  %19 = sitofp i64 %3 to float
  %20 = add i64 %3, 1
  %21 = sitofp i64 %20 to float
  %22 = insertelement <8 x float> poison, float %4, i64 0
  %23 = shufflevector <8 x float> %22, <8 x float> poison, <8 x i32> zeroinitializer
  %24 = shufflevector <2 x float> %10, <2 x float> %14, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
  %25 = shufflevector <2 x float> %18, <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
  %26 = shufflevector <8 x float> %24, <8 x float> %25, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 poison, i32 poison>
  %27 = insertelement <8 x float> %26, float %19, i64 6
  %28 = insertelement <8 x float> %27, float %21, i64 7
  %29 = fdiv <8 x float> %23, %28
  store <8 x float> %29, ptr %5, align 4
  ret void
}

As far as we can see, the LLVM IR looks fine.
However, the generated assembly code:

faulty:
        lea     eax, [rdi + 1]
        vmovd   xmm1, edi
        vpinsrb xmm1, xmm1, eax, 1
        vmovd   xmm2, esi
        inc     esi
        vpinsrw xmm2, xmm2, esi, 1
        vmovd   xmm3, edx
        inc     edx
        vpinsrd xmm3, xmm3, edx, 1
        vcvtsi2ss       xmm4, xmm15, rcx
        vpunpckldq      xmm1, xmm1, xmm2
        vcvtdq2ps       xmm2, xmm3
        inc     rcx
        vcvtsi2ss       xmm3, xmm15, rcx
        vbroadcastss    ymm0, xmm0
        vpmovsxbd       ymm1, xmm1
        vcvtdq2ps       ymm1, ymm1
        vinsertf128     ymm2, ymm1, xmm2, 1
        vextractf128    xmm1, ymm1, 1
        vpunpcklqdq     ymm1, ymm2, ymm1
        vbroadcastss    ymm2, xmm4
        vblendps        ymm1, ymm1, ymm2, 64
        vbroadcastss    ymm2, xmm3
        vblendps        ymm1, ymm1, ymm2, 128
        vdivps  ymm0, ymm0, ymm1
        vmovups ymmword ptr [r8], ymm0
        vzeroupper
        ret

seems to be faulty/problematic.
As far as we can see (observed while debugging) these instructions

vpmovsxbd       ymm1, xmm1
vcvtdq2ps       ymm1, ymm1

(can) load poison values (garbage) into the higher vector registers and then converts these poison values to floats.
The next instructions use these poisoned floats making them part of the result.
The godbolt reproducer demonstrates this behavior, causing inf to become part of the result.

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions