-
Notifications
You must be signed in to change notification settings - Fork 15.6k
Description
Reproducer:
https://godbolt.org/z/YEvhYdPo7
Background:
We use the MLIR IR builder to generate MLIR code from C++ code.
We lower the MLIR code to LLVM IR, JIT compile and execute the resulting code.
The problem does not occur with representative C code, because C adds a sign extension instruction for the 8-bit signed integer, which prevents the optimization that causes the problem.
Test case:
C code representing the test case (does not cause the error when compiled with clang (see godbolt), but helps clarifying the setup):
void float_division(int8_t i8, int16_t i16, int32_t i32, int64_t i64, float f32, float* results) {
results[0] = f32 / i8;
results[1] = f32 / (i8 + 1);
results[2] = f32 / i16;
results[3] = f32 / (i16 + 1);
results[4] = f32 / i32;
results[5] = f32 / (i32 + 1);
results[6] = f32 / i64;
results[7] = f32 / (i64 + 1);
}Input:
float_division(1,1,1,1,2.0,results)
Expected result:
{2,1,2,1,2,1,2,1}
Actual result (using the MLIR builder):
{2,12,inf,2,1,2,1}
Architecture:
see godbold
Problem:
When switching from LLVM 20 to LLVM 21 one of our tests started to fail, returning incorrect results.
The godbolt link shows a boiled-down version.
In essence, the generated LLVM IR looks fine, but the generated assembly seems to work with and return poison values.
Details:
We have four signed integers, one 8-bit, one 16-bit, one 32-bit and one 64-bit and a 32-bit floating point value.
We divide the floating point value by each of the four integers and by each of the four integers + 1 (8 divisions overall).
The generated LLVM IR (not using Clang, but using the MLIR builder) essentially looks like this:
define void @faulty(i8 %0, i16 %1, i32 %2, i64 %3, float %4, ptr writeonly captures(none) initializes((0, 32)) %5) local_unnamed_addr #0 {
%7 = add i8 %0, 1
%8 = insertelement <2 x i8> poison, i8 %0, i64 0
%9 = insertelement <2 x i8> %8, i8 %7, i64 1
%10 = sitofp <2 x i8> %9 to <2 x float>
%11 = add i16 %1, 1
%12 = insertelement <2 x i16> poison, i16 %1, i64 0
%13 = insertelement <2 x i16> %12, i16 %11, i64 1
%14 = sitofp <2 x i16> %13 to <2 x float>
%15 = add i32 %2, 1
%16 = insertelement <2 x i32> poison, i32 %2, i64 0
%17 = insertelement <2 x i32> %16, i32 %15, i64 1
%18 = sitofp <2 x i32> %17 to <2 x float>
%19 = sitofp i64 %3 to float
%20 = add i64 %3, 1
%21 = sitofp i64 %20 to float
%22 = insertelement <8 x float> poison, float %4, i64 0
%23 = shufflevector <8 x float> %22, <8 x float> poison, <8 x i32> zeroinitializer
%24 = shufflevector <2 x float> %10, <2 x float> %14, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 poison, i32 poison, i32 poison, i32 poison>
%25 = shufflevector <2 x float> %18, <2 x float> poison, <8 x i32> <i32 0, i32 1, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison, i32 poison>
%26 = shufflevector <8 x float> %24, <8 x float> %25, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 poison, i32 poison>
%27 = insertelement <8 x float> %26, float %19, i64 6
%28 = insertelement <8 x float> %27, float %21, i64 7
%29 = fdiv <8 x float> %23, %28
store <8 x float> %29, ptr %5, align 4
ret void
}As far as we can see, the LLVM IR looks fine.
However, the generated assembly code:
faulty:
lea eax, [rdi + 1]
vmovd xmm1, edi
vpinsrb xmm1, xmm1, eax, 1
vmovd xmm2, esi
inc esi
vpinsrw xmm2, xmm2, esi, 1
vmovd xmm3, edx
inc edx
vpinsrd xmm3, xmm3, edx, 1
vcvtsi2ss xmm4, xmm15, rcx
vpunpckldq xmm1, xmm1, xmm2
vcvtdq2ps xmm2, xmm3
inc rcx
vcvtsi2ss xmm3, xmm15, rcx
vbroadcastss ymm0, xmm0
vpmovsxbd ymm1, xmm1
vcvtdq2ps ymm1, ymm1
vinsertf128 ymm2, ymm1, xmm2, 1
vextractf128 xmm1, ymm1, 1
vpunpcklqdq ymm1, ymm2, ymm1
vbroadcastss ymm2, xmm4
vblendps ymm1, ymm1, ymm2, 64
vbroadcastss ymm2, xmm3
vblendps ymm1, ymm1, ymm2, 128
vdivps ymm0, ymm0, ymm1
vmovups ymmword ptr [r8], ymm0
vzeroupper
retseems to be faulty/problematic.
As far as we can see (observed while debugging) these instructions
vpmovsxbd ymm1, xmm1
vcvtdq2ps ymm1, ymm1(can) load poison values (garbage) into the higher vector registers and then converts these poison values to floats.
The next instructions use these poisoned floats making them part of the result.
The godbolt reproducer demonstrates this behavior, causing inf to become part of the result.