-
Notifications
You must be signed in to change notification settings - Fork 15.1k
Description
As a result of a micro-optimization, we have the rust compiler generating several different variations of assertions that a particular 32-bit floating point value is zero with the hope that LLVM can optimize away a multiplication with said variable. Unfortunately, despite trying several different variations of "informing" LLVM that both the sign and shape of the f32 variable match those of +0.0, it doesn't seem to be able to perform this optimization.
Test case: +0.0 bitpattern is directly asserted via intrinsic
define noundef float @assert_bitpattern(float noundef %x, float noundef %y, float noundef %z) unnamed_addr {
start:
%cond = tail call i1 @llvm.is.fpclass.f32(float %y, i32 64)
tail call void @llvm.assume(i1 %cond)
%_7 = fmul float %x, 9.000000e+00
%_8 = fmul float %y, 8.000000e+00
%_6 = fadd float %_7, %_8
%_0 = fadd float %z, %_6
ret float %_0
}Result: no different than when @llvm.assume() isn't called, with both multiplications preserved and the addition of the multiplication by zero being included in the summation:
.LCPI0_0:
.long 0x41100000 # float 9
.LCPI0_1:
.long 0x41000000 # float 8
bitcast: # @bitcast
mulss xmm0, dword ptr [rip + .LCPI0_0]
mulss xmm1, dword ptr [rip + .LCPI0_1]
addss xmm0, xmm1
addss xmm0, xmm2
xorps xmm1, xmm1
addss xmm0, xmm1
retTest case: assert the bitpattern via type punning
define noundef float @bitcast(float noundef %x, float noundef %y, float noundef %z) unnamed_addr {
start:
%y_as_i32 = bitcast float %y to i32
%cond = icmp eq i32 %y_as_i32, 0
tail call void @llvm.assume(i1 %cond)
%_7 = fmul float %x, 9.000000e+00
%_8 = fmul float %y, 8.000000e+00
%_6 = fadd float %_7, %_8
%_5 = fadd float %z, %_6
%_0 = fadd float %_5, 0.000000e+00
ret float %_0
}Result: same as when asserting the bitpattern via @llvm.is.fpclass.f32(float %y, i32 64)
Test case: assert magnitude is zero (fcmp oeq float .., 0.0000e+00) performing an operation where the result does not change regardless of whether the variable is specifically +0.0 or -0.0 (because +0.0 is added at the end of the fp32 summation)
define noundef float @sign_irrelevant(float noundef %x, float noundef %y, float noundef %z) unnamed_addr {
start:
%cond = fcmp oeq float %y, 0.000000e+00
tail call void @llvm.assume(i1 %cond)
%_7 = fmul float %x, 9.000000e+00
%_8 = fmul float %y, 8.000000e+00
%_6 = fadd float %_7, %_8
%_5 = fadd float %z, %_6
%_0 = fadd float %_5, 0.000000e+00
ret float %_0
}Result: no different than when @llvm.assume() isn't called. (Regardless of whether or not the final fadd float %_5, 0.000e+00 is optimized away I would expect the fmul float %y, ... to be elided.)
Test case: assert that the compiler is capable of at least folding away the operation under any circumstance
define noundef float @folded_away(float noundef %x, float noundef %y, float noundef %z) unnamed_addr {
start:
%y_ = fadd float 0.000000e+00, 0.000000e+00
%_7 = fmul float %x, 9.000000e+00
%_8 = fmul float %y_, 8.000000e+00
%_6 = fadd float %_7, %_8
%_5 = fadd float %z, %_6
%_0 = fadd float %_5, 0.000000e+00
ret float %_0
}Result: here we finally observe the compiler optimizing away the multiplication and subsequent addition:
.LCPI3_0:
.long 0x41100000 # float 9
folded_away: # @folded_away
mulss xmm0, dword ptr [rip + .LCPI3_0]
xorps xmm1, xmm1
addss xmm0, xmm1
addss xmm0, xmm2
addss xmm0, xmm1
retThis is the assembly I would have expected ~all the test cases above to generate.
LLVM version: trunk as well as 21.1.0 and earlier versions
Target architecture: x86_64
Command line flags: -O3
Godbolt link: https://llvm.godbolt.org/z/7W95Ehsrd