Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"float_control(precise, on)" no longer has an effect if a function is inlined #82857

Closed
ormris opened this issue Feb 24, 2024 · 2 comments · Fixed by #88589
Closed

"float_control(precise, on)" no longer has an effect if a function is inlined #82857

ormris opened this issue Feb 24, 2024 · 2 comments · Fixed by #88589

Comments

@ormris
Copy link
Collaborator

ormris commented Feb 24, 2024

The float_control(precise, on) no longer has an effect if a function is inlined. In the example below, adding the "noinline" attribute to a function changes the instruction used in the division (vdivss vs vrcpps).

$ cat inline.cpp
#pragma float_control(push)
#pragma float_control(precise, on)
float e(float a, float b)
{
    return a / b;
}
#pragma float_control(pop)

struct c { float a, b, c, d; };

__attribute__((noinline)) c
b(float X, float Y)
{
    return c {
        2.0f * e(0.0f, X) - 1.0f,
        1.0f - 2.0f * e(3.0f, Y),
        2.0f * e(4.0f, X) - 1.0f,
        1.0f - 2.0f * e(5.0f, Y)
    };
}

int main()
{
    c f = b(1.0f, 2.0f);
    return (int) f.a + f.b + f.c + f.d;
}
$ diff -u -d inline.cpp noinline.cpp
--- inline.cpp  2024-02-23 17:26:53.041466900 -0800
+++ noinline.cpp        2024-02-23 17:27:16.320344400 -0800
@@ -1,5 +1,6 @@
 #pragma float_control(push)
 #pragma float_control(precise, on)
+__attribute__((noinline))
 float e(float a, float b)
 {
     return a / b;
$ clang -S -O3 -ffast-math inline.cpp
$ clang -S -O3 -ffast-math noinline.cpp
$ cat inline.s
...
_Z1bff:                                 # @_Z1bff
...
	vmovss	.LCPI1_2(%rip), %xmm3           # xmm3 = [-1.0E+0,0.0E+0,0.0E+0,0.0E+0]
	vdivss	%xmm1, %xmm2, %xmm2
	vrcpps	%xmm0, %xmm1 # <----------
	vaddss	.LCPI1_1(%rip), %xmm2, %xmm2
	vmulps	%xmm4, %xmm1, %xmm5
...
$ cat noinline.s
...
_Z1eff:                                 # @_Z1eff
	.cfi_startproc
# %bb.0:                                # %entry
	vdivss	%xmm1, %xmm0, %xmm0 # <----------
	retq
...
$
@ormris ormris added clang:frontend Language frontend issues, e.g. anything involving "Sema" floating-point Floating-point math labels Feb 24, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Feb 24, 2024

@llvm/issue-subscribers-clang-frontend

Author: Matthew Voss (ormris)

The `float_control(precise, on)` no longer has an effect if a function is inlined. In the example below, adding the "noinline" attribute to a function changes the instruction used in the division (vdivss vs vrcpps).
$ cat inline.cpp
#pragma float_control(push)
#pragma float_control(precise, on)
float e(float a, float b)
{
    return a / b;
}
#pragma float_control(pop)

struct c { float a, b, c, d; };

__attribute__((noinline)) c
b(float X, float Y)
{
    return c {
        2.0f * e(0.0f, X) - 1.0f,
        1.0f - 2.0f * e(3.0f, Y),
        2.0f * e(4.0f, X) - 1.0f,
        1.0f - 2.0f * e(5.0f, Y)
    };
}

int main()
{
    c f = b(1.0f, 2.0f);
    return (int) f.a + f.b + f.c + f.d;
}
$ diff -u -d inline.cpp noinline.cpp
--- inline.cpp  2024-02-23 17:26:53.041466900 -0800
+++ noinline.cpp        2024-02-23 17:27:16.320344400 -0800
@@ -1,5 +1,6 @@
 #pragma float_control(push)
 #pragma float_control(precise, on)
+__attribute__((noinline))
 float e(float a, float b)
 {
     return a / b;
$ clang -S -O3 -ffast-math inline.cpp
$ clang -S -O3 -ffast-math noinline.cpp
$ cat inline.s
...
_Z1bff:                                 # @<!-- -->_Z1bff
...
	vmovss	.LCPI1_2(%rip), %xmm3           # xmm3 = [-1.0E+0,0.0E+0,0.0E+0,0.0E+0]
	vdivss	%xmm1, %xmm2, %xmm2
	vrcpps	%xmm0, %xmm1 # &lt;----------
	vaddss	.LCPI1_1(%rip), %xmm2, %xmm2
	vmulps	%xmm4, %xmm1, %xmm5
...
$ cat noinline.s
...
_Z1eff:                                 # @<!-- -->_Z1eff
	.cfi_startproc
# %bb.0:                                # %entry
	vdivss	%xmm1, %xmm0, %xmm0 # &lt;----------
	retq
...
$

@andykaylor
Copy link
Contributor

andykaylor commented Apr 12, 2024

The inlining is not the real culprit here, of course. That just brings instructions together to expose a problem that's happening with InstCombine not respecting the fast-math flags (or lack thereof) on all instructions it's transforming. You can reproduce the issue with this:

define float @test(float %a) {
entry:
  %div = fdiv float 5.0, %a
  %mul = fmul fast float %div, 2.0 
  ret float %mul
}

https://godbolt.org/z/avxMbefjv

InstCombine is folding (C1 / X) * C --> (C * C1) / X and transferring the fast math flags from the fmul to the resulting fdiv. Unfortunately, it is doing this without checking the fast-math flags on the original fdiv operation. In fact, the only thing it's checking for is reassoc on the fmul. This is happening in InstCombinerImpl::foldFMulReassoc().

My smaller reproducer doesn't change the instruction generated, but that's because we won't do the reciprocal division approximation unless we can vectorize it. The root cause is the same though. The reciprocal approximation exposes a nasty issue here though, because if the input value is an infinity the reciprocal approximation will produce NaN.

https://godbolt.org/z/rexsfvEqd

andykaylor added a commit that referenced this issue Apr 16, 2024
This change updates a few of the transformations in foldFMulReassoc to
respect absent fast-math flags in cases where fmul and fdiv, fadd, or fsub
instructions were being folded but the code was only checking for
fast-math flags on the fmul instruction and was transferring flags to
the folded instruction that were not present on the other original 
instructions.

This fixes #82857
@EugeneZelenko EugeneZelenko added llvm:instcombine llvm:ir and removed clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Apr 16, 2024
qihangkong pushed a commit to rvgpu/llvm that referenced this issue Apr 18, 2024
This change updates a few of the transformations in foldFMulReassoc to
respect absent fast-math flags in a few cases where fmul and fdiv
instructions were being folded but the code was not checking for
fast-math flags on the fdiv instruction and was transferring flags to
the folded instruction that were not present on the original fdiv
instruction.

This fixes llvm/llvm-project#82857
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants