Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change in floating-point behavior in clang 14 #54927

Closed
glandium opened this issue Apr 15, 2022 · 12 comments
Closed

Change in floating-point behavior in clang 14 #54927

glandium opened this issue Apr 15, 2022 · 12 comments
Labels
llvm:optimizations question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!

Comments

@glandium
Copy link
Contributor

I'm not sure this qualifies as a problem in itself, but I'll lay it down, hopefully someone who knows better will say whether something needs to be changed or not.

The following is reduced from fdlibm (https://github.com/freebsd/freebsd/tree/master/lib/msun/src), as used in Firefox for trigonometric functions with reduced fingerprint in JS, along with how the function is called within the context of a test that fails.

#include <stdio.h>

static const double
half =  5.00000000000000000000e-01, /* 0x3FE00000, 0x00000000 */
S1  = -1.66666666666666324348e-01, /* 0xBFC55555, 0x55555549 */
S2  =  8.33333333332248946124e-03, /* 0x3F811111, 0x1110F8A6 */
S3  = -1.98412698298579493134e-04, /* 0xBF2A01A0, 0x19C161D5 */
S4  =  2.75573137070700676789e-06, /* 0x3EC71DE3, 0x57B1FE7D */
S5  = -2.50507602534068634195e-08, /* 0xBE5AE5E6, 0x8A2B9CEB */
S6  =  1.58969099521155010221e-10; /* 0x3DE5D93A, 0x5ACFD57C */

double
__kernel_sin(double x, double y, int iy)
{
        double z,r,v,w;

        z       =  x*x;
        w       =  z*z;
        r       =  S2+z*(S3+z*S4) + z*w*(S5+z*S6);
        v       =  z*x;
        if(iy==0) return x+v*(S1+z*r);
        else      return x-((z*(half*y-v*r)-y)-v*S1);
}

int main() {
        printf("%.20f\n", __kernel_sin(0.41892385060478199, 4.0204527408880763E-18, 1));
        return 0;
}

On x86 and x86_64, this prints 0.40677759702517241047.
On aarch64 with clang 13, this prints the same thing.
On aarch64 with clang 14, this prints 0.40677759702517235496.

Interestingly, things change when clang is allowed to optimize the function call, in which case it uses a constant with a value that doesn't match what the actual function call returns. That is, on x86_64, it prints 0.40677759702517235496 at -O3 (which uses a constant), but prints 0.40677759702517241047 when __attribute__((noinline)) is added to the function definition.

The behavior changed with f04e387 (and thus can be changed with -ffp-contract=fast).

@davidbolvansky
Copy link
Collaborator

@fhahn
Copy link
Contributor

fhahn commented Apr 15, 2022

As mentioned by @davidbolvansky, this is an expected change in behavior and should be in line with the C/C++ spec.

I am going to close the issue, but please feel free to re-open the issue if you believe there is a bug/problem in LLVM/Clang.

@fhahn fhahn closed this as completed Apr 15, 2022
@EugeneZelenko EugeneZelenko added the question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead! label Apr 15, 2022
@glandium
Copy link
Contributor Author

Is constant folding not giving the same result a problem or not? Shouldn't one expect precise mode to return the same value as before, which is consistent with other platforms, if source ordering and rounding properties are preserved?

@fhahn
Copy link
Contributor

fhahn commented Apr 19, 2022

Is constant folding not giving the same result a problem or not? Shouldn't one expect precise mode to return the same value as before, which is consistent with other platforms, if source ordering and rounding properties are preserved?

I cannot comment on the X86 case, but for AArch64 we will be using fmadd, hence the difference and that is somewhat expected.

Constant folding perhaps doesn't avoid the rounding step with fmuladd intrinsic. Perhaps it could/should?

cc @andykaylor #50688

@andykaylor
Copy link
Contributor

I see the same behavior as described for AArch64 when compiling for x86 at -O2: https://godbolt.org/z/3K4nGvfqn

Does this value change happen for AArch64 at O0?

The C language standard allows contracting of floating point operations, with the understanding that the different rounding behavior of a contracted expression will yield slightly different numeric results. The problem is that the compiler isn't required to contract the expression, so it might contract it in one place and not in another. I don't think this happens in practice with FP_CONTRACT=on (the default now) because the front end will generate an llvm.fmuladd intrinsic and that shouldn't get mixed with other operations.

One thing here that we probably should consider a bug is that the constant folder is using APFloat::fusedMultiplyAdd() without checking to see if the target architecture supports FMA. The APFloat::fusedMultiplyAdd() will always evaluate the expression with excess precision and perform just a single rounding step on the result. However, the llvm.fmuladd() operation gets lowered to separate multiply and add operations with intermediate rounding when the subtarget doesn't support FMA, and in the case of X86 the default subtarget does not support FMA.

@glandium
Copy link
Contributor Author

glandium commented Apr 20, 2022

Note: GCC doesn't care about constant folding returning a different value. I don't know whether it's considered a bug there, though.

@glandium
Copy link
Contributor Author

Also note: it's weird that -ffp-contract=fast does not enable FMA in this code.

@glandium
Copy link
Contributor Author

There's also something going on with constant folding on x86-64: https://godbolt.org/z/W65MbzexP

@davidbolvansky
Copy link
Collaborator

Note: GCC doesn't care about constant folding returning a different value. I don't know whether it's considered a bug there, though.

Report it to GCC devs please. If they say that no bug, then LLVM should follow them.

@andykaylor
Copy link
Contributor

There's also something going on with constant folding on x86-64: https://godbolt.org/z/W65MbzexP

This can happen with FMA because the zero result in the non-FMA computation is an artifact of intermediate rounding. I didn't work through the details in your example, but I see the same thing if I replace the constants with values that make the calculation (1.0 - 10.00.1). You look at that and think it should return 0.0, right? And it does without FMA, but the 0.1 can't be represented exactly and so 10.00.1 doesn't evaluate to exactly 1.0 at theoretically infinite precision and so the result with FMA and only the final result rounded is an artifact of the error in the representation of 0.1. Technically, it's more accurate given the inputs that the instruction received. It's just not what you'd expect.

https://godbolt.org/z/YaG4d4MYf

As for the FMA not happening with -ffp-contract=fast, that's an unfortunate byproduct of the way fp-contract=fast vs. fp-contract=on is represented in the IR. With fp-contract=on, we only want to allow FMA if all the terms occur in a single expression (otherwise, we'd be violating language standards) so the front end has to handle it by generating the llvm.fmuladd() intrinsic for the expression. With fp-contract=fast, we allow FMA anywhere the optimizer can bring together a set of operations with the 'contract' fast-math flag set. If the IR isn't constant folded, the backend would generate an FMA instruction, but the constant folder doesn't try to form FMA where the llvm.fmuladd() isn't used. You can see that happening here:

https://godbolt.org/z/TGox1vMTe

This is reasonable, because with fp-contract=fast there might be multiple terms combined depending on the surrounding operations, and so there is no guarantee of consistent numeric results.

@BrushXue
Copy link

BrushXue commented Aug 5, 2023

For everyone doing scientific computing and finally found this issue, I'm sure you will want to say F word to the person who initiated this change.

@BrushXue
Copy link

BrushXue commented Aug 5, 2023

Generally, FMA is more accurate, unless you meet certain extreme cases. However, for scientific computing we tackle with ill-conditioned numbers every day. This can easily lead to catastrophic bugs that won't be able debug (that's why -O0 exists).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:optimizations question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!
Projects
None yet
Development

No branches or pull requests

6 participants