Change in floating-point behavior in clang 14 #54927

glandium · 2022-04-15T08:25:47Z

I'm not sure this qualifies as a problem in itself, but I'll lay it down, hopefully someone who knows better will say whether something needs to be changed or not.

The following is reduced from fdlibm (https://github.com/freebsd/freebsd/tree/master/lib/msun/src), as used in Firefox for trigonometric functions with reduced fingerprint in JS, along with how the function is called within the context of a test that fails.

#include <stdio.h>

static const double
half =  5.00000000000000000000e-01, /* 0x3FE00000, 0x00000000 */
S1  = -1.66666666666666324348e-01, /* 0xBFC55555, 0x55555549 */
S2  =  8.33333333332248946124e-03, /* 0x3F811111, 0x1110F8A6 */
S3  = -1.98412698298579493134e-04, /* 0xBF2A01A0, 0x19C161D5 */
S4  =  2.75573137070700676789e-06, /* 0x3EC71DE3, 0x57B1FE7D */
S5  = -2.50507602534068634195e-08, /* 0xBE5AE5E6, 0x8A2B9CEB */
S6  =  1.58969099521155010221e-10; /* 0x3DE5D93A, 0x5ACFD57C */

double
__kernel_sin(double x, double y, int iy)
{
        double z,r,v,w;

        z       =  x*x;
        w       =  z*z;
        r       =  S2+z*(S3+z*S4) + z*w*(S5+z*S6);
        v       =  z*x;
        if(iy==0) return x+v*(S1+z*r);
        else      return x-((z*(half*y-v*r)-y)-v*S1);
}

int main() {
        printf("%.20f\n", __kernel_sin(0.41892385060478199, 4.0204527408880763E-18, 1));
        return 0;
}

On x86 and x86_64, this prints 0.40677759702517241047.
On aarch64 with clang 13, this prints the same thing.
On aarch64 with clang 14, this prints 0.40677759702517235496.

Interestingly, things change when clang is allowed to optimize the function call, in which case it uses a constant with a value that doesn't match what the actual function call returns. That is, on x86_64, it prints 0.40677759702517235496 at -O3 (which uses a constant), but prints 0.40677759702517241047 when __attribute__((noinline)) is added to the function definition.

The behavior changed with f04e387 (and thus can be changed with -ffp-contract=fast).

The text was updated successfully, but these errors were encountered:

davidbolvansky · 2022-04-15T17:19:36Z

Check clang release notes https://releases.llvm.org/14.0.0/tools/clang/docs/ReleaseNotes.html#floating-point-support-in-clang

fhahn · 2022-04-15T18:05:06Z

As mentioned by @davidbolvansky, this is an expected change in behavior and should be in line with the C/C++ spec.

I am going to close the issue, but please feel free to re-open the issue if you believe there is a bug/problem in LLVM/Clang.

glandium · 2022-04-15T20:37:14Z

Is constant folding not giving the same result a problem or not? Shouldn't one expect precise mode to return the same value as before, which is consistent with other platforms, if source ordering and rounding properties are preserved?

fhahn · 2022-04-19T13:02:37Z

Is constant folding not giving the same result a problem or not? Shouldn't one expect precise mode to return the same value as before, which is consistent with other platforms, if source ordering and rounding properties are preserved?

I cannot comment on the X86 case, but for AArch64 we will be using fmadd, hence the difference and that is somewhat expected.

Constant folding perhaps doesn't avoid the rounding step with fmuladd intrinsic. Perhaps it could/should?

cc @andykaylor #50688

andykaylor · 2022-04-19T19:04:50Z

I see the same behavior as described for AArch64 when compiling for x86 at -O2: https://godbolt.org/z/3K4nGvfqn

Does this value change happen for AArch64 at O0?

The C language standard allows contracting of floating point operations, with the understanding that the different rounding behavior of a contracted expression will yield slightly different numeric results. The problem is that the compiler isn't required to contract the expression, so it might contract it in one place and not in another. I don't think this happens in practice with FP_CONTRACT=on (the default now) because the front end will generate an llvm.fmuladd intrinsic and that shouldn't get mixed with other operations.

One thing here that we probably should consider a bug is that the constant folder is using APFloat::fusedMultiplyAdd() without checking to see if the target architecture supports FMA. The APFloat::fusedMultiplyAdd() will always evaluate the expression with excess precision and perform just a single rounding step on the result. However, the llvm.fmuladd() operation gets lowered to separate multiply and add operations with intermediate rounding when the subtarget doesn't support FMA, and in the case of X86 the default subtarget does not support FMA.

glandium · 2022-04-20T00:14:17Z

Note: GCC doesn't care about constant folding returning a different value. I don't know whether it's considered a bug there, though.

glandium · 2022-04-20T04:13:48Z

Also note: it's weird that -ffp-contract=fast does not enable FMA in this code.

glandium · 2022-04-20T04:28:19Z

There's also something going on with constant folding on x86-64: https://godbolt.org/z/W65MbzexP

davidbolvansky · 2022-04-20T06:54:01Z

Note: GCC doesn't care about constant folding returning a different value. I don't know whether it's considered a bug there, though.

Report it to GCC devs please. If they say that no bug, then LLVM should follow them.

andykaylor · 2022-04-20T17:39:33Z

There's also something going on with constant folding on x86-64: https://godbolt.org/z/W65MbzexP

This can happen with FMA because the zero result in the non-FMA computation is an artifact of intermediate rounding. I didn't work through the details in your example, but I see the same thing if I replace the constants with values that make the calculation (1.0 - 10.00.1). You look at that and think it should return 0.0, right? And it does without FMA, but the 0.1 can't be represented exactly and so 10.00.1 doesn't evaluate to exactly 1.0 at theoretically infinite precision and so the result with FMA and only the final result rounded is an artifact of the error in the representation of 0.1. Technically, it's more accurate given the inputs that the instruction received. It's just not what you'd expect.

https://godbolt.org/z/YaG4d4MYf

As for the FMA not happening with -ffp-contract=fast, that's an unfortunate byproduct of the way fp-contract=fast vs. fp-contract=on is represented in the IR. With fp-contract=on, we only want to allow FMA if all the terms occur in a single expression (otherwise, we'd be violating language standards) so the front end has to handle it by generating the llvm.fmuladd() intrinsic for the expression. With fp-contract=fast, we allow FMA anywhere the optimizer can bring together a set of operations with the 'contract' fast-math flag set. If the IR isn't constant folded, the backend would generate an FMA instruction, but the constant folder doesn't try to form FMA where the llvm.fmuladd() isn't used. You can see that happening here:

https://godbolt.org/z/TGox1vMTe

This is reasonable, because with fp-contract=fast there might be multiple terms combined depending on the surrounding operations, and so there is no guarantee of consistent numeric results.

BrushXue · 2023-08-05T14:51:21Z

For everyone doing scientific computing and finally found this issue, I'm sure you will want to say F word to the person who initiated this change.

BrushXue · 2023-08-05T14:54:54Z

Generally, FMA is more accurate, unless you meet certain extreme cases. However, for scientific computing we tackle with ill-conditioned numbers every day. This can easily lead to catastrophic bugs that won't be able debug (that's why -O0 exists).

github-actions bot added the new issue label Apr 15, 2022

EugeneZelenko added llvm:optimizations and removed new issue labels Apr 15, 2022

fhahn closed this as completed Apr 15, 2022

EugeneZelenko added the question A question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead! label Apr 15, 2022

UUUnmei mentioned this issue Nov 2, 2022

unexpected results with clang-cl /fp:fast in release mode microsoft/DirectXMath#152

Closed

mingodad mentioned this issue Sep 6, 2023

Fresh build fail to compile sample c3lang/c3c#975

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change in floating-point behavior in clang 14 #54927

Change in floating-point behavior in clang 14 #54927

glandium commented Apr 15, 2022

davidbolvansky commented Apr 15, 2022

fhahn commented Apr 15, 2022

glandium commented Apr 15, 2022

fhahn commented Apr 19, 2022

andykaylor commented Apr 19, 2022

glandium commented Apr 20, 2022 •

edited

glandium commented Apr 20, 2022

glandium commented Apr 20, 2022

davidbolvansky commented Apr 20, 2022

andykaylor commented Apr 20, 2022

BrushXue commented Aug 5, 2023

BrushXue commented Aug 5, 2023

Change in floating-point behavior in clang 14 #54927

Change in floating-point behavior in clang 14 #54927

Comments

glandium commented Apr 15, 2022

davidbolvansky commented Apr 15, 2022

fhahn commented Apr 15, 2022

glandium commented Apr 15, 2022

fhahn commented Apr 19, 2022

andykaylor commented Apr 19, 2022

glandium commented Apr 20, 2022 • edited

glandium commented Apr 20, 2022

glandium commented Apr 20, 2022

davidbolvansky commented Apr 20, 2022

andykaylor commented Apr 20, 2022

BrushXue commented Aug 5, 2023

BrushXue commented Aug 5, 2023

glandium commented Apr 20, 2022 •

edited