New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change in floating-point behavior in clang 14 #54927
Comments
Check clang release notes https://releases.llvm.org/14.0.0/tools/clang/docs/ReleaseNotes.html#floating-point-support-in-clang |
As mentioned by @davidbolvansky, this is an expected change in behavior and should be in line with the C/C++ spec. I am going to close the issue, but please feel free to re-open the issue if you believe there is a bug/problem in LLVM/Clang. |
Is constant folding not giving the same result a problem or not? Shouldn't one expect precise mode to return the same value as before, which is consistent with other platforms, if source ordering and rounding properties are preserved? |
I cannot comment on the X86 case, but for AArch64 we will be using Constant folding perhaps doesn't avoid the rounding step with |
I see the same behavior as described for AArch64 when compiling for x86 at -O2: https://godbolt.org/z/3K4nGvfqn Does this value change happen for AArch64 at O0? The C language standard allows contracting of floating point operations, with the understanding that the different rounding behavior of a contracted expression will yield slightly different numeric results. The problem is that the compiler isn't required to contract the expression, so it might contract it in one place and not in another. I don't think this happens in practice with FP_CONTRACT=on (the default now) because the front end will generate an llvm.fmuladd intrinsic and that shouldn't get mixed with other operations. One thing here that we probably should consider a bug is that the constant folder is using APFloat::fusedMultiplyAdd() without checking to see if the target architecture supports FMA. The APFloat::fusedMultiplyAdd() will always evaluate the expression with excess precision and perform just a single rounding step on the result. However, the llvm.fmuladd() operation gets lowered to separate multiply and add operations with intermediate rounding when the subtarget doesn't support FMA, and in the case of X86 the default subtarget does not support FMA. |
Note: GCC doesn't care about constant folding returning a different value. I don't know whether it's considered a bug there, though. |
Also note: it's weird that |
There's also something going on with constant folding on x86-64: https://godbolt.org/z/W65MbzexP |
Report it to GCC devs please. If they say that no bug, then LLVM should follow them. |
This can happen with FMA because the zero result in the non-FMA computation is an artifact of intermediate rounding. I didn't work through the details in your example, but I see the same thing if I replace the constants with values that make the calculation (1.0 - 10.00.1). You look at that and think it should return 0.0, right? And it does without FMA, but the 0.1 can't be represented exactly and so 10.00.1 doesn't evaluate to exactly 1.0 at theoretically infinite precision and so the result with FMA and only the final result rounded is an artifact of the error in the representation of 0.1. Technically, it's more accurate given the inputs that the instruction received. It's just not what you'd expect. https://godbolt.org/z/YaG4d4MYf As for the FMA not happening with -ffp-contract=fast, that's an unfortunate byproduct of the way fp-contract=fast vs. fp-contract=on is represented in the IR. With fp-contract=on, we only want to allow FMA if all the terms occur in a single expression (otherwise, we'd be violating language standards) so the front end has to handle it by generating the llvm.fmuladd() intrinsic for the expression. With fp-contract=fast, we allow FMA anywhere the optimizer can bring together a set of operations with the 'contract' fast-math flag set. If the IR isn't constant folded, the backend would generate an FMA instruction, but the constant folder doesn't try to form FMA where the llvm.fmuladd() isn't used. You can see that happening here: https://godbolt.org/z/TGox1vMTe This is reasonable, because with fp-contract=fast there might be multiple terms combined depending on the surrounding operations, and so there is no guarantee of consistent numeric results. |
For everyone doing scientific computing and finally found this issue, I'm sure you will want to say F word to the person who initiated this change. |
Generally, FMA is more accurate, unless you meet certain extreme cases. However, for scientific computing we tackle with ill-conditioned numbers every day. This can easily lead to catastrophic bugs that won't be able debug (that's why |
I'm not sure this qualifies as a problem in itself, but I'll lay it down, hopefully someone who knows better will say whether something needs to be changed or not.
The following is reduced from fdlibm (https://github.com/freebsd/freebsd/tree/master/lib/msun/src), as used in Firefox for trigonometric functions with reduced fingerprint in JS, along with how the function is called within the context of a test that fails.
On x86 and x86_64, this prints
0.40677759702517241047
.On aarch64 with clang 13, this prints the same thing.
On aarch64 with clang 14, this prints
0.40677759702517235496
.Interestingly, things change when clang is allowed to optimize the function call, in which case it uses a constant with a value that doesn't match what the actual function call returns. That is, on x86_64, it prints
0.40677759702517235496
at -O3 (which uses a constant), but prints0.40677759702517241047
when__attribute__((noinline))
is added to the function definition.The behavior changed with f04e387 (and thus can be changed with
-ffp-contract=fast
).The text was updated successfully, but these errors were encountered: