-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Description
A customer of ours reported an inconvenience they hit due to the way we handle NaNs. They reported cases where at low optimization, expressions like:
0.0f * std::numeric_limits<float>::infinity()
produced 0xffc00000 (for succinctness, say -nan), whereas with optimization, the result was 0x7fc00000 (say, +nan). The IEEE standard (IEEE Std 754™-2008) doesn't require any specific NaN representation, so both are legal. And hence by definition, this isn't a bug.
That said, it's a problem/annoyance for them, in that they intentionally initialize some floating-point values to a NaN by multiplying +0 times +inf. At low optimization, this results in 0xffc00000. But with optimization, it produces 0x7fc00000 (because of compile-time folding). They have testing tools that do bit-wise comparisons of results, and they want those bit-wise comparisons to match across optimization levels.
I initially thought that we folded the product of +0 times +inf to +nan because we folded in an "intuitively sensible" way (in that we produced a NaN with the sign bit set according to the XOR of the sign bits of the factors). And I expected that -0 times +inf (or +0 times -inf) would fold to -nan. (And I thought the hardware behaved "strangely", in that it always produced a NaN with the sign-bit set -- I tried a handful of x86_64 targets, and they all produced the same negative NaN when the multiplication was done at run-time, rather than folded at compile-time; although I admit that I cannot find an x86_64 hardware spec that asserts the multiplication will do that.) But on experimenting, I found that when we fold these products at compile-time, we always fold them to +nan, irrespective of the sign bits of the factors.
In short, when optimization is enabled and so the following expressions are folded at compile-time:
0.0f * std::numeric_limits<float>::infinity()
(-0.0f) * std::numeric_limits<float>::infinity()
0.0f * (-std::numeric_limits<float>::infinity())
(-0.0f) * (-std::numeric_limits<float>::infinity())
they all produce 0x7fc00000 (+nan). but if they are computed at run-time, they all produce 0xffc00000 (-nan).
If in this folding we produced a NaN with the sign-bit being the XOR of the sign bits of the factors (that "intuitive" way), then I was going to suggest a workaround to the customer of initializing their values using the expression -0 * +inf. In which case, with or without optimization, the -nan result would be produced (that is, whether folded to a constant at compile-time, or produced by multiplying the factors at run-time, the result would be the -nan). But since we always fold to +nan, that idea doesn't work.
Here is a test showing the Clang (trunk) result:
https://godbolt.org/z/Es584o3eK
In that test-case, the compile-time computed results (that is, when compile-time folding is done) are all +nan, and the run-time results are all -nan.
As an experiment, I tried the same test-case with the Microsoft compiler (Version 19.29.30146 for x64), and it produced -nan for all the products (folded at compile-time, or computed at run-time). This is the case with and without optimization (/Od and /O2).
FTR, I also tried GCC, and different versions had different behavior. So there isn't much of a model that I can derive from that.
In summary, we could:
- Do nothing (leaving this "inconsistent" behavior between values computed at compile-time vs at run-time, and hence often different behavior at different optimization levels).
- Change our folding of these sorts of cases to produce a
-nan(mimicking the Microsoft behavior, and being consistent across optimization levels (at least for hardware that produces-nanfor these products)). - Change our folding to make the sign-bit of the NaN be the XOR of the sign bits of the two factors (creating the opportunity to write code that folds at compile-time in a way that matches the hardware behavior, and hence is handled consistently across optimization levels; but doesn't mimic the Microsoft behavior).
What do people think? I lean toward option 2, although I can see an argument for option 3 (especially if other hardware always produces a +nan for cases where the product is computed at run-time). There's also an argument for option 1, given that the IEEE Standard doesn't specify the behavior (and so users can not safely write code that depends on a particular behavior).