Inconsistent run-time (on x86_64) and compile-time folding in NaN production

A customer of ours reported an inconvenience they hit due to the way we handle NaNs.  They reported cases where at low optimization, expressions like:

```
  0.0f  *   std::numeric_limits<float>::infinity()
```

produced `0xffc00000` (for succinctness, say `-nan`), whereas with optimization, the result was `0x7fc00000` (say, `+nan`).  The IEEE standard (IEEE Std 754™-2008) doesn't require any specific NaN representation, so both are legal.  And hence by definition, this isn't a bug.

That said, it's a problem/annoyance for them, in that they intentionally initialize some floating-point values to a NaN by multiplying +0 times +inf.  At low optimization, this results in 0xffc00000.  But with optimization, it produces 0x7fc00000 (because of compile-time folding).  They have testing tools that do bit-wise comparisons of results, and they want those bit-wise comparisons to match across optimization levels.

I initially thought that we folded the product of +0 times +inf to `+nan` because we folded in an "intuitively sensible" way (in that we produced a NaN with the sign bit set according to the XOR of the sign bits of the factors).  And I expected that -0 times +inf (or +0 times -inf) would fold to  `-nan`.  (And I thought the hardware behaved "strangely", in that it always produced a NaN with the sign-bit set -- I tried a handful of x86_64 targets, and they all produced the same negative NaN when the multiplication was done at run-time, rather than folded at compile-time; although I admit that I cannot find an x86_64 hardware spec that asserts the multiplication _will_ do that.)  But on experimenting, I found that when we fold these products at compile-time, we always fold them to `+nan`, irrespective of the sign bits of the factors.

In short, when optimization is enabled and so the following expressions are folded at compile-time:

```
    0.0f  *   std::numeric_limits<float>::infinity()
  (-0.0f) *   std::numeric_limits<float>::infinity()
    0.0f  * (-std::numeric_limits<float>::infinity())
  (-0.0f) * (-std::numeric_limits<float>::infinity())
```

they _all_ produce `0x7fc00000` (`+nan`). but if they are computed at run-time, they _all_ produce 0xffc00000 (`-nan`).

If in this folding we produced a NaN with the sign-bit being the XOR of the sign bits of the factors (that "intuitive" way), then I was going to suggest a workaround to the customer of initializing their values using the expression `-0 * +inf`.  In which case, with or without optimization, the `-nan` result would be produced (that is, whether folded to a constant at compile-time, or produced by multiplying the factors at run-time, the result would be the `-nan`).  But since we always fold to `+nan`, that idea doesn't work.

Here is a test showing the Clang (trunk) result:
https://godbolt.org/z/Es584o3eK

In that test-case, the compile-time computed results (that is, when compile-time folding is done) are all `+nan`, and the run-time results are all `-nan`.

As an experiment, I tried the same test-case with the Microsoft compiler (Version 19.29.30146 for x64), and it produced `-nan` for all the products (folded at compile-time, or computed at run-time).  This is the case with and without optimization (`/Od` and `/O2`).

FTR, I also tried GCC, and different versions had different behavior.  So there isn't much of a model that I can derive from that.

----

In summary, we could:

1. Do nothing (leaving this "inconsistent" behavior between values computed at compile-time vs at run-time, and hence often different behavior at different optimization levels).
2. Change our folding of these sorts of cases to produce a `-nan` (mimicking the Microsoft behavior, and being consistent across optimization levels (at least for hardware that produces `-nan` for these products)).
3. Change our folding to make the sign-bit of the NaN be the XOR of the sign bits of the two factors (creating the opportunity to write code that folds at compile-time in a way that matches the hardware behavior, and hence is handled consistently across optimization levels; but doesn't mimic the Microsoft behavior).

What do people think?  I lean toward option 2, although I can see an argument for option 3 (especially if other hardware always produces a `+nan` for cases where the product is computed at run-time).  There's also an argument for option 1, given that the IEEE Standard doesn't specify the behavior (and so users can not safely write code that depends on a particular behavior).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inconsistent run-time (on x86_64) and compile-time folding in NaN production #61973

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsistent run-time (on x86_64) and compile-time folding in NaN production #61973

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions