Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instcombine incorrectly transforms store i64 -> store double #44497

Open
nunoplopes opened this issue Mar 9, 2020 · 14 comments
Open

Instcombine incorrectly transforms store i64 -> store double #44497

nunoplopes opened this issue Mar 9, 2020 · 14 comments
Labels
bugzilla miscompilation

Comments

@nunoplopes
Copy link
Member

nunoplopes commented Mar 9, 2020

Bugzilla Link 45152
Version trunk
OS All
CC @DMG862,@efriedma-quic,@ecnelises,@aqjune,@LebedevRI,@jfbastien,@RKSimon,@nikic,@RalfJung,@programmerjake,@regehr,@rotateright,@yuanfang-chen

Extended Description

The unit test "test/Transforms/InstCombine/bitcast-phi-uselistorder.ll" shows an incorrect transformation from load+store i64 into load/store double. These are not equivalent because NaN values can be canonicalized by the CPU so the store double can write a different bit-pattern than store i64.

Alive2's counterexample:
@​Q = global 8 bytes, align 8

define double @​test(i1 %c, * %p) {
%entry:
br i1 %c, label %if, label %end

%if:
%__constexpr_0 = bitcast * @​Q to *
%load = load i64, * %__constexpr_0, align 8
br label %end

%end:
%phi = phi i64 [ 0, %entry ], [ %load, %if ]
store i64 %phi, * %p, align 8
%cast = bitcast i64 %phi to double
ret double %cast
}
=>
@​Q = global 8 bytes, align 8

define double @​test(i1 %c, * %p) {
%entry:
br i1 %c, label %if, label %end

%if:
%load1 = load double, * @​Q, align 8
br label %end

%end:
%0 = phi double [ 0.000000, %entry ], [ %load1, %if ]
%1 = bitcast * %p to *
store double %0, * %1, align 8
ret double %0
}
Transformation doesn't verify!
ERROR: Mismatch in memory

Example:
i1 %c = #x1 (1)

  • %p = pointer(non-local, block_id=2, offset=64)

Source:

  • %__constexpr_0 = pointer(non-local, block_id=1, offset=0)
    i64 %load = #x7ff0000001000000 (9218868437244182528)
    i64 %phi = #x7ff0000001000000 (9218868437244182528)
    double %cast = NaN

Target:
double %load1 = NaN
double %0 = NaN

  • %1 = pointer(non-local, block_id=2, offset=64)

Mismatch in pointer(non-local, block_id=2, offset=64)
Source value: #x7ff0000001000000
Target value: #x7ff0000000020000

@LebedevRI
Copy link
Member

LebedevRI commented Mar 9, 2020

NaN values can be canonicalized by
the CPU so the store double can write a different bit-pattern than store i64.

Citation needed?

@efriedma-quic
Copy link
Collaborator

efriedma-quic commented Mar 9, 2020

I think the only target LLVM supports that has loads that canonicalize floats is 32-bit x86 (using non-SSE operations).

How we want to deal with that case is sort of complicated. On the one hand, we don't want to forbid hoisting loads with float type. On the other hand, it's impossible for a function with a double return type to return an SNaN in the standard x86 calling convention, so we'd need some special rule for that.

I'd prefer to say that we have to preserve SNaN patterns across non-arithmetic operations, and make the x86 backend deal with whatever complexity results from that.

@nunoplopes
Copy link
Member Author

nunoplopes commented Mar 9, 2020

I'm fairly sure I've seen instcombine rewrites that take advantage of NaN bits being don't care. That's why I implemented this semantics in Alive2.
So if we decide that a specific bit pattern must be preserved across NaN we need to work out the details. We would also need to specify what's the bit pattern produced by each operations that returns NaN. E.g. what's the value of (bitcast-to-i32 (fdiv 1.0 0.0))? It's target specific the very least, and in Alive2 is a non-deterministic value that covers all possible bit-patterns for NaN.

@jfbastien
Copy link
Member

jfbastien commented Mar 9, 2020

This seems like a bad bug, and I've seen something like it trigger before in GCC (funnily enough, GCC miscompiled clang):
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58416

It seems like this transform of i64 -> double shouldn't occur if the target canonicalizes NaNs. I don't think we need a particularly complex solution for this.

@efriedma-quic
Copy link
Collaborator

efriedma-quic commented Mar 9, 2020

I guess there are two models you could reason about. The model I was thinking about works something like the following:

  1. Non-arithmetic operations (load/store/select/phi/call without fast-math flags) preserve the exact bit pattern of a value, even if it's a NaN.
  2. Arithmetic operations that have a NaN as input produce an unspecified NaN as output.

I guess there's an alternative model, like you're proposing:

  1. NaN values have an unspecified bit pattern.
  2. Storing/bitcasting a NaN produces a NaN bit pattern, but the chosen NaN pattern is unspecified, and may be different for each operation.

Really, the semantic results of either rule is pretty similar. It's just a question of when the NaN pattern is fixed: when an instruction produces the value, or when a store/bitcast forces it to be fixed.

It's probably worth noting that any rule that means a "load" is allowed to raise an FP exception is going to make strict fp support a lot more complicated.

@nunoplopes
Copy link
Member Author

nunoplopes commented Mar 10, 2020

I agree with your summary, Eli. I would just add that in semantics 1), if an operation produces NaN when the operands are not, it produces an unspecified NaN bit-pattern.

The main reason I prefer to consider NaN to have an unspecified bit-pattern is because it makes it easier to support chips where load of an integer is not the same as load of a float. If this is not an issue, then we should document it and change Alive2 to match the semantics. And then remove any optimization that doesn't respect those semantics.

Bottom line: is the optimization present in this bug report important or can it be removed?

@efriedma-quic
Copy link
Collaborator

efriedma-quic commented Mar 11, 2020

This particular optimization probably isn't that valuable on its own; I mean, we want to avoid bitcasts where we can, but there are other ways of addressing this particular situation on most targets.

I have three concerns with making NaN values indeterminate in registers:

  1. We have a bunch of optimizations over memory operations that are essentially type-agnostic: they don't care what type is loaded/stored. We'd have to add a bunch of new checks.
  2. Destroying NaN-patterns would be very unfriendly to SIMD intrinsics; for example, if we decided that you can't use _mm_shuffle_ps on integer vectors.
  3. Allowing LLVM IR loads to raise FP exceptions probably isn't compatible with strict fp; we would need strictfp load intrinsics, and I don't think anyone wants to deal with that. If we do in fact forbid loads from raising an FP exception, every target needs to support some way to lower FP loads in a way that doesn't raise an exception. And if we have that support anyway, we can just use it unconditionally. (This should be possible on any target, at some cost to performance.)

@nunoplopes
Copy link
Member Author

nunoplopes commented Mar 12, 2020

Ok, give me some time and I'll compile a list of optimizations that are broken if we assume that specific NaN patterns get propagated.

@nunoplopes
Copy link
Member Author

nunoplopes commented Mar 30, 2020

Ok, I've implemented the semantics we discussed here in Alive2 for testing. This keeps all values in bit-vectors all the time, except when there's a float operation. Operands are converted from a bit-vector into float type, the operation is performed, and the result converted back to a bit-vector pattern.
The conversion from float->bit-vector produces all bit patterns for NaN.

The result is that there is only 1 regression in the LLVM test suite. On the other hand, only 2 tests get fixed.

The new test failure is this:
define i64 @​All11(i64 %in) {
; CHECK-NEXT: ret i64 0
%out = and i64 %in, xor (i64 bitcast (<2 x float> bitcast (i64 -1 to <2 x float>) to i64), i64 -1)
ret i64 %out
}

The problem for this test is that "bitcast (i64 -1 to <2 x float>)" shows up as llvm::ConstantFP, and thus Alive2's interpretation of the IR above is:
define i64 @​All11(i64 %in) {
%__constexpr_1 = bitcast <2 x float> { -nan, -nan } to i64
%__constexpr_0 = xor i64 %__constexpr_1, -1
%out = and i64 %in, %__constexpr_0
ret i64 %out
}

So the -1 bit pattern disappears. So not sure there's a way to distinguish a bit-pattern that happens to represent NaN from a float NaN (which we interpret as any bit-pattern that represents NaN).

Anyway, assuming we reach a conclusion on what to do with this test, the question is then about the backends. Are all backends ok for LLVM to assume that read/write of data into float registers doesn't change the bits?
(this is not true in one of our chips, but I think we can live with this)

@efriedma-quic
Copy link
Collaborator

efriedma-quic commented Mar 30, 2020

Probably under any model where floating-point "operations" are special, bitcast shouldn't count as a floating-point operation; otherwise, we lose the equivalence between bitcast and store+load.

Again, I think the only in-tree target with non-bit-preserving load/store operations is 32-bit x86 (using x87 operations to load float/double values; oddly, long double load/store are bit-preserving).

@nunoplopes
Copy link
Member Author

nunoplopes commented Apr 9, 2020

There's a related test failure in Transforms/InstCombine/minmax-fold.ll:
define <4 x i32> @​bitcasts_fcmp_1(<2 x i64> %a, <2 x i64> %b) {
%t0 = bitcast <2 x i64> %a to <4 x float>
%t1 = bitcast <2 x i64> %b to <4 x float>
%t2 = fcmp olt <4 x i1> %t1, %t0
%t3 = bitcast <2 x i64> %a to <4 x i32>
%t4 = bitcast <2 x i64> %b to <4 x i32>
%t5 = select <4 x i1> %t2, <4 x i32> %t3, <4 x i32> %t4
ret <4 x i32> %t5
}
=>
define <4 x i32> @​bitcasts_fcmp_1(<2 x i64> %a, <2 x i64> %b) {
%t0 = bitcast <2 x i64> %a to <4 x float>
%t1 = bitcast <2 x i64> %b to <4 x float>
%t2 = fcmp olt <4 x i1> %t1, %t0
%1 = select <4 x i1> %t2, <4 x float> %t0, <4 x float> %t1
%t5 = bitcast <4 x float> %1 to <4 x i32>
ret <4 x i32> %t5
}

In source, there's only a bitcast between integers, while in target there's bitcast of int->float->int. If the bit representation in NaN, then this roundtrip may change the bits (e.g. canonicalize the NaN).

@RalfJung
Copy link

RalfJung commented Jun 16, 2020

In Rust, we are also struggling with what exactly LLVM's NaN semantics are: rust-lang/rust#73328. Would be good to get a more precise documentation of those semantics -- though it seems that's still work-in-progress if I understand the discussion here correctly?

I'm fairly sure I've seen instcombine rewrites that take advantage of NaN bits being don't care. That's why I implemented this semantics in Alive2.

What was the semantics you implemented first, i.e., how is it different from the adjusted one you described later? Is it what Eli calls the "alternative model"? So, float/double LLVM variables actually have a different range of possible values than a memory location, and conversion happens on load/store/bitcast?

  1. Arithmetic operations that have a NaN as input produce an unspecified NaN as output.

Note that this means that these operations are non-deterministic! So, duplicating them would be an illegal transformation, similar to "freeze". Is that something LLVM is treating properly?

(In the "alternative model", likewise stores/bitcasts would be non-deterministic and must not be duplicated.)

@nunoplopes
Copy link
Member Author

nunoplopes commented Jun 16, 2020

I see 2 models:

  1. when you move a value in/out of a float register, the CPU canonicalizes the NaN value, so the original bit pattern may not be preserved. This is the semantics implemented ATM in Alive. When a float is stored to memory or bit-casted to int, we allow all don't care NaN bits to be arbitrary. This allows loads to be duplicated (bits are arbitrary, but fixed). The reasoning is that CPUs may canonicalize the NaN bits, but they always do it in the same way.

  2. CPUs are not allowed to change the NaN bits, hence they preserve the bit-pattern of the input. The non-determinism is moved to the float operations: if they produce NaN, they make all their NaN bits non-deterministic.

Each has pros and cons, as usual. The first one is required if people care about such processors. The second one allows some optimizations like the ones shown in this bug report.

@RalfJung
Copy link

RalfJung commented Jun 16, 2020

When a float is stored to memory or bit-casted to int, we allow all don't care NaN bits to be arbitrary. This allows loads to be duplicated (bits are arbitrary, but fixed). The reasoning is that CPUs may canonicalize the NaN bits, but they always do it in the same way.

(The store would be the problem with duplication here, not the load.)
Oh I see, so basically there is a fixed global parameter of the semantics which determines the NaN bit pattern? Or is it allowed to depend on some other factors?

The second one seems to disallow e.g. turning "float f = x+x; if ((int)f == (int)f)" into "if ((int)(x+x) == (int)(x+x))" as that would duplicate the non-determinism. This particular transformation likely makes little sense, but there might be other conditions under which recomputing a result could be beneficial.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla miscompilation
Projects
None yet
Development

No branches or pull requests

5 participants