Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between compile-time and runtime float precision on 32-bit x86 without SSE can cause miscompilation leading to segfault #89885

Open
beetrees opened this issue Apr 24, 2024 · 16 comments
Labels

Comments

@beetrees
Copy link
Contributor

beetrees commented Apr 24, 2024

The following program (based on the Rust version from here, which is based on @comex's example from a different issue; they gave an explanation of how they found it here)

#include <stdio.h>
#include <stddef.h>

void print_vals(float x, size_t i, int vals_i) {
	printf("x=%f i=%zu vals[i]=%i\n", x, i, vals_i);
}

void out_of_bounds(float x, size_t i) {
	printf("x=%f i=%zu vals[i]=out of bounds\n", x, i);
}

void evil(int vals[300]) {
	float x = 0.0;
	size_t i = 0;

	while (x != 90.0) {
		// At compile time, LLVM will brute-force the loop and discover that it
		// terminates after 90 iterations, with `i` always less than 300. This bounds
		// check therefore gets optimised out.
		if (i < 300) {
			print_vals(x, i, vals[i]);
		} else {
			out_of_bounds(x, i);
		}
		x += 1.0;
		// This addition is too small to have any effect on the value of `x` with
		// regular `float` precision, which is what LLVM uses at compile-time.
		// However, with the extra precision offered by x87 floating point registers,
		// the value of `x` will change slightly, meaning it will never hit exactly
		// 90.0 and therefore the loop will never terminate at runtime.
		x += 2.9802322387695312e-08;
		i += 2;
	}
}

int main() {
	int vals[300];
	for (int i = 0; i < 300; i++) {
		vals[i] = i;
	}
	evil(vals);
}

compiled with clang -O3 --target=i686-unknown-linux-gnu -mno-sse code.c will segfault at runtime. This is due to LLVM evaluating floats at standard float precision at compile-time, but outputting machine code that uses x87 extended precision at runtime. Specifically, llvm/lib/Analysis/ScalarEvolution.cpp will brute force the loop using compile-time semantics, causing the bounds check to be optimised out; however the extra precision of x87 extended precision floats will mean that the loop termination condition is never hit at runtime.

The LangRef appears to imply that the compile-time semantics are correct, so this is a bug in the x86 backend.

Related to #44218.

@efriedma-quic efriedma-quic added llvm:SCEV Scalar Evolution and removed new issue labels Apr 24, 2024
@efriedma-quic
Copy link
Collaborator

I suspect this isn't x87-specific; there's probably interactions with fast-math flags, math library calls, maybe some other stuff. The fundamental issue is that if we constant-fold an operation with a non-deterministic result, we need to actually replace the value in the IR, or else we might get a different result at runtime. So ScalarEvolution::computeExitCountExhaustively is unsound; it needs to be reimplemented as a fold in indvars. (Or we could forbid folding non-deterministic instructions in ScalarEvolution::computeExitCountExhaustively, but we lose some optimizations if we do that.)

See also https://reviews.llvm.org/D84792 , which is the same class of issue.

@Muon
Copy link

Muon commented Apr 25, 2024

The fundamental issue is that if we constant-fold an operation with a non-deterministic result, we need to actually replace the value in the IR, or else we might get a different result at runtime.

Every operation in that testcase is deterministic.

@beetrees
Copy link
Contributor Author

I think there's possibly two different bugs here:

Floating point accuracy

Which LLVM floating point operations guarantee deterministic results? As far as I'm aware there is a general presumption that add/sub/mul/div/rem/fma give perfect accurately rounded results, whereas other floating point operations might not. However, I was unable to find this codified anywhere in the LangRef. If this is true, then the LLVM IR optimisation is correct in the original example, and the x86 backend is responsible for the miscompilation.

Incorrect constant folding of non-deterministic operations

The original example can be adapted to use any operation which LLVM will constant-fold while brute-forcing a loop, but which differs between compile-time and runtime. rust-lang/rust#124364 is an example in Rust which uses the differences in the implementation of sin to cause a miscompilation leading to a segfault at runtime, but any non-deterministic const-foldable operation will do.

@Muon
Copy link

Muon commented Apr 25, 2024

The people writing the optimizations assume correct rounding, so it's the case de facto, regardless of the LangRef's opinion. It's also necessary to actually provide what C and C++ require. I did open a bug about the documentation a while ago: #60942

@nikic
Copy link
Contributor

nikic commented May 1, 2024

I suspect this isn't x87-specific; there's probably interactions with fast-math flags, math library calls, maybe some other stuff. The fundamental issue is that if we constant-fold an operation with a non-deterministic result, we need to actually replace the value in the IR, or else we might get a different result at runtime. So ScalarEvolution::computeExitCountExhaustively is unsound; it needs to be reimplemented as a fold in indvars. (Or we could forbid folding non-deterministic instructions in ScalarEvolution::computeExitCountExhaustively, but we lose some optimizations if we do that.)

Is your suggestion for the IndVars optimization to replace the exit with a comparison of a canonical IV against the exhaustive exit count there?

@efriedma-quic
Copy link
Collaborator

Is your suggestion for the IndVars optimization to replace the exit with a comparison of a canonical IV against the exhaustive exit count there?

That's what I was thinking (similar to the existing LFTR), but considering it a bit more, I'm not sure how safe that actually is. If the IV isn't used for anything else, it's fine, but in most interesting cases the floating-point IV will have other uses. Even if we do LFTR, the computed trip count needs to be consistent with the actual floating-point numbers we're using. So we'd actually need to replace the floating-point computation in each iteration with the constant-folded result. That's theoretically possible (just store the values in a constant table), but the profitability becomes a bit questionable.

@nikic
Copy link
Contributor

nikic commented May 2, 2024

Is your suggestion for the IndVars optimization to replace the exit with a comparison of a canonical IV against the exhaustive exit count there?

That's what I was thinking (similar to the existing LFTR), but considering it a bit more, I'm not sure how safe that actually is. If the IV isn't used for anything else, it's fine, but in most interesting cases the floating-point IV will have other uses. Even if we do LFTR, the computed trip count needs to be consistent with the actual floating-point numbers we're using. So we'd actually need to replace the floating-point computation in each iteration with the constant-folded result. That's theoretically possible (just store the values in a constant table), but the profitability becomes a bit questionable.

Right. I think we need to approach this the other way around and have a flag for ConstantFolding to skip these. At least as far as SCEV is concerned, I don't think we actually care about FP exits conditions there (I think it's more about things like load-from-constant based exit conditions).

A possible way to go about this would be to extend IIQ AllowUndef to AllowNonDeterministic and then also pass that to ConstantFolding as well.

(As already mentioned above, it doesn't fix the specific issue with x87 here, but rather analogous issues with FP libcalls on not-fundamentally-broken targets.)

@RalfJung
Copy link
Contributor

RalfJung commented May 2, 2024

I suspect this isn't x87-specific; there's probably interactions with fast-math flags, math library calls, maybe some other stuff. The fundamental issue is that if we constant-fold an operation with a non-deterministic result, we need to actually replace the value in the IR, or else we might get a different result at runtime.

Yeah I agree, determinism is the sticky point here. Note that even basic float operations are not deterministic as per #66579 -- if they produce a NaN, some aspects of that NaN are picked non-deterministically.

Every operation in that testcase is deterministic.

I think that is exactly the question. One could specify float operations as being basically non-deterministic and producing a result with arbitrary precision. In that case the x86 backend would be right and the optimizer would be wrong to assume determinism.

However that would be a pretty bad spec IMO; the Rust frontend for sure would like to be able to generate code that exactly has IEEE semantics, and I suspect other frontends have similar requirements. In that case basic float operations are deterministic unless they produce a NaN, and it is the (non-SSE) x86 backend that is wrong. Transcendental functions however are still non-deterministic; that's what happens with the sin issue.

@beetrees
Copy link
Contributor Author

beetrees commented May 2, 2024

Miscompilations can also occur when the bit-pattern of NaNs differs at compile-time and runtime. Here is an example in Rust that uses the differing signs of NaNs between compile-time and runtime to cause a miscompilation.

@RalfJung
Copy link
Contributor

RalfJung commented May 2, 2024

Nice example! So this does not just affect libcalls then, it affects all float operations as they can all produce NaNs (and LLVMs const-fold does not guarantee that the NaN bit pattern matches what the target would produce).

@nikic
Copy link
Contributor

nikic commented May 3, 2024

Simplified IR test case (https://llvm.godbolt.org/z/KW5s4oTPa):

; RUN: opt -S -passes=indvars < %s
define i64 @test() {
entry:
  br label %loop

loop:
  %iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
  %fv = phi double [ 1.000000e+00, %entry ], [ %fv.next, %loop ]
  call void @use(double %fv)
  %fv.next = call double @llvm.sin.f64(double %fv)
  %iv.next = add i64 %iv, 1
  %fcmp = fcmp une double %fv, 0x3FC6BA15EE8460B0
  br i1 %fcmp, label %loop, label %exit

exit:
  ret i64 %iv
}

declare void @use(double %i)
declare double @llvm.sin.f64(double)

The replacement with ret i64 90 shouldn't occur.

@nikic nikic self-assigned this May 3, 2024
nikic added a commit to nikic/llvm-project that referenced this issue May 3, 2024
nikic added a commit to nikic/llvm-project that referenced this issue May 16, 2024
nikic added a commit that referenced this issue May 20, 2024
…90942)

When calculating the exit count exhaustively, if any of the involved
operations is non-deterministic, the exit count we compute at
compile-time and the exit count at run-time may differ. Using these
non-deterministic constant folding results is only correct if we
actually replace all uses of the instruction with the value. SCEV (or
its consumers) generally don't do this.

Handle this by adding a new AllowNonDeterministic flag to the constant
folding API, and disabling it in SCEV. If non-deterministic results are
not allowed, do not fold FP lib calls in general, and FP operations
returning NaNs in particular. This could be made more precise (some FP
libcalls like fabs are fully deterministic), but I don't think this that
precise handling here is worthwhile.

Fixes the interesting part of
#89885.
@nikic
Copy link
Contributor

nikic commented May 20, 2024

The non-x87 part of this should be fixed now.

@RalfJung
Copy link
Contributor

RalfJung commented Sep 2, 2024

FWIW a similar issue exists with 32-bit ARM NEON; @beetrees produced an example that I shared here.

Should that be a separate issue, or this issue expanded?

@nikic
Copy link
Contributor

nikic commented Sep 2, 2024

I think it's best to track that as a separate issue, as the fix will not be in SCEV. The root cause is that vector FP is not IEEE 754 compliant on that target, which is #16648.

I think this could be nominally fixed by making all the neon builtins use "denormal-fp-math"="dynamic", which should also stop them from being constant folded for denormals. Though this wouldn't work without further changes because IIRC we currently don't inline dynamic denormals into non-dynamic. Maybe we could do this by also marking the caller as dynamic. Not familiar with the precise semantics of these attributes.

@RalfJung
Copy link
Contributor

RalfJung commented Sep 2, 2024

I think this could be nominally fixed by making all the neon builtins use "denormal-fp-math"="dynamic", which should also stop them from being constant folded for denormals.

And what should happen for code that uses float vectors with LLVM operations like fadd? Either fadd needs to be documented to have target-specific behavior even for supposedly portable types like float, or the backend can't just use NEON instructions for this.

@beetrees
Copy link
Contributor Author

beetrees commented Sep 2, 2024

Note the vector issue also occurs on PowerPC when the vsx target feature is not enabled (Rust issue with example).

arsenm pushed a commit that referenced this issue Oct 11, 2024
… to IEEE-754 (#102140)

Fixes #60942: IEEE semantics
is likely what many frontends want (it definitely is what Rust wants),
and it is what LLVM passes already assume when they use APFloat to
propagate float operations.

This does not reflect what happens on x87, but what happens there is
just plain unsound (#89885,
#44218); there is no coherent
specification that will describe this behavior correctly -- the backend
in combination with standard LLVM passes is just fundamentally buggy in
a hard-to-fix-way.

There's also the questions around flushing subnormals to zero, but [this
discussion](https://discourse.llvm.org/t/questions-about-llvm-canonicalize/79378)
seems to indicate a general stance of: this is specific non-standard
hardware behavior, and generally needs LLVM to be told that basic float
ops do not return the standard result. Just naively running
LLVM-compiled code on hardware configured to flush subnormals will lead
to #89885-like issues.

AFAIK this is also what Alive2 implements (@nunoplopes please correct me
if I am wrong).
DanielCChen pushed a commit to DanielCChen/llvm-project that referenced this issue Oct 16, 2024
… to IEEE-754 (llvm#102140)

Fixes llvm#60942: IEEE semantics
is likely what many frontends want (it definitely is what Rust wants),
and it is what LLVM passes already assume when they use APFloat to
propagate float operations.

This does not reflect what happens on x87, but what happens there is
just plain unsound (llvm#89885,
llvm#44218); there is no coherent
specification that will describe this behavior correctly -- the backend
in combination with standard LLVM passes is just fundamentally buggy in
a hard-to-fix-way.

There's also the questions around flushing subnormals to zero, but [this
discussion](https://discourse.llvm.org/t/questions-about-llvm-canonicalize/79378)
seems to indicate a general stance of: this is specific non-standard
hardware behavior, and generally needs LLVM to be told that basic float
ops do not return the standard result. Just naively running
LLVM-compiled code on hardware configured to flush subnormals will lead
to llvm#89885-like issues.

AFAIK this is also what Alive2 implements (@nunoplopes please correct me
if I am wrong).
bricknerb pushed a commit to bricknerb/llvm-project that referenced this issue Oct 17, 2024
… to IEEE-754 (llvm#102140)

Fixes llvm#60942: IEEE semantics
is likely what many frontends want (it definitely is what Rust wants),
and it is what LLVM passes already assume when they use APFloat to
propagate float operations.

This does not reflect what happens on x87, but what happens there is
just plain unsound (llvm#89885,
llvm#44218); there is no coherent
specification that will describe this behavior correctly -- the backend
in combination with standard LLVM passes is just fundamentally buggy in
a hard-to-fix-way.

There's also the questions around flushing subnormals to zero, but [this
discussion](https://discourse.llvm.org/t/questions-about-llvm-canonicalize/79378)
seems to indicate a general stance of: this is specific non-standard
hardware behavior, and generally needs LLVM to be told that basic float
ops do not return the standard result. Just naively running
LLVM-compiled code on hardware configured to flush subnormals will lead
to llvm#89885-like issues.

AFAIK this is also what Alive2 implements (@nunoplopes please correct me
if I am wrong).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants