New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue: 32bit x86 targets without SSE2 have unsound floating point behavior #114479
Comments
Can we just drop support for x86 without SSE2 or fall back to software floating point? |
We currently have the following targets in that category:
They are all tier 2. I assume people added them for a reason so I doubt they will be happy about having them dropped. Not sure to what extent floating point support is needed on those targets, but I don't think we have a concept of "target without FP support".
In theory of course we can, not sure if that would be any easier than following the Java approach. |
it should be much easier since LLVM already supports that, unlike the Java scheme (afaik) |
FWIW f32 on x86-32-noSSE should actually be fine, since double-rounding is okay as long as the precision gap between the two modes is big enough. Only f64 has a problem since it is "too close" to the 80bit-precision of the x87 FPU. On another note, we even have code in the standard library that temporarily alters the x87 FPU control word to ensure exact 64bit precision... |
I wonder if there's something that could be done about the fact that this also affects tier 1 targets with custom (stable) flags such as |
@RalfJung What about forcing non-SSE2 targets to software floating point? That must be supported anyway because of kernel code. |
Yeah that's an option listed above, since you already proposed it before. I have no idea how feasible it is. People are very concerned about the softfloat support for f16/f128 leading to code bloat and whatnot, so the same concerns would likely also apply here. AFAIK the kernel code just doesn't use floats, I don't think they have softfloats? |
…bilee add notes about non-compliant FP behavior on 32bit x86 targets Based on ton of prior discussion (see all the issues linked from rust-lang/unsafe-code-guidelines#237), the consensus seems to be that these targets are simply cursed and we cannot implement the desired semantics for them. I hope I properly understood what exactly the extent of the curse is here, let's make sure people with more in-depth FP knowledge take a close look! In particular for the tier 3 targets I have no clue which target is affected by which particular variant of the x86_32 FP curse. I assumed that `i686` meant SSE is used so the "floating point return value" is the only problem, while everything lower (`i586`, `i386`) meant x87 is used. I opened rust-lang#114479 to concisely describe and track the issue. Cc `@workingjubilee` `@thomcc` `@chorman0773` `@rust-lang/opsem` Fixes rust-lang#73288 Fixes rust-lang#72327
Rollup merge of rust-lang#113053 - RalfJung:x86_32-float, r=workingjubilee add notes about non-compliant FP behavior on 32bit x86 targets Based on ton of prior discussion (see all the issues linked from rust-lang/unsafe-code-guidelines#237), the consensus seems to be that these targets are simply cursed and we cannot implement the desired semantics for them. I hope I properly understood what exactly the extent of the curse is here, let's make sure people with more in-depth FP knowledge take a close look! In particular for the tier 3 targets I have no clue which target is affected by which particular variant of the x86_32 FP curse. I assumed that `i686` meant SSE is used so the "floating point return value" is the only problem, while everything lower (`i586`, `i386`) meant x87 is used. I opened rust-lang#114479 to concisely describe and track the issue. Cc `@workingjubilee` `@thomcc` `@chorman0773` `@rust-lang/opsem` Fixes rust-lang#73288 Fixes rust-lang#72327
Just to clarify, this is really only for 32-bit x86 non-SSE targets, and doesn't affect x86-64 non-SSE2 targets like x86-64-unknown-none?
I would guess we'll need to support i686-unknown-none like x86-64-unknown-none for use in operating system kernels like Linux that don't allow vector registers to be used (without extra work) even when the hardware has them.
That appears to be what |
x86-64-unknown-none uses softfloats so it should not be affected.
I think they were asking about targets where the hardware doesn't have SSE. |
This is not sufficient to ensure full IEEE754 compliance (for example see here). |
What's described there sounds different. That is using The proposal you quoted was to switch x87 precision such that the operation is performed with double-precision to begin with, entirely avoiding double rounding. It is possible that the x87 has further issues that make this not work, but the link you posted does not seem to even mention idea if changing the FPU control register to get a different precision, so as far as I can see it doesn't provide any evidence that setting the x87 to 64bit precision would lead to incorrect results. |
From the post (emphasis mine):
|
Hm, my understanding was that switching the FPU mode to 64bit would solve the problem. But of course it's possible that x87 screws up even when explicitly asked for IEEE 754 64bit arithmetic. 🤷 Requiring the FPU to be in 64bit mode is anyway not a realistic option, I listed it just for completeness' sake. |
running x87 in 53-bit mode works except that denormal |
Does the Java encoding behave correctly for denormals? That it can express too many exponents wouldn't be a problem if the extra precision doesn't lead to different rounding results. |
Yes, Java behaves correctly in all cases. It scales the exponent of one of the arguments before multiplication and division to ensure that where a 64-bit op would evaluate to a denormal, the 80-bit op does the same (and then it scales the exponent of the result back to what it should be). This is all described in the PDF linked from OP. |
Unfortunately, this doesn't just effect floating point arithmetic, as LLVM will load to and store from the x87 floating point stack even when just moving floats around. As loading and storing #[derive(Copy, Clone)]
#[repr(u32)]
enum Inner {
// Same bit pattern as a signalling NaN `f32`.
A = (u32::MAX << 23) | 1,
B,
}
#[derive(Copy, Clone)]
enum Data {
I(Inner),
F(f32),
}
#[inline(never)]
fn store_data(data: Data, data_out: &mut Data) {
// Suggest to LLVM that the data payload is a float.
std::hint::black_box(match data {
Data::I(x) => 0.0,
Data::F(x) => x,
});
// LLVM will optimise this to a float load and store (with a separate load/store for the discriminant).
*data_out = match data {
Data::I(x) => Data::I(x),
Data::F(x) => Data::F(x),
};
}
fn main() {
let mut res = Data::I(Inner::A);
store_data(Data::I(Inner::A), &mut res);
if let Data::I(res) = res {
// LLVM will optimise out the bounds check as the index should always be in range.
let index = (res as u32 - (u32::MAX << 23)) as usize;
dbg!([1, 2, 3, 4, 5][index]); // This will segfault.
} else {
unreachable!();
}
} |
Wow, that's a great example. It even works in entirely safe code. Impressive. We probably have to upgrade this issue to I-unsound then. Is there an upstream LLVM issue for this miscompilation? |
That is impressive. It looks closely related to llvm/llvm-project#44218. Perhaps it's yet another example to toss on that pile? I'm honestly starting to think LLVM should just deprecate f32 and f64 support on x87 targets. |
|
Atomic loads/stores use the integer-to-float/float-to-integer load/store instructions ( |
The above example will segfault on current stable (1.77.2), but not current nightly (2024-04-22). Making the #[derive(Copy, Clone)]
#[repr(u32)]
enum Inner {
// Same bit pattern as a signalling NaN `f32`.
A = (u32::MAX << 23) | 1,
B,
}
#[derive(Copy, Clone)]
enum Data {
I(Inner),
F(f32),
}
#[inline(never)]
fn store_data(data: &mut Data, data_out: &mut Data) {
// Suggest to LLVM that the data payload is a float.
std::hint::black_box(match *data {
Data::I(x) => 0.0,
Data::F(x) => x,
});
// LLVM will optimise this to a float load and store (with a separate load/store for the discriminant).
*data_out = match *data {
Data::I(x) => Data::I(x),
Data::F(x) => Data::F(x),
};
}
fn main() {
let mut res = Data::I(Inner::A);
store_data(&mut Data::I(Inner::A), &mut res);
if let Data::I(res) = res {
// LLVM will optimise out the bounds check as the index should always be in range.
let index = (res as u32 - (u32::MAX << 23)) as usize;
dbg!([1, 2, 3, 4, 5][index]); // This will segfault.
} else {
unreachable!();
}
} |
It's also possible to cause miscompilations due to the difference between what floats evaluate to at compile-time vs. at runtime. The following program, which is a very lightly modified version of @comex's example from a different issue (rust-lang/unsafe-code-guidelines#471 (comment), see that comment for more details on how this works), will segfault when ran after compiling with optimisations on i586 targets: #[inline(never)]
fn print_vals(x: f32, i: usize, vals_i: u32) {
println!("x={x} i={i} vals[i]={vals_i}");
}
#[inline(never)]
pub fn evil(vals: &[u32; 300]) {
// Loop variables:
let mut x: f32 = 0.0; // increments by 1-and-a-bit every time
let mut i: usize = 0; // increments by 2 every time
while x != 90.0 {
// LLVM will do a brute-force evaluation of this loop for up to 100
// iterations to try to calculate an iteration count. (See
// `llvm/lib/Analysis/ScalarEvolution.cpp`.) Under normal floating
// point semantics, `x` will equal exactly 90.0 after 90 iterations;
// LLVM discovers this by brute-force evaluation and concludes that the
// iteration count is always 90.
// Now, if this loop executes 90 times, then `i` must be in the range
// `0..180`, so the bounds check in `vals[i]` should always pass, so
// LLVM eliminates it.
print_vals(x, i, vals[i]);
// Update `x`. The exact computation doesn't matter that much; it just
// needs to:
// (a) be possible to constant-evaluate by brute force (i.e. by going
// through each iteration one at a time);
// (b) be too complex for IndVarSimplifyPass to simplify *without*
// brute force;
// (b) depend on floating point accuracy.
// First increment `x`, to make sure it's not just the same value every
// time (either in LLVM's opinion or in reality):
x += 1.0;
// This adds a small float to `x`. This should get rounded to no change
// as the float being added is too small to make a difference to `f32`'s
// 23-bit fraction. However, it will make a difference to the value of
// the `f80` on the x87 floating point stack. This means that `x` will
// no longer be a whole number and will never hit exactly 90.0.
x += (1.0_f32 / 2.0_f32.powi(25));
// Update `i`, the integer we use to index into `vals`. Why increment
// by 2 instead of 1? Because if we increment by 1, then LLVM notices
// that `i` happens to be equal to the loop count, and therefore it can
// replace the loop condition with `while i != 90`. With `i` as-is,
// LLVM could hypothetically replace the loop condition with
// `while i != 180`, but it doesn't.
i += 2;
}
}
pub fn main() {
// Make an array on the stack:
let mut vals: [u32; 300] = [0; 300];
for i in 0..300 { vals[i as usize] = i; }
evil(&vals);
} |
Nominating for t-compiler discussion. This tracking issue shows that we have targets that intersect tier platforms support in different ways. For example i686 are tier 1 but "non-SSE2" are tier 2 (and suffer from codegen unsoundnesses). All these differences are not apparent in our documentation. So as discussed on Zulip there are probably a number of questions:
(please feel free to add context/correct how I represent the problem, thanks!) @rustbot label I-compiler-nominated |
In a sense the issue stems from LLVM, yeah -- x86 without SSE seems to be a poorly supported target. Even in the best of cases, f64 just behaves incorrectly on that target (and f32, to a lesser extent), and then it turns out things are actually a lot worse. Altogether I think there are enough issues with floating point on x86 without SSE (this one, and also #115567) that IMO we should say that tier 1 hardfloat targets require SSE period. It is already the case that using feature flags to turn a hardfloat target into a softfloat target is unsound (Cc #116344), and we should simply hard-error in those cases (e.g. disabling the x87 feature on any hardfloat x86 target). IMO we should do the same when disabling SSE/SSE2 on an i686 target. |
On x86 (32bit) targets that cannot use SSE2 instructions (this includes the tier 1 i686 targets with flags that disable SSE2 support, such as
-C target-cpu=pentium
), floating-point operation can return results that are rounded in different ways than they should, and results can be "inconsistent": depending on whether const-propagation happened, the same computation can produce different results, leading to a program that seemingly contradicts itself. This is caused by using x87 instructions to perform floating-point arithmetic, which do not accurately implement IEEE floating-point semantics (not with the right precision, anyway).Worse, LLVM can use x87 register to store values it thinks are floats, which resets the signaling bit and thus alters the value -- leading to miscompilations.
This is a known and long-standing problem, and very hard to fix. The purpose of this issue mostly is to document its existence and to give it a URL that can be referenced.
Some ideas that have been floated for fixing this problem:
We could set the FPU control register to 64bit precision for Rust programs, and require other code to set the register in that way before calling into a Rust library.this does not workRelated issues:
Prior issues:
The text was updated successfully, but these errors were encountered: