Tracking issue: 32bit x86 targets without SSE2 have unsound floating point behavior #114479

RalfJung · 2023-08-04T18:46:10Z

On x86 (32bit) targets that cannot use SSE2 instructions (this includes the tier 1 i686 targets with flags that disable SSE2 support, such as -C target-cpu=pentium), floating-point operation can return results that are rounded in different ways than they should, and results can be "inconsistent": depending on whether const-propagation happened, the same computation can produce different results, leading to a program that seemingly contradicts itself. This is caused by using x87 instructions to perform floating-point arithmetic, which do not accurately implement IEEE floating-point semantics (not with the right precision, anyway).

Worse, LLVM can use x87 register to store values it thinks are floats, which resets the signaling bit and thus alters the value -- leading to miscompilations.

This is a known and long-standing problem, and very hard to fix. The purpose of this issue mostly is to document its existence and to give it a URL that can be referenced.

Some ideas that have been floated for fixing this problem:

We could emit different instruction sequences for floating-point operations that emulate the expected rounding behavior using x87 instructions. This will likely require changes deep in LLVM's x86 backend. This is what Java does.
We could use softfloats.
~~We could set the FPU control register to 64bit precision for Rust programs, and require other code to set the register in that way before calling into a Rust library.~~ this does not work

Related issues:

Wrong optimization: instability of x87 floating-point results leads to nonsense llvm/llvm-project#44218: basically the same bug on the LLVM side, and pointing out the potential for soundness issues
Tracking issue: 32bit x86 targets lose float NaN payload in return values #115567: that's a different problem, also affecting x86-32 but unrelated to what happens when an FP operation is executed; it is about the behavior of NaN bits as they get returned from a function

Prior issues:

Comparing to infinity is buggy on x87 #72327

The text was updated successfully, but these errors were encountered:

DemiMarie · 2023-08-04T20:25:19Z

Can we just drop support for x86 without SSE2 or fall back to software floating point?

RalfJung · 2023-08-04T21:02:27Z

Can we just drop support for x86 without SSE2

We currently have the following targets in that category:

i586-pc-windows-msvc
i586-unknown-linux-gnu
i586-unknown-linux-musl

They are all tier 2. I assume people added them for a reason so I doubt they will be happy about having them dropped. Not sure to what extent floating point support is needed on those targets, but I don't think we have a concept of "target without FP support".

or fall back to software floating point?

In theory of course we can, not sure if that would be any easier than following the Java approach.

programmerjake · 2023-08-04T21:49:04Z

or fall back to software floating point?

In theory of course we can, not sure if that would be any easier than following the Java approach.

it should be much easier since LLVM already supports that, unlike the Java scheme (afaik)

https://gcc.godbolt.org/z/14WdnKhhP

RalfJung · 2023-08-14T14:16:53Z

FWIW f32 on x86-32-noSSE should actually be fine, since double-rounding is okay as long as the precision gap between the two modes is big enough. Only f64 has a problem since it is "too close" to the 80bit-precision of the x87 FPU.

On another note, we even have code in the standard library that temporarily alters the x87 FPU control word to ensure exact 64bit precision...

RalfJung · 2023-09-05T12:47:33Z

I wonder if there's something that could be done about the fact that this also affects tier 1 targets with custom (stable) flags such as -C target-cpu=pentium. Do we still want to consider that a tier 1 target in itself? Is there a way that we can just reject flags that would disable SSE2 support, and tell people to use a different target instead?

DemiMarie · 2023-09-05T14:20:49Z

@RalfJung What about forcing non-SSE2 targets to software floating point? That must be supported anyway because of kernel code.

RalfJung · 2023-09-05T14:59:07Z

Yeah that's an option listed above, since you already proposed it before. I have no idea how feasible it is. People are very concerned about the softfloat support for f16/f128 leading to code bloat and whatnot, so the same concerns would likely also apply here.

AFAIK the kernel code just doesn't use floats, I don't think they have softfloats?

…bilee add notes about non-compliant FP behavior on 32bit x86 targets Based on ton of prior discussion (see all the issues linked from rust-lang/unsafe-code-guidelines#237), the consensus seems to be that these targets are simply cursed and we cannot implement the desired semantics for them. I hope I properly understood what exactly the extent of the curse is here, let's make sure people with more in-depth FP knowledge take a close look! In particular for the tier 3 targets I have no clue which target is affected by which particular variant of the x86_32 FP curse. I assumed that `i686` meant SSE is used so the "floating point return value" is the only problem, while everything lower (`i586`, `i386`) meant x87 is used. I opened rust-lang#114479 to concisely describe and track the issue. Cc `@workingjubilee` `@thomcc` `@chorman0773` `@rust-lang/opsem` Fixes rust-lang#73288 Fixes rust-lang#72327

Rollup merge of rust-lang#113053 - RalfJung:x86_32-float, r=workingjubilee add notes about non-compliant FP behavior on 32bit x86 targets Based on ton of prior discussion (see all the issues linked from rust-lang/unsafe-code-guidelines#237), the consensus seems to be that these targets are simply cursed and we cannot implement the desired semantics for them. I hope I properly understood what exactly the extent of the curse is here, let's make sure people with more in-depth FP knowledge take a close look! In particular for the tier 3 targets I have no clue which target is affected by which particular variant of the x86_32 FP curse. I assumed that `i686` meant SSE is used so the "floating point return value" is the only problem, while everything lower (`i586`, `i386`) meant x87 is used. I opened rust-lang#114479 to concisely describe and track the issue. Cc `@workingjubilee` `@thomcc` `@chorman0773` `@rust-lang/opsem` Fixes rust-lang#73288 Fixes rust-lang#72327

briansmith · 2023-11-11T21:06:13Z

Just to clarify, this is really only for 32-bit x86 non-SSE targets, and doesn't affect x86-64 non-SSE2 targets like x86-64-unknown-none?

Can we just drop support for x86 without SSE2

I would guess we'll need to support i686-unknown-none like x86-64-unknown-none for use in operating system kernels like Linux that don't allow vector registers to be used (without extra work) even when the hardware has them.

or fall back to software floating point?

That appears to be what x86_64-unknown-none does, one of the non-SSE2-by-default x86-64 targets. At least the table at https://doc.rust-lang.org/beta/rustc/platform-support.html says "softfloat", though the detailed target documentation at https://doc.rust-lang.org/beta/rustc/platform-support/x86_64-unknown-none.html doesn't mention softfloat.

RalfJung · 2023-11-12T09:55:15Z

Just to clarify, this is really only for 32-bit x86 non-SSE targets, and doesn't affect x86-64 non-SSE2 targets like x86-64-unknown-none?

x86-64-unknown-none uses softfloats so it should not be affected.
I don't think there are any x86-64 hardfloat targets without SSE.

I would guess we'll need to support i686-unknown-none like x86-64-unknown-none for use in operating system kernels like Linux that don't allow vector registers to be used (without extra work) even when the hardware has them.

I think they were asking about targets where the hardware doesn't have SSE.
Softfloat targets are fine, assuming the softfloat libraries implement the IEEE spec correctly.

Jules-Bertholet · 2023-11-13T16:29:10Z

We could set the FPU control register to 64bit precision for Rust programs, and require other code to set the register in that way before calling into a Rust library.

This is not sufficient to ensure full IEEE754 compliance (for example see here).

RalfJung · 2023-11-13T17:16:21Z

What's described there sounds different. That is using volatile to force rounding from "extended precision" to "double precision" after each operation. But of course the actual operation is still done with extended precision, which leads to double-rounding, which leads to wrong results.

The proposal you quoted was to switch x87 precision such that the operation is performed with double-precision to begin with, entirely avoiding double rounding.

It is possible that the x87 has further issues that make this not work, but the link you posted does not seem to even mention idea if changing the FPU control register to get a different precision, so as far as I can see it doesn't provide any evidence that setting the x87 to 64bit precision would lead to incorrect results.

Jules-Bertholet · 2023-11-13T18:53:14Z

From the post (emphasis mine):

Floating-point calculations done with x87 FPU instructions are done in extended-precision registers, even when the processor is in double-precision mode and the variables being operated on are double-precision.

RalfJung · 2023-11-13T19:17:47Z

Hm, my understanding was that switching the FPU mode to 64bit would solve the problem. But of course it's possible that x87 screws up even when explicitly asked for IEEE 754 64bit arithmetic. 🤷

Requiring the FPU to be in 64bit mode is anyway not a realistic option, I listed it just for completeness' sake.

programmerjake · 2023-11-13T19:53:43Z

running x87 in 53-bit mode works except that denormal f64 results have too much precision and the exponent range is too big (e.g. it can express 2^-8000 but f64 can't)

RalfJung · 2023-11-13T20:10:49Z

Does the Java encoding behave correctly for denormals?

That it can express too many exponents wouldn't be a problem if the extra precision doesn't lead to different rounding results.

Jules-Bertholet · 2023-11-13T20:20:57Z

Yes, Java behaves correctly in all cases. It scales the exponent of one of the arguments before multiplication and division to ensure that where a 64-bit op would evaluate to a denormal, the 80-bit op does the same (and then it scales the exponent of the result back to what it should be). This is all described in the PDF linked from OP.

beetrees · 2024-04-23T11:27:11Z

Unfortunately, this doesn't just effect floating point arithmetic, as LLVM will load to and store from the x87 floating point stack even when just moving floats around. As loading and storing f32s and f64s to and from the x87 floating point stack quiets signalling NaNs , but LLVM assumes that it does not, this can lead to miscompilations. For instance, the following program will segfault when ran after compiling with optimisations for an i586 target (e.g. rustc -O --target=i586-unknown-linux-gnu code.rs).

#[derive(Copy, Clone)]
#[repr(u32)]
enum Inner {
    // Same bit pattern as a signalling NaN `f32`.
    A = (u32::MAX << 23) | 1,
    B,
}

#[derive(Copy, Clone)]
enum Data {
    I(Inner),
    F(f32),
}

#[inline(never)]
fn store_data(data: Data, data_out: &mut Data) {
    // Suggest to LLVM that the data payload is a float.
    std::hint::black_box(match data {
        Data::I(x) => 0.0,
        Data::F(x) => x,
    });
    // LLVM will optimise this to a float load and store (with a separate load/store for the discriminant).
    *data_out = match data {
        Data::I(x) => Data::I(x),
        Data::F(x) => Data::F(x),
    };
}

fn main() {
    let mut res = Data::I(Inner::A);
    store_data(Data::I(Inner::A), &mut res);
    if let Data::I(res) = res {
        // LLVM will optimise out the bounds check as the index should always be in range.
        let index = (res as u32 - (u32::MAX << 23)) as usize;
        dbg!([1, 2, 3, 4, 5][index]); // This will segfault.
    } else {
        unreachable!();
    }
}

RalfJung · 2024-04-23T11:37:40Z

Wow, that's a great example. It even works in entirely safe code. Impressive.

We probably have to upgrade this issue to I-unsound then. Is there an upstream LLVM issue for this miscompilation?

Muon · 2024-04-23T12:58:39Z

That is impressive. It looks closely related to llvm/llvm-project#44218. Perhaps it's yet another example to toss on that pile? I'm honestly starting to think LLVM should just deprecate f32 and f64 support on x87 targets.

taiki-e · 2024-04-23T14:23:15Z

As loading and storing f32s and f64s to and from the x87 floating point stack quiets signalling NaNs

~~(I have not tested it, but I wonder if this x87's behavior actually affects not only floats, but also the i586's 64-bit atomic load/store, which uses x87 load/store instructions.)~~ EDIT: sorry, this my comment is not correct: see #114479 (comment)

beetrees · 2024-04-23T14:46:51Z

Atomic loads/stores use the integer-to-float/float-to-integer load/store instructions (fild/fistp) as opposed to the float-to-float load/store instructions (fld/fstp) and are therefore unaffected.

beetrees · 2024-04-23T16:02:18Z

The above example will segfault on current stable (1.77.2), but not current nightly (2024-04-22). Making the data argument of the store_data function an &mut reference makes it segfault on both:

#[derive(Copy, Clone)]
#[repr(u32)]
enum Inner {
    // Same bit pattern as a signalling NaN `f32`.
    A = (u32::MAX << 23) | 1,
    B,
}

#[derive(Copy, Clone)]
enum Data {
    I(Inner),
    F(f32),
}

#[inline(never)]
fn store_data(data: &mut Data, data_out: &mut Data) {
    // Suggest to LLVM that the data payload is a float.
    std::hint::black_box(match *data {
        Data::I(x) => 0.0,
        Data::F(x) => x,
    });
    // LLVM will optimise this to a float load and store (with a separate load/store for the discriminant).
    *data_out = match *data {
        Data::I(x) => Data::I(x),
        Data::F(x) => Data::F(x),
    };
}

fn main() {
    let mut res = Data::I(Inner::A);
    store_data(&mut Data::I(Inner::A), &mut res);
    if let Data::I(res) = res {
        // LLVM will optimise out the bounds check as the index should always be in range.
        let index = (res as u32 - (u32::MAX << 23)) as usize;
        dbg!([1, 2, 3, 4, 5][index]); // This will segfault.
    } else {
        unreachable!();
    }
}

beetrees · 2024-04-23T17:04:13Z

It's also possible to cause miscompilations due to the difference between what floats evaluate to at compile-time vs. at runtime. The following program, which is a very lightly modified version of @comex's example from a different issue (rust-lang/unsafe-code-guidelines#471 (comment), see that comment for more details on how this works), will segfault when ran after compiling with optimisations on i586 targets:

#[inline(never)]
fn print_vals(x: f32, i: usize, vals_i: u32) {
    println!("x={x} i={i} vals[i]={vals_i}");
}

#[inline(never)]
pub fn evil(vals: &[u32; 300]) {
    // Loop variables:
    let mut x: f32 = 0.0; // increments by 1-and-a-bit every time
    let mut i: usize = 0; // increments by 2 every time

    while x != 90.0 {
        // LLVM will do a brute-force evaluation of this loop for up to 100
        // iterations to try to calculate an iteration count.  (See
        // `llvm/lib/Analysis/ScalarEvolution.cpp`.)  Under normal floating
        // point semantics, `x` will equal exactly 90.0 after 90 iterations;
        // LLVM discovers this by brute-force evaluation and concludes that the
        // iteration count is always 90.

        // Now, if this loop executes 90 times, then `i` must be in the range
        // `0..180`, so the bounds check in `vals[i]` should always pass, so
        // LLVM eliminates it.
        print_vals(x, i, vals[i]);

        // Update `x`.  The exact computation doesn't matter that much; it just
        // needs to:
        //   (a) be possible to constant-evaluate by brute force (i.e. by going
        //       through each iteration one at a time);
        //   (b) be too complex for IndVarSimplifyPass to simplify *without*
        //       brute force;
        //   (b) depend on floating point accuracy.

        // First increment `x`, to make sure it's not just the same value every
        // time (either in LLVM's opinion or in reality):
        x += 1.0;

		// This adds a small float to `x`. This should get rounded to no change
		// as the float being added is too small to make a difference to `f32`'s
		// 23-bit fraction. However, it will make a difference to the value of
		// the `f80` on the x87 floating point stack. This means that `x` will
		// no longer be a whole number and will never hit exactly 90.0.
        x += (1.0_f32 / 2.0_f32.powi(25));

        // Update `i`, the integer we use to index into `vals`.  Why increment
        // by 2 instead of 1?  Because if we increment by 1, then LLVM notices
        // that `i` happens to be equal to the loop count, and therefore it can
        // replace the loop condition with `while i != 90`.  With `i` as-is,
        // LLVM could hypothetically replace the loop condition with
        // `while i != 180`, but it doesn't.
        i += 2;

    }
}

pub fn main() {
    // Make an array on the stack:
    let mut vals: [u32; 300] = [0; 300];
    for i in 0..300 { vals[i as usize] = i; }
    evil(&vals);
}

apiraino · 2024-04-25T09:09:41Z

Nominating for t-compiler discussion.

This tracking issue shows that we have targets that intersect tier platforms support in different ways. For example i686 are tier 1 but "non-SSE2" are tier 2 (and suffer from codegen unsoundnesses). All these differences are not apparent in our documentation.

So as discussed on Zulip there are probably a number of questions:

targets that just don't have SSE2, like i586 -- these are tier 2, maybe critical codegen bugs are "fine" there?
tier 1 targets with SSE2 disabled. we dont have a way to say "that's actually a tier 2 situation" so maybe we should just emit a hard error? or at least a warning?
(to which I also add): if these issues stem from LLVM, can we faithfully represent a realistic situation or should we just point at LLVM and say "we just do what they do"?

(please feel free to add context/correct how I represent the problem, thanks!)

@rustbot label I-compiler-nominated

RalfJung · 2024-04-25T11:52:17Z

In a sense the issue stems from LLVM, yeah -- x86 without SSE seems to be a poorly supported target. Even in the best of cases, f64 just behaves incorrectly on that target (and f32, to a lesser extent), and then it turns out things are actually a lot worse.

Altogether I think there are enough issues with floating point on x86 without SSE (this one, and also #115567) that IMO we should say that tier 1 hardfloat targets require SSE period. It is already the case that using feature flags to turn a hardfloat target into a softfloat target is unsound (Cc #116344), and we should simply hard-error in those cases (e.g. disabling the x87 feature on any hardfloat x86 target). IMO we should do the same when disabling SSE/SSE2 on an i686 target.

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 4, 2023

RalfJung added C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. O-x86 A-floating-point Area: Floating point numbers and arithmetic and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Aug 4, 2023

ct20000901 mentioned this issue Aug 21, 2023

test: cross-platform float-point values inconsistency datafuselabs/databend#12515

Open

RalfJung changed the title ~~Tracking issue: 32bit x86 targets have non-compliant floating point behavior~~ Tracking issue: 32bit x86 targets without SSE have non-compliant floating point behavior Sep 5, 2023

RalfJung changed the title ~~Tracking issue: 32bit x86 targets without SSE have non-compliant floating point behavior~~ Tracking issue: 32bit x86 targets without SSE2 have non-compliant floating point behavior Sep 5, 2023

RalfJung mentioned this issue Sep 5, 2023

Tracking issue: 32bit x86 targets lose float NaN payload in return values #115567

Open

liushuyu mentioned this issue Sep 15, 2023

core library: Disable fpmath tests for i586 ... #114042

Merged

GuentherVIII mentioned this issue Sep 17, 2023

Return f32 and f64 in XMM0 instead of FP0 on i686 #115919

Open

RalfJung mentioned this issue Oct 2, 2023

The ABI of float types on i686 targets depends on target features #116344

Open

Nilstrieb added O-x86_32 Target: x86 processors, 32 bit (like i686-*) and removed O-x86-all labels Oct 25, 2023

RalfJung mentioned this issue Oct 29, 2023

Remove ability to disable some target features #116584

Closed

RalfJung mentioned this issue Nov 11, 2023

Design meeting: resolve ABI issues around target-feature rust-lang/lang-team#235

Closed

jacobbramley mentioned this issue Nov 24, 2023

Neon types generate bad code if the "neon" target feature is disabled. #118249

Open

pdeljanov mentioned this issue Feb 15, 2024

verify_timebase test failing on i386 architecture pdeljanov/Symphonia#254

Open

tgross35 mentioned this issue Apr 14, 2024

f16 generates code that uses the incorrect ABI for compiler-rt #123885

Open

RalfJung changed the title ~~Tracking issue: 32bit x86 targets without SSE2 have non-compliant floating point behavior~~ Tracking issue: 32bit x86 targets without SSE2 have unsound floating point behavior Apr 23, 2024

RalfJung added the I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness label Apr 23, 2024

rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Apr 23, 2024

Muon mentioned this issue Apr 23, 2024

Wrong optimization: instability of x87 floating-point results leads to nonsense llvm/llvm-project#44218

Open

beetrees mentioned this issue Apr 23, 2024

Loading/storing float/double on 32-bit x86 without SSE can cause the value to mutate llvm/llvm-project#89791

Closed

beetrees mentioned this issue Apr 24, 2024

Difference between compile-time and runtime float precision on 32-bit x86 without SSE can cause miscompilation leading to segfault llvm/llvm-project#89885

Open

Muon mentioned this issue Apr 25, 2024

Instcombine incorrectly transforms store i64 -> store double llvm/llvm-project#44497

Open

beetrees mentioned this issue Apr 25, 2024

Differences between LLVM const-eval of floats and runtime behaviour can cause miscompilations #124364

Open

rustbot added the I-compiler-nominated The issue / PR has been nominated for discussion during a compiler team meeting. label Apr 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking issue: 32bit x86 targets without SSE2 have unsound floating point behavior #114479

Tracking issue: 32bit x86 targets without SSE2 have unsound floating point behavior #114479

RalfJung commented Aug 4, 2023 •

edited

DemiMarie commented Aug 4, 2023

RalfJung commented Aug 4, 2023

programmerjake commented Aug 4, 2023 •

edited

RalfJung commented Aug 14, 2023

RalfJung commented Sep 5, 2023

DemiMarie commented Sep 5, 2023

RalfJung commented Sep 5, 2023 •

edited

briansmith commented Nov 11, 2023

RalfJung commented Nov 12, 2023 •

edited

Jules-Bertholet commented Nov 13, 2023 •

edited

RalfJung commented Nov 13, 2023

Jules-Bertholet commented Nov 13, 2023 •

edited

RalfJung commented Nov 13, 2023

programmerjake commented Nov 13, 2023

RalfJung commented Nov 13, 2023

Jules-Bertholet commented Nov 13, 2023

beetrees commented Apr 23, 2024 •

edited

RalfJung commented Apr 23, 2024

Muon commented Apr 23, 2024 •

edited

taiki-e commented Apr 23, 2024 •

edited

beetrees commented Apr 23, 2024

beetrees commented Apr 23, 2024

beetrees commented Apr 23, 2024 •

edited

apiraino commented Apr 25, 2024 •

edited

RalfJung commented Apr 25, 2024

Tracking issue: 32bit x86 targets without SSE2 have unsound floating point behavior #114479

Tracking issue: 32bit x86 targets without SSE2 have unsound floating point behavior #114479

Comments

RalfJung commented Aug 4, 2023 • edited

DemiMarie commented Aug 4, 2023

RalfJung commented Aug 4, 2023

programmerjake commented Aug 4, 2023 • edited

RalfJung commented Aug 14, 2023

RalfJung commented Sep 5, 2023

DemiMarie commented Sep 5, 2023

RalfJung commented Sep 5, 2023 • edited

briansmith commented Nov 11, 2023

RalfJung commented Nov 12, 2023 • edited

Jules-Bertholet commented Nov 13, 2023 • edited

RalfJung commented Nov 13, 2023

Jules-Bertholet commented Nov 13, 2023 • edited

RalfJung commented Nov 13, 2023

programmerjake commented Nov 13, 2023

RalfJung commented Nov 13, 2023

Jules-Bertholet commented Nov 13, 2023

beetrees commented Apr 23, 2024 • edited

RalfJung commented Apr 23, 2024

Muon commented Apr 23, 2024 • edited

taiki-e commented Apr 23, 2024 • edited

beetrees commented Apr 23, 2024

beetrees commented Apr 23, 2024

beetrees commented Apr 23, 2024 • edited

apiraino commented Apr 25, 2024 • edited

RalfJung commented Apr 25, 2024

RalfJung commented Aug 4, 2023 •

edited

programmerjake commented Aug 4, 2023 •

edited

RalfJung commented Sep 5, 2023 •

edited

RalfJung commented Nov 12, 2023 •

edited

Jules-Bertholet commented Nov 13, 2023 •

edited

Jules-Bertholet commented Nov 13, 2023 •

edited

beetrees commented Apr 23, 2024 •

edited

Muon commented Apr 23, 2024 •

edited

taiki-e commented Apr 23, 2024 •

edited

beetrees commented Apr 23, 2024 •

edited

apiraino commented Apr 25, 2024 •

edited