Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue: 32bit x86 targets without SSE2 have unsound floating point behavior #114479

Open
RalfJung opened this issue Aug 4, 2023 · 25 comments
Labels
A-floating-point Area: Floating point numbers and arithmetic C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. I-compiler-nominated The issue / PR has been nominated for discussion during a compiler team meeting. I-prioritize Issue: Indicates that prioritization has been requested for this issue. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-x86_32 Target: x86 processors, 32 bit (like i686-*)

Comments

@RalfJung
Copy link
Member

RalfJung commented Aug 4, 2023

On x86 (32bit) targets that cannot use SSE2 instructions (this includes the tier 1 i686 targets with flags that disable SSE2 support, such as -C target-cpu=pentium), floating-point operation can return results that are rounded in different ways than they should, and results can be "inconsistent": depending on whether const-propagation happened, the same computation can produce different results, leading to a program that seemingly contradicts itself. This is caused by using x87 instructions to perform floating-point arithmetic, which do not accurately implement IEEE floating-point semantics (not with the right precision, anyway).

Worse, LLVM can use x87 register to store values it thinks are floats, which resets the signaling bit and thus alters the value -- leading to miscompilations.

This is a known and long-standing problem, and very hard to fix. The purpose of this issue mostly is to document its existence and to give it a URL that can be referenced.

Some ideas that have been floated for fixing this problem:

  • We could emit different instruction sequences for floating-point operations that emulate the expected rounding behavior using x87 instructions. This will likely require changes deep in LLVM's x86 backend. This is what Java does.
  • We could use softfloats.
  • We could set the FPU control register to 64bit precision for Rust programs, and require other code to set the register in that way before calling into a Rust library. this does not work

Related issues:

Prior issues:

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Aug 4, 2023
@RalfJung RalfJung added C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. O-x86 A-floating-point Area: Floating point numbers and arithmetic and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Aug 4, 2023
@DemiMarie
Copy link
Contributor

Can we just drop support for x86 without SSE2 or fall back to software floating point?

@RalfJung
Copy link
Member Author

RalfJung commented Aug 4, 2023

Can we just drop support for x86 without SSE2

We currently have the following targets in that category:

  • i586-pc-windows-msvc
  • i586-unknown-linux-gnu
  • i586-unknown-linux-musl

They are all tier 2. I assume people added them for a reason so I doubt they will be happy about having them dropped. Not sure to what extent floating point support is needed on those targets, but I don't think we have a concept of "target without FP support".

or fall back to software floating point?

In theory of course we can, not sure if that would be any easier than following the Java approach.

@programmerjake
Copy link
Member

programmerjake commented Aug 4, 2023

or fall back to software floating point?

In theory of course we can, not sure if that would be any easier than following the Java approach.

it should be much easier since LLVM already supports that, unlike the Java scheme (afaik)

https://gcc.godbolt.org/z/14WdnKhhP

@RalfJung
Copy link
Member Author

FWIW f32 on x86-32-noSSE should actually be fine, since double-rounding is okay as long as the precision gap between the two modes is big enough. Only f64 has a problem since it is "too close" to the 80bit-precision of the x87 FPU.

On another note, we even have code in the standard library that temporarily alters the x87 FPU control word to ensure exact 64bit precision...

@RalfJung RalfJung changed the title Tracking issue: 32bit x86 targets have non-compliant floating point behavior Tracking issue: 32bit x86 targets without SSE have non-compliant floating point behavior Sep 5, 2023
@RalfJung RalfJung changed the title Tracking issue: 32bit x86 targets without SSE have non-compliant floating point behavior Tracking issue: 32bit x86 targets without SSE2 have non-compliant floating point behavior Sep 5, 2023
@RalfJung
Copy link
Member Author

RalfJung commented Sep 5, 2023

I wonder if there's something that could be done about the fact that this also affects tier 1 targets with custom (stable) flags such as -C target-cpu=pentium. Do we still want to consider that a tier 1 target in itself? Is there a way that we can just reject flags that would disable SSE2 support, and tell people to use a different target instead?

@DemiMarie
Copy link
Contributor

@RalfJung What about forcing non-SSE2 targets to software floating point? That must be supported anyway because of kernel code.

@RalfJung
Copy link
Member Author

RalfJung commented Sep 5, 2023

Yeah that's an option listed above, since you already proposed it before. I have no idea how feasible it is. People are very concerned about the softfloat support for f16/f128 leading to code bloat and whatnot, so the same concerns would likely also apply here.

AFAIK the kernel code just doesn't use floats, I don't think they have softfloats?

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Oct 3, 2023
…bilee

add notes about non-compliant FP behavior on 32bit x86 targets

Based on ton of prior discussion (see all the issues linked from rust-lang/unsafe-code-guidelines#237), the consensus seems to be that these targets are simply cursed and we cannot implement the desired semantics for them. I hope I properly understood what exactly the extent of the curse is here, let's make sure people with more in-depth FP knowledge take a close look!

In particular for the tier 3 targets I have no clue which target is affected by which particular variant of the x86_32 FP curse. I assumed that `i686` meant SSE is used so the "floating point return value" is the only problem, while everything lower (`i586`, `i386`) meant x87 is used.

I opened rust-lang#114479 to concisely describe and track the issue.

Cc `@workingjubilee` `@thomcc` `@chorman0773`  `@rust-lang/opsem`
Fixes rust-lang#73288
Fixes rust-lang#72327
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Oct 3, 2023
Rollup merge of rust-lang#113053 - RalfJung:x86_32-float, r=workingjubilee

add notes about non-compliant FP behavior on 32bit x86 targets

Based on ton of prior discussion (see all the issues linked from rust-lang/unsafe-code-guidelines#237), the consensus seems to be that these targets are simply cursed and we cannot implement the desired semantics for them. I hope I properly understood what exactly the extent of the curse is here, let's make sure people with more in-depth FP knowledge take a close look!

In particular for the tier 3 targets I have no clue which target is affected by which particular variant of the x86_32 FP curse. I assumed that `i686` meant SSE is used so the "floating point return value" is the only problem, while everything lower (`i586`, `i386`) meant x87 is used.

I opened rust-lang#114479 to concisely describe and track the issue.

Cc `@workingjubilee` `@thomcc` `@chorman0773`  `@rust-lang/opsem`
Fixes rust-lang#73288
Fixes rust-lang#72327
@Nilstrieb Nilstrieb added O-x86_32 Target: x86 processors, 32 bit (like i686-*) and removed O-x86-all labels Oct 25, 2023
@briansmith
Copy link

Just to clarify, this is really only for 32-bit x86 non-SSE targets, and doesn't affect x86-64 non-SSE2 targets like x86-64-unknown-none?

Can we just drop support for x86 without SSE2

I would guess we'll need to support i686-unknown-none like x86-64-unknown-none for use in operating system kernels like Linux that don't allow vector registers to be used (without extra work) even when the hardware has them.

or fall back to software floating point?

That appears to be what x86_64-unknown-none does, one of the non-SSE2-by-default x86-64 targets. At least the table at https://doc.rust-lang.org/beta/rustc/platform-support.html says "softfloat", though the detailed target documentation at https://doc.rust-lang.org/beta/rustc/platform-support/x86_64-unknown-none.html doesn't mention softfloat.

@RalfJung
Copy link
Member Author

RalfJung commented Nov 12, 2023

Just to clarify, this is really only for 32-bit x86 non-SSE targets, and doesn't affect x86-64 non-SSE2 targets like x86-64-unknown-none?

x86-64-unknown-none uses softfloats so it should not be affected.
I don't think there are any x86-64 hardfloat targets without SSE.

I would guess we'll need to support i686-unknown-none like x86-64-unknown-none for use in operating system kernels like Linux that don't allow vector registers to be used (without extra work) even when the hardware has them.

I think they were asking about targets where the hardware doesn't have SSE.
Softfloat targets are fine, assuming the softfloat libraries implement the IEEE spec correctly.

@Jules-Bertholet
Copy link
Contributor

Jules-Bertholet commented Nov 13, 2023

We could set the FPU control register to 64bit precision for Rust programs, and require other code to set the register in that way before calling into a Rust library.

This is not sufficient to ensure full IEEE754 compliance (for example see here).

@RalfJung
Copy link
Member Author

What's described there sounds different. That is using volatile to force rounding from "extended precision" to "double precision" after each operation. But of course the actual operation is still done with extended precision, which leads to double-rounding, which leads to wrong results.

The proposal you quoted was to switch x87 precision such that the operation is performed with double-precision to begin with, entirely avoiding double rounding.

It is possible that the x87 has further issues that make this not work, but the link you posted does not seem to even mention idea if changing the FPU control register to get a different precision, so as far as I can see it doesn't provide any evidence that setting the x87 to 64bit precision would lead to incorrect results.

@Jules-Bertholet
Copy link
Contributor

Jules-Bertholet commented Nov 13, 2023

From the post (emphasis mine):

Floating-point calculations done with x87 FPU instructions are done in extended-precision registers, even when the processor is in double-precision mode and the variables being operated on are double-precision.

@RalfJung
Copy link
Member Author

Hm, my understanding was that switching the FPU mode to 64bit would solve the problem. But of course it's possible that x87 screws up even when explicitly asked for IEEE 754 64bit arithmetic. 🤷

Requiring the FPU to be in 64bit mode is anyway not a realistic option, I listed it just for completeness' sake.

@programmerjake
Copy link
Member

running x87 in 53-bit mode works except that denormal f64 results have too much precision and the exponent range is too big (e.g. it can express 2^-8000 but f64 can't)

@RalfJung
Copy link
Member Author

Does the Java encoding behave correctly for denormals?

That it can express too many exponents wouldn't be a problem if the extra precision doesn't lead to different rounding results.

@Jules-Bertholet
Copy link
Contributor

Yes, Java behaves correctly in all cases. It scales the exponent of one of the arguments before multiplication and division to ensure that where a 64-bit op would evaluate to a denormal, the 80-bit op does the same (and then it scales the exponent of the result back to what it should be). This is all described in the PDF linked from OP.

@beetrees
Copy link
Contributor

beetrees commented Apr 23, 2024

Unfortunately, this doesn't just effect floating point arithmetic, as LLVM will load to and store from the x87 floating point stack even when just moving floats around. As loading and storing f32s and f64s to and from the x87 floating point stack quiets signalling NaNs , but LLVM assumes that it does not, this can lead to miscompilations. For instance, the following program will segfault when ran after compiling with optimisations for an i586 target (e.g. rustc -O --target=i586-unknown-linux-gnu code.rs).

#[derive(Copy, Clone)]
#[repr(u32)]
enum Inner {
    // Same bit pattern as a signalling NaN `f32`.
    A = (u32::MAX << 23) | 1,
    B,
}

#[derive(Copy, Clone)]
enum Data {
    I(Inner),
    F(f32),
}

#[inline(never)]
fn store_data(data: Data, data_out: &mut Data) {
    // Suggest to LLVM that the data payload is a float.
    std::hint::black_box(match data {
        Data::I(x) => 0.0,
        Data::F(x) => x,
    });
    // LLVM will optimise this to a float load and store (with a separate load/store for the discriminant).
    *data_out = match data {
        Data::I(x) => Data::I(x),
        Data::F(x) => Data::F(x),
    };
}

fn main() {
    let mut res = Data::I(Inner::A);
    store_data(Data::I(Inner::A), &mut res);
    if let Data::I(res) = res {
        // LLVM will optimise out the bounds check as the index should always be in range.
        let index = (res as u32 - (u32::MAX << 23)) as usize;
        dbg!([1, 2, 3, 4, 5][index]); // This will segfault.
    } else {
        unreachable!();
    }
}

@RalfJung
Copy link
Member Author

Wow, that's a great example. It even works in entirely safe code. Impressive.

We probably have to upgrade this issue to I-unsound then. Is there an upstream LLVM issue for this miscompilation?

@RalfJung RalfJung changed the title Tracking issue: 32bit x86 targets without SSE2 have non-compliant floating point behavior Tracking issue: 32bit x86 targets without SSE2 have unsound floating point behavior Apr 23, 2024
@RalfJung RalfJung added the I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness label Apr 23, 2024
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Apr 23, 2024
@Muon
Copy link

Muon commented Apr 23, 2024

That is impressive. It looks closely related to llvm/llvm-project#44218. Perhaps it's yet another example to toss on that pile? I'm honestly starting to think LLVM should just deprecate f32 and f64 support on x87 targets.

@taiki-e
Copy link
Member

taiki-e commented Apr 23, 2024

As loading and storing f32s and f64s to and from the x87 floating point stack quiets signalling NaNs

(I have not tested it, but I wonder if this x87's behavior actually affects not only floats, but also the i586's 64-bit atomic load/store, which uses x87 load/store instructions.) EDIT: sorry, this my comment is not correct: see #114479 (comment)

@beetrees
Copy link
Contributor

Atomic loads/stores use the integer-to-float/float-to-integer load/store instructions (fild/fistp) as opposed to the float-to-float load/store instructions (fld/fstp) and are therefore unaffected.

@beetrees
Copy link
Contributor

The above example will segfault on current stable (1.77.2), but not current nightly (2024-04-22). Making the data argument of the store_data function an &mut reference makes it segfault on both:

#[derive(Copy, Clone)]
#[repr(u32)]
enum Inner {
    // Same bit pattern as a signalling NaN `f32`.
    A = (u32::MAX << 23) | 1,
    B,
}

#[derive(Copy, Clone)]
enum Data {
    I(Inner),
    F(f32),
}

#[inline(never)]
fn store_data(data: &mut Data, data_out: &mut Data) {
    // Suggest to LLVM that the data payload is a float.
    std::hint::black_box(match *data {
        Data::I(x) => 0.0,
        Data::F(x) => x,
    });
    // LLVM will optimise this to a float load and store (with a separate load/store for the discriminant).
    *data_out = match *data {
        Data::I(x) => Data::I(x),
        Data::F(x) => Data::F(x),
    };
}

fn main() {
    let mut res = Data::I(Inner::A);
    store_data(&mut Data::I(Inner::A), &mut res);
    if let Data::I(res) = res {
        // LLVM will optimise out the bounds check as the index should always be in range.
        let index = (res as u32 - (u32::MAX << 23)) as usize;
        dbg!([1, 2, 3, 4, 5][index]); // This will segfault.
    } else {
        unreachable!();
    }
}

@beetrees
Copy link
Contributor

beetrees commented Apr 23, 2024

It's also possible to cause miscompilations due to the difference between what floats evaluate to at compile-time vs. at runtime. The following program, which is a very lightly modified version of @comex's example from a different issue (rust-lang/unsafe-code-guidelines#471 (comment), see that comment for more details on how this works), will segfault when ran after compiling with optimisations on i586 targets:

#[inline(never)]
fn print_vals(x: f32, i: usize, vals_i: u32) {
    println!("x={x} i={i} vals[i]={vals_i}");
}

#[inline(never)]
pub fn evil(vals: &[u32; 300]) {
    // Loop variables:
    let mut x: f32 = 0.0; // increments by 1-and-a-bit every time
    let mut i: usize = 0; // increments by 2 every time

    while x != 90.0 {
        // LLVM will do a brute-force evaluation of this loop for up to 100
        // iterations to try to calculate an iteration count.  (See
        // `llvm/lib/Analysis/ScalarEvolution.cpp`.)  Under normal floating
        // point semantics, `x` will equal exactly 90.0 after 90 iterations;
        // LLVM discovers this by brute-force evaluation and concludes that the
        // iteration count is always 90.

        // Now, if this loop executes 90 times, then `i` must be in the range
        // `0..180`, so the bounds check in `vals[i]` should always pass, so
        // LLVM eliminates it.
        print_vals(x, i, vals[i]);

        // Update `x`.  The exact computation doesn't matter that much; it just
        // needs to:
        //   (a) be possible to constant-evaluate by brute force (i.e. by going
        //       through each iteration one at a time);
        //   (b) be too complex for IndVarSimplifyPass to simplify *without*
        //       brute force;
        //   (b) depend on floating point accuracy.

        // First increment `x`, to make sure it's not just the same value every
        // time (either in LLVM's opinion or in reality):
        x += 1.0;

		// This adds a small float to `x`. This should get rounded to no change
		// as the float being added is too small to make a difference to `f32`'s
		// 23-bit fraction. However, it will make a difference to the value of
		// the `f80` on the x87 floating point stack. This means that `x` will
		// no longer be a whole number and will never hit exactly 90.0.
        x += (1.0_f32 / 2.0_f32.powi(25));

        // Update `i`, the integer we use to index into `vals`.  Why increment
        // by 2 instead of 1?  Because if we increment by 1, then LLVM notices
        // that `i` happens to be equal to the loop count, and therefore it can
        // replace the loop condition with `while i != 90`.  With `i` as-is,
        // LLVM could hypothetically replace the loop condition with
        // `while i != 180`, but it doesn't.
        i += 2;

    }
}

pub fn main() {
    // Make an array on the stack:
    let mut vals: [u32; 300] = [0; 300];
    for i in 0..300 { vals[i as usize] = i; }
    evil(&vals);
}

@apiraino
Copy link
Contributor

apiraino commented Apr 25, 2024

Nominating for t-compiler discussion.

This tracking issue shows that we have targets that intersect tier platforms support in different ways. For example i686 are tier 1 but "non-SSE2" are tier 2 (and suffer from codegen unsoundnesses). All these differences are not apparent in our documentation.

So as discussed on Zulip there are probably a number of questions:

  • targets that just don't have SSE2, like i586 -- these are tier 2, maybe critical codegen bugs are "fine" there?
  • tier 1 targets with SSE2 disabled. we dont have a way to say "that's actually a tier 2 situation" so maybe we should just emit a hard error? or at least a warning?
  • (to which I also add): if these issues stem from LLVM, can we faithfully represent a realistic situation or should we just point at LLVM and say "we just do what they do"?

(please feel free to add context/correct how I represent the problem, thanks!)

@rustbot label I-compiler-nominated

@rustbot rustbot added the I-compiler-nominated The issue / PR has been nominated for discussion during a compiler team meeting. label Apr 25, 2024
@RalfJung
Copy link
Member Author

In a sense the issue stems from LLVM, yeah -- x86 without SSE seems to be a poorly supported target. Even in the best of cases, f64 just behaves incorrectly on that target (and f32, to a lesser extent), and then it turns out things are actually a lot worse.

Altogether I think there are enough issues with floating point on x86 without SSE (this one, and also #115567) that IMO we should say that tier 1 hardfloat targets require SSE period. It is already the case that using feature flags to turn a hardfloat target into a softfloat target is unsound (Cc #116344), and we should simply hard-error in those cases (e.g. disabling the x87 feature on any hardfloat x86 target). IMO we should do the same when disabling SSE/SSE2 on an i686 target.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-floating-point Area: Floating point numbers and arithmetic C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. I-compiler-nominated The issue / PR has been nominated for discussion during a compiler team meeting. I-prioritize Issue: Indicates that prioritization has been requested for this issue. I-unsound Issue: A soundness hole (worst kind of bug), see: https://en.wikipedia.org/wiki/Soundness O-x86_32 Target: x86 processors, 32 bit (like i686-*)
Projects
None yet
Development

No branches or pull requests