Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should std::{f32,f64}::NAN be a QNAN or SNAN ? #52897

Open
gnzlbg opened this issue Jul 31, 2018 · 15 comments
Open

Should std::{f32,f64}::NAN be a QNAN or SNAN ? #52897

gnzlbg opened this issue Jul 31, 2018 · 15 comments
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools A-floating-point Area: Floating point numbers and arithmetic T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 31, 2018

Right now, whether they are a QNAN or an SNAN depends on the architecture. Currently, they are an SNAN on MIPS (EDIT: r5 and older) at least, and a QNAN on most others.

It probably makes sense to offer consistent behavior here independently of whether LLVM preservers payloads, signaling/quietness, etc.

cc @rkruppe @draganmladjenovic

@est31
Copy link
Member

est31 commented Jul 31, 2018

Note that "on MIPS at least" is imprecise: which bit pattern is considered quiet and which pattern is considered signaling depends on the used MIPS version and configuration. See #46246 for more info.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jul 31, 2018

@est31 indeed. #52666 suggests leaving the current mips targets for the "older" mips versions (r5 and older), and to add a new targets (e.g. mipsisa64r6-unknown-linux-gnuabi64) for the newer versions (r6 and newer).

MIPS in the issue refers to r5 and older since currently we don't have any targets for r6. I've added an edit to clarify that.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Jul 31, 2018

It probably makes sense to offer consistent behavior here independently of whether LLVM preservers payloads, signaling/quietness, etc.

Why? Considering how LLVM behaves (and the differences in how hardware handle payloads, too!), I can't think of any (edit: portable) use case that can reliably use any particular NAN constant for a bitwise equality check. It seems to me you'd always want to test with is_nan().

@jonas-schievink jonas-schievink added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jan 27, 2019
@varkor varkor added the A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools label May 28, 2019
@varkor
Copy link
Member

varkor commented May 28, 2019

The guarantees provided by std::{f32,f64}::NAN should be documented. At the moment, there is no information, but it might be helpful to be explicit about the fact that nothing is guaranteed other than being a NaN.

@silwol
Copy link

silwol commented Feb 5, 2020

In Debian, we hit what I believe to be the issue at hand.
I bisected the rustup versions of compilers, and it seems the behavior changed from rustc 1.36 to 1.37. Was this intentional, or maybe triggered by an update of a library used by rustc, e.g. llvm?

The issue surfaced in rust-num/num-traits#151 where the tests worked before and don't work any longer now in mipsel and mips64el.

@nikic
Copy link
Contributor

nikic commented Feb 5, 2020

@silwol I'd expect that minnum/maxnum use libcall legalization on mips, so you're likely seeing libm behavior here. Can you check whether that's the case?

Per the C standard, I believe fmin/fmax should be treating SNaN the same as QNaN (contrary to IEEE754).

@silwol
Copy link

silwol commented Feb 6, 2020

@nikic I hope I checked this the right way. If not, please give me a hint how / where to try next. The important information here is that it's not the num-traits code that fails, but instead there is a doctest inside the project which falls back to std's min and max implementation if their std feature is enabled, and that is what fails.

Here is what I tried:

$ cat example.c 
#include <math.h>
#include <stdio.h>

int main() {
    double a = 1;
    double b = NAN;
    printf("fmin(%f, %f) = %f\n", a, b, fmin(a, b));
    return 0;
}
$ clang -Wall -Werror -std=c99 -lm -fno-builtin -o example example.c 
$ ./example
fmin(1.000000, nan) = 1.000000
$

This behaves exactly the same on Debian mipsel and amd64.

In contrast, the following behaved the same on these architectures with rustc up to v1.36.0 (installed through rustup.rs), and fails on mipsel since rustc v1.37.0.

General information on my mipsel VM:

$ uname -a
Linux debian 5.4.0-3-4kc-malta #1 SMP Debian 5.4.13-1 (2020-01-19) mips GNU/Linux
$ cat src/lib.rs 
#[cfg(test)]
mod tests {
    #[test]
    fn min() {
        let nan = 0f64 / 0f64;
        assert_eq!(1f64.min(nan), 1f64);
    }

    #[test]
    fn max() {
        let nan = 0f64 / 0f64;
        assert_eq!(1f64.max(nan), 1f64);
    }
}

Trying with latest stable rustc installed with rustup, behaves the same with the rustc 1.40.0 installed from the Debian repository:

$ rustc --version
rustc 1.41.0 (5e1a79984 2020-01-27)
$ cargo test
    Finished test [unoptimized + debuginfo] target(s) in 0.57s
     Running target/debug/deps/floatextremestest-23756e6931b17aa8

running 2 tests
test tests::max ... FAILED
test tests::min ... FAILED

failures:

---- tests::max stdout ----
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `NaN`,
 right: `1.0`', src/lib.rs:12:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

---- tests::min stdout ----
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `NaN`,
 right: `1.0`', src/lib.rs:6:9


failures:
    tests::max
    tests::min

test result: FAILED. 0 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out

Trying with latest known-to-work stable rustc:

$ rustc --version
rustc 1.36.0 (a53f9df32 2019-07-03)
$ cargo test
    Finished dev [unoptimized + debuginfo] target(s) in 9.38s
     Running target/debug/deps/floatextremestest-3c9ba83b13a3b688

running 2 tests
test tests::max ... ok
test tests::min ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests floatextremestest

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

I'm not very familiar with compiler internals, so please let me know if that might be a different problem from this one, then I'll move it to a new issue.
For the record, this is tracked in Debian at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=950583

@est31
Copy link
Member

est31 commented Feb 6, 2020

@silwol which mips version does your cpu have? E.g. what does running cat /proc/cpuinfo | grep "model name" show?

@silwol
Copy link

silwol commented Feb 6, 2020

For checking if the error is reproducible, I started a qemu mipsel VM with qemu-system-mipsel -M malta …. /proc/cpuinfo doesn't contain a "model name" entry here, but instead has this line:

cpu model		: MIPS 24Kc V0.0  FPU V0.0

Regarding the Debian infrastructure, I don't have direct access to these machines. The failing test logs can be found on https://buildd.debian.org/status/package.php?p=rust-num-traits from where you can follow the link to the machine information in the Buildd column. There you have a "Build machine info" entry somewhere in the head, showing a list of links to the details of each machine.
The affected num-traits crate has been built multiple times on different machines. The details are on https://buildd.debian.org/status/logs.php?pkg=rust-num-traits&ver=0.2.11-1&arch=mips64el and https://buildd.debian.org/status/logs.php?pkg=rust-num-traits&ver=0.2.11-1&arch=mipsel (corresponding the "all" link in the "Logs" column).

@silwol
Copy link

silwol commented Feb 6, 2020

I just looked up the information about the physical machines that built the project. They're all listed with Processor: Rhino Labs UTM8. There are machines available with Processor: LS3A-RS780-1w (Quad Core Loongson 3A) and Processor: Lemote 3A ITX-A1101 (Quad Core Loongson 3A), but I don't know if there is any way to assign builds explicitly to a specific machine type.

@est31
Copy link
Member

est31 commented Feb 6, 2020

The f32::max docs say:

If one of the arguments is NaN, then the other argument is returned.

Which is clearly not the case here.

f32::max itself is only a wrapper of the maxnumf32 intrinsic, which maps to llvm.maxnum.f32.

The docs of that LLVM intrinsic say:

Follows the IEEE-754 semantics for maxNum except for the handling of signaling NaNs. This matches the behavior of libm’s fmax.

If either operand is a NaN, returns the other non-NaN operand.

So this promise seems to be violated as well. However, this does not imply a bug in llvm just yet, as we provide some intrinsics to llvm. It might be the intrinsics that are wrong. The Rust based libm implementation has this fmaxf code. So maybe the bug lurks there? The code looks alright though.

@silwol
Copy link

silwol commented Feb 7, 2020

Thanks @est31 for the hints.
I found f64::from_bits docs mentioning a difference between MIPS and the other platforms in the signaling bit. Also the fmaxf code you referenced has a comment telling that sNaN is not handled due to not being supported yet. Interestingly enough, the implementation that the num_traits crate uses for no-std environments succeeds in the test.
The question remains what changed from rustc 1.36 to 1.37 to reveal these circumstances.

@est31
Copy link
Member

est31 commented Feb 7, 2020

@silwol yeah there is a fundamental difference between older MIPS targets and newer MIPS targets in that the signaling bit is opposite to IEEE 754-2008 vs follows the standard. But even older MIPS targets follow IEEE 754-1985 which clearly specifies the bit patterns of NaNs (it only doesn't specify the bit pattern distinction between qNaN and sNaN).

@silwol if you have access to a MIPS machine you could try finding the nightly that caused the regression using cargo-bisect-rustc.

Either way, I think you should file a proper issue for your bug report.

@vinc17fr
Copy link

IEEE 754-2008 is obsolete. You should follow IEEE 754-2019, where the poorly designed minNum and maxNum operations (as being non-associative) have been replaced by other operations, which handle qNaN and sNaN in the same way (except for the signaled invalid operation exception in case of sNaN).

@est31
Copy link
Member

est31 commented Aug 19, 2021

@vinc17fr there is a discussion about which IEEE 754 version to use in #83984 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools A-floating-point Area: Floating point numbers and arithmetic T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

9 participants