New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x). #57353

Merged
merged 1 commit into from Jan 13, 2019

Conversation

Projects
None yet
8 participants
@huonw
Copy link
Member

huonw commented Jan 5, 2019

These can both rely on IEEE754 semantics to be made faster, by folding
away the sign with an abs (left private for now), and then comparing
to infinity, letting the NaN semantics of a direct float comparison
handle NaN input properly.

The abs bit-fiddling is simple (a single and), and so these new
forms compile down to a few instructions, without branches, e.g. for
f32:

is_infinite:
        andps   xmm0, xmmword ptr [rip + .LCPI2_0] ; 0x7FFF_FFFF
        ucomiss xmm0, dword ptr [rip + .LCPI2_1]   ; 0x7F80_0000
        setae   al
        ret

is_finite:
        andps   xmm0, xmmword ptr [rip + .LCPI1_0] ; 0x7FFF_FFFF
        movss   xmm1, dword ptr [rip + .LCPI1_1]   ; 0x7F80_0000
        ucomiss xmm1, xmm0
        seta    al
        ret

When used in loops/repeatedly, they get even better: the memory
operations (loading the mask 0x7FFFFFFF for abs, and infinity
0x7F80_0000) are likely to be hoisted out of the individual calls, to
be shared, and the seta/setae are likely to be collapsed into
conditional jumps or moves (or similar).

The old is_infinite did two comparisons, and the old is_finite did
three (with a branch), and both of them had to check the flags after
every one of those comparison. These functions have had that old
implementation since they were added in
6284190
7 years ago.

Benchmark (abs is the new form, std is the old):

test f32_is_finite_abs            ... bench:          55 ns/iter (+/- 10)
test f32_is_finite_std            ... bench:         118 ns/iter (+/- 5)

test f32_is_infinite_abs          ... bench:          53 ns/iter (+/- 1)
test f32_is_infinite_std          ... bench:          84 ns/iter (+/- 6)

test f64_is_finite_abs            ... bench:          52 ns/iter (+/- 12)
test f64_is_finite_std            ... bench:         128 ns/iter (+/- 25)

test f64_is_infinite_abs          ... bench:          54 ns/iter (+/- 5)
test f64_is_infinite_std          ... bench:          93 ns/iter (+/- 23)
 #![feature(test)]
extern crate test;

use std::{f32, f64};
use test::Bencher;

const VALUES_F32: &[f32] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];

 #[bench]
fn f32_is_infinite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.is_infinite()));
}
 #[bench]
fn f32_is_infinite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.abs()== f32::INFINITY));
}
 #[bench]
fn f32_is_finite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.is_finite()));
}
 #[bench]
fn f32_is_finite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.abs() < f32::INFINITY));
}

const VALUES_F64: &[f64] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];

 #[bench]
fn f64_is_infinite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.is_infinite()));
}
 #[bench]
fn f64_is_infinite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.abs() == f64::INFINITY));
}
 #[bench]
fn f64_is_finite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.is_finite()));
}
 #[bench]
fn f64_is_finite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.abs() < f64::INFINITY));
}
@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Jan 5, 2019

r? @KodrAus

(rust_highfive has picked a reviewer for you, use r? to override)

@@ -161,6 +161,11 @@ impl f32 {
self != self
}

#[inline]
fn abs_(self) -> f32 {
f32::from_bits(self.to_bits() & 0x7fff_ffff)

This comment has been minimized.

@varkor

varkor Jan 5, 2019

Member

Why is this necessary, rather than using the existing abs method (which uses the LLVM fabsf32 intrinsic directly)?

This comment has been minimized.

@huonw

huonw Jan 5, 2019

Member

That's only available in std. #50145

This comment has been minimized.

@varkor

varkor Jan 5, 2019

Member

I'm bikeshedding here, but given typical naming conventions, I think it'd make sense to simply call this abs (which shouldn't cause any conflicts) and add a comment explaining why this method is private for discoverability, e.g.

// FIXME(#50145): `abs` is publicly unavailable in libcore due to concerns
// about portability, so this implementation is for private use internally.

This comment has been minimized.

@clarcharr

clarcharr Jan 5, 2019

Contributor

I agree about adding a comment, but I don't think that naming it abs is appropriate. I'd call it something like abs_hack so it's clear that it's not using the proper abs method.

This comment has been minimized.

@huonw

huonw Jan 6, 2019

Member

How about abs_private with a comment?

This comment has been minimized.

@huonw

huonw Jan 6, 2019

Member

Added that.

Show resolved Hide resolved src/libcore/num/f64.rs Outdated
Show resolved Hide resolved src/libcore/num/f32.rs Outdated
@KodrAus

KodrAus approved these changes Jan 6, 2019

Copy link

KodrAus left a comment

Thanks @huonw!

I've just left the same nits as @lzutao as suggestions, but r=me anytime.

Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x).
These can both rely on IEEE754 semantics to be made faster, by folding
away the sign with an abs (left private for now), and then comparing
to infinity, letting the NaN semantics of a direct float comparison
handle NaN input properly.

The `abs` bit-fiddling is simple (a single and), and so these new
forms compile down to a few instructions, without branches, e.g. for
f32:

```asm
is_infinite:
        andps   xmm0, xmmword ptr [rip + .LCPI2_0] ; 0x7FFF_FFFF
        ucomiss xmm0, dword ptr [rip + .LCPI2_1]   ; 0x7F80_0000
        setae   al
        ret

is_finite:
        andps   xmm0, xmmword ptr [rip + .LCPI1_0] ; 0x7FFF_FFFF
        movss   xmm1, dword ptr [rip + .LCPI1_1]   ; 0x7F80_0000
        ucomiss xmm1, xmm0
        seta    al
        ret
```

When used in loops/repeatedly, they get even better: the memory
operations (loading the mask 0x7FFFFFFF for abs, and infinity
0x7F80_0000) are likely to be hoisted out of the individual calls, to
be shared, and the `seta`/`setae` are likely to be collapsed into
conditional jumps or moves (or similar).

The old `is_infinite` did two comparisons, and the old `is_finite` did
three (with a branch), and both of them had to check the flags after
every one of those comparison. These functions have had that old
implementation since they were added in
6284190
7 years ago.

Benchmark (`abs` is the new form, `std` is the old):

```
test f32_is_finite_abs            ... bench:          55 ns/iter (+/- 10)
test f32_is_finite_std            ... bench:         118 ns/iter (+/- 5)

test f32_is_infinite_abs          ... bench:          53 ns/iter (+/- 1)
test f32_is_infinite_std          ... bench:          84 ns/iter (+/- 6)

test f64_is_finite_abs            ... bench:          52 ns/iter (+/- 12)
test f64_is_finite_std            ... bench:         128 ns/iter (+/- 25)

test f64_is_infinite_abs          ... bench:          54 ns/iter (+/- 5)
test f64_is_infinite_std          ... bench:          93 ns/iter (+/- 23)
```

```rust
 #![feature(test)]
extern crate test;

use std::{f32, f64};
use test::Bencher;

const VALUES_F32: &[f32] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];

 #[bench]
fn f32_is_infinite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.is_infinite()));
}
 #[bench]
fn f32_is_infinite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.abs()== f32::INFINITY));
}
 #[bench]
fn f32_is_finite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.is_finite()));
}
 #[bench]
fn f32_is_finite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.abs() < f32::INFINITY));
}

const VALUES_F64: &[f64] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];

 #[bench]
fn f64_is_infinite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.is_infinite()));
}
 #[bench]
fn f64_is_infinite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.abs() == f64::INFINITY));
}
 #[bench]
fn f64_is_finite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.is_finite()));
}
 #[bench]
fn f64_is_finite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.abs() < f64::INFINITY));
}
```

@huonw huonw force-pushed the huonw:faster-finiteness-checks branch from 6a4473a to 6e742db Jan 7, 2019

@huonw

This comment has been minimized.

Copy link
Member

huonw commented Jan 7, 2019

Nice catch with the nit. I've updated to fix it.

@KodrAus

This comment has been minimized.

Copy link

KodrAus commented Jan 7, 2019

@bors r+

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jan 7, 2019

📌 Commit 6e742db has been approved by KodrAus

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jan 10, 2019

⌛️ Testing commit 6e742db with merge ab00b4b...

bors added a commit that referenced this pull request Jan 10, 2019

Auto merge of #57353 - huonw:faster-finiteness-checks, r=KodrAus
Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x).

These can both rely on IEEE754 semantics to be made faster, by folding
away the sign with an abs (left private for now), and then comparing
to infinity, letting the NaN semantics of a direct float comparison
handle NaN input properly.

The `abs` bit-fiddling is simple (a single and), and so these new
forms compile down to a few instructions, without branches, e.g. for
f32:

```asm
is_infinite:
        andps   xmm0, xmmword ptr [rip + .LCPI2_0] ; 0x7FFF_FFFF
        ucomiss xmm0, dword ptr [rip + .LCPI2_1]   ; 0x7F80_0000
        setae   al
        ret

is_finite:
        andps   xmm0, xmmword ptr [rip + .LCPI1_0] ; 0x7FFF_FFFF
        movss   xmm1, dword ptr [rip + .LCPI1_1]   ; 0x7F80_0000
        ucomiss xmm1, xmm0
        seta    al
        ret
```

When used in loops/repeatedly, they get even better: the memory
operations (loading the mask 0x7FFFFFFF for abs, and infinity
0x7F80_0000) are likely to be hoisted out of the individual calls, to
be shared, and the `seta`/`setae` are likely to be collapsed into
conditional jumps or moves (or similar).

The old `is_infinite` did two comparisons, and the old `is_finite` did
three (with a branch), and both of them had to check the flags after
every one of those comparison. These functions have had that old
implementation since they were added in
6284190
7 years ago.

Benchmark (`abs` is the new form, `std` is the old):

```
test f32_is_finite_abs            ... bench:          55 ns/iter (+/- 10)
test f32_is_finite_std            ... bench:         118 ns/iter (+/- 5)

test f32_is_infinite_abs          ... bench:          53 ns/iter (+/- 1)
test f32_is_infinite_std          ... bench:          84 ns/iter (+/- 6)

test f64_is_finite_abs            ... bench:          52 ns/iter (+/- 12)
test f64_is_finite_std            ... bench:         128 ns/iter (+/- 25)

test f64_is_infinite_abs          ... bench:          54 ns/iter (+/- 5)
test f64_is_infinite_std          ... bench:          93 ns/iter (+/- 23)
```

```rust
 #![feature(test)]
extern crate test;

use std::{f32, f64};
use test::Bencher;

const VALUES_F32: &[f32] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];

 #[bench]
fn f32_is_infinite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.is_infinite()));
}
 #[bench]
fn f32_is_infinite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.abs()== f32::INFINITY));
}
 #[bench]
fn f32_is_finite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.is_finite()));
}
 #[bench]
fn f32_is_finite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.abs() < f32::INFINITY));
}

const VALUES_F64: &[f64] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];

 #[bench]
fn f64_is_infinite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.is_infinite()));
}
 #[bench]
fn f64_is_infinite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.abs() == f64::INFINITY));
}
 #[bench]
fn f64_is_finite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.is_finite()));
}
 #[bench]
fn f64_is_finite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.abs() < f64::INFINITY));
}
```
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Jan 10, 2019

💔 Test failed - status-appveyor

@pietroalbini

This comment has been minimized.

Copy link
Member

pietroalbini commented Jan 10, 2019

@bors retry
AppVeyor... what's wrong with you today?

Centril added a commit to Centril/rust that referenced this pull request Jan 10, 2019

Rollup merge of rust-lang#57353 - huonw:faster-finiteness-checks, r=K…
…odrAus

Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x).

These can both rely on IEEE754 semantics to be made faster, by folding
away the sign with an abs (left private for now), and then comparing
to infinity, letting the NaN semantics of a direct float comparison
handle NaN input properly.

The `abs` bit-fiddling is simple (a single and), and so these new
forms compile down to a few instructions, without branches, e.g. for
f32:

```asm
is_infinite:
        andps   xmm0, xmmword ptr [rip + .LCPI2_0] ; 0x7FFF_FFFF
        ucomiss xmm0, dword ptr [rip + .LCPI2_1]   ; 0x7F80_0000
        setae   al
        ret

is_finite:
        andps   xmm0, xmmword ptr [rip + .LCPI1_0] ; 0x7FFF_FFFF
        movss   xmm1, dword ptr [rip + .LCPI1_1]   ; 0x7F80_0000
        ucomiss xmm1, xmm0
        seta    al
        ret
```

When used in loops/repeatedly, they get even better: the memory
operations (loading the mask 0x7FFFFFFF for abs, and infinity
0x7F80_0000) are likely to be hoisted out of the individual calls, to
be shared, and the `seta`/`setae` are likely to be collapsed into
conditional jumps or moves (or similar).

The old `is_infinite` did two comparisons, and the old `is_finite` did
three (with a branch), and both of them had to check the flags after
every one of those comparison. These functions have had that old
implementation since they were added in
rust-lang@6284190
7 years ago.

Benchmark (`abs` is the new form, `std` is the old):

```
test f32_is_finite_abs            ... bench:          55 ns/iter (+/- 10)
test f32_is_finite_std            ... bench:         118 ns/iter (+/- 5)

test f32_is_infinite_abs          ... bench:          53 ns/iter (+/- 1)
test f32_is_infinite_std          ... bench:          84 ns/iter (+/- 6)

test f64_is_finite_abs            ... bench:          52 ns/iter (+/- 12)
test f64_is_finite_std            ... bench:         128 ns/iter (+/- 25)

test f64_is_infinite_abs          ... bench:          54 ns/iter (+/- 5)
test f64_is_infinite_std          ... bench:          93 ns/iter (+/- 23)
```

```rust
 #![feature(test)]
extern crate test;

use std::{f32, f64};
use test::Bencher;

const VALUES_F32: &[f32] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];

 #[bench]
fn f32_is_infinite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.is_infinite()));
}
 #[bench]
fn f32_is_infinite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.abs()== f32::INFINITY));
}
 #[bench]
fn f32_is_finite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.is_finite()));
}
 #[bench]
fn f32_is_finite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.abs() < f32::INFINITY));
}

const VALUES_F64: &[f64] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];

 #[bench]
fn f64_is_infinite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.is_infinite()));
}
 #[bench]
fn f64_is_infinite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.abs() == f64::INFINITY));
}
 #[bench]
fn f64_is_finite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.is_finite()));
}
 #[bench]
fn f64_is_finite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.abs() < f64::INFINITY));
}
```

bors added a commit that referenced this pull request Jan 11, 2019

Auto merge of #57503 - Centril:rollup, r=Centril
Rollup of 19 pull requests

Successful merges:

 - #56425 (Redo the docs for Vec::set_len)
 - #56906 (Issue #56905)
 - #57042 (Don't call `FieldPlacement::count` when count is too large)
 - #57192 (Change std::error::Error trait documentation to talk about `source` instead of `cause`)
 - #57296 (Fixed the link to the ? operator)
 - #57353 (Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x).)
 - #57368 (Use CMAKE_{C,CXX}_COMPILER_LAUNCHER for ccache)
 - #57400 (Rustdoc: update Source Serif Pro and replace Heuristica italic)
 - #57412 (Improve the wording)
 - #57417 (rustdoc: use text-based doctest parsing if a macro is wrapping main)
 - #57433 (Add link destination for `read-ownership`)
 - #57434 (Remove `CrateNum::Invalid`.)
 - #57441 (Supporting backtrace for x86_64-fortanix-unknown-sgx.)
 - #57450 (actually take a slice in this example)
 - #57454 (Some cleanups for core::fmt)
 - #57459 (Reference tracking issue for inherent associated types in diagnostic)
 - #57463 (docs: Fix some 'second-edition' links)
 - #57466 (Remove outdated comment)
 - #57493 (use structured suggestion when casting a reference)

Failed merges:

r? @ghost

bors added a commit that referenced this pull request Jan 11, 2019

Auto merge of #57503 - Centril:rollup, r=Centril
Rollup of 19 pull requests

Successful merges:

 - #56425 (Redo the docs for Vec::set_len)
 - #56906 (Issue #56905)
 - #57042 (Don't call `FieldPlacement::count` when count is too large)
 - #57192 (Change std::error::Error trait documentation to talk about `source` instead of `cause`)
 - #57296 (Fixed the link to the ? operator)
 - #57353 (Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x).)
 - #57368 (Use CMAKE_{C,CXX}_COMPILER_LAUNCHER for ccache)
 - #57400 (Rustdoc: update Source Serif Pro and replace Heuristica italic)
 - #57412 (Improve the wording)
 - #57417 (rustdoc: use text-based doctest parsing if a macro is wrapping main)
 - #57433 (Add link destination for `read-ownership`)
 - #57434 (Remove `CrateNum::Invalid`.)
 - #57441 (Supporting backtrace for x86_64-fortanix-unknown-sgx.)
 - #57450 (actually take a slice in this example)
 - #57454 (Some cleanups for core::fmt)
 - #57459 (Reference tracking issue for inherent associated types in diagnostic)
 - #57463 (docs: Fix some 'second-edition' links)
 - #57466 (Remove outdated comment)
 - #57493 (use structured suggestion when casting a reference)

Failed merges:

r? @ghost

pietroalbini added a commit to pietroalbini/rust that referenced this pull request Jan 12, 2019

Rollup merge of rust-lang#57353 - huonw:faster-finiteness-checks, r=K…
…odrAus

Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x).

These can both rely on IEEE754 semantics to be made faster, by folding
away the sign with an abs (left private for now), and then comparing
to infinity, letting the NaN semantics of a direct float comparison
handle NaN input properly.

The `abs` bit-fiddling is simple (a single and), and so these new
forms compile down to a few instructions, without branches, e.g. for
f32:

```asm
is_infinite:
        andps   xmm0, xmmword ptr [rip + .LCPI2_0] ; 0x7FFF_FFFF
        ucomiss xmm0, dword ptr [rip + .LCPI2_1]   ; 0x7F80_0000
        setae   al
        ret

is_finite:
        andps   xmm0, xmmword ptr [rip + .LCPI1_0] ; 0x7FFF_FFFF
        movss   xmm1, dword ptr [rip + .LCPI1_1]   ; 0x7F80_0000
        ucomiss xmm1, xmm0
        seta    al
        ret
```

When used in loops/repeatedly, they get even better: the memory
operations (loading the mask 0x7FFFFFFF for abs, and infinity
0x7F80_0000) are likely to be hoisted out of the individual calls, to
be shared, and the `seta`/`setae` are likely to be collapsed into
conditional jumps or moves (or similar).

The old `is_infinite` did two comparisons, and the old `is_finite` did
three (with a branch), and both of them had to check the flags after
every one of those comparison. These functions have had that old
implementation since they were added in
rust-lang@6284190
7 years ago.

Benchmark (`abs` is the new form, `std` is the old):

```
test f32_is_finite_abs            ... bench:          55 ns/iter (+/- 10)
test f32_is_finite_std            ... bench:         118 ns/iter (+/- 5)

test f32_is_infinite_abs          ... bench:          53 ns/iter (+/- 1)
test f32_is_infinite_std          ... bench:          84 ns/iter (+/- 6)

test f64_is_finite_abs            ... bench:          52 ns/iter (+/- 12)
test f64_is_finite_std            ... bench:         128 ns/iter (+/- 25)

test f64_is_infinite_abs          ... bench:          54 ns/iter (+/- 5)
test f64_is_infinite_std          ... bench:          93 ns/iter (+/- 23)
```

```rust
 #![feature(test)]
extern crate test;

use std::{f32, f64};
use test::Bencher;

const VALUES_F32: &[f32] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];

 #[bench]
fn f32_is_infinite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.is_infinite()));
}
 #[bench]
fn f32_is_infinite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.abs()== f32::INFINITY));
}
 #[bench]
fn f32_is_finite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.is_finite()));
}
 #[bench]
fn f32_is_finite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.abs() < f32::INFINITY));
}

const VALUES_F64: &[f64] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];

 #[bench]
fn f64_is_infinite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.is_infinite()));
}
 #[bench]
fn f64_is_infinite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.abs() == f64::INFINITY));
}
 #[bench]
fn f64_is_finite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.is_finite()));
}
 #[bench]
fn f64_is_finite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.abs() < f64::INFINITY));
}
```

bors added a commit that referenced this pull request Jan 12, 2019

Auto merge of #57554 - pietroalbini:rollup, r=pietroalbini
Rollup of 15 pull requests

Successful merges:

 - #57351 (Don't actually create a full MIR stack frame when not needed)
 - #57353 (Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x).)
 - #57392 (Always calculate glob map but only for glob uses)
 - #57412 (Improve the wording)
 - #57436 (save-analysis: use a fallback when access levels couldn't be computed)
 - #57442 (Simplify `ConstValue::ScalarPair`)
 - #57453 (lldb_batchmode.py: try `import _thread` for Python 3)
 - #57454 (Some cleanups for core::fmt)
 - #57461 (Change `String` to `&'static str` in `ParseResult::Failure`.)
 - #57473 (std: Render large exit codes as hex on Windows)
 - #57474 (save-analysis: Get path def from parent in case there's no def for the path itself.)
 - #57494 (Speed up item_bodies for large match statements involving regions)
 - #57496 (re-do docs for core::cmp)
 - #57508 (rustdoc: Allow inlining of reexported crates and crate items)
 - #57547 (Use `ptr::eq` where applicable)

Failed merges:

r? @ghost

Centril added a commit to Centril/rust that referenced this pull request Jan 13, 2019

Rollup merge of rust-lang#57353 - huonw:faster-finiteness-checks, r=K…
…odrAus

Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x).

These can both rely on IEEE754 semantics to be made faster, by folding
away the sign with an abs (left private for now), and then comparing
to infinity, letting the NaN semantics of a direct float comparison
handle NaN input properly.

The `abs` bit-fiddling is simple (a single and), and so these new
forms compile down to a few instructions, without branches, e.g. for
f32:

```asm
is_infinite:
        andps   xmm0, xmmword ptr [rip + .LCPI2_0] ; 0x7FFF_FFFF
        ucomiss xmm0, dword ptr [rip + .LCPI2_1]   ; 0x7F80_0000
        setae   al
        ret

is_finite:
        andps   xmm0, xmmword ptr [rip + .LCPI1_0] ; 0x7FFF_FFFF
        movss   xmm1, dword ptr [rip + .LCPI1_1]   ; 0x7F80_0000
        ucomiss xmm1, xmm0
        seta    al
        ret
```

When used in loops/repeatedly, they get even better: the memory
operations (loading the mask 0x7FFFFFFF for abs, and infinity
0x7F80_0000) are likely to be hoisted out of the individual calls, to
be shared, and the `seta`/`setae` are likely to be collapsed into
conditional jumps or moves (or similar).

The old `is_infinite` did two comparisons, and the old `is_finite` did
three (with a branch), and both of them had to check the flags after
every one of those comparison. These functions have had that old
implementation since they were added in
rust-lang@6284190
7 years ago.

Benchmark (`abs` is the new form, `std` is the old):

```
test f32_is_finite_abs            ... bench:          55 ns/iter (+/- 10)
test f32_is_finite_std            ... bench:         118 ns/iter (+/- 5)

test f32_is_infinite_abs          ... bench:          53 ns/iter (+/- 1)
test f32_is_infinite_std          ... bench:          84 ns/iter (+/- 6)

test f64_is_finite_abs            ... bench:          52 ns/iter (+/- 12)
test f64_is_finite_std            ... bench:         128 ns/iter (+/- 25)

test f64_is_infinite_abs          ... bench:          54 ns/iter (+/- 5)
test f64_is_infinite_std          ... bench:          93 ns/iter (+/- 23)
```

```rust
 #![feature(test)]
extern crate test;

use std::{f32, f64};
use test::Bencher;

const VALUES_F32: &[f32] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];

 #[bench]
fn f32_is_infinite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.is_infinite()));
}
 #[bench]
fn f32_is_infinite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.abs()== f32::INFINITY));
}
 #[bench]
fn f32_is_finite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.is_finite()));
}
 #[bench]
fn f32_is_finite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.abs() < f32::INFINITY));
}

const VALUES_F64: &[f64] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];

 #[bench]
fn f64_is_infinite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.is_infinite()));
}
 #[bench]
fn f64_is_infinite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.abs() == f64::INFINITY));
}
 #[bench]
fn f64_is_finite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.is_finite()));
}
 #[bench]
fn f64_is_finite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.abs() < f64::INFINITY));
}
```

bors added a commit that referenced this pull request Jan 13, 2019

Auto merge of #57561 - Centril:rollup, r=Centril
Rollup of 15 pull requests

Successful merges:

 - #57351 (Don't actually create a full MIR stack frame when not needed)
 - #57353 (Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x).)
 - #57392 (Always calculate glob map but only for glob uses)
 - #57412 (Improve the wording)
 - #57436 (save-analysis: use a fallback when access levels couldn't be computed)
 - #57453 (lldb_batchmode.py: try `import _thread` for Python 3)
 - #57454 (Some cleanups for core::fmt)
 - #57461 (Change `String` to `&'static str` in `ParseResult::Failure`.)
 - #57473 (std: Render large exit codes as hex on Windows)
 - #57474 (save-analysis: Get path def from parent in case there's no def for the path itself.)
 - #57494 (Speed up item_bodies for large match statements involving regions)
 - #57496 (re-do docs for core::cmp)
 - #57508 (rustdoc: Allow inlining of reexported crates and crate items)
 - #57547 (Use `ptr::eq` where applicable)
 - #57560 (hygiene: Do not treat `Self` ctor as a local variable)

Failed merges:

r? @ghost

Centril added a commit to Centril/rust that referenced this pull request Jan 13, 2019

Rollup merge of rust-lang#57353 - huonw:faster-finiteness-checks, r=K…
…odrAus

Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x).

These can both rely on IEEE754 semantics to be made faster, by folding
away the sign with an abs (left private for now), and then comparing
to infinity, letting the NaN semantics of a direct float comparison
handle NaN input properly.

The `abs` bit-fiddling is simple (a single and), and so these new
forms compile down to a few instructions, without branches, e.g. for
f32:

```asm
is_infinite:
        andps   xmm0, xmmword ptr [rip + .LCPI2_0] ; 0x7FFF_FFFF
        ucomiss xmm0, dword ptr [rip + .LCPI2_1]   ; 0x7F80_0000
        setae   al
        ret

is_finite:
        andps   xmm0, xmmword ptr [rip + .LCPI1_0] ; 0x7FFF_FFFF
        movss   xmm1, dword ptr [rip + .LCPI1_1]   ; 0x7F80_0000
        ucomiss xmm1, xmm0
        seta    al
        ret
```

When used in loops/repeatedly, they get even better: the memory
operations (loading the mask 0x7FFFFFFF for abs, and infinity
0x7F80_0000) are likely to be hoisted out of the individual calls, to
be shared, and the `seta`/`setae` are likely to be collapsed into
conditional jumps or moves (or similar).

The old `is_infinite` did two comparisons, and the old `is_finite` did
three (with a branch), and both of them had to check the flags after
every one of those comparison. These functions have had that old
implementation since they were added in
rust-lang@6284190
7 years ago.

Benchmark (`abs` is the new form, `std` is the old):

```
test f32_is_finite_abs            ... bench:          55 ns/iter (+/- 10)
test f32_is_finite_std            ... bench:         118 ns/iter (+/- 5)

test f32_is_infinite_abs          ... bench:          53 ns/iter (+/- 1)
test f32_is_infinite_std          ... bench:          84 ns/iter (+/- 6)

test f64_is_finite_abs            ... bench:          52 ns/iter (+/- 12)
test f64_is_finite_std            ... bench:         128 ns/iter (+/- 25)

test f64_is_infinite_abs          ... bench:          54 ns/iter (+/- 5)
test f64_is_infinite_std          ... bench:          93 ns/iter (+/- 23)
```

```rust
 #![feature(test)]
extern crate test;

use std::{f32, f64};
use test::Bencher;

const VALUES_F32: &[f32] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];

 #[bench]
fn f32_is_infinite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.is_infinite()));
}
 #[bench]
fn f32_is_infinite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.abs()== f32::INFINITY));
}
 #[bench]
fn f32_is_finite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.is_finite()));
}
 #[bench]
fn f32_is_finite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.abs() < f32::INFINITY));
}

const VALUES_F64: &[f64] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];

 #[bench]
fn f64_is_infinite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.is_infinite()));
}
 #[bench]
fn f64_is_infinite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.abs() == f64::INFINITY));
}
 #[bench]
fn f64_is_finite_std(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.is_finite()));
}
 #[bench]
fn f64_is_finite_abs(b: &mut Bencher) {
    b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.abs() < f64::INFINITY));
}
```

bors added a commit that referenced this pull request Jan 13, 2019

Auto merge of #57568 - Centril:rollup, r=Centril
Rollup of 16 pull requests

Successful merges:

 - #57351 (Don't actually create a full MIR stack frame when not needed)
 - #57353 (Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x).)
 - #57412 (Improve the wording)
 - #57436 (save-analysis: use a fallback when access levels couldn't be computed)
 - #57453 (lldb_batchmode.py: try `import _thread` for Python 3)
 - #57454 (Some cleanups for core::fmt)
 - #57461 (Change `String` to `&'static str` in `ParseResult::Failure`.)
 - #57473 (std: Render large exit codes as hex on Windows)
 - #57474 (save-analysis: Get path def from parent in case there's no def for the path itself.)
 - #57494 (Speed up item_bodies for large match statements involving regions)
 - #57496 (re-do docs for core::cmp)
 - #57508 (rustdoc: Allow inlining of reexported crates and crate items)
 - #57547 (Use `ptr::eq` where applicable)
 - #57557 (resolve: Mark extern crate items as used in more cases)
 - #57560 (hygiene: Do not treat `Self` ctor as a local variable)
 - #57564 (Update the const fn tracking issue to the new metabug)

Failed merges:

r? @ghost

bors added a commit that referenced this pull request Jan 13, 2019

Auto merge of #57568 - Centril:rollup, r=Centril
Rollup of 16 pull requests

Successful merges:

 - #57351 (Don't actually create a full MIR stack frame when not needed)
 - #57353 (Optimise floating point `is_finite` (2x) and `is_infinite` (1.6x).)
 - #57412 (Improve the wording)
 - #57436 (save-analysis: use a fallback when access levels couldn't be computed)
 - #57453 (lldb_batchmode.py: try `import _thread` for Python 3)
 - #57454 (Some cleanups for core::fmt)
 - #57461 (Change `String` to `&'static str` in `ParseResult::Failure`.)
 - #57473 (std: Render large exit codes as hex on Windows)
 - #57474 (save-analysis: Get path def from parent in case there's no def for the path itself.)
 - #57494 (Speed up item_bodies for large match statements involving regions)
 - #57496 (re-do docs for core::cmp)
 - #57508 (rustdoc: Allow inlining of reexported crates and crate items)
 - #57547 (Use `ptr::eq` where applicable)
 - #57557 (resolve: Mark extern crate items as used in more cases)
 - #57560 (hygiene: Do not treat `Self` ctor as a local variable)
 - #57564 (Update the const fn tracking issue to the new metabug)

Failed merges:

r? @ghost

@bors bors merged commit 6e742db into rust-lang:master Jan 13, 2019

1 of 2 checks passed

homu Test failed
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@huonw huonw deleted the huonw:faster-finiteness-checks branch Jan 15, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment