Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise floating point is_finite (2x) and is_infinite (1.6x). #57353

Merged
merged 1 commit into from
Jan 13, 2019

Commits on Jan 7, 2019

  1. Optimise floating point is_finite (2x) and is_infinite (1.6x).

    These can both rely on IEEE754 semantics to be made faster, by folding
    away the sign with an abs (left private for now), and then comparing
    to infinity, letting the NaN semantics of a direct float comparison
    handle NaN input properly.
    
    The `abs` bit-fiddling is simple (a single and), and so these new
    forms compile down to a few instructions, without branches, e.g. for
    f32:
    
    ```asm
    is_infinite:
            andps   xmm0, xmmword ptr [rip + .LCPI2_0] ; 0x7FFF_FFFF
            ucomiss xmm0, dword ptr [rip + .LCPI2_1]   ; 0x7F80_0000
            setae   al
            ret
    
    is_finite:
            andps   xmm0, xmmword ptr [rip + .LCPI1_0] ; 0x7FFF_FFFF
            movss   xmm1, dword ptr [rip + .LCPI1_1]   ; 0x7F80_0000
            ucomiss xmm1, xmm0
            seta    al
            ret
    ```
    
    When used in loops/repeatedly, they get even better: the memory
    operations (loading the mask 0x7FFFFFFF for abs, and infinity
    0x7F80_0000) are likely to be hoisted out of the individual calls, to
    be shared, and the `seta`/`setae` are likely to be collapsed into
    conditional jumps or moves (or similar).
    
    The old `is_infinite` did two comparisons, and the old `is_finite` did
    three (with a branch), and both of them had to check the flags after
    every one of those comparison. These functions have had that old
    implementation since they were added in
    rust-lang@6284190
    7 years ago.
    
    Benchmark (`abs` is the new form, `std` is the old):
    
    ```
    test f32_is_finite_abs            ... bench:          55 ns/iter (+/- 10)
    test f32_is_finite_std            ... bench:         118 ns/iter (+/- 5)
    
    test f32_is_infinite_abs          ... bench:          53 ns/iter (+/- 1)
    test f32_is_infinite_std          ... bench:          84 ns/iter (+/- 6)
    
    test f64_is_finite_abs            ... bench:          52 ns/iter (+/- 12)
    test f64_is_finite_std            ... bench:         128 ns/iter (+/- 25)
    
    test f64_is_infinite_abs          ... bench:          54 ns/iter (+/- 5)
    test f64_is_infinite_std          ... bench:          93 ns/iter (+/- 23)
    ```
    
    ```rust
     #![feature(test)]
    extern crate test;
    
    use std::{f32, f64};
    use test::Bencher;
    
    const VALUES_F32: &[f32] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];
    
     #[bench]
    fn f32_is_infinite_std(b: &mut Bencher) {
        b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.is_infinite()));
    }
     #[bench]
    fn f32_is_infinite_abs(b: &mut Bencher) {
        b.iter(|| test::black_box(VALUES_F32).iter().any(|x| x.abs()== f32::INFINITY));
    }
     #[bench]
    fn f32_is_finite_std(b: &mut Bencher) {
        b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.is_finite()));
    }
     #[bench]
    fn f32_is_finite_abs(b: &mut Bencher) {
        b.iter(|| test::black_box(VALUES_F32).iter().all(|x| x.abs() < f32::INFINITY));
    }
    
    const VALUES_F64: &[f64] = &[0.910, 0.135, 0.735, -0.874, 0.518, 0.150, -0.527, -0.418, 0.449, -0.158, -0.064, -0.144, -0.948, -0.103, 0.225, -0.104, -0.795, 0.435, 0.860, 0.027, 0.625, -0.848, -0.454, 0.359, -0.930, 0.067, 0.642, 0.976, -0.682, -0.035, 0.750, 0.005, -0.825, 0.731, -0.850, -0.740, -0.118, -0.972, 0.888, -0.958, 0.086, 0.237, -0.580, 0.488, 0.028, -0.552, 0.302, 0.058, -0.229, -0.166, -0.248, -0.430, 0.789, -0.122, 0.120, -0.934, -0.911, -0.976, 0.882, -0.410, 0.311, -0.611, -0.758, 0.786, -0.711, 0.378, 0.803, -0.068, 0.932, 0.483, 0.085, 0.247, -0.128, -0.839, -0.737, -0.605, 0.637, -0.230, -0.502, 0.231, -0.694, -0.400, -0.441, 0.142, 0.174, 0.681, -0.763, -0.608, 0.848, -0.550, 0.883, -0.212, 0.876, 0.186, -0.909, 0.401, -0.533, -0.961, 0.539, -0.298, -0.448, 0.223, -0.307, -0.594, 0.629, -0.534, 0.959, 0.349, -0.926, -0.523, -0.895, -0.157, -0.074, -0.060, 0.513, -0.647, -0.649, 0.428, 0.401, 0.391, 0.426, 0.700, 0.880, -0.101, 0.862, 0.493, 0.819, -0.597];
    
     #[bench]
    fn f64_is_infinite_std(b: &mut Bencher) {
        b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.is_infinite()));
    }
     #[bench]
    fn f64_is_infinite_abs(b: &mut Bencher) {
        b.iter(|| test::black_box(VALUES_F64).iter().any(|x| x.abs() == f64::INFINITY));
    }
     #[bench]
    fn f64_is_finite_std(b: &mut Bencher) {
        b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.is_finite()));
    }
     #[bench]
    fn f64_is_finite_abs(b: &mut Bencher) {
        b.iter(|| test::black_box(VALUES_F64).iter().all(|x| x.abs() < f64::INFINITY));
    }
    ```
    huonw committed Jan 7, 2019
    Configuration menu
    Copy the full SHA
    6e742db View commit details
    Browse the repository at this point in the history