u8::reverse_bits is too slow

While upgrading the `bitintr` crate I re-ran its benchmarks and found out that the stable implementation there is much faster than the stabilized `u8::reverse_bits` intrinsic available on nightly. 

I'm comparing this implementation of `u8::reverse_bits`:

```rust
fn rbit_u8(x: u8) -> u8 {
    (((((x as u64) * 0x80200802_u64) & 0x0884422110_u64) * 0x0101010101_u64)
        >> 32) as u8
}
```

vs `u8::reverse_bits`. 

My benchmark there isn't super tight, each iteration calls reverse_bits on all [0, 255] integers :

```rust
fn u8_runner<F: Fn(u8) -> u8>(bench: &mut Bencher, f: F) {
    bench.iter(|| {
        for v in 0..=u8::max_value() {
            bencher::black_box(f(bencher::black_box(v)));
        }
    })
}

#[bench]
fn rbit_u8_std(bench: &mut Bencher) {
    u8_runner(bench, |x| x.reverse_bits()))
}

#[bench]
fn rbit_u8_self(bench: &mut Bencher) {
    u8_runner(bench, |x| rbit_u8(x)))
}
```

On my laptop (x86_64 1.8Ghz i5), I'm getting 343 ns/iter for `rbit_u8`, while for `u8::reverse_bits` I'm getting 619 ns/iter. Dividing by 256 that's 1.34 (mine) vs 2.42 (libstd) ns / bitreverse.

All of this somehow rings a bell; the `bitintr` crate had a benchmark specifically for this operation, and it was previously comparing its own implementations against `core::intrinsic::bitreverse`, and it had a workaround for using its own implementation even when the user was on nightly and explicitly enabled using `core::intrinsics` via an `unstable` cargo feature. I guess I should have written a comment back then.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

u8::reverse_bits is too slow #61989

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

u8::reverse_bits is too slow #61989

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions