Skip to content

u8::reverse_bits is too slow #61989

@gnzlbg

Description

@gnzlbg

While upgrading the bitintr crate I re-ran its benchmarks and found out that the stable implementation there is much faster than the stabilized u8::reverse_bits intrinsic available on nightly.

I'm comparing this implementation of u8::reverse_bits:

fn rbit_u8(x: u8) -> u8 {
    (((((x as u64) * 0x80200802_u64) & 0x0884422110_u64) * 0x0101010101_u64)
        >> 32) as u8
}

vs u8::reverse_bits.

My benchmark there isn't super tight, each iteration calls reverse_bits on all [0, 255] integers :

fn u8_runner<F: Fn(u8) -> u8>(bench: &mut Bencher, f: F) {
    bench.iter(|| {
        for v in 0..=u8::max_value() {
            bencher::black_box(f(bencher::black_box(v)));
        }
    })
}

#[bench]
fn rbit_u8_std(bench: &mut Bencher) {
    u8_runner(bench, |x| x.reverse_bits()))
}

#[bench]
fn rbit_u8_self(bench: &mut Bencher) {
    u8_runner(bench, |x| rbit_u8(x)))
}

On my laptop (x86_64 1.8Ghz i5), I'm getting 343 ns/iter for rbit_u8, while for u8::reverse_bits I'm getting 619 ns/iter. Dividing by 256 that's 1.34 (mine) vs 2.42 (libstd) ns / bitreverse.

All of this somehow rings a bell; the bitintr crate had a benchmark specifically for this operation, and it was previously comparing its own implementations against core::intrinsic::bitreverse, and it had a workaround for using its own implementation even when the user was on nightly and explicitly enabled using core::intrinsics via an unstable cargo feature. I guess I should have written a comment back then.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-bugCategory: This is a bug.I-slowIssue: Problems and improvements with respect to performance of generated code.O-x86_32Target: x86 processors, 32 bit (like i686-*) (also known as IA-32, i386, i586, i686)O-x86_64Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions