-
-
Notifications
You must be signed in to change notification settings - Fork 14.8k
u8::reverse_bits is too slow #61989
Copy link
Copy link
Open
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-bugCategory: This is a bug.Category: This is a bug.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.O-x86_32Target: x86 processors, 32 bit (like i686-*) (also known as IA-32, i386, i586, i686)Target: x86 processors, 32 bit (like i686-*) (also known as IA-32, i386, i586, i686)O-x86_64Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.
Metadata
Metadata
Assignees
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-bugCategory: This is a bug.Category: This is a bug.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.O-x86_32Target: x86 processors, 32 bit (like i686-*) (also known as IA-32, i386, i586, i686)Target: x86 processors, 32 bit (like i686-*) (also known as IA-32, i386, i586, i686)O-x86_64Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64)T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.
Type
Fields
Give feedbackNo fields configured for issues without a type.
While upgrading the
bitintrcrate I re-ran its benchmarks and found out that the stable implementation there is much faster than the stabilizedu8::reverse_bitsintrinsic available on nightly.I'm comparing this implementation of
u8::reverse_bits:vs
u8::reverse_bits.My benchmark there isn't super tight, each iteration calls reverse_bits on all [0, 255] integers :
On my laptop (x86_64 1.8Ghz i5), I'm getting 343 ns/iter for
rbit_u8, while foru8::reverse_bitsI'm getting 619 ns/iter. Dividing by 256 that's 1.34 (mine) vs 2.42 (libstd) ns / bitreverse.All of this somehow rings a bell; the
bitintrcrate had a benchmark specifically for this operation, and it was previously comparing its own implementations againstcore::intrinsic::bitreverse, and it had a workaround for using its own implementation even when the user was on nightly and explicitly enabled usingcore::intrinsicsvia anunstablecargo feature. I guess I should have written a comment back then.