-
-
Notifications
You must be signed in to change notification settings - Fork 14.8k
_bzhi_u32/_bzhi_u64 has poor codegen without bmi2 feature #88566
Copy link
Copy link
Open
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-codegenArea: Code generationArea: Code generationC-bugCategory: This is a bug.Category: This is a bug.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.
Metadata
Metadata
Assignees
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-codegenArea: Code generationArea: Code generationC-bugCategory: This is a bug.Category: This is a bug.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.
Type
Fields
Give feedbackNo fields configured for issues without a type.
As reported in this article about some issues encountered using simd with rust, calls to
bzhiintrinsics made without thebmi2cpu target feature enabled gives some odd codegen. Thebzhiinstruction isn't emulated and is still executed directly - but the the intrinsic is never inlined resulting in a completely unnecessary function call (which may be in a hot path).e.g.
compiles to
(and this is what it looks like with optimizations enabled when it can't just
jmpto the intrinsic: godbolt)Other intrinsics get inlined after emulation all the time (e.g. ctlz); this one isn't emulated but it's not inlined, either.
I'm not sure if there's a good reason for this or not, so please pardon me if I'm just missing something obvious. Is it intentional to prevent a
#UDin some odd cases where the mere presence of the unrecognized instruction, even if not called, is a problem but it can be moved into another function without a problem?(Also, it would be amazing if we could discuss having a feature like
avx2unlock guaranteed available (but not synonymous) features likebmi1andbmi2but this is not the place for that.)