Skip to content

_bzhi_u32/_bzhi_u64 has poor codegen without bmi2 feature #88566

@mqudsi

Description

@mqudsi

As reported in this article about some issues encountered using simd with rust, calls to bzhi intrinsics made without the bmi2 cpu target feature enabled gives some odd codegen. The bzhi instruction isn't emulated and is still executed directly - but the the intrinsic is never inlined resulting in a completely unnecessary function call (which may be in a hot path).

e.g.

#[target_feature(enable = "avx2")]
pub unsafe fn bzhi(num: u32) -> u32 {
    core::arch::x86_64::_bzhi_u32(num, 31)
}

compiles to

core::core_arch::x86::bmi2::_bzhi_u32:
        sub     rsp, 4
        bzhi    eax, edi, esi
        mov     dword ptr [rsp], eax
        mov     eax, dword ptr [rsp]
        add     rsp, 4
        ret

example::bzhi:
        push    rax
        mov     esi, 31
        call    core::core_arch::x86::bmi2::_bzhi_u32
        mov     dword ptr [rsp + 4], eax
        mov     eax, dword ptr [rsp + 4]
        pop     rcx
        ret

(and this is what it looks like with optimizations enabled when it can't just jmp to the intrinsic: godbolt)

Other intrinsics get inlined after emulation all the time (e.g. ctlz); this one isn't emulated but it's not inlined, either.

I'm not sure if there's a good reason for this or not, so please pardon me if I'm just missing something obvious. Is it intentional to prevent a #UD in some odd cases where the mere presence of the unrecognized instruction, even if not called, is a problem but it can be moved into another function without a problem?

(Also, it would be amazing if we could discuss having a feature like avx2 unlock guaranteed available (but not synonymous) features like bmi1 and bmi2 but this is not the place for that.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-codegenArea: Code generationC-bugCategory: This is a bug.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions