Skip to content

branchless binary search is not branchless since rust 1.25 #57935

Closed
@fulmicoton

Description

The std binary search implementation is supposed to rely on a branchless implementation.
The implementation rely on the idea that the idiom

   let base = if cond { base } else { something_else }; 

compiles as a conditional move.

Until rust < 1.25, rustc was very aggressively relying on conditional moves to avoid branches.
With rust >= 1..25, rustc seem to consistently compile the expression above with conditional jump.

I haven't benchmarked it, but this should have resulted in a performance regression.

Example code

pub fn binary_search(arr: &[u32], mut base: usize, mut len: usize, target: u32) -> usize {
    while len > 1 {
        let half = len / 2;
        let mid = base + half;
        let pivot = *unsafe { arr.get_unchecked(mid) };
        let cmp = pivot>target;
        base = if cmp { base } else { mid };
        len -= half;
    }
    base + ((*unsafe { arr.get_unchecked(base) } < target) as usize)
}

Assembly on 1.24

Godbolt

example::binary_search:
        push    rbp
        mov     rbp, rsp
        cmp     rcx, 2
        jb      .LBB0_1
.LBB0_3:
        mov     rsi, rcx
        shr     rsi
        lea     rax, [rdx + rsi]
        cmp     dword ptr [rdi + 4*rax], r8d
        cmova   rax, rdx
        sub     rcx, rsi
        cmp     rcx, 1
        mov     rdx, rax
        ja      .LBB0_3
        jmp     .LBB0_2
.LBB0_1:
        mov     rax, rdx
.LBB0_2:
        cmp     dword ptr [rdi + 4*rax], r8d
        adc     rax, 0
        pop     rbp
        ret

Assembly on 1.25

Godbolt

example::binary_search:
        push    rbp
        mov     rbp, rsp
        cmp     rcx, 2
        jb      .LBB0_4
.LBB0_1:
        mov     rax, rcx
        shr     rax
        lea     rsi, [rdx + rax]
        cmp     dword ptr [rdi + 4*rsi], r8d
        ja      .LBB0_3
        mov     rdx, rsi
.LBB0_3:
        sub     rcx, rax
        cmp     rcx, 1
        ja      .LBB0_1
.LBB0_4:
        cmp     dword ptr [rdi + 4*rdx], r8d
        adc     rdx, 0
        mov     rax, rdx
        pop     rbp
        ret

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.I-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions