Skip to content

Simple Rust match vs C switch equivalent produce slower code #61961

@julianlalu

Description

@julianlalu

When comparing simple i32 switch case in C (Compiled with Clang 8.0.0) and Rust (1.35) we have slower assembly produce by the Rust compiler ( Time evaluated with LLVM MCA)

Here is the Rust code:

pub fn func(x :i32) -> i32 {
    match x {
        1 => x+1,
        3 => x+3,
        5 => x+5,
        7 => x+7,
        _ => x*x,
    }
}

Rust compiler 1.35 (-C opt-level=3) produce the following assembly for intel

example::func:
        lea     eax, [rdi - 1]
        cmp     eax, 7
        jae     .LBB0_3
        mov     ecx, 85
        bt      ecx, eax
        jb      .LBB0_2
.LBB0_3:
        imul    edi, edi
        mov     eax, edi
        ret
.LBB0_2:
        cdqe
        lea     rcx, [rip + .Lswitch.table.example::func]
        mov     eax, dword ptr [rcx + 4*rax]
        ret

.Lswitch.table.example::func:
        .long   2
        .long   2
        .long   6
        .long   2
        .long   10
        .long   2
        .long   14

Here is the equivalent C code

int func(int x) {
    switch(x)
    {
        case 1: return x+1;
        case 3: return x+3;
        case 5: return x+5;
        case 7: return x+7;
        default: return x*x;
   }
}

Clang compiler 8.0.0 (-O3) produce the following assembly for intel

func:                                   # @func
        lea     eax, [rdi - 1]
        cmp     eax, 7
        jae     .LBB0_3
        mov     ecx, 85
        bt      ecx, eax
        jb      .LBB0_2
.LBB0_3:
        imul    edi, edi
        mov     eax, edi
        ret
.LBB0_2:
        cdqe
        mov     eax, dword ptr [4*rax + .Lswitch.table.func]
        ret
.Lswitch.table.func:
        .long   2                       # 0x2
        .long   2                       # 0x2
        .long   6                       # 0x6
        .long   2                       # 0x2
        .long   10                      # 0xa
        .long   2                       # 0x2
        .long   14                      # 0xe

We can see that

lea     rcx, [rip + .Lswitch.table.example::func]
mov     eax, dword ptr [rcx + 4*rax]

produce by Rust can be what Clang do

mov eax, dword ptr [4*rax + .Lswitch.table.func]

According to LLVM MCA Rust 1.35 produce slower code than C for no reason here:
Rust 1.35 LLVM MCA report is:

Iterations: 100
Instructions: 1300
Total Cycles: 454
Total uOps: 1700
Dispatch Width: 6
uOps Per Cycle: 3.74
IPC: 2.86
Block RThroughput: 2.8

Clang 8.0.0 LLVM MCA report is :

Iterations: 100
Instructions: 1200
Total Cycles: 418
Total uOps: 1600
Dispatch Width: 6
uOps Per Cycle: 3.83
IPC: 2.87
Block RThroughput: 2.7

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.I-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions