Simple Rust match vs C switch equivalent produce slower code

When comparing simple i32 switch case in C (Compiled with Clang 8.0.0) and Rust (1.35) we have slower assembly produce by the Rust compiler ( Time evaluated with LLVM MCA)

Here is the Rust code:
```
pub fn func(x :i32) -> i32 {
    match x {
        1 => x+1,
        3 => x+3,
        5 => x+5,
        7 => x+7,
        _ => x*x,
    }
}

```
Rust compiler 1.35 (-C opt-level=3) produce the following assembly for intel

```
example::func:
        lea     eax, [rdi - 1]
        cmp     eax, 7
        jae     .LBB0_3
        mov     ecx, 85
        bt      ecx, eax
        jb      .LBB0_2
.LBB0_3:
        imul    edi, edi
        mov     eax, edi
        ret
.LBB0_2:
        cdqe
        lea     rcx, [rip + .Lswitch.table.example::func]
        mov     eax, dword ptr [rcx + 4*rax]
        ret

.Lswitch.table.example::func:
        .long   2
        .long   2
        .long   6
        .long   2
        .long   10
        .long   2
        .long   14
```

Here is the equivalent C code
```
int func(int x) {
    switch(x)
    {
        case 1: return x+1;
        case 3: return x+3;
        case 5: return x+5;
        case 7: return x+7;
        default: return x*x;
   }
}
```
Clang compiler 8.0.0 (-O3) produce the following assembly for intel 
```
func:                                   # @func
        lea     eax, [rdi - 1]
        cmp     eax, 7
        jae     .LBB0_3
        mov     ecx, 85
        bt      ecx, eax
        jb      .LBB0_2
.LBB0_3:
        imul    edi, edi
        mov     eax, edi
        ret
.LBB0_2:
        cdqe
        mov     eax, dword ptr [4*rax + .Lswitch.table.func]
        ret
.Lswitch.table.func:
        .long   2                       # 0x2
        .long   2                       # 0x2
        .long   6                       # 0x6
        .long   2                       # 0x2
        .long   10                      # 0xa
        .long   2                       # 0x2
        .long   14                      # 0xe
```

We can see that

```
lea     rcx, [rip + .Lswitch.table.example::func]
mov     eax, dword ptr [rcx + 4*rax]
```

produce by Rust can be what Clang do

`mov     eax, dword ptr [4*rax + .Lswitch.table.func]`

According to LLVM MCA Rust 1.35 produce slower code than C for no reason here:
Rust 1.35 LLVM MCA report is:

> Iterations:        100
> Instructions:      1300
> Total Cycles:      454
> Total uOps:        1700
> Dispatch Width:    6
> uOps Per Cycle:    3.74
> IPC:               2.86
> Block RThroughput: 2.8

Clang 8.0.0 LLVM MCA report is :

> Iterations:        100
> Instructions:      1200
> Total Cycles:      418
> Total uOps:        1600
> Dispatch Width:    6
> uOps Per Cycle:    3.83
> IPC:               2.87
> Block RThroughput: 2.7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simple Rust match vs C switch equivalent produce slower code #61961

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Simple Rust match vs C switch equivalent produce slower code #61961

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions