-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Description
When comparing simple i32 switch case in C (Compiled with Clang 8.0.0) and Rust (1.35) we have slower assembly produce by the Rust compiler ( Time evaluated with LLVM MCA)
Here is the Rust code:
pub fn func(x :i32) -> i32 {
match x {
1 => x+1,
3 => x+3,
5 => x+5,
7 => x+7,
_ => x*x,
}
}
Rust compiler 1.35 (-C opt-level=3) produce the following assembly for intel
example::func:
lea eax, [rdi - 1]
cmp eax, 7
jae .LBB0_3
mov ecx, 85
bt ecx, eax
jb .LBB0_2
.LBB0_3:
imul edi, edi
mov eax, edi
ret
.LBB0_2:
cdqe
lea rcx, [rip + .Lswitch.table.example::func]
mov eax, dword ptr [rcx + 4*rax]
ret
.Lswitch.table.example::func:
.long 2
.long 2
.long 6
.long 2
.long 10
.long 2
.long 14
Here is the equivalent C code
int func(int x) {
switch(x)
{
case 1: return x+1;
case 3: return x+3;
case 5: return x+5;
case 7: return x+7;
default: return x*x;
}
}
Clang compiler 8.0.0 (-O3) produce the following assembly for intel
func: # @func
lea eax, [rdi - 1]
cmp eax, 7
jae .LBB0_3
mov ecx, 85
bt ecx, eax
jb .LBB0_2
.LBB0_3:
imul edi, edi
mov eax, edi
ret
.LBB0_2:
cdqe
mov eax, dword ptr [4*rax + .Lswitch.table.func]
ret
.Lswitch.table.func:
.long 2 # 0x2
.long 2 # 0x2
.long 6 # 0x6
.long 2 # 0x2
.long 10 # 0xa
.long 2 # 0x2
.long 14 # 0xe
We can see that
lea rcx, [rip + .Lswitch.table.example::func]
mov eax, dword ptr [rcx + 4*rax]
produce by Rust can be what Clang do
mov eax, dword ptr [4*rax + .Lswitch.table.func]
According to LLVM MCA Rust 1.35 produce slower code than C for no reason here:
Rust 1.35 LLVM MCA report is:
Iterations: 100
Instructions: 1300
Total Cycles: 454
Total uOps: 1700
Dispatch Width: 6
uOps Per Cycle: 3.74
IPC: 2.86
Block RThroughput: 2.8
Clang 8.0.0 LLVM MCA report is :
Iterations: 100
Instructions: 1200
Total Cycles: 418
Total uOps: 1600
Dispatch Width: 6
uOps Per Cycle: 3.83
IPC: 2.87
Block RThroughput: 2.7