-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Open
Labels
A-codegenArea: Code generationArea: Code generationA-slice-patternsArea: Slice patterns, https://github.com/rust-lang/rust/issues/23121Area: Slice patterns, https://github.com/rust-lang/rust/issues/23121C-enhancementCategory: An issue proposing an enhancement or a PR with one.Category: An issue proposing an enhancement or a PR with one.C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchCategory: An issue highlighting optimization opportunities or PRs implementing suchI-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.WG-llvmWorking group: LLVM backend code generationWorking group: LLVM backend code generation
Description
After reading this blog post recently, I was very happily surprised with the quality of the code generation for the following function :
fn middle(xs: &[u32]) -> Option<&u32> {
match xs {
[_, inner @ .., _] => middle(inner),
[x] => Some(x),
[] => None,
}
}
example::middle:
cmp rsi, 2
jb .LBB0_2
add rsi, -2
mov rax, rsi
and rax, -2
lea rdi, [rdi + 2*rax]
add rdi, 4
and esi, 1
.LBB0_2:
xor eax, eax
cmp rsi, 1
cmove rax, rdi
ret
It is amazing what the compiler is able to achieve !
But then, I tried this very slight variation :
pub fn middle(xs: &[u32]) -> Option<u32> {
match xs {
[_, inner @ .., _] => middle(inner),
[x] => Some(*x),
[] => None,
}
}
(the only difference is that we are returning an u32
instead of an &u32
)
And the generated code changes dramatically:
example::middle:
push rax
cmp rsi, 1
jbe .LBB0_1
add rdi, 4
add rsi, -2
call qword ptr [rip + example::middle@GOTPCREL]
pop rcx
ret
.LBB0_1:
cmp rsi, 1
jne .LBB0_2
mov edx, dword ptr [rdi]
mov eax, 1
pop rcx
ret
.LBB0_2:
xor eax, eax
pop rcx
ret
Is there something making the optimization harder to apply in the second case, or is it a bug somewhere in the compiler ?
YoukaiCat
Metadata
Metadata
Assignees
Labels
A-codegenArea: Code generationArea: Code generationA-slice-patternsArea: Slice patterns, https://github.com/rust-lang/rust/issues/23121Area: Slice patterns, https://github.com/rust-lang/rust/issues/23121C-enhancementCategory: An issue proposing an enhancement or a PR with one.Category: An issue proposing an enhancement or a PR with one.C-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchCategory: An issue highlighting optimization opportunities or PRs implementing suchI-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.WG-llvmWorking group: LLVM backend code generationWorking group: LLVM backend code generation