-
-
Notifications
You must be signed in to change notification settings - Fork 14.5k
Closed
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-enhancementCategory: An issue proposing an enhancement or a PR with one.Category: An issue proposing an enhancement or a PR with one.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.
Description
The following functions:
pub fn xor_in_place(a: &mut [u8; 16], b: &[u8; 16]) {
for (b1, b2) in a.iter_mut().zip(b.iter()) {
*b1 ^= *b2;
}
}
pub fn xor_in_place2(a: &mut [u8; 16], b: &[u8; 16]) {
for i in 0..16 {
a[i] ^= b[i];
}
}Produce the following ineffective assembly (with -O):
Click to expand
mov al, byte ptr [rsi]
xor byte ptr [rdi], al
mov al, byte ptr [rsi + 1]
xor byte ptr [rdi + 1], al
mov al, byte ptr [rsi + 2]
xor byte ptr [rdi + 2], al
mov al, byte ptr [rsi + 3]
xor byte ptr [rdi + 3], al
mov al, byte ptr [rsi + 4]
xor byte ptr [rdi + 4], al
mov al, byte ptr [rsi + 5]
xor byte ptr [rdi + 5], al
mov al, byte ptr [rsi + 6]
xor byte ptr [rdi + 6], al
mov al, byte ptr [rsi + 7]
xor byte ptr [rdi + 7], al
mov al, byte ptr [rsi + 8]
xor byte ptr [rdi + 8], al
mov al, byte ptr [rsi + 9]
xor byte ptr [rdi + 9], al
mov al, byte ptr [rsi + 10]
xor byte ptr [rdi + 10], al
mov al, byte ptr [rsi + 11]
xor byte ptr [rdi + 11], al
mov al, byte ptr [rsi + 12]
xor byte ptr [rdi + 12], al
mov al, byte ptr [rsi + 13]
xor byte ptr [rdi + 13], al
mov al, byte ptr [rsi + 14]
xor byte ptr [rdi + 14], al
mov al, byte ptr [rsi + 15]
xor byte ptr [rdi + 15], alInstead of using movups and pxor.
Meanwhile if we pass slices as arguments function gets vectorized:
pub fn xor_in_place(a: &mut [u8], b: &[u8]) {
for (b1, b2) in a.iter_mut().zip(b.iter()) {
*b1 ^= *b2;
}
}Click to expand
movups xmm0, xmmword ptr [rdx]
movups xmm1, xmmword ptr [rdx + 16]
movups xmm2, xmmword ptr [rdi]
movups xmm3, xmmword ptr [rdi + 16]
xorps xmm2, xmm0
xorps xmm3, xmm1
movups xmmword ptr [rdi], xmm2
movups xmmword ptr [rdi + 16], xmm3
mov r9d, 32UPD: See also this comment.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-enhancementCategory: An issue proposing an enhancement or a PR with one.Category: An issue proposing an enhancement or a PR with one.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.