Skip to content

Inneficient code generation for array xor'ing #43702

@newpavlov

Description

@newpavlov

The following functions:

pub fn xor_in_place(a: &mut [u8; 16], b: &[u8; 16]) {
    for (b1, b2) in a.iter_mut().zip(b.iter()) {
        *b1 ^= *b2;
    }
}

pub fn xor_in_place2(a: &mut [u8; 16], b: &[u8; 16]) {
    for i in 0..16 {
        a[i] ^= b[i];
    }
}

Produce the following ineffective assembly (with -O):

Click to expand
  mov al, byte ptr [rsi]
  xor byte ptr [rdi], al
  mov al, byte ptr [rsi + 1]
  xor byte ptr [rdi + 1], al
  mov al, byte ptr [rsi + 2]
  xor byte ptr [rdi + 2], al
  mov al, byte ptr [rsi + 3]
  xor byte ptr [rdi + 3], al
  mov al, byte ptr [rsi + 4]
  xor byte ptr [rdi + 4], al
  mov al, byte ptr [rsi + 5]
  xor byte ptr [rdi + 5], al
  mov al, byte ptr [rsi + 6]
  xor byte ptr [rdi + 6], al
  mov al, byte ptr [rsi + 7]
  xor byte ptr [rdi + 7], al
  mov al, byte ptr [rsi + 8]
  xor byte ptr [rdi + 8], al
  mov al, byte ptr [rsi + 9]
  xor byte ptr [rdi + 9], al
  mov al, byte ptr [rsi + 10]
  xor byte ptr [rdi + 10], al
  mov al, byte ptr [rsi + 11]
  xor byte ptr [rdi + 11], al
  mov al, byte ptr [rsi + 12]
  xor byte ptr [rdi + 12], al
  mov al, byte ptr [rsi + 13]
  xor byte ptr [rdi + 13], al
  mov al, byte ptr [rsi + 14]
  xor byte ptr [rdi + 14], al
  mov al, byte ptr [rsi + 15]
  xor byte ptr [rdi + 15], al

Instead of using movups and pxor.

Meanwhile if we pass slices as arguments function gets vectorized:

pub fn xor_in_place(a: &mut [u8], b: &[u8]) {
    for (b1, b2) in a.iter_mut().zip(b.iter()) {
        *b1 ^= *b2;
    }
}
Click to expand
  movups xmm0, xmmword ptr [rdx]
  movups xmm1, xmmword ptr [rdx + 16]
  movups xmm2, xmmword ptr [rdi]
  movups xmm3, xmmword ptr [rdi + 16]
  xorps xmm2, xmm0
  xorps xmm3, xmm1
  movups xmmword ptr [rdi], xmm2
  movups xmmword ptr [rdi + 16], xmm3
  mov r9d, 32

UPD: See also this comment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.C-enhancementCategory: An issue proposing an enhancement or a PR with one.I-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions