avoid libcall to memcpy harder #54535

nickdesaulniers · 2022-03-24T18:20:43Z

Consider the following example:

struct foo {
    unsigned long x0;
    unsigned long x1;
    unsigned long x2;
    unsigned long x3;
    unsigned long x4;
    unsigned long x5;
    unsigned long x6;
    unsigned long x7;
    unsigned long x8;
    unsigned long x9;
    unsigned long x10;
    unsigned long x11;
    unsigned long x12;
    unsigned long x13;
    unsigned long x14;
    unsigned long x15;
    // Comment out below members.
    unsigned long x16;
    unsigned long x17;
    unsigned long x18;
    unsigned long x19;
} *x, *y;

struct foo* get_x(void);

struct foo* cpy(struct foo *y) {
    struct foo *x = get_x();
    if (y != x)
        *x = *y;
    return x;
}

compiled with -O2 -mno-sse (as the Linux kernel does), we get:

cpy:
  ...
        movl    $160, %edx
        movq    %rbx, %rdi
        movq    %r14, %rsi
        callq   memcpy@PLT
...

but if we reduce the number of members in struct foo, we can get:

cpy:
  ...
        movl    $16, %ecx
        movq    %rax, %rdi
        movq    %rbx, %rsi
        rep;movsq (%rsi), %es:(%rdi)
...

which is going to be way faster. FWICT, it looks like isel is choosing whether to lower @llvm.memcpy.p0i8.p0i8.i64() to a libcall to memcpy vs inline a simple memcpy.

I assume there's some limit on how many bytes rep;movsq can copy, but surely it's much larger than 16x8B?

cc @phoebewang

The text was updated successfully, but these errors were encountered:

llvmbot · 2022-03-24T18:21:02Z

@llvm/issue-subscribers-backend-x86

efriedma-quic · 2022-03-24T18:46:41Z

llvm-project/llvm/lib/Target/X86/X86Subtarget.h

Line 81 in a9b70a8

/// Max. memset / memcpy size that is turned into rep/movs, rep/stos ops.

The value "128" was chosen 16 years ago in 03c1e6f . Maybe the correct default has changed since then. :)

phoebewang · 2022-03-25T05:58:03Z

It seems only icelake and later targets have fast rep: https://reviews.llvm.org/D85989
We already have patches for the replacement: https://reviews.llvm.org/D86883
https://godbolt.org/z/ovEf9Kxfz

LebedevRI · 2022-03-25T10:42:35Z

I've actually thought about similar problem.
Let me at least come up with a benchmark (-mllvm -x86-use-fsrm-for-memcpy will simplify that :))

LebedevRI · 2022-03-25T16:50:57Z

Ok, got the benchmark-ish:
benchmark_memcpy.cc.txt

On Zen3:

For fully unaligned pointers, memcpy always wins: res-align1.txt. This concludes my interest.

align 2 likewise: res-align2.txt

RKSimon · 2022-03-25T17:06:36Z

CC @RKSimon

bdaase · 2023-01-03T08:23:35Z

llvm-project/llvm/lib/Target/X86/X86Subtarget.h

Line 81 in a9b70a8

/// Max. memset / memcpy size that is turned into rep/movs, rep/stos ops.
The value "128" was chosen 16 years ago in 03c1e6f . Maybe the correct default has changed since then. :)

I am a bit surprised that the MaxInlineSizeThreshold is actually 128, because my experiments indicate that it stops inlining the memcpy at 256 Bytes: https://godbolt.org/z/j4qaTvjb7

In the above example, the memcpy is inligned, even though one field is 16 * 16 = 256 Byte.
Uncommenting either line 21 or changing the 16 to a 17 in line 39 makes it call memcpy.

nickdesaulniers added backend:X86 llvm:optimizations missed-optimization labels Mar 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

avoid libcall to memcpy harder #54535

avoid libcall to memcpy harder #54535

nickdesaulniers commented Mar 24, 2022 •

edited

Loading

llvmbot commented Mar 24, 2022

efriedma-quic commented Mar 24, 2022

phoebewang commented Mar 25, 2022

LebedevRI commented Mar 25, 2022

LebedevRI commented Mar 25, 2022 •

edited

Loading

RKSimon commented Mar 25, 2022

bdaase commented Jan 3, 2023

avoid libcall to memcpy harder #54535

avoid libcall to memcpy harder #54535

Comments

nickdesaulniers commented Mar 24, 2022 • edited Loading

llvmbot commented Mar 24, 2022

efriedma-quic commented Mar 24, 2022

phoebewang commented Mar 25, 2022

LebedevRI commented Mar 25, 2022

LebedevRI commented Mar 25, 2022 • edited Loading

RKSimon commented Mar 25, 2022

bdaase commented Jan 3, 2023

nickdesaulniers commented Mar 24, 2022 •

edited

Loading

LebedevRI commented Mar 25, 2022 •

edited

Loading