-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
avoid libcall to memcpy harder #54535
Comments
@llvm/issue-subscribers-backend-x86 |
The value "128" was chosen 16 years ago in 03c1e6f . Maybe the correct default has changed since then. :) |
It seems only icelake and later targets have fast rep: https://reviews.llvm.org/D85989 |
I've actually thought about similar problem. |
Ok, got the benchmark-ish: On Zen3: For fully unaligned pointers, align 2 likewise: res-align2.txt |
CC @RKSimon |
I am a bit surprised that the In the above example, the |
Via this thread:
Consider the following example:
compiled with
-O2 -mno-sse
(as the Linux kernel does), we get:but if we reduce the number of members in
struct foo
, we can get:which is going to be way faster. FWICT, it looks like isel is choosing whether to lower
@llvm.memcpy.p0i8.p0i8.i64()
to a libcall to memcpy vs inline a simple memcpy.I assume there's some limit on how many bytes rep;movsq can copy, but surely it's much larger than 16x8B?
cc @phoebewang
The text was updated successfully, but these errors were encountered: