Generate better memcpy code for types with alignment padding #70779
Labels
A-LLVM
Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.
C-enhancement
Category: An issue proposing an enhancement or a PR with one.
I-slow
Issue: Problems and improvements with respect to performance of generated code.
T-compiler
Relevant to the compiler team, which will review and decide on the PR/issue.
Consider the type
S0
below, which has 9 bytes of payload data, but because of alignment requirements its size is 16 bytes. It implementsCopy
, so it can be cloned and copied bymemcpy
. rustc tends to emit 9-bytememcpy
calls for it in several cases, even though it would be legal to emitmemcpy
calls of size anywhere between 9 bytes and 16 bytes. On x86-64, a 9-byte memcpy is 2x slower than a 16-byte memcpy: the former requires 2 loads and 2 stores, whereas the latter can use just 1 load and 1 store. It would be nice for Rust to emit the more efficient memcpy calls. Currently I'm working around this issue by manually padding my type up to 16 bytes of payload data, like inS1
.Compare the generated assembly of
copy_s0
andcopy_s1
:Similar issues show up for
clone_from_slice
.(Playground)
The text was updated successfully, but these errors were encountered: