Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate better memcpy code for types with alignment padding #70779

Open
reinerp opened this issue Apr 4, 2020 · 0 comments
Open

Generate better memcpy code for types with alignment padding #70779

reinerp opened this issue Apr 4, 2020 · 0 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@reinerp
Copy link

reinerp commented Apr 4, 2020

Consider the type S0 below, which has 9 bytes of payload data, but because of alignment requirements its size is 16 bytes. It implements Copy, so it can be cloned and copied by memcpy. rustc tends to emit 9-byte memcpy calls for it in several cases, even though it would be legal to emit memcpy calls of size anywhere between 9 bytes and 16 bytes. On x86-64, a 9-byte memcpy is 2x slower than a 16-byte memcpy: the former requires 2 loads and 2 stores, whereas the latter can use just 1 load and 1 store. It would be nice for Rust to emit the more efficient memcpy calls. Currently I'm working around this issue by manually padding my type up to 16 bytes of payload data, like in S1.

Compare the generated assembly of copy_s0 and copy_s1:

playground::copy_s0:
	movq	(%rsi), %rax
	movb	8(%rsi), %cl
	movq	%rax, (%rdi)
	movb	%cl, 8(%rdi)
	retq

playground::copy_s1:
	movups	(%rsi), %xmm0
	movups	%xmm0, (%rdi)
	retq

Similar issues show up for clone_from_slice.

#[derive(Clone, Copy)]
pub struct S0(u64, u8);

pub fn clone_s0(dst: &mut S0, src: &S0) {
    *dst = src.clone();
}

pub fn copy_s0(dst: &mut S0, src: &S0) {
    *dst = *src;
}

pub fn clone_s0_array(dst: &mut [S0; 8], src: & [S0; 8]) {
    *dst = src.clone();
}

pub fn copy_s0_array(dst: &mut [S0; 8], src: & [S0; 8]) {
    *dst = *src;
}

pub fn clone_s0_slice(dst: &mut [S0; 8], src: & [S0; 8]) {
    dst.clone_from_slice(src);
}

pub fn copy_s0_slice(dst: &mut [S0; 8], src: & [S0; 8]) {
    dst.copy_from_slice(src);
}

#[derive(Clone, Copy)]
pub struct S1(u64, u8, [u8; 7]);

pub fn clone_s1(dst: &mut S1, src: &S1) {
    *dst = src.clone();
}

pub fn copy_s1(dst: &mut S1, src: &S1) {
    *dst = *src;
}

pub fn clone_s1_array(dst: &mut [S1; 8], src: & [S1; 8]) {
    *dst = src.clone();
}

pub fn copy_s1_array(dst: &mut [S1; 8], src: & [S1; 8]) {
    *dst = *src;
}

pub fn clone_s1_slice(dst: &mut [S1; 8], src: & [S1; 8]) {
    dst.clone_from_slice(src);
}

pub fn copy_s1_slice(dst: &mut [S1; 8], src: & [S1; 8]) {
    dst.copy_from_slice(src);
}

(Playground)

@jonas-schievink jonas-schievink added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

2 participants