Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memcpy generated for Vec::push(Default::default()) instead of initialization in-place #125632

Open
jhorstmann opened this issue May 27, 2024 · 2 comments
Labels
A-codegen Area: Code generation C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@jhorstmann
Copy link
Contributor

I was investigating a performance issue in code that initializes some larger structs from a binary protocol. The code tries hard to avoid unnecessary moves, for example by first creating default values and then initializing them, instead of returning structs by value. After a slight change to one of the structs there was a significant performance drop, which boiled down to an unnecessary call to memcpy.

Optimization of this memcpy seems to depend on the size of the struct and also on the ratio of zero-initialized fields to other fields.

In the following example, std_push_default initializes a struct on the stack and then calls memcpy to append it to the vector, while inlined_push_default initializes the struct in-place. The implementation of the latter should match the internals of Vec::push.

#[derive(Default)]
pub struct Foo {
    a: [i64; 13],
    b: [i64; 17],
    c: Vec<i64>,
    d: Vec<i64>,
}

#[no_mangle]
pub fn std_push_default(v: &mut Vec<Foo>) {
    v.push(Default::default());
}

#[no_mangle]
pub fn inlined_push_default(v: &mut Vec<Foo>){
    v.reserve(1);
    unsafe {
        v.as_mut_ptr().add(v.len()).write(Default::default());
        v.set_len(v.len()+1);
    }
}

#[no_mangle]
pub fn size_of_foo() -> usize {
    std::mem::size_of::<Foo>()
}

Using rustc 1.78 in the compiler explorer. This might be a regression from 1.76, or the thresholds are slightly different.

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label May 27, 2024
@jhorstmann
Copy link
Contributor Author

The difference seems to be not caused by inlining but by whether the default value is created before or after the capacity check and reserve_for_push.

This generates a memcpy:

    let value = Default::default();
    v.reserve(1);
    unsafe {
        v.as_mut_ptr().add(v.len()).write(value);
        v.set_len(v.len()+1);
    }

While this one does not:

    v.reserve(1);
    let value = Default::default();
    unsafe {
        v.as_mut_ptr().add(v.len()).write(value);
        v.set_len(v.len()+1);
    }

Unfortunately just adding a reserve does not help, this one still includes a memcpy (and checks the capacity twice):

    v.reserve(1);
    v.push(Default::default());

@jieyouxu jieyouxu added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such A-codegen Area: Code generation and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jun 4, 2024
@jhorstmann
Copy link
Contributor Author

Another version that optimizes nicely, without any unsafe:

#[no_mangle]
pub fn extend_one_default(v: &mut Vec<Foo>){
    v.extend((0..1).map(|_| Foo::default()))
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

3 participants