Skip to content

Suboptimal code generation in 8byte copy loop #84813

@PSeitz

Description

@PSeitz

In this loop the 8 byte copy seems to get converted into a more complicated assembly than necessary, which is quite slow compared to the 16byte version, which is a much simpler version.

https://play.rust-lang.org/?version=nightly&mode=release&edition=2018&gist=21b5eb51917947a25123dfafc9ed7959

pub fn wild_copy_from_src_8(mut source: *const u8, mut dst_ptr: *mut u8, num_items: usize) {
    // Note: if the compiler transforms this into a call to memcpy it'll hurt performance!
    unsafe {
        let dst_ptr_end = dst_ptr.add(num_items);
        while (dst_ptr as usize) < dst_ptr_end as usize {
            core::ptr::copy_nonoverlapping(source, dst_ptr, 8);
            source = source.add(8);
            dst_ptr = dst_ptr.add(8);
        }
    }
}

When the step size is 16bytes, the assembly looks fine

https://play.rust-lang.org/?version=nightly&mode=release&edition=2018&gist=9bb775fedfb94aec47252b943c28a6d6

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.A-mir-optArea: MIR optimizationsC-optimizationCategory: An issue highlighting optimization opportunities or PRs implementing suchI-slowIssue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions