Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A tuple of primitives function parameter is passed via the stack even though the tuple constituents could be passed as registers #64099

Open
Arnavion opened this issue Sep 2, 2019 · 1 comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code. O-linux Operating system: Linux O-x86_64 Target: x86-64 processors (like x86_64-*) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Arnavion
Copy link

Arnavion commented Sep 2, 2019

1.36.0 x86_64-unknown-linux-gnu


#[inline(never)]
pub fn foo((a, b, c, d): (usize, usize, usize, usize)) -> ((usize, usize), (usize, usize)) {
    if a == 0 {
        ((0, b), (0, d))
    }
    else {
        ((a, b), (c, d))
    }
}

pub fn bar() {
    foo((1, 2, 3, 4));
}

passes the tuple via the stack

example::bar:
        sub     rsp, 72
        movaps  xmm0, xmmword ptr [rip + .LCPI1_0]
        movaps  xmmword ptr [rsp], xmm0
        movaps  xmm0, xmmword ptr [rip + .LCPI1_1]
        movaps  xmmword ptr [rsp + 16], xmm0
        lea     rdi, [rsp + 40]
        mov     rsi, rsp
        call    qword ptr [rip + example::foo@GOTPCREL]
        add     rsp, 72
        ret

However if the function has four usize parameters, they're passed via registers.

#[inline(never)]
pub fn foo(a: usize, b: usize, c: usize, d: usize) -> ((usize, usize), (usize, usize)) {
    if a == 0 {
        ((0, b), (0, d))
    }
    else {
        ((a, b), (c, d))
    }
}
example::bar:
        sub     rsp, 40
        lea     rdi, [rsp + 8]
        mov     esi, 1
        mov     edx, 2
        mov     ecx, 3
        mov     r8d, 4
        call    qword ptr [rip + example::foo@GOTPCREL]
        add     rsp, 40
        ret

It would be nice if the tuple case also used registers.


Perhaps related to / same as #63244 . The tuple-taking foo has an aggregate parameter, and making multi-constituent tuples transparent (as suggested there) would help?

Tuple-taking foo:

define void @_ZN7example3foo17hffb69e1018702615E({ [0 x i64], { i64, i64 }, [0 x i64], { i64, i64 }, [0 x i64] }* noalias nocapture sret dereferenceable(32), { [0 x i64], i64, [0 x i64], i64, [0 x i64], i64, [0 x i64], i64, [0 x i64] }* noalias nocapture readonly dereferenceable(32) %arg0) unnamed_addr #0 !dbg !5 {

vs four-parameter foo:

define void @_ZN7example3foo17hb403de2e6269ef77E({ [0 x i64], { i64, i64 }, [0 x i64], { i64, i64 }, [0 x i64] }* noalias nocapture sret dereferenceable(32), i64 %a, i64 %b, i64 %c, i64 %d) unnamed_addr #0 !dbg !5 {

I discovered this with a fn (Option<(NonNull<FatPointer>, NonNull<FatPointer>)>) -> (Option<NonNull<FatPointer>>, Option<NonNull<FatPointer>>). So I can't just rewrite this to use four separate parameters.

The two usize cases and the Option case are in this godbolt for reference.

@Centril Centril added I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. O-linux Operating system: Linux O-x86_64 Target: x86-64 processors (like x86_64-*) labels Sep 2, 2019
@mati865
Copy link
Contributor

mati865 commented Aug 17, 2021

Triage: still present in 2021, godbolt link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code. O-linux Operating system: Linux O-x86_64 Target: x86-64 processors (like x86_64-*) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

3 participants