Skip to content

Inefficient AArch64 frame generation with VLAs #167982

@smeenai

Description

@smeenai

https://godbolt.org/z/9dj5eaxa9 shows a function with a mix of callee-saved registers, fixed size stack objects, and variable size stack objects where Clang's code generation is seemingly worse than GCC's:

  • Clang requires two more callee-saved registers.
  • x19 is presumably serving as the base pointer, per
    // |-----------------------------------| <- bp(not defined by ABI,
    // |.variable-sized.local.variables....| LLVM chooses X19)
    There's no overaligned local variables here though, so I'm not sure why the base pointer is needed.
  • x21 seemingly serves no purpose: the stack pointer is copied to it in line 11, and then it's copied back to the stack pointer in line 22, but the copied value is immediately overridden two instructions later (line 24), so it's never used.

Is there potential to improve the frame setup here, or am I just missing something in what Clang is doing?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions