Skip to content
This repository has been archived by the owner on Feb 29, 2024. It is now read-only.

reversed push/pop order breaks de facto ABI for -fno-omit-frame-pointer #194

Closed
sorear opened this issue Nov 3, 2022 · 4 comments
Closed

Comments

@sorear
Copy link

sorear commented Nov 3, 2022

Background: gcc and llvm support a -fno-omit-frame-pointer compilation mode which causes fp to point to a linked list of stack frame structures. -4(fp) contains the return address associated with a frame and -8(fp) contains the previous fp value. (All examples in this document assume XLEN=32.) This was never quite ratified but it is intentionally consistent between the toolchains (riscv-non-isa/riscv-elf-psabi-doc#18, see also the linked mailing list threads), and has downstream users in the sanitizer runtime, the Linux kernel space unwinder, and the Linux perf user space unwinder and is intentionally followed by userspace code (example libffi), likely among other users.

This adds 3 16-bit instructions (without Zcmp) to each non-leaf function, and requires less than a dozen instructions to unwind at runtime. If runtime stack traces are needed for a given application, no-omit-frame-pointer has the smallest text and rodata impact of any available option (at small but nonzero cost in dynamic instructions and stack space compared to DWARF or SFrame), as such I consider it in scope for code size reduction extensions.

If the push/pop instructions were modified to store s0 at the address immediately below ra (by reversing the entire stack image, or a simple swap), then Zcmp would reduce the marginal cost of no-omit-frame-pointer to 2 bytes per non-leaf function. However, with the current definition of the push/pop instructions, they cannot be used if no-omit-frame-pointer is in effect.

Unless I am misunderstanding the behavior of its frame lowering, the Zcmp LLVM port does not follow any usable ABI for no-omit-frame-pointer; the fp value for a given frame is at a variable offset from the saved ra and saved previous fp, making unwinding impossible without out-of-band information about the ISA and number of saved registers for each function.

Paths forward:

  1. Do not change the ISA or ABI. Modify LLVM (and gcc if applicable) to suppress the use of push/pop instructions when no-omit-frame-pointer is in effect. Pro: No impact on ISA standardization or non-toolchain software. Con: Effective cost of no-omit-frame-pointer increases, since push/pop instructions cannot be used.

  2. Redefine push/pop instructions to store registers at addresses corresponding to the reverse order of the x-register numbers. Pro: the no-omit-frame-pointer ABI is preserved and efficiency is maximized for no-omit-frame-pointer at no cost to the default ABI. Con: requires changes to the ISA quite close to ratification, and makes behavior of future load/store multiple instructions slightly less intuitive.

  3. Define a new no-omit-frame-pointer ABI that is usable with the new Zcmp extensions, and switch to the new ABI for both the base and Zcmp ISA. The fp register is defined to either point to (option 1) the start or (option 2) one-past-the-end of a stack frame record, consisting of a saved return address first and a saved frame pointer second. Option 1 saves 2 bytes in functions with a variable-sized stack frame (scalable vector local variables, VLAs, or alloca calls), since the variable sized portion of the stack frame can be deallocated with a c.mv sp, fp instead of addi sp, fp, -8; however, option 2 provides a degree of software compatibility: assuming the unwinder knows the bounds of the stack and that non-leaf functions are never located within the stack memory range (gcc nested function trampolines are leaves), it can examine both words at -4(fp) and -8(fp) and guess which word contains fp and which contains ra by means of address validity. This requires updates to all unwinders, but updated unwinders can be used
    with old/non-Zcmp compiled code, new compiled code, or any mixture thereof. Pro: no change whatsoever to the ISA or code generation in the default ABI; maximum efficiency with no-omit-frame-pointer. Con breaking change to the no-omit-frame-ponter ABI.

@tariqkurd-repo
Copy link
Contributor

Interesting, thanks for pointing that out. From memory the order of the stack frame was reversed to work better with future load/store multiple - is that your recollection @aswaterman ?

@aswaterman
Copy link
Collaborator

aswaterman commented Nov 8, 2022

I don't have time to get into this in detail, but I will say it's unlikely we'll entertain changing the ISA definition on this basis.

I strongly favor Stefan's do-nothing option 1. -fno-omit-frame-pointer is bad for code size and performance, anyway; optimizing that case is almost like optimizing for unoptimized code. This is what DWARF is for.

@tariqkurd-repo
Copy link
Contributor

agreed, let's do option 1.

@abukharmeh
Copy link
Contributor

abukharmeh commented Nov 9, 2022

Just letting everyone know that option 1 was implemented in LLVM branch plctlab/llvm-project@bc13e10

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants