Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement leaf functions on POWER #12601

Merged
merged 2 commits into from
Sep 26, 2023
Merged

Conversation

xavierleroy
Copy link
Contributor

Previously, a stack frame was always allocated and the return address always saved in the stack frame. This is not necessary if the function is a leaf function (no calls to other functions) and has no stack-allocated variables. This PR should improve speed and reduce code size for these functions.

A bit of history. In OCaml 4, the POWER/PPC port implements leaf functions correctly in 32-bit mode. In 64-bit mode, a stack frame is always allocated (either wrongly or because it was required by the ELF64v1 big-endian ABI), but leaf functions do not save and restore the return address. When I adapted this port to OCaml 5, the systematic allocation of the stack frame remained, but became a systematic save/restore of the return address because of #12242 ...

I think the implementation in leaf functions in this PR is compatible with the ELF64v2 little-endian ABI that we currently use, but some OPAM-wide testing would reassure me.

@stedolan
Copy link
Contributor

I was surprised to see a change to Lprologue without a matching change to the epilogue code in Lreturn, but maybe it's fine and I just don't understand the invariants here. Is it always the case that f.fun_frame_required iff frame_size env > 0?

Previously, a stack frame was always allocated and the return address
always saved in the stack frame.  This is not necessary if the function
is a leaf function (no calls to other functions) and has no stack-allocated
variables.
@xavierleroy
Copy link
Contributor Author

Is it always the case that f.fun_frame_required iff frame_size env > 0?

Yes. An empty frame is not "required" :-) Symmetrically, f.fun_frame_required is true when either the return address needs to be saved on stack, or local variables were allocated to stack slots; in each case, the stack frame has nonzero size.

This said, the change to Lprologue was perhaps gratuitous, in that the original code happens to just work, and might be easier to evolve later. So, I reverted the Lprologue change. The diff is now very small indeed!

Copy link
Contributor

@stedolan stedolan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Do I understand correctly that the old code (with 6 extra instructions in prologue_size) was wrong, but not in a serious way since overestimating instruction size can only cause slightly poorer branch relaxation?

@xavierleroy
Copy link
Contributor Author

Do I understand correctly that the old code (with 6 extra instructions in prologue_size) was wrong, but not in a serious way since overestimating instruction size can only cause slightly poorer branch relaxation?

Right. I'm not even sure we ever jump with a conditional branch (not a tail call) to the first Lprologue instruction of a function, so it might have no influence whatsoever on branch relaxation. But it feels better to have the correct size for Lprologue.

@xavierleroy xavierleroy merged commit 5b8651b into ocaml:trunk Sep 26, 2023
3 of 6 checks passed
@xavierleroy xavierleroy deleted the powerpc-leaf branch September 26, 2023 13:57
sadiqj pushed a commit to sadiqj/ocaml that referenced this pull request Oct 8, 2023
Previously, a stack frame was always allocated and the return address
always saved in the stack frame.  This is not necessary if the function
is a leaf function (no calls to other functions) and has no stack-allocated
variables.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants