-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduced stack alignment for x86-64 #11239
Conversation
Before you ask: ARM64 also uses 16-aligned stack frames, but reducing the alignment to 8 would be dangerous: first, ocamlopt would no longer be able to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the reasoning and I could check that the patch does what it says. Approved.
It occurs to me that there's a risk of confusing tools such as |
d1a26af
to
5f59077
Compare
This change makes sense to me, and removes a tricky constraint in the code generator. The call to |
16-alignment of the stack pointer is required for the C stack, but not for the stacks used to run OCaml code.
16-alignment of the stack pointer is required for the C stack, but not for the stacks used to run OCaml code.
We used to force the allocation of a stack frame whenever caml_ml_array_bound_error is called, so as to guarantee 16-alignment of the (joint OCaml / C) stack. Now that the (OCaml) stack doesn't require 16-alignment, this special case is no longer needed.
c814295
to
816706c
Compare
9f000cc
to
160663e
Compare
Why? It no longer checks stack alignment, but still checks that there remains enough space on the stack.
I removed all of them in amd64.S, as well as a dummy macro in s390x.S. I need to re-run tests on RISC-V. |
Apologies, it is as you say. Somehow I completely missed the changes to |
No problem, and thanks for the feedback. I ran more tests on RISC-V and observed no problems. One thing that gives me pause: it's not just ocamlopt-generated code that runs on the OCaml stack, but also stub code generated by the linkers (static or dynamic), e.g. the PLT business for x86, or the jump islands for RISC-V. We would be in trouble if these stubs accessed the stack with instructions requiring alignment > 8. This doesn't seem to be the case: on x86 and RISC-V stubs do not access the stack; on POWER, stubs do contain stack writes, but these are 8-byte writes and we keep the default 16-alignment anyway. So, I think we're safe. One thing that gives me hope is that the Linux kernel for x86 uses reduced stack alignment (from the default 16 alignment to 8), in order to save stack space... They had to work hard to convince GCC not to emit instructions that expect 16-alignment of the stack, but they managed. This makes me more confident that tools such as gdb and perf do not expect 16-alignment for x86 stacks. |
The x86-64 ABIs require that the C stack pointer is 16-aligned. In particular, this enables C compilers to emit SSE load and store instructions that require 16-alignment.
However, starting with OCaml 5, the x86-64 code generated by ocamlopt runs in its own stack, not on the C stack. Moreover, ocamlopt generates only 8-byte memory accesses for which 8-alignment is enough to get maximal performance.
So, there is no longer a need to align every ocamlopt-generated stack frame to a multiple of 16 bytes. This PR just removes this alignment.
The net result is a decrease in stack usage. For a silly example,
now uses 8 bytes of stack per call instead of 16, hence can run TWICE AS LONG before overflowing the stack :-)
For a less silly example,
List.mapi
uses 20% less stack space, so it can process a list of length 217000 before overflowing the default 1 Miword stack, instead of 175000 before.Smaller stack frames also mean even more locality in stack accesses and even better utilization of the caches.