Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduced stack alignment for x86-64 #11239

Merged
merged 7 commits into from
Jun 23, 2023

Conversation

xavierleroy
Copy link
Contributor

The x86-64 ABIs require that the C stack pointer is 16-aligned. In particular, this enables C compilers to emit SSE load and store instructions that require 16-alignment.

However, starting with OCaml 5, the x86-64 code generated by ocamlopt runs in its own stack, not on the C stack. Moreover, ocamlopt generates only 8-byte memory accesses for which 8-alignment is enough to get maximal performance.

So, there is no longer a need to align every ocamlopt-generated stack frame to a multiple of 16 bytes. This PR just removes this alignment.

The net result is a decrease in stack usage. For a silly example,

let rec f () = incr depth; 1 + f ()

now uses 8 bytes of stack per call instead of 16, hence can run TWICE AS LONG before overflowing the stack :-)

For a less silly example, List.mapi uses 20% less stack space, so it can process a list of length 217000 before overflowing the default 1 Miword stack, instead of 175000 before.

Smaller stack frames also mean even more locality in stack accesses and even better utilization of the caches.

@xavierleroy
Copy link
Contributor Author

Before you ask: ARM64 also uses 16-aligned stack frames, but reducing the alignment to 8 would be dangerous: first, ocamlopt would no longer be able to use stp and ldp double-word stores and loads; second, there's a bit in ARMv8 processors that causes them to monitor SP alignment at SP-relative memory accesses and trap if SP is not 16-aligned.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the reasoning and I could check that the patch does what it says. Approved.

@xavierleroy
Copy link
Contributor Author

It occurs to me that there's a risk of confusing tools such as perf or debuggers that could expect the stack to be 16-aligned. I need to check for this.

@stedolan
Copy link
Contributor

stedolan commented Jun 8, 2023

This change makes sense to me, and removes a tricky constraint in the code generator.

The call to caml_assert_stack_invariants in emit.mlp should also be removed, and possibly there are some now-redundant alignment checks in the runtime/$ARCH.S files also.

16-alignment of the stack pointer is required for the C stack,
but not for the stacks used to run OCaml code.
16-alignment of the stack pointer is required for the C stack,
but not for the stacks used to run OCaml code.
We used to force the allocation of a stack frame whenever
caml_ml_array_bound_error is called, so as to guarantee 16-alignment
of the (joint OCaml / C) stack.  Now that the (OCaml) stack doesn't
require 16-alignment, this special case is no longer needed.
@xavierleroy
Copy link
Contributor Author

The call to caml_assert_stack_invariants in emit.mlp should also be removed,

Why? It no longer checks stack alignment, but still checks that there remains enough space on the stack.

and possibly there are some now-redundant alignment checks in the runtime/$ARCH.S files also.

I removed all of them in amd64.S, as well as a dummy macro in s390x.S. I need to re-run tests on RISC-V.

@stedolan
Copy link
Contributor

The call to caml_assert_stack_invariants in emit.mlp should also be removed,

Why? It no longer checks stack alignment, but still checks that there remains enough space on the stack.

and possibly there are some now-redundant alignment checks in the runtime/$ARCH.S files also.

I removed all of them in amd64.S, as well as a dummy macro in s390x.S. I need to re-run tests on RISC-V.

Apologies, it is as you say. Somehow I completely missed the changes to amd64.S when reading this originally.

@xavierleroy
Copy link
Contributor Author

No problem, and thanks for the feedback.

I ran more tests on RISC-V and observed no problems.

One thing that gives me pause: it's not just ocamlopt-generated code that runs on the OCaml stack, but also stub code generated by the linkers (static or dynamic), e.g. the PLT business for x86, or the jump islands for RISC-V. We would be in trouble if these stubs accessed the stack with instructions requiring alignment > 8. This doesn't seem to be the case: on x86 and RISC-V stubs do not access the stack; on POWER, stubs do contain stack writes, but these are 8-byte writes and we keep the default 16-alignment anyway. So, I think we're safe.

One thing that gives me hope is that the Linux kernel for x86 uses reduced stack alignment (from the default 16 alignment to 8), in order to save stack space... They had to work hard to convince GCC not to emit instructions that expect 16-alignment of the stack, but they managed. This makes me more confident that tools such as gdb and perf do not expect 16-alignment for x86 stacks.

@xavierleroy xavierleroy merged commit 717d9ba into ocaml:trunk Jun 23, 2023
9 of 10 checks passed
@lthls lthls mentioned this pull request Aug 3, 2023
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants