Reduced stack alignment for x86-64 #11239

xavierleroy · 2022-05-04T15:25:11Z

The x86-64 ABIs require that the C stack pointer is 16-aligned. In particular, this enables C compilers to emit SSE load and store instructions that require 16-alignment.

However, starting with OCaml 5, the x86-64 code generated by ocamlopt runs in its own stack, not on the C stack. Moreover, ocamlopt generates only 8-byte memory accesses for which 8-alignment is enough to get maximal performance.

So, there is no longer a need to align every ocamlopt-generated stack frame to a multiple of 16 bytes. This PR just removes this alignment.

The net result is a decrease in stack usage. For a silly example,

let rec f () = incr depth; 1 + f ()

now uses 8 bytes of stack per call instead of 16, hence can run TWICE AS LONG before overflowing the stack :-)

For a less silly example, List.mapi uses 20% less stack space, so it can process a list of length 217000 before overflowing the default 1 Miword stack, instead of 175000 before.

Smaller stack frames also mean even more locality in stack accesses and even better utilization of the caches.

xavierleroy · 2022-05-04T15:25:18Z

Before you ask: ARM64 also uses 16-aligned stack frames, but reducing the alignment to 8 would be dangerous: first, ocamlopt would no longer be able to use stp and ldp double-word stores and loads; second, there's a bit in ARMv8 processors that causes them to monitor SP alignment at SP-relative memory accesses and trap if SP is not 16-aligned.

gasche

I understand the reasoning and I could check that the patch does what it says. Approved.

xavierleroy · 2022-05-13T07:32:15Z

It occurs to me that there's a risk of confusing tools such as perf or debuggers that could expect the stack to be 16-aligned. I need to check for this.

stedolan · 2023-06-08T15:49:35Z

This change makes sense to me, and removes a tricky constraint in the code generator.

The call to caml_assert_stack_invariants in emit.mlp should also be removed, and possibly there are some now-redundant alignment checks in the runtime/$ARCH.S files also.

16-alignment of the stack pointer is required for the C stack, but not for the stacks used to run OCaml code.

We used to force the allocation of a stack frame whenever caml_ml_array_bound_error is called, so as to guarantee 16-alignment of the (joint OCaml / C) stack. Now that the (OCaml) stack doesn't require 16-alignment, this special case is no longer needed.

xavierleroy · 2023-06-09T12:20:32Z

The call to caml_assert_stack_invariants in emit.mlp should also be removed,

Why? It no longer checks stack alignment, but still checks that there remains enough space on the stack.

and possibly there are some now-redundant alignment checks in the runtime/$ARCH.S files also.

I removed all of them in amd64.S, as well as a dummy macro in s390x.S. I need to re-run tests on RISC-V.

stedolan · 2023-06-19T16:24:07Z

The call to caml_assert_stack_invariants in emit.mlp should also be removed,

Why? It no longer checks stack alignment, but still checks that there remains enough space on the stack.

and possibly there are some now-redundant alignment checks in the runtime/$ARCH.S files also.

I removed all of them in amd64.S, as well as a dummy macro in s390x.S. I need to re-run tests on RISC-V.

Apologies, it is as you say. Somehow I completely missed the changes to amd64.S when reading this originally.

xavierleroy · 2023-06-23T13:46:25Z

No problem, and thanks for the feedback.

I ran more tests on RISC-V and observed no problems.

One thing that gives me pause: it's not just ocamlopt-generated code that runs on the OCaml stack, but also stub code generated by the linkers (static or dynamic), e.g. the PLT business for x86, or the jump islands for RISC-V. We would be in trouble if these stubs accessed the stack with instructions requiring alignment > 8. This doesn't seem to be the case: on x86 and RISC-V stubs do not access the stack; on POWER, stubs do contain stack writes, but these are 8-byte writes and we keep the default 16-alignment anyway. So, I think we're safe.

One thing that gives me hope is that the Linux kernel for x86 uses reduced stack alignment (from the default 16 alignment to 8), in order to save stack space... They had to work hard to convince GCC not to emit instructions that expect 16-alignment of the stack, but they managed. This makes me more confident that tools such as gdb and perf do not expect 16-alignment for x86 stacks.

gasche approved these changes May 4, 2022

View reviewed changes

xavierleroy force-pushed the reduced-stack-alignment branch from d1a26af to 5f59077 Compare May 10, 2023 17:00

xavierleroy added 4 commits June 9, 2023 14:04

ocamlopt x86-64: don't align stack frames to 16 bytes

52843f6

16-alignment of the stack pointer is required for the C stack, but not for the stacks used to run OCaml code.

ocamlopt RISC-V: don't align stack frames to 16 bytes

e0b29f8

16-alignment of the stack pointer is required for the C stack, but not for the stacks used to run OCaml code.

runtime/amd64.S: remove all stack alignment checks

816706c

xavierleroy force-pushed the reduced-stack-alignment branch from c814295 to 816706c Compare June 9, 2023 12:09

xavierleroy added 3 commits June 9, 2023 14:18

runtime/s390x.S: remove the dummy CHECK_STACK_ALIGNMENT macro

ecbb0b7

runtime/riscv.S: update a comment

462b122

Changes entry for 11239

160663e

xavierleroy force-pushed the reduced-stack-alignment branch from 9f000cc to 160663e Compare June 9, 2023 12:19

xavierleroy merged commit 717d9ba into ocaml:trunk Jun 23, 2023
9 of 10 checks passed

xavierleroy mentioned this pull request Jun 28, 2023

Dynamic linking of a shared library can cause segfaults #12328

Open

lthls mentioned this pull request Aug 3, 2023

Dynarrays, boxed #11882

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduced stack alignment for x86-64 #11239

Reduced stack alignment for x86-64 #11239

xavierleroy commented May 4, 2022

xavierleroy commented May 4, 2022

gasche left a comment

xavierleroy commented May 13, 2022

stedolan commented Jun 8, 2023

xavierleroy commented Jun 9, 2023

stedolan commented Jun 19, 2023

xavierleroy commented Jun 23, 2023

Reduced stack alignment for x86-64 #11239

Reduced stack alignment for x86-64 #11239

Conversation

xavierleroy commented May 4, 2022

xavierleroy commented May 4, 2022

gasche left a comment

Choose a reason for hiding this comment

xavierleroy commented May 13, 2022

stedolan commented Jun 8, 2023

xavierleroy commented Jun 9, 2023

stedolan commented Jun 19, 2023

xavierleroy commented Jun 23, 2023