Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use mmap(MAP_STACK|...) for fibre stacks on OpenBSD #10875

Merged
merged 2 commits into from
Feb 4, 2022

Conversation

avsm
Copy link
Member

@avsm avsm commented Jan 11, 2022

OpenBSD 6.4 and higher (circa 2018ish) has a feature known as stack-register checking that requires that all memory used as a stack is allocated using mmap(MAP_STACK). The kernel checks upon syscall entry if the stack pointer is aimed at such memory, and kills the process otherwise. There is no mprotect or other post-allocation way to mark a piece of memory as intended for the stack, to avoid ROP widgets.

For OCaml, this means that we can't use the caml_stat_alloc allocator for fibre stacks, as that uses malloc behind the scenes and causes the native code compiler to be killed almost immediately when executed on OpenBSD 6.4+. This draft PR hides the OpenBSD-specific changes behind a #define to illustrate the changes required to pass the full OCaml test suite (apart from the domain parallel burn test, which seems to take hours and is killed with a timeout).

  • by switching to mmap, we need to track the size of the stack in order to free it. Previously this was taken care of by the malloc implementation. It's a little ugly to have to track the size in the stack_info struct though -- is there a better way to track it?
  • it appears that mmap(MAP_STACK) is also available on Linux, but not used for anything as far as I can see. It also exists on FreeBSD, where it does some automatic resizing of a guard page which we don't need in our OCaml usage. I could turn the __OpenBSD__ checks into a USE_MMAP_MAP_STACK config.h option, and turn that on automatically if OpenBSD is detected.
  • there are some hardcoded alignments to 64-bit which aren't a problem now, but will be if we resurrect 32-bit.

With this, the test suite and the fix from #10872, OpenBSD passes the test suite with the exception of domain-parallel-burn which looks like a fairness issue in the pthread implementation in OpenBSD itself.

@avsm
Copy link
Member Author

avsm commented Jan 12, 2022

Personal communication with @ctk21 indicates that we can calculate the stack length at free point by the Stack_end value -- I'll simplify this patch later on

@ctk21
Copy link
Contributor

ctk21 commented Jan 12, 2022

My thinking for the size, is that you could back it out from:

/* Stack layout for native code. Stack grows downwards.
*
* +------------------------+
* | struct stack_handler |
* +------------------------+ <--- Stack_high
* | caml_runstack |
* +------------------------+
* | |
* . OCaml frames . <--- sp
* | |
* +------------------------+ <--- Stack_threshold
* | |
* . Red Zone .
* | |
* +------------------------+ <--- Stack_base
* | struct stack_info |
* +------------------------+ <--- Caml_state->current_stack
* | HEADER WORD |
* +------------------------+
*/

using Stack_high and Stack_base.

However there is an alignment adjustment to ensure the stack is 16-byte aligned:

ocaml/runtime/fiber.c

Lines 124 to 128 in 1546e1f

/* Ensure 16-byte alignment because some architectures require it */
hand = (struct stack_handler*)
(((uintnat)stack + sizeof(struct stack_info) + sizeof(value) * wosize + 8)
& ((uintnat)-1 << 4));
stack->handler = hand;

and unfortunately it doesn't look simple to know the exact boundary of the area from Stack_high as things stand.

@mndrix
Copy link
Contributor

mndrix commented Jan 12, 2022

  • It's a little ugly to have to track the size in the stack_info struct though -- is there a better way to track it?

I like the clarity of tracking size explicitly. It's easy to understand and to convince myself that it's doing the right thing.

  • I could turn the __OpenBSD__ checks into a USE_MMAP_MAP_STACK config.h option, and turn that on automatically if OpenBSD is detected.

After re-reading the relevant man pages, I agree that MAP_STACK doesn't buy much outside of OpenBSD. So I don't think extra configuration complexity is worthwhile.

I'll report back if I encounter any problems while running on this PR.

@mndrix
Copy link
Contributor

mndrix commented Jan 21, 2022

I've been running this PR with some local (admittedly light) workloads. No problems so far beyond the domain-parallel-burn test. I manually merge this branch each morning when I rebuild trunk on my laptop, so I'd personally enjoy seeing it merged.

Is it possible to disable the domain-parallel-burn test on OpenBSD? I don't know enough about OpenBSD's scheduler to send a patch upstream.

@gasche
Copy link
Member

gasche commented Jan 21, 2022

This would need a review before we can consider merging. I just merged #10872, on which the present PR is based (it may be possible to rebase now?) thanks to @kayceesrk's review there.

runtime/fiber.c Outdated Show resolved Hide resolved
runtime/caml/fiber.h Outdated Show resolved Hide resolved
runtime/fiber.c Outdated Show resolved Hide resolved
@ctk21
Copy link
Contributor

ctk21 commented Jan 24, 2022

Is it possible to disable the domain-parallel-burn test on OpenBSD? I don't know enough about OpenBSD's scheduler to send a patch upstream.

Is it possible first to try rebasing the PR to include the performance fix in #10880?

I'm wanting to flush out if the domain-parallel-burn test is just slow or if there is a livelock/deadlock bug in the runtime that the test on OpenBSD is picking up.

@mndrix
Copy link
Contributor

mndrix commented Jan 24, 2022

Is it possible first to try rebasing the PR to include the performance fix in #10880?

I merged this PR into trunk (as of commit 488a28a, which includes the fix from #10880) then ran the domain parallel spawn burn test on OpenBSD. It passes for bytecode but ends with a timeout for native. Here's
the output, in case that helps.

@avsm avsm force-pushed the openbsd-mmap-stack branch 3 times, most recently from a038c3d to 90e4885 Compare January 25, 2022 18:08
@avsm
Copy link
Member Author

avsm commented Jan 26, 2022

I pushed a --enable-mmap-map-stack option to toggle this behaviour, and addressed all the other review comments from @gasche @kayceesrk @ctk21. The option forces mmaped stacks on OpenBSD, and lets it be used on Linux/FreeBSD and any other system where MAP_STACK is available.

Oddly enough, in a rare "OpenBSD feature speeds up Linux" situation, the use of mmaped stacks on Linux seems to be faster than the existing malloc strategy. A very rough benchmark was running Linux 'make tests' a few times with the option on/off and without this patch:

enabled mmap MAP_STACK:
real    10m55.258s
user    13m31.781s
sys     1m26.354s

real    10m31.754s
user    12m1.970s
sys     1m21.525s

disabled mmap:
real    11m51.425s
user    15m42.359s
sys     1m35.848s

real    11m55.043s
user    15m47.818s
sys     1m35.336s

trunk:
real    11m21.694s
user    14m38.814s
sys     1m30.819s

real    12m5.827s
user    16m32.831s
sys     1m39.710s

I think we need to run this on Sandmark to see if it really is faster or not without the variance.

Another thing to track down is that FreeBSD crashes with mmap(MAP_STACK). This is due to its very different semantics, where it allocates a guard page to automatically grow the stack as it heads downward in memory. I suspect that on FreeBSD we'll need to have an extra page as the guard page never disappears. However, MAP_STACK just isn't that much use on FreeBSD either since we do our own manual management.

@gasche
Copy link
Member

gasche commented Jan 26, 2022

Note: the non-mmap path uses caml_stat_alloc which keeps track of malloced block in a pool to properly free them when the runtime is shutdown. Currently the mmap path offers no such tracking, so it could leak memory if the runtime is stopped without calling caml_free_stack on some stacks. (This sounds plausible if it shuts down with some continuations stored in the heap but not active.)

I think that this not an issue for this PR that implements OpenBSD support ("eventually leaking some memory in corner cases" is way better than "being killed instantly"), but we think a bit harder about it before using the feature much beyond that. (I think that having a configure option is a good idea.)

I guess that a "proper" way to handle this would be to add a different kind of blocks to our stat_alloc pool for mmapped memory, to call munmap on cleanup. This could come in a separate PR in some hypothetical future.

Copy link
Member

@gasche gasche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks very reasonable to me, and I understand that it has been tested on OpenBSD.

(You still need a Changes entry and also check-typo complains)

./configure.ac:1928.81: [long-line] line is over 80 columns

@avsm
Copy link
Member Author

avsm commented Jan 26, 2022

I agree @gasche -- I looked into adding the tracking support into the caml_stat functions, but wanted to figure out this FreeBSD situation first. @shubhamkumar13 is going to run some Sandmark tests this week with this PR to get some more systematic results, and then I'll tidy up the Changes and check-typo and mark the PR as ready.

avsm added a commit to avsm/ocaml that referenced this pull request Feb 3, 2022
@avsm avsm marked this pull request as ready for review February 3, 2022 20:42
@avsm
Copy link
Member Author

avsm commented Feb 4, 2022

No appreciable difference vs trunk or with mmap on/off on Linux, so this looks safe to merge. Thanks to @shubhamkumar13 for the Sandmark run.

image

Changes Outdated Show resolved Hide resolved
This exposes a --enable-mmap-map-stack configure option that
switch stack allocation from malloc to use mmap(MAP_STACK).
This is always enabled on OpenBSD, and optionally available
when explicitly specified on at least Linux. FreeBSD's mmap
with MAP_STACK has different semantics and is not supported.
@kayceesrk kayceesrk merged commit 8fac59a into ocaml:trunk Feb 4, 2022
@kayceesrk
Copy link
Contributor

Thanks for all the reviews and the benchmarking efforts.

@avsm avsm deleted the openbsd-mmap-stack branch February 9, 2022 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants