-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Variadic function cast causing segfaults on Apple ARM64 #2708
Fix: Variadic function cast causing segfaults on Apple ARM64 #2708
Conversation
2a88d90
to
43fbd08
Compare
Fix for Segfaults on Apple ARM64. Apple silicon does not support re-casting functions with fixed params as variadic, and instead requires the the function pointer to match the signature of the definition. If they do not match arguments will not be correctly marshaled, causing the function implementation to look for parameters in the wrong places. To enable Apple ARM64 support, we must explicitly cast to the correct function signature instead of using varargs `...`. Co-Authored-By: mjc <mjc@kernel.org>
43fbd08
to
af8a2f5
Compare
bif = (Eterm (*)(Process*, ...)) *pc++; | ||
t = (*bif)(build_proc, esp-1); | ||
bif = (BIF_RETTYPE (*)(BIF_ALIST)) *pc++; | ||
t = (*bif)(build_proc, esp-1, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I've got everything updated correctly. The one thing I'm not entirely sure of is if this should be copied into bif_args
, if this is an optimization for this specific case, or something else? I'm new to c
; so, if I'm understanding this correctly, esp-1
works here without copying into bif_args
because esp-1
and bif_args[0]
essentially point to the same value and both are pointers so semantically it's all good? Then by not copying into bif_args[0]
we save an allocation/step?
And, please feel free to tell me to just google more 😄 !
Thank you both for your time reviewing this, and please let me know if there's anything else I can change, improve, or have missed 😅 here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The copying of the arguments to bif_args
when there are 2 or 3 arguments is only necessary because the arguments would be in the wrong order if we would pass esp-2
or esp-3
, respectively. When there is a single argument, there is no need to copy it because it can't be in the wrong order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @bjorng
Many thanks for working on this. I have added this pull request to our daily builds. If everything goes well, we will merge it within a few days. |
Thanks for your pull request. |
Overview
Howdy!
This is a fix for segfaults happening on Apple ARM64. The segfaults are caused by differences in how Apple ARM64 treats variadic functions. x86 will only use the stack when necessary, after filling the available registers; while, 🍎 ARM64 will always places variadic arguments onto the stack regardless of many registers are available. The segfault manifests as what look like errors in bif functions (e.g.
is_integer_1
) and do not leave an erl crash dump (only available logs were from the system's crash report).This was originally developed against to
master
(otp-24) but has been back-ported here tomaint
. The updated build configuration provided by #2700 is required so it's currently included in this branch.I'm running the tests now and will get back with the results when they finish (in the morning).
FWIW, so far, Cowboy's test suite[1] and
cowlib
passed all tests without error on master (otp-24
). Using this branch (based offmaint
) with Elixir'smaster-otp-23
(from asdf I'll get the elixirsha
later) Phoenix's test suite passed all tests without error.Issue
The problem itself (seems) to boil down to variadic casts happening when calling bifs in
erts/emulator/beam/erl_db_util.c - db_prog_match
:bif
points to functions fromerts/emulator/beam/erl_bif_op.c
which have a fixed number of arguments e.g.Casting the fixed arguments to be variadic then causes the registers to be incorrectly marshaled and the VM segfaults.
The Fix
All of the functions in
erts/emulator/beam/erl_bif_op.c
are defined using a set of macros defined inerts/emulator/beam/bif.h
with the relevant parts being:Using these macros (on 🍎 ARM64), we were then able to declare function pointers in
db_prog_match
matching the expected function signature:and then call it:
Closing Remarks
I know, after battling these segfaults for a week. this is a very hot path of code execution , and I'll gladly admit I don't have the expertise required right now to know if there is a better approach for this fix or if this poses its own set of problems. This has been way out of my comfort zone (but damn it felt good! I'll be back for more); so, I'd like to emphasize I'm very receptive to feedback. The change in the PR as of opening it, is the simplest and most obvious (?) solution I could muster... and so that's exactly what's here.
I have a gigantic amount of symbolicated crash logs, I'd normally attach to help others without the platform see the issues, but... I'm not sure how much I can and can't share since I'm bound to the 🍎 developer kit NDA. I'm going to ask in their forums later if I can share some of the crash logs here or via email (if Apple needs that) for posterity.
Finally I'm happy, and waiting 😄 , to rebase after #2700 can be backported to
maint
as that's a dependency for this patch to work... Thank you for time![1] I'm not really sure how cowboy's test suite ran as it looks (based on the output) like it needs
go
and I definitely do not have a working instance ofgo
on the machine.Co-Authored-By: mjc mjc@kernel.org