New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HiPE code alloc fixes #584
Conversation
The HIPE_ALLOC_CODE macro in the HiPE runtime was introduced ages ago to allow x86 and amd64 to switch from erts_alloc() to an mmap() implementation with proper flag setting. Nowadays the macro is identical on all platforms, and serves no purpose. Delete the macro, move the hipe_alloc_code() prototype to hipe_arch.h, and simplify hipe_bifs_enter_code_2().
The hipe_bifs:make_native_stub/2 and hipe_bifs:get_emu_address/1 BIFs were originally used by hipe_unified_loader.erl, but the code been obsolete and disabled for ages. Remove the BIFs and all references to them. In hipe_unified_loader.erl, remove the no-op emu_make_stubs/1 function.
HiPE allocates stubs for BEAM functions called from native code. Error handling for that allocation is currently non-existent: if it fails the VM will crash with an abort() or a SEGV from a NULL pointer dereference. This updates generic code in hipe_bif0.c to check for and handle NULL returns from the platform-specific hipe_make_native_stub(). hipe_get_na_nofail() and hipe_get_na_nofail_locked() are renamed to remove the inaccurate _nofail part of their names. Obsolete #if 0 code in hipe_make_stub() is removed.
Fix platform-specific code to return NULL rather than crashing when code allocation fails.
Patch has passed first testings and has been assigned to be reviewed I am a script, I am not human |
I think 21cbea9 (handle stub allocation failures) does more harm than good. Commit ddae791 seems "more ok" as memory allocation for the native code is some of the first thing we do and the code loading will fail without leaving any inconsistencies. |
I realize that badarg is sub-optimal for allocation failures. Would the "handle stub allocation failures" part be more palatable if it threw system_limit instead? Doing so would require more changes to have the lower-level code return a proper error cookie (I don't think I want to abuse errno). Or do you have a different response to a code allocation failure in mind? |
The most common action at out-of-memory is VM "crash", either by abort (which may create core dump) or an erl_crash.dump. The topic "how to handle out-of-memory" has been discussed a number of times and the conclusion is always that it's not much point in trying to "handle it", but instead try to prevent it by some sort of overload protection as early as possible at the edges of the system. If the reason for the memory exhaustion is some sort of resource leakage, then it's probably better with a node reboot than having Erlang processes doing cyclic restarts. Code loading done as part of an upgrade could maybe be an exception to the above IF the failed upgrade can be totally reverted and the system keeps on running with the old software. But that would be a much larger work including transactional load/upgrade of entire applications My suggestion: Keep the old behavior with VM crash or call erl_exit like erts_alloc ends up doing. |
A controlled termination via erl_exit() is perfectly fine, and much better than crashing hard with an abort or a SEGV. Do you want the revision in a new pull req, or should I update the current branch and and just add a comment here when the revision has been done? |
Keeping the same pull request is preferred I think. Not sure how you do it though. When thinking of it, erl_exit would be the best action even when alloc_code() fails. I see no reason for that native code allocation to behave different than all other allocations done on behalf of code loading. |
Instead of signalling an Erlang-level exception on code or stub allocation failure, just terminate the VM with erl_exit(). This reverts most of 21cbea9 except for its dead code removal.
I've updated the branch to call erl_exit() when code or stub allocation fails, instead of signalling an Erlang-level exception, as per your suggestions. The previous renamings etc in hipe_bif0.c are reverted. |
Patch has passed first testings and has been assigned to be reviewed I am a script, I am not human |
One last nitpicking request. Could you include the requested number of bytes in the error message to erl_exit. It's nice to easy distinguish between a real out-of-memory and some bug causing an insane amount of memory being requested. |
Will do. That may require some rewrites because the exact amount to allocate depends on how many trampolines are present, and the target. But it should be doable. |
When code allocation fails in hipe_bifs_enter_code_2, log the number of bytes and trampolines in the erl_exit() message.
I've updated the erl_exit() message in hipe_bifs_enter_code_2 to mention the number of bytes we wanted to allocate for code, and the number of trampolines to other functions we wanted. The latter is an upper bound since some of them may already exist. This should allow us to distinguish between real out-of-memory and excessive allocations due to bugs. I didn't change the erl_exit() message in hipe_make_stub() since that's a small fixed-size allocation. |
Patch has passed first testings and has been assigned to be reviewed I am a script, I am not human |
merged on master |
This branch contains cleanups and fixes related to memory allocation for code in the HiPE runtime.
Tested with the HiPE testsuite on x86_64 (-m64 and -m32), armv7, sparc64 -m32, and ppc64 -m32.