HiPE code alloc fixes #584

mikpe · 2015-01-11T16:37:11Z

This branch contains cleanups and fixes related to memory allocation for code in the HiPE runtime.

it changes generic code to raise Erlang-level exceptions on allocation failures
it changes platform-specific code to signal allocation failures rather than crashing the VM (with abort() or SEGV from NULL pointer dereferences)
it removes obsolete and disabled BIFs related to code management
it eliminates a useless internal platform-specific macro

Tested with the HiPE testsuite on x86_64 (-m64 and -m32), armv7, sparc64 -m32, and ppc64 -m32.

The HIPE_ALLOC_CODE macro in the HiPE runtime was introduced ages ago to allow x86 and amd64 to switch from erts_alloc() to an mmap() implementation with proper flag setting. Nowadays the macro is identical on all platforms, and serves no purpose. Delete the macro, move the hipe_alloc_code() prototype to hipe_arch.h, and simplify hipe_bifs_enter_code_2().

The hipe_bifs:make_native_stub/2 and hipe_bifs:get_emu_address/1 BIFs were originally used by hipe_unified_loader.erl, but the code been obsolete and disabled for ages. Remove the BIFs and all references to them. In hipe_unified_loader.erl, remove the no-op emu_make_stubs/1 function.

HiPE allocates stubs for BEAM functions called from native code. Error handling for that allocation is currently non-existent: if it fails the VM will crash with an abort() or a SEGV from a NULL pointer dereference. This updates generic code in hipe_bif0.c to check for and handle NULL returns from the platform-specific hipe_make_native_stub(). hipe_get_na_nofail() and hipe_get_na_nofail_locked() are renamed to remove the inaccurate _nofail part of their names. Obsolete #if 0 code in hipe_make_stub() is removed.

Fix platform-specific code to return NULL rather than crashing when code allocation fails.

OTP-Maintainer · 2015-01-11T20:00:47Z

Patch has passed first testings and has been assigned to be reviewed

I am a script, I am not human

sverker · 2015-01-12T20:14:13Z

I think 21cbea9 (handle stub allocation failures) does more harm than good.
Either the allocation failure happens during code loading, in which case the error handling is incomplete leaving inconsistent data structures from a half loaded module. This is a problem we already have if the hipe_unified_loader fails due to an invalid beam file.
Or the allocation failure happens during execution of a dynamic MFA lookup in which case a badarg exception is thrown possibly crashing the Erlang process. This is not how we usually handle other out-of-memory scenarios.

Commit ddae791 seems "more ok" as memory allocation for the native code is some of the first thing we do and the code loading will fail without leaving any inconsistencies.

mikpe · 2015-01-12T21:55:38Z

I realize that badarg is sub-optimal for allocation failures. Would the "handle stub allocation failures" part be more palatable if it threw system_limit instead? Doing so would require more changes to have the lower-level code return a proper error cookie (I don't think I want to abuse errno). Or do you have a different response to a code allocation failure in mind?

sverker · 2015-01-13T15:15:46Z

The most common action at out-of-memory is VM "crash", either by abort (which may create core dump) or an erl_crash.dump. The topic "how to handle out-of-memory" has been discussed a number of times and the conclusion is always that it's not much point in trying to "handle it", but instead try to prevent it by some sort of overload protection as early as possible at the edges of the system.

If the reason for the memory exhaustion is some sort of resource leakage, then it's probably better with a node reboot than having Erlang processes doing cyclic restarts.

Code loading done as part of an upgrade could maybe be an exception to the above IF the failed upgrade can be totally reverted and the system keeps on running with the old software. But that would be a much larger work including transactional load/upgrade of entire applications

My suggestion: Keep the old behavior with VM crash or call erl_exit like erts_alloc ends up doing.

mikpe · 2015-01-13T18:38:05Z

A controlled termination via erl_exit() is perfectly fine, and much better than crashing hard with an abort or a SEGV.

Do you want the revision in a new pull req, or should I update the current branch and and just add a comment here when the revision has been done?

sverker · 2015-01-13T19:35:47Z

Keeping the same pull request is preferred I think. Not sure how you do it though.

When thinking of it, erl_exit would be the best action even when alloc_code() fails. I see no reason for that native code allocation to behave different than all other allocations done on behalf of code loading.

Instead of signalling an Erlang-level exception on code or stub allocation failure, just terminate the VM with erl_exit(). This reverts most of 21cbea9 except for its dead code removal.

mikpe · 2015-01-18T14:50:49Z

I've updated the branch to call erl_exit() when code or stub allocation fails, instead of signalling an Erlang-level exception, as per your suggestions. The previous renamings etc in hipe_bif0.c are reverted.

OTP-Maintainer · 2015-01-18T19:21:46Z

Patch has passed first testings and has been assigned to be reviewed

I am a script, I am not human

sverker · 2015-01-20T15:28:20Z

One last nitpicking request. Could you include the requested number of bytes in the error message to erl_exit. It's nice to easy distinguish between a real out-of-memory and some bug causing an insane amount of memory being requested.

mikpe · 2015-01-21T09:07:14Z

Will do. That may require some rewrites because the exact amount to allocate depends on how many trampolines are present, and the target. But it should be doable.

When code allocation fails in hipe_bifs_enter_code_2, log the number of bytes and trampolines in the erl_exit() message.

mikpe · 2015-01-24T18:19:49Z

I've updated the erl_exit() message in hipe_bifs_enter_code_2 to mention the number of bytes we wanted to allocate for code, and the number of trampolines to other functions we wanted. The latter is an upper bound since some of them may already exist. This should allow us to distinguish between real out-of-memory and excessive allocations due to bugs.

I didn't change the erl_exit() message in hipe_make_stub() since that's a small fixed-size allocation.

OTP-Maintainer · 2015-01-24T19:22:03Z

Patch has passed first testings and has been assigned to be reviewed

I am a script, I am not human

marcusarendt · 2015-01-30T09:18:38Z

merged on master

mikpe added 4 commits January 11, 2015 16:29

hipe: return NULL on code allocation failures

ddae791

Fix platform-specific code to return NULL rather than crashing when code allocation fails.

hipe: erl_exit() on code allocation failure

d7ef5d4

Instead of signalling an Erlang-level exception on code or stub allocation failure, just terminate the VM with erl_exit(). This reverts most of 21cbea9 except for its dead code removal.

hipe: log nrbytes when code allocation fails

160e861

When code allocation fails in hipe_bifs_enter_code_2, log the number of bytes and trampolines in the erl_exit() message.

marcusarendt closed this Jan 30, 2015

mikpe deleted the hipe-code-alloc-fixes branch March 15, 2020 13:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HiPE code alloc fixes #584

HiPE code alloc fixes #584

mikpe commented Jan 11, 2015

OTP-Maintainer commented Jan 11, 2015

sverker commented Jan 12, 2015

mikpe commented Jan 12, 2015

sverker commented Jan 13, 2015

mikpe commented Jan 13, 2015

sverker commented Jan 13, 2015

mikpe commented Jan 18, 2015

OTP-Maintainer commented Jan 18, 2015

sverker commented Jan 20, 2015

mikpe commented Jan 21, 2015

mikpe commented Jan 24, 2015

OTP-Maintainer commented Jan 24, 2015

marcusarendt commented Jan 30, 2015

HiPE code alloc fixes #584

HiPE code alloc fixes #584

Conversation

mikpe commented Jan 11, 2015

OTP-Maintainer commented Jan 11, 2015

sverker commented Jan 12, 2015

mikpe commented Jan 12, 2015

sverker commented Jan 13, 2015

mikpe commented Jan 13, 2015

sverker commented Jan 13, 2015

mikpe commented Jan 18, 2015

OTP-Maintainer commented Jan 18, 2015

sverker commented Jan 20, 2015

mikpe commented Jan 21, 2015

mikpe commented Jan 24, 2015

OTP-Maintainer commented Jan 24, 2015

marcusarendt commented Jan 30, 2015