Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HiPE code alloc fixes #584

Closed
wants to merge 6 commits into from
Closed

Conversation

mikpe
Copy link
Contributor

@mikpe mikpe commented Jan 11, 2015

This branch contains cleanups and fixes related to memory allocation for code in the HiPE runtime.

  • it changes generic code to raise Erlang-level exceptions on allocation failures
  • it changes platform-specific code to signal allocation failures rather than crashing the VM (with abort() or SEGV from NULL pointer dereferences)
  • it removes obsolete and disabled BIFs related to code management
  • it eliminates a useless internal platform-specific macro

Tested with the HiPE testsuite on x86_64 (-m64 and -m32), armv7, sparc64 -m32, and ppc64 -m32.

The HIPE_ALLOC_CODE macro in the HiPE runtime was introduced
ages ago to allow x86 and amd64 to switch from erts_alloc()
to an mmap() implementation with proper flag setting.  Nowadays
the macro is identical on all platforms, and serves no purpose.

Delete the macro, move the hipe_alloc_code() prototype to
hipe_arch.h, and simplify hipe_bifs_enter_code_2().
The hipe_bifs:make_native_stub/2 and hipe_bifs:get_emu_address/1
BIFs were originally used by hipe_unified_loader.erl, but the
code been obsolete and disabled for ages.

Remove the BIFs and all references to them.

In hipe_unified_loader.erl, remove the no-op emu_make_stubs/1
function.
HiPE allocates stubs for BEAM functions called from native code.
Error handling for that allocation is currently non-existent: if
it fails the VM will crash with an abort() or a SEGV from a NULL
pointer dereference.

This updates generic code in hipe_bif0.c to check for and handle
NULL returns from the platform-specific hipe_make_native_stub().

hipe_get_na_nofail() and hipe_get_na_nofail_locked() are renamed
to remove the inaccurate _nofail part of their names.

Obsolete #if 0 code in hipe_make_stub() is removed.
Fix platform-specific code to return NULL rather than crashing
when code allocation fails.
@OTP-Maintainer
Copy link

Patch has passed first testings and has been assigned to be reviewed


I am a script, I am not human


@sverker
Copy link
Contributor

sverker commented Jan 12, 2015

I think 21cbea9 (handle stub allocation failures) does more harm than good.
Either the allocation failure happens during code loading, in which case the error handling is incomplete leaving inconsistent data structures from a half loaded module. This is a problem we already have if the hipe_unified_loader fails due to an invalid beam file.
Or the allocation failure happens during execution of a dynamic MFA lookup in which case a badarg exception is thrown possibly crashing the Erlang process. This is not how we usually handle other out-of-memory scenarios.

Commit ddae791 seems "more ok" as memory allocation for the native code is some of the first thing we do and the code loading will fail without leaving any inconsistencies.

@mikpe
Copy link
Contributor Author

mikpe commented Jan 12, 2015

I realize that badarg is sub-optimal for allocation failures. Would the "handle stub allocation failures" part be more palatable if it threw system_limit instead? Doing so would require more changes to have the lower-level code return a proper error cookie (I don't think I want to abuse errno). Or do you have a different response to a code allocation failure in mind?

@sverker
Copy link
Contributor

sverker commented Jan 13, 2015

The most common action at out-of-memory is VM "crash", either by abort (which may create core dump) or an erl_crash.dump. The topic "how to handle out-of-memory" has been discussed a number of times and the conclusion is always that it's not much point in trying to "handle it", but instead try to prevent it by some sort of overload protection as early as possible at the edges of the system.

If the reason for the memory exhaustion is some sort of resource leakage, then it's probably better with a node reboot than having Erlang processes doing cyclic restarts.

Code loading done as part of an upgrade could maybe be an exception to the above IF the failed upgrade can be totally reverted and the system keeps on running with the old software. But that would be a much larger work including transactional load/upgrade of entire applications

My suggestion: Keep the old behavior with VM crash or call erl_exit like erts_alloc ends up doing.

@mikpe
Copy link
Contributor Author

mikpe commented Jan 13, 2015

A controlled termination via erl_exit() is perfectly fine, and much better than crashing hard with an abort or a SEGV.

Do you want the revision in a new pull req, or should I update the current branch and and just add a comment here when the revision has been done?

@sverker
Copy link
Contributor

sverker commented Jan 13, 2015

Keeping the same pull request is preferred I think. Not sure how you do it though.

When thinking of it, erl_exit would be the best action even when alloc_code() fails. I see no reason for that native code allocation to behave different than all other allocations done on behalf of code loading.

Instead of signalling an Erlang-level exception on code or stub
allocation failure, just terminate the VM with erl_exit().

This reverts most of 21cbea9
except for its dead code removal.
@mikpe
Copy link
Contributor Author

mikpe commented Jan 18, 2015

I've updated the branch to call erl_exit() when code or stub allocation fails, instead of signalling an Erlang-level exception, as per your suggestions. The previous renamings etc in hipe_bif0.c are reverted.

@OTP-Maintainer
Copy link

Patch has passed first testings and has been assigned to be reviewed


I am a script, I am not human


@sverker
Copy link
Contributor

sverker commented Jan 20, 2015

One last nitpicking request. Could you include the requested number of bytes in the error message to erl_exit. It's nice to easy distinguish between a real out-of-memory and some bug causing an insane amount of memory being requested.

@mikpe
Copy link
Contributor Author

mikpe commented Jan 21, 2015

Will do. That may require some rewrites because the exact amount to allocate depends on how many trampolines are present, and the target. But it should be doable.

When code allocation fails in hipe_bifs_enter_code_2, log the
number of bytes and trampolines in the erl_exit() message.
@mikpe
Copy link
Contributor Author

mikpe commented Jan 24, 2015

I've updated the erl_exit() message in hipe_bifs_enter_code_2 to mention the number of bytes we wanted to allocate for code, and the number of trampolines to other functions we wanted. The latter is an upper bound since some of them may already exist. This should allow us to distinguish between real out-of-memory and excessive allocations due to bugs.

I didn't change the erl_exit() message in hipe_make_stub() since that's a small fixed-size allocation.

@OTP-Maintainer
Copy link

Patch has passed first testings and has been assigned to be reviewed


I am a script, I am not human


@marcusarendt
Copy link

merged on master

@mikpe mikpe deleted the hipe-code-alloc-fixes branch March 15, 2020 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants