-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8331208: Memory stress test that checks OutOfMemoryError stack trace fails #18925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Welcome back dnsimon! A progress list of the required criteria for merging this PR into |
|
@dougxc This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 199 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
8ea7188 to
07d2306
Compare
07d2306 to
35e6b50
Compare
| public: | ||
| SandboxedOOMEMark(JavaThread* thread, bool disable_events=false) { | ||
| if (thread != nullptr) { | ||
| _outer = thread->sandboxed_oome_mark(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need for supporting recursion is shown by this stack trace:
V [libjvm.dylib+0x4c9fe4] CompileBroker::compile_method(methodHandle const&, int, int, methodHandle const&, int, CompileTask::CompileReason, DirectiveSet*, JavaThread*)+0x6b0
V [libjvm.dylib+0x4c98d0] CompileBroker::compile_method(methodHandle const&, int, int, methodHandle const&, int, CompileTask::CompileReason, JavaThread*)+0xcc
V [libjvm.dylib+0x4a7434] CompilationPolicy::event(methodHandle const&, methodHandle const&, int, int, CompLevel, nmethod*, JavaThread*)+0x2e0
V [libjvm.dylib+0x355c14] Runtime1::counter_overflow(JavaThread*, int, Method*)+0x268
v ~RuntimeStub::counter_overflow Runtime1 stub 0x0000000116276c3c
J 4004 c1 jdk.internal.loader.URLClassPath.getLoader(I)Ljdk/internal/loader/URLClassPath$Loader; java.base@23-internal (194 bytes) @ 0x000000010f55a7bc [0x000000010f558cc0+0x0000000000001afc]
J 3651 jvmci jdk.internal.loader.URLClassPath.getResource(Ljava/lang/String;Z)Ljdk/internal/loader/Resource; java.base@23-internal (74 bytes) @ 0x0000000116ba628c [0x0000000116ba6200+0x000000000000008c]
J 3649 jvmci jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(Ljava/lang/String;)Ljava/lang/Class; java.base@23-internal (64 bytes) @ 0x0000000116ba4ffc [0x0000000116ba4c40+0x00000000000003bc]
J 3640 jvmci jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(Ljava/lang/String;Z)Ljava/lang/Class; java.base@23-internal (143 bytes) @ 0x0000000116ba26c0 [0x0000000116ba2440+0x0000000000000280]
J 3638 jvmci jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(Ljava/lang/String;Z)Ljava/lang/Class; java.base@23-internal (40 bytes) @ 0x0000000116ba17c0 [0x0000000116ba1680+0x0000000000000140]
J 3636 jvmci java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class; java.base@23-internal (7 bytes) @ 0x0000000116ba137c [0x0000000116ba1300+0x000000000000007c]
v ~StubRoutines::call_stub 0x00000001160f0190
V [libjvm.dylib+0x856918] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x420
V [libjvm.dylib+0x855618] JavaCalls::call_virtual(JavaValue*, Klass*, Symbol*, Symbol*, JavaCallArguments*, JavaThread*)+0x218
V [libjvm.dylib+0x8558b0] JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, Handle, JavaThread*)+0x70
V [libjvm.dylib+0x1024ca8] SystemDictionary::load_instance_class_impl(Symbol*, Handle, JavaThread*)+0x114
V [libjvm.dylib+0x10226c8] SystemDictionary::load_instance_class(Symbol*, Handle, JavaThread*)+0x28
V [libjvm.dylib+0x1021a7c] SystemDictionary::resolve_instance_class_or_null(Symbol*, Handle, Handle, JavaThread*)+0x69c
V [libjvm.dylib+0xf5ea64] SignatureStream::as_klass(Handle, Handle, SignatureStream::FailureMode, JavaThread*)+0x60
V [libjvm.dylib+0xd57894] Method::load_signature_classes(methodHandle const&, JavaThread*)+0xf0
V [libjvm.dylib+0x4c9c8c] CompileBroker::compile_method(methodHandle const&, int, int, methodHandle const&, int, CompileTask::CompileReason, DirectiveSet*, JavaThread*)+0x358
V [libjvm.dylib+0x4c98d0] CompileBroker::compile_method(methodHandle const&, int, int, methodHandle const&, int, CompileTask::CompileReason, JavaThread*)+0xcc
V [libjvm.dylib+0x4a7434] CompilationPolicy::event(methodHandle const&, methodHandle const&, int, int, CompLevel, nmethod*, JavaThread*)+0x2e0
V [libjvm.dylib+0x355c14] Runtime1::counter_overflow(JavaThread*, int, Method*)+0x268
v ~RuntimeStub::counter_overflow Runtime1 stub 0x0000000116276c3c
J 4003 c1 CountUppercase.identity(ILCountUppercase$Unloaded;)I (2 bytes) @ 0x000000010f558874 [0x000000010f5587c0+0x00000000000000b4]
j CountUppercase.main([Ljava/lang/String;)V+61
v ~StubRoutines::call_stub 0x00000001160f0190
V [libjvm.dylib+0x856918] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x420
V [libjvm.dylib+0x940bf8] jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, JavaThread*)+0x14c
V [libjvm.dylib+0x9475c0] jni_CallStaticVoidMethod+0x16c
C [libjli.dylib+0xa260] invokeStaticMainWithArgs+0x84
C [libjli.dylib+0xaa9c] JavaMain+0x588
C [libjli.dylib+0xd4a0] ThreadJavaMain+0xc
Note the recursive call to CompileBroker::compile_method (which uses a SandboxedOOMEMark).
… SandboxedOOMEMark
c4cc8d3 to
137ab23
Compare
|
/cc hotspot-gc hotspot-runtime |
|
@dougxc The |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you are generalising (and seemingly simplifying) the notion of a "retryable allocation" so that internally an OOME can be ignored for a range of reasons. It seems a rather elaborate response to the test failure (especially when generating a stacktrace under OOM conditions could itself fail anyway), but I can see the general utility of expanding things this way. I really dislike the name SandboxedOOMEMark though - sorry - suggestions: InternalOOMEMark, ScopedOOMEMark, ConfinedOOMEMark ?
My main concerns relate to me not understanding the details of the existing retryable allocation, so some of the new code seems a little odd. Comments below.
Thanks
| // -XX:+HeapDumpOnOutOfMemoryError and -XX:OnOutOfMemoryError support | ||
| report_java_out_of_memory(message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not obvious we now need this to be unconditional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was a mistake to make it conditional when RetryableAllocationMark was first introduced. The purpose of RAM was to only to resolve a correctness issue wrt to JVMTI (it was seeing the "same" exception being reported twice). The -XX actions do not change the semantics of the exception throwing so can be done unconditionally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if this is a hidden/internal OOME then why would we treat it as a normal OOME and trigger the XX action? If the allocation routines returned null instead, we would never consider triggering the XX actions for OOME.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on what the purpose of the -XX actions is. As far as I can tell, they are for understanding when and why the JVM hits a memory limit from an external perspective. For example, until something like https://bugs.openjdk.org/browse/JDK-8328639 exists, I don't think it would be easy to discover an OOME caused by the string constant resolution done by the JIT. But maybe that doesn't matter? I'm fine with keeping the XX actions conditional if you'd prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to keep them conditional - thanks. The -XX:OnOutOfMemoryError is an action to take when a user-visible OOME would be thrown. We don't run these actions for VM allocation failures.
| } else { | ||
| _outer = false; | ||
| _thread = nullptr; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't obvious to me how this part is intended to be used. I see it ties back to the retryable allocation "activate" mode, but I'm unclear what that means as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "this part", do you mean the else branch? It exists for the !activate case of RetryableAllocationMark which is used when the null_on_fail parameter of JVMCIRuntime::new_instance_common is true. That is, the runtime call is from compiled code that does not want to trigger throwing of an OOME. Graal will deopt in such cases and let the interpreter throw the exception. This ensures the OOME is reported exactly once to JVMTI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"this part" means the "else branch" which means the null receiving constructor. Yeah that whole "null_on_fail" thing had me a bit perplexed and I see there is now a JBS issue filed. to kill it off as we always want null-on-fail.
src/hotspot/share/oops/klass.cpp
Outdated
| if (length > max_length) { | ||
| if (!THREAD->in_retryable_allocation()) { | ||
| report_java_out_of_memory("Requested array size exceeds VM limit"); | ||
| report_java_out_of_memory("Requested array size exceeds VM limit"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again not obvious this should now be unconditional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same reasoning as for MemAllocator::Allocation::check_out_of_memory.
Yes, it's somewhat elaborate but also resolves a long standing suboptimal behavior in HotSpot. That is, when an OOME is thrown while reallocating objects in deoptimization (for example), it uses up one of the precious pre-allocated OOMEs. This increases the chance that an OOME that actually makes it out to user code will not have a stack trace.
Aren't "sandboxed", "scoped" and "confined" kind of all the same concept? I don't mind using a different name but want to better understand the specific objection to "sandboxed" first. |
|
I don't think "sandbox" fits in this context:
|
|
Ok, I will rename it to |
| JavaThread* _thread; | ||
|
|
||
| public: | ||
| InternalOOMEMark(JavaThread* thread) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: add a comment:
// Passing a null thread allows for a no-op implementation for contexts that will suppress
// throwing of the OOME - see RetryableAllocationMark.
I was wondering if we really need this. AFAICS it would be harmless to always pass in the current thread and set the thread's field because when we would have passed null then no exception would be thrown anyway. It seems the null thread is only used as a means for RAM to track whether activate was false. But I guess a no-op IOM achieves the same goal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throwing of the OOME is never suppressed by InternalOOMEMark. It only changes how the OOME is initialized.
When RetryableAllocationMark passes thread == null, it wants the normal OOME initialization to be done and JVMTI events to be fired.
In the context of https://bugs.openjdk.org/browse/JDK-8331429, I propose to leave this PR as is. That issue will remove activate altogether (cc @mur47x111 ).
| } | ||
| } | ||
|
|
||
| // Returns nullptr iff `activate` was false in the constructor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is out of place - activate is in the RAM constructor
| JavaThread* THREAD = _thread; // For exception macros. | ||
| JavaThread* THREAD = _iom.thread(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please restore comment: // For exception macros.
| if (ex->is_a(vmClasses::OutOfMemoryError_klass())) { | ||
| CLEAR_PENDING_EXCEPTION; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an observation but the original code will clear all exceptions except for an "async" exception, which these days is only the InternalError thrown by unsafe-access-errors. But the new code will only clear OOME thus allowing the (as expected) InternalError to remain, but also any other VirtualMachineErrors that may have arisen e.g. StackOverflowError. I actually think this is more correct, but it does seem a change in behaviour that we may need to be wary of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the context of Graal, it doesn't really make much of a difference as the Graal stub that calls this runtime routine will clear all exceptions anyway. But yes, I think limiting the clearing here to OOME is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at this because it touches GC code, and therefore have a few nits / style requests related to that. However, don't consider this a full review since I'm not familiar with the part of the code / issues this PR intends to solve.
| JavaThread* _thread; | ||
|
|
||
| public: | ||
| InternalOOMEMark(JavaThread* thread) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| InternalOOMEMark(JavaThread* thread) { | |
| explicit InternalOOMEMark(JavaThread* thread) { |
| class DeoptResourceMark; | ||
| class JNIHandleBlock; | ||
| class JVMCIRuntime; | ||
| class InternalOOMEMark; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to get this sorted as the other forward declarations.
| oop Universe::out_of_memory_error_java_heap(bool omit_backtrace) { | ||
| oop oome = out_of_memory_errors()->obj_at(_oom_java_heap); | ||
| if (!omit_backtrace) { | ||
| oome = gen_out_of_memory_error(oome); | ||
| } | ||
| return oome; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be nice to get rid of the double negation here:
| oop Universe::out_of_memory_error_java_heap(bool omit_backtrace) { | |
| oop oome = out_of_memory_errors()->obj_at(_oom_java_heap); | |
| if (!omit_backtrace) { | |
| oome = gen_out_of_memory_error(oome); | |
| } | |
| return oome; | |
| oop Universe::out_of_memory_error_java_heap(bool omit_backtrace) { | |
| oop oome = out_of_memory_errors()->obj_at(_oom_java_heap); | |
| if (omit_backtrace) { | |
| return oome; | |
| } | |
| return gen_out_of_memory_error(oome); |
| // may or may not have a backtrace. If error has a backtrace then the stack trace is already | ||
| // filled in. | ||
| static oop out_of_memory_error_java_heap(); | ||
| static oop out_of_memory_error_java_heap(bool omit_backtrace=false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| static oop out_of_memory_error_java_heap(bool omit_backtrace=false); | |
| static oop out_of_memory_error_java_heap(bool omit_backtrace = false); |
src/hotspot/share/oops/klass.cpp
Outdated
| THROW_OOP(Universe::out_of_memory_error_array_size()); | ||
| } else { | ||
| THROW_OOP(Universe::out_of_memory_error_retry()); | ||
| THROW_OOP(Universe::out_of_memory_error_java_heap(/* omit_backtrace*/ true)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| THROW_OOP(Universe::out_of_memory_error_java_heap(/* omit_backtrace*/ true)); | |
| THROW_OOP(Universe::out_of_memory_error_java_heap(/* omit_backtrace */ true)); |
| THROW_OOP_(exception, true); | ||
| } else { | ||
| THROW_OOP_(Universe::out_of_memory_error_retry(), true); | ||
| THROW_OOP_(Universe::out_of_memory_error_java_heap(/* omit_backtrace*/ true), true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the only explicitly passed in value for omit_backtrace is true I think it would be nicer to create a separate function for this case, instead of having a comment always explaining what true stands for. Maybe out_of_memory_error_java_heap_omit_backtrace()?
| bool in_internal_oome_mark() const { return _in_internal_oome_mark; } | ||
| void set_in_internal_oome_mark(bool b) { _in_internal_oome_mark = b; } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should all these be prefixed with is like:
bool is_in_VTMS_transition() const { return _is_in_VTMS_transition; }
bool is_in_tmp_VTMS_transition() const { return _is_in_tmp_VTMS_transition; }
bool is_in_any_VTMS_transition() const { return _is_in_VTMS_transition || _is_in_tmp_VTMS_transition; }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new version looks great to me.
|
Any remaining concerns @dholmes-ora ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updates look good. Thanks
|
Thanks for the reviews. /integrate |
|
Going to push as commit aafa15f.
Your commit was automatically rebased without conflicts. |
This pull request mitigates failures in memory stress tests that check the stack trace of an
OutOfMemoryErrorfor certain expected entries.The stack trace of an OOME will not be allocated once all preallocated OOMEs are used up. If the only heap allocations performed in stressful conditions are those of the stress test, then the 4 preallocated OOMEs would be sufficient. However, it's possible for VM internal allocations to also occur during stressful conditions, especially in
-Xcompmode. For example, CompileBroker::compile_method will try to resolve the string constants in the constant pool of the method about to be compiled. This can fail as shown here:These internal allocations can occur before the allocations of the test and thus use up the pre-allocated OOMEs. As a result, the OOMEs triggered by the stress test may end up throwing the default, shared OOME instance that have no stack trace.
This PR mitigates this by introducing a scope (see
InternalOOMEMarkinmemAllocator.hpp) in which a failed heap allocation results in the shared, stacktrace-less OOME instance being thrown. This scope is used for guarding VM internal allocations where an OOME will not be propagated to user code. In addition, JVMTI "resource exhausted" events are disabled in the scope of anInternalOOMEMark.Note that this change also improves diagnosability because internal OOMEs will no longer use up the limited number of pre-allocated OOMEs.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18925/head:pull/18925$ git checkout pull/18925Update a local copy of the PR:
$ git checkout pull/18925$ git pull https://git.openjdk.org/jdk.git pull/18925/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 18925View PR using the GUI difftool:
$ git pr show -t 18925Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18925.diff
Webrev
Link to Webrev Comment