-
Notifications
You must be signed in to change notification settings - Fork 6.3k
8366865: Allocation GC Pauses Triggered after JVM has started shutdown #27190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back iwalulya! A progress list of the required criteria for merging this PR into |
@walulyai This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 11 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
Webrevs
|
/cc hotspot-gc |
@albertnetymk |
// If the VM is shutting down, we may have skipped VM_CollectForAllocation. | ||
// To avoid returning nullptr (which could cause premature OOME), we stall | ||
// allocation requests here until the VM shutdown is complete. | ||
MonitorLocker ml(VMExit_lock); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if an always-zero-Semaphore
works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR title shouldn't have G1:
prefix, as it changes other GCs as well.
|
||
// Check invocations | ||
if (skip_operation()) { | ||
if (skip_operation() || Universe::heap()->is_shutting_down()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the new condition be inlined inside skip_operation()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the first iteration I had it inlined, but then decided to make it more explicit. I can change it back if you prefer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular preference, but the comments in skip_operation()
need to be adjusted to also indicate that as additional condition.
Monitor* Terminator_lock = nullptr; | ||
Monitor* InitCompleted_lock = nullptr; | ||
Monitor* BeforeExit_lock = nullptr; | ||
Monitor* VMExit_lock = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unfortunate that we need yet-another exit related mutex.
// If the VM is shutting down, we may have skipped VM_CollectForAllocation. | ||
// To avoid returning nullptr (which could cause premature OOME), we stall | ||
// allocation requests here until the VM shutdown is complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the shutdown sequence correctly, I don't think you can do this without risking a hang. The thread doing the shutdown can potentially execute code that requires allocation after is_shutting_down()
returns true. This is due to the JVMTI events posted from before_exit
:
if (JvmtiExport::should_post_thread_life()) {
JvmtiExport::post_thread_end(thread);
}
// Always call even when there are not JVMTI environments yet, since environments
// may be attached late and JVMTI must track phases of VM execution
JvmtiExport::post_vm_death();
I think if you can't GC during shutdown then you have to simply let the allocation fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I completely overlooked the JVMTI callbacks. It’s probably better to stall with a timeout and then return an allocation failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the Universe::before_exit()
call and also added a timed wait
.
@dholmes-ora do you have any concerns with this approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I do have concerns - sorry. Any change to the shutdown sequence needs very careful analysis. You have now changed the circumstances whereby the JVMTI events get posted. Maybe it won't matter, maybe it will - the issue is that it is very hard to determine the impact of such a change until you get notified that someone's code is now broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have undone that part of the change. We can revisit it separately, that way it is easier to backout if it is problematic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created JDK-8367902
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is mostly good, just some minor nitpicks.
|
||
// Check invocations | ||
if (skip_operation()) { | ||
if (skip_operation() || Universe::heap()->is_shutting_down()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular preference, but the comments in skip_operation()
need to be adjusted to also indicate that as additional condition.
} | ||
|
||
void Universe::before_exit() { | ||
log_cpu_time(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you move log_cpu_time()
? During the review of CPUTimeUsage refactor (#26621) we discussed this choice. Given that it still includes more than just GC I think it should stay in Universe
. Also the PR title does not reflect that it would include a refactor of CPUTimeUsage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main reason was to have the log_cpu_time
and AtomicAccess::release_store(&_is_shutting_down, true)
under same critical section. Otherwise, we have no guarantee that we don't continue GCs after log_cpu_time
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had put log_cpu_time
right before calling stop()
. The stop()
is the method that terminates GC threads, so no synchronization should be needed if I'm not mistaken.
Please correct me if you think I got it wrong here.
Nevertheless, any user of gc_threads_do
might still iterate over terminated GC workers thread. Could we consider adding a check or assert in that method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have GCs between log_cpu_time
and stop()
. This reduces chances of that happening if we have log_cpu_time
under same lock as setting _is_shutting_down
.
Nevertheless, any user of gc_threads_do might still iterate over terminated GC workers thread. Could we consider adding a check or assert in that method?
Yes, we can have the assert in gc_threads_do
, I thought this was going to be done as a follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved _is_shutting_down
to Universe
, which allowed me to restore this log_cpu_time
. I also added the assert before reading os::thread_cpu_time
I think we want |
@walulyai this pull request can not be integrated into git checkout shutting_down_gcs
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
Should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this doesn't seem to perturb the general shutdown process I have no further comment and will leave it to GC folk to approve.
Though I still wonder if this aspect of shutdown should be part of "Heap" (or Universe) rather than G1 specific.
Why is it G1 specific? The new |
static bool _fully_initialized; // true after universe_init and initialize_vtables called | ||
|
||
// Shutdown | ||
static volatile bool _is_shutting_down; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If _is_shutting_down
is in Universe, should ZGC/ZAbort hook into this too? It could be confusing that ZAbort reports that we are shutting down but the universe field does not report this. I would expect them both to report true.
My mistake, I didn't track the changes enough and was just looking at all the G1 files involved. |
MonitorLocker ml(VMExit_lock); | ||
ml.wait(2 * MILLIUNITS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one can use ThreadBlockInVM
+ sleep to achieve the blocking-current-thread purpose. Then, there is no need for a new lock, as there is no critical-region anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to just using sleep.
// triggers a GC. | ||
MonitorLocker ml(VMExit_lock); | ||
ml.wait(2 * MILLIUNITS); | ||
JavaThread::current()->sleep(2 * MILLIUNITS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer this to the Monitor::wait too. Thanks
Thanks for the reviews! /integrate |
Going to push as commit 3e5094e.
Your commit was automatically rebased without conflicts. |
Please review this patch to skip VM_GC_Collect_Operations if initiated after the VM shutdown process has begun. We add a _is_shutting_down flag to CollectedHeap, which is set while holding the Heap_lock. This ensures mutual exclusion with VM_GC_Collect_Operations, which also require the Heap_lock.
Skipping VM_GC_Collect_Operation would otherwise cause allocation requests to fail (resulting in OutOfMemoryError) if requesting daemon threads were allowed to continue, we instead block these threads on a monitor. They remain stalled until they are terminated as part of the VM shutdown sequence.
Testing: Tier 1-7
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27190/head:pull/27190
$ git checkout pull/27190
Update a local copy of the PR:
$ git checkout pull/27190
$ git pull https://git.openjdk.org/jdk.git pull/27190/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 27190
View PR using the GUI difftool:
$ git pr show -t 27190
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27190.diff
Using Webrev
Link to Webrev Comment