8366865: Allocation GC Pauses Triggered after JVM has started shutdown #27190

walulyai · 2025-09-10T09:57:18Z

Please review this patch to skip VM_GC_Collect_Operations if initiated after the VM shutdown process has begun. We add a _is_shutting_down flag to CollectedHeap, which is set while holding the Heap_lock. This ensures mutual exclusion with VM_GC_Collect_Operations, which also require the Heap_lock.

Skipping VM_GC_Collect_Operation would otherwise cause allocation requests to fail (resulting in OutOfMemoryError) if requesting daemon threads were allowed to continue, we instead block these threads on a monitor. They remain stalled until they are terminated as part of the VM shutdown sequence.

Testing: Tier 1-7

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8366865: Allocation GC Pauses Triggered after JVM has started shutdown (Enhancement - P4)

Reviewers

Albert Mingkun Yang (@albertnetymk - Reviewer)
Thomas Schatzl (@tschatzl - Reviewer) Review applies to edad0efe

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27190/head:pull/27190
$ git checkout pull/27190

Update a local copy of the PR:
$ git checkout pull/27190
$ git pull https://git.openjdk.org/jdk.git pull/27190/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 27190

View PR using the GUI difftool:
$ git pr show -t 27190

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27190.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2025-09-10T09:59:33Z

👋 Welcome back iwalulya! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-09-10T09:59:52Z

@walulyai This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8366865: Allocation GC Pauses Triggered after JVM has started shutdown

Reviewed-by: ayang, tschatzl

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 11 new commits pushed to the master branch:

0ba4141: 8366878: Improve flags of compiler/loopopts/superword/TestAlignVectorFuzzer.java
e8db14f: 8349910: Implement JEP 517: HTTP/3 for the HTTP Client API
433d2ec: 8367409: G1: Remove unused G1MonotonicArena::Segment::copy_to()
... and 8 more: https://git.openjdk.org/jdk/compare/cc65836d00de7041e7d32e7f15d98108b1ae47a0...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk · 2025-09-10T10:00:53Z

@walulyai The following label will be automatically applied to this pull request:

hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2025-09-10T10:19:42Z

Webrevs

albertnetymk · 2025-09-10T13:33:50Z

/cc hotspot-gc

openjdk · 2025-09-10T13:35:21Z

@albertnetymk
The hotspot-gc label was successfully added.

albertnetymk · 2025-09-10T13:50:23Z

src/hotspot/share/gc/shared/collectedHeap.cpp

+  // If the VM is shutting down, we may have skipped VM_CollectForAllocation.
+  // To avoid returning nullptr (which could cause premature OOME), we stall
+  // allocation requests here until the VM shutdown is complete.
+  MonitorLocker ml(VMExit_lock);


I wonder if an always-zero-Semaphore works.

albertnetymk

The PR title shouldn't have G1: prefix, as it changes other GCs as well.

albertnetymk · 2025-09-10T13:51:50Z

src/hotspot/share/gc/shared/gcVMOperations.cpp


  // Check invocations
-  if (skip_operation()) {
+  if (skip_operation() || Universe::heap()->is_shutting_down()) {


Can the new condition be inlined inside skip_operation()?

In the first iteration I had it inlined, but then decided to make it more explicit. I can change it back if you prefer

No particular preference, but the comments in skip_operation() need to be adjusted to also indicate that as additional condition.

dholmes-ora · 2025-09-11T01:30:53Z

src/hotspot/share/runtime/mutexLocker.cpp

 Monitor* Terminator_lock              = nullptr;
 Monitor* InitCompleted_lock           = nullptr;
 Monitor* BeforeExit_lock              = nullptr;
+Monitor* VMExit_lock                  = nullptr;


It is unfortunate that we need yet-another exit related mutex.

dholmes-ora · 2025-09-11T01:41:46Z

src/hotspot/share/gc/shared/collectedHeap.cpp

+  // If the VM is shutting down, we may have skipped VM_CollectForAllocation.
+  // To avoid returning nullptr (which could cause premature OOME), we stall
+  // allocation requests here until the VM shutdown is complete.


If I understand the shutdown sequence correctly, I don't think you can do this without risking a hang. The thread doing the shutdown can potentially execute code that requires allocation after is_shutting_down() returns true. This is due to the JVMTI events posted from before_exit:

if (JvmtiExport::should_post_thread_life()) { JvmtiExport::post_thread_end(thread); } // Always call even when there are not JVMTI environments yet, since environments // may be attached late and JVMTI must track phases of VM execution JvmtiExport::post_vm_death();

I think if you can't GC during shutdown then you have to simply let the allocation fail.

Thanks, I completely overlooked the JVMTI callbacks. It’s probably better to stall with a timeout and then return an allocation failure.

I moved the Universe::before_exit() call and also added a timed wait.
@dholmes-ora do you have any concerns with this approach?

Yes I do have concerns - sorry. Any change to the shutdown sequence needs very careful analysis. You have now changed the circumstances whereby the JVMTI events get posted. Maybe it won't matter, maybe it will - the issue is that it is very hard to determine the impact of such a change until you get notified that someone's code is now broken.

I have undone that part of the change. We can revisit it separately, that way it is easier to backout if it is problematic

Created JDK-8367902

tschatzl

I think this is mostly good, just some minor nitpicks.

src/hotspot/share/gc/shared/collectedHeap.cpp

tschatzl · 2025-09-15T13:09:01Z

src/hotspot/share/gc/shared/gcVMOperations.cpp


  // Check invocations
-  if (skip_operation()) {
+  if (skip_operation() || Universe::heap()->is_shutting_down()) {


No particular preference, but the comments in skip_operation() need to be adjusted to also indicate that as additional condition.

src/hotspot/share/runtime/java.cpp

src/hotspot/share/gc/shared/collectedHeap.cpp

JonasNorlinder · 2025-09-16T11:06:58Z

src/hotspot/share/memory/universe.cpp

-}

 void Universe::before_exit() {
-  log_cpu_time();


Why did you move log_cpu_time()? During the review of CPUTimeUsage refactor (#26621) we discussed this choice. Given that it still includes more than just GC I think it should stay in Universe. Also the PR title does not reflect that it would include a refactor of CPUTimeUsage.

Main reason was to have the log_cpu_time and AtomicAccess::release_store(&_is_shutting_down, true) under same critical section. Otherwise, we have no guarantee that we don't continue GCs after log_cpu_time.

I had put log_cpu_time right before calling stop(). The stop() is the method that terminates GC threads, so no synchronization should be needed if I'm not mistaken.

Please correct me if you think I got it wrong here.

Nevertheless, any user of gc_threads_do might still iterate over terminated GC workers thread. Could we consider adding a check or assert in that method?

We can have GCs between log_cpu_time and stop(). This reduces chances of that happening if we have log_cpu_time under same lock as setting _is_shutting_down.

Nevertheless, any user of gc_threads_do might still iterate over terminated GC workers thread. Could we consider adding a check or assert in that method?

Yes, we can have the assert in gc_threads_do, I thought this was going to be done as a follow up.

I moved _is_shutting_down to Universe, which allowed me to restore this log_cpu_time. I also added the assert before reading os::thread_cpu_time

JonasNorlinder · 2025-09-16T11:16:17Z

I think we want gc_threads_do (which we might want to avoid for performance reasons?) to check is_shutting_down() such that we can't query terminated GC workers.

openjdk · 2025-09-17T09:49:44Z

@walulyai this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout shutting_down_gcs
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

albertnetymk · 2025-09-17T15:53:58Z

Should CollectedHeap::satisfy_failed_metadata_allocation also call stall_for_vm_shutdown? (Looking through all subclasses of VM_GC_Collect_Operation.)

dholmes-ora

As this doesn't seem to perturb the general shutdown process I have no further comment and will leave it to GC folk to approve.

Though I still wonder if this aspect of shutdown should be part of "Heap" (or Universe) rather than G1 specific.

albertnetymk · 2025-09-18T09:04:53Z

Though I still wonder if this aspect of shutdown should be part of "Heap" (or Universe) rather than G1 specific.

Why is it G1 specific? The new bool _is_shutting_down lives inside CollectedHeap -- Serial/Parallel/G1 are all updated to use it.

JonasNorlinder · 2025-09-18T21:58:22Z

src/hotspot/share/memory/universe.hpp

  static bool _fully_initialized;                     // true after universe_init and initialize_vtables called

+  // Shutdown
+  static volatile bool _is_shutting_down;


If _is_shutting_down is in Universe, should ZGC/ZAbort hook into this too? It could be confusing that ZAbort reports that we are shutting down but the universe field does not report this. I would expect them both to report true.

src/hotspot/share/services/cpuTimeUsage.cpp

dholmes-ora · 2025-09-19T04:58:05Z

Though I still wonder if this aspect of shutdown should be part of "Heap" (or Universe) rather than G1 specific.

Why is it G1 specific? The new bool _is_shutting_down lives inside CollectedHeap -- Serial/Parallel/G1 are all updated to use it.

My mistake, I didn't track the changes enough and was just looking at all the G1 files involved.

albertnetymk · 2025-09-19T10:01:43Z

src/hotspot/share/gc/shared/collectedHeap.cpp

+  MonitorLocker ml(VMExit_lock);
+  ml.wait(2 * MILLIUNITS);


I think one can use ThreadBlockInVM + sleep to achieve the blocking-current-thread purpose. Then, there is no need for a new lock, as there is no critical-region anyway.

Changed to just using sleep.

dholmes-ora · 2025-09-22T07:56:29Z

src/hotspot/share/gc/shared/collectedHeap.cpp

  //     triggers a GC.
-  MonitorLocker ml(VMExit_lock);
-  ml.wait(2 * MILLIUNITS);
+  JavaThread::current()->sleep(2 * MILLIUNITS);


I prefer this to the Monitor::wait too. Thanks

src/hotspot/share/memory/universe.cpp

walulyai · 2025-09-23T08:15:57Z

Thanks for the reviews!

/integrate

openjdk · 2025-09-23T08:19:13Z

Going to push as commit 3e5094e.
Since your change was applied there have been 35 commits pushed to the master branch:

360b6af: 8364657: Crash for SecureRandom.generateSeed(0) on Windows x86-64
47ed1a8: 8368213: ZGC: Cleanup dead code, unimplemented declarations, unused private fields
7ed72d9: 8368212: ZGC: Fix spelling and typos in comments
... and 32 more: https://git.openjdk.org/jdk/compare/cc65836d00de7041e7d32e7f15d98108b1ae47a0...master

Your commit was automatically rebased without conflicts.

openjdk · 2025-09-23T08:19:23Z

@walulyai Pushed as commit 3e5094e.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

walulyai added 6 commits September 8, 2025 13:57

init

eb95d49

log_cpu_time

4f512f6

remove debug logs

89cf3cd

remove debug logs

1c20e32

space

b3e1000

Merge remote-tracking branch 'upstream/master' into shutting_down_gcs

8a05ec4

openjdk bot added the hotspot hotspot-dev@openjdk.org label Sep 10, 2025

walulyai marked this pull request as ready for review September 10, 2025 10:14

openjdk bot added the rfr Pull request is ready for review label Sep 10, 2025

openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Sep 10, 2025

albertnetymk reviewed Sep 10, 2025

View reviewed changes

dholmes-ora reviewed Sep 11, 2025

View reviewed changes

walulyai added 3 commits September 12, 2025 11:22

timed wait

d2f45cc

Merge remote-tracking branch 'upstream/master' into shutting_down_gcs

97c2108

return on timeout

187a463

walulyai mentioned this pull request Sep 15, 2025

8366328: G1: Crash on reading os::thread_cpu_time #27087

Closed

3 tasks

tschatzl suggested changes Sep 15, 2025

View reviewed changes

Thomas Review

0e45912

walulyai changed the title ~~8366865: G1: Allocation GC Pauses Triggered after JVM has started shutdown~~ 8366865: Allocation GC Pauses Triggered after JVM has started shutdown Sep 15, 2025

Revert

2ebff06

JonasNorlinder suggested changes Sep 16, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/master' into shutting_down_gcs

edad0ef

tschatzl approved these changes Sep 17, 2025

View reviewed changes

openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Sep 17, 2025

Merge remote-tracking branch 'upstream/master' into shutting_down_gcs

ab94789

openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Sep 17, 2025

dholmes-ora reviewed Sep 18, 2025

View reviewed changes

walulyai added 2 commits September 18, 2025 13:48

Merge remote-tracking branch 'upstream/master' into shutting_down_gcs

e09a55f

make universal

1a53c20

JonasNorlinder reviewed Sep 18, 2025

View reviewed changes

remove assert

46add7a

openjdk bot removed the hotspot-gc hotspot-gc-dev@openjdk.org label Sep 19, 2025

albertnetymk reviewed Sep 19, 2025

View reviewed changes

walulyai added 2 commits September 21, 2025 08:34

Remove lock

354e53c

Merge remote-tracking branch 'upstream/master' into shutting_down_gcs

87c8019

dholmes-ora reviewed Sep 22, 2025

View reviewed changes

albertnetymk reviewed Sep 22, 2025

View reviewed changes

src/hotspot/share/memory/universe.cpp Show resolved Hide resolved

Albert suggestion

f1ec969

albertnetymk approved these changes Sep 22, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Sep 22, 2025

openjdk bot added the integrated Pull request has been integrated label Sep 23, 2025

openjdk bot closed this Sep 23, 2025

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 23, 2025

8366865: Allocation GC Pauses Triggered after JVM has started shutdown #27190

8366865: Allocation GC Pauses Triggered after JVM has started shutdown #27190

Uh oh!

Conversation

walulyai commented Sep 10, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Reviewing

Uh oh!

bridgekeeper bot commented Sep 10, 2025

Uh oh!

openjdk bot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk bot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlbridge bot commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

albertnetymk commented Sep 10, 2025

Uh oh!

openjdk bot commented Sep 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albertnetymk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

walulyai Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tschatzl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JonasNorlinder commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk bot commented Sep 17, 2025

Uh oh!

albertnetymk commented Sep 17, 2025

Uh oh!

walulyai commented Sep 10, 2025 •

edited by openjdk bot

Loading

openjdk bot commented Sep 10, 2025 •

edited

Loading

openjdk bot commented Sep 10, 2025 •

edited

Loading

mlbridge bot commented Sep 10, 2025 •

edited

Loading

walulyai Sep 16, 2025 •

edited

Loading

JonasNorlinder commented Sep 16, 2025 •

edited

Loading