Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8057586: Explicit GC ignored if GCLocker is active #13191

Closed
wants to merge 13 commits into from

Conversation

walulyai
Copy link
Member

@walulyai walulyai commented Mar 27, 2023

Hi All,

Please review this change to guarantee that at least a Full GC is executed between the invocation and return of an explicit Full GC call, even if the call is concurrent with an active GCLocker. We specify explicit GCs as GCs triggered by the end user in some form (jcmd, System.GC, or WhiteBox testing).

The change should also handle the issues reported in JDK-8299276.

Split into 3 commits, one commit for changes to each GC in [G1, Parallel, Serial].

Testing: Tier 1-5.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8057586: Explicit GC ignored if GCLocker is active

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/13191/head:pull/13191
$ git checkout pull/13191

Update a local copy of the PR:
$ git checkout pull/13191
$ git pull https://git.openjdk.org/jdk.git pull/13191/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 13191

View PR using the GUI difftool:
$ git pr show -t 13191

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/13191.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 27, 2023

👋 Welcome back iwalulya! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Mar 27, 2023

@walulyai The following label will be automatically applied to this pull request:

  • hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Mar 27, 2023
@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 27, 2023
@mlbridge
Copy link

mlbridge bot commented Mar 27, 2023


println("Starting " + allocThreadNum + " allocating threads");
for (int i = 0; i < allocThreadNum; i += 1) {
new Thread(new AllocatingWorker()).start();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of having an alloc thread? I'd expect whiteboxapi to be enough to trigger gc cycles regardless of heap state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://bugs.openjdk.org/browse/JDK-8057573

Reproducer taken from here. Added minor changes to fit JTREG.

The alloc thread test makes for more robust testing, but not required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why having an alloc thread would improve/degrade the robustness of this test -- jni-call-thread and systemgc-thread should be enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allocations trigger GCLocker Initiated GCs, so I prefer to keep them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment that this is intentional to increase the variety of the GC types that can happen here.

}

long gcCountAfter = collector.getCollectionCount();
Asserts.assertLessThanOrEqual(gcCountBefore + fullGcCounts, gcCountAfter, "Triggered more Full GCs than expected");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this assert be placed right after wb.fullGC(); so that the failure is closer to its cause?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it can.

Comment on lines 225 to 236
long durationMS = (long) (1000 * durationSec);
long start = System.currentTimeMillis();
long now = start;
long soFar = now - start;
while (soFar < durationMS) {
try {
Thread.sleep(durationMS - soFar);
} catch (Exception e) {
}
now = System.currentTimeMillis();
soFar = now - start;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's unexpected for the current thread to be interrupted; if so, it should log the exception and exit peacefully, as there's no enough info here to properly process it.

try {
    Thread.sleep(durationSec * 1000);
} catch (InterruptedException e) {
    e.printStackTrace();
    return;
}


println("Starting " + allocThreadNum + " allocating threads");
for (int i = 0; i < allocThreadNum; i += 1) {
new Thread(new AllocatingWorker()).start();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why having an alloc thread would improve/degrade the robustness of this test -- jni-call-thread and systemgc-thread should be enough.

System.out.println("SYSTEM_GC AFTER");

long gcCountAfter = collector.getCollectionCount();
Asserts.assertLessThanOrEqual(gcCountBefore + fullGcCounts, gcCountAfter, "Triggered more Full GCs than expected");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work?

var before = collector.getCollectionCount();
wb.fullGC();
var after = collector.getCollectionCount();
assert(before < after);

Comment on lines 2117 to 2119
if (counters_before.total_full_collections() != full_gc_count) {
return true;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other two GCs does this check inside the locker-scope. I don't think there's any practical diff -- why the inconsistency?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only difference is "time" at which changes were made (Initially on G1), then also I couldn't decide which version is cleaner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the style in Serial is cleaner.

VM_ParallelGCSystemGC op(gc_count, full_gc_count, cause);
VMThread::execute(&op);

if (!VM_ParallelGCSystemGC::is_cause_full(cause) || op.full_gc_succeeded()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why full_gc_succeeded here? Doesn't the following full_gc_count != total_full_collections achieve the same purpose?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

full_gc_count != total_full_collections is under locker-scope, so we can avoid that locking since we are sure at this point that gc_count was incremented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but this is not a perf-critical method, be consistent (with other gcs) is more important, IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only Serial doesn't have this early return, and only reason I didn't add it is because PR was getting too big and serial required a lot more changes to have this implemented. So i think It will added later for Serial GC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please file an issue for this.

Comment on lines 221 to 234
long durationMS = (long) (1000 * durationSec);
long start = System.currentTimeMillis();
long now = start;
long soFar = now - start;
while (soFar < durationMS) {
try {
Thread.sleep(durationMS - soFar);
} catch (InterruptedException e) {
e.printStackTrace();
return;
}
now = System.currentTimeMillis();
soFar = now - start;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single Thread.sleep should be fine -- the error margin should be negligible.

public:
VM_ParallelGCSystemGC(uint gc_count, uint full_gc_count, GCCause::Cause gc_cause);
virtual VMOp_Type type() const { return VMOp_ParallelGCSystemGC; }
virtual void doit();
bool full_gc_succeeded() const { return _full_gc_succeeded; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we name this gc_succeeded like in g1?


println("Starting " + allocThreadNum + " allocating threads");
for (int i = 0; i < allocThreadNum; i += 1) {
new Thread(new AllocatingWorker()).start();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment that this is intentional to increase the variety of the GC types that can happen here.

VM_ParallelGCSystemGC op(gc_count, full_gc_count, cause);
VMThread::execute(&op);

if (!VM_ParallelGCSystemGC::is_cause_full(cause) || op.full_gc_succeeded()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please file an issue for this.

@openjdk
Copy link

openjdk bot commented Apr 17, 2023

@walulyai This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8057586: Explicit GC ignored if GCLocker is active

Reviewed-by: tschatzl, ayang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 329 new commits pushed to the master branch:

  • 20b1d19: 8305746: InitializeEncoding should cache Charset object instead of charset name
  • 955abca: 8306483: (ch) Channels.newReader(ReadableByteChannel,Charset) refers to csName
  • c6a288d: 8305945: (zipfs) Opening a directory to get input stream produces incorrect exception message
  • 73018b3: 8306284: G1: Remove assertion in G1ScanHRForRegionClosure::do_claimed_block
  • 33a7978: 8306538: Zero variant build failure after JDK-8257967
  • 9c2e5b3: 8306459: s390x: Replace NULL to nullptr
  • 6a7dff3: 8305880: Loom: Avoid putting stale object pointers in oops
  • 310aa93: 8304291: [AIX] Broken build after JDK-8301998
  • 64ed816: 8305943: Open source few AWT Focus related tests
  • b8f0a66: 8041676: remove the java.compiler system property
  • ... and 319 more: https://git.openjdk.org/jdk/compare/f96aee74010476a850175f7012c196e40a31c188...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 17, 2023

long durationSec = Long.parseLong(args[0]);
int allocThreadNum = Integer.parseInt(args[1]);
int jniCriticalThreadNum = Integer.parseInt(args[2]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this always one in all test cases? Wouldn't it be more "stressing" to use sth larger? Same as allocThreadNum for instance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one is enough to trigger the error if one exists, whichever number we pick higher will be random. You can suggest a number if you like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one is enough to trigger the error if one exists

Well, there could be concurrent issues also. I'd feel more comfortable if it's > 1. (I suggest using 4, just to match its neighbor.)

VM_ParallelGCSystemGC op(gc_count, full_gc_count, cause);
VMThread::execute(&op);

if (!VM_ParallelGCSystemGC::is_cause_full(cause) || op.gc_succeeded()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!is_cause_full would cover more cases than System.gc and whitebox-fullgc, right? Is this really intended?

The introduce of op.gc_succeeded() is not well motivated -- the semantics of the return-val of invoke() is also not obvious at first glance. Therefore, I'd prefer keeping the existing signature.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!is_cause_full would cover more cases than System.gc and whitebox-fullgc, right? Is this really intended?

Yes, it is intended.

The introduce of op.gc_succeeded() is not well motivated -- the semantics of the return-val of invoke() is also not obvious at first glance. Therefore, I'd prefer keeping the existing signature.

Yeah, I did think about this. I can revert it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is intended.

Then, even non-explicit-gc (e.g. _metadata_GC_threshold or some other gc-cause) would get "guarantee that at least a Full GC is executed", not matching the title of this ticket.

Yeah, I did think about this. I can revert it.

Thank you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, even non-explicit-gc (e.g. _metadata_GC_threshold or some other gc-cause) would get "guarantee that at least a Full GC is executed", not matching the title of this ticket.

Thanks for catching that! Fixed

Comment on lines 98 to 103
inline static bool is_explicit_gc(GCCause::Cause cause) {
return (cause == GCCause::_java_lang_system_gc ||
cause == GCCause::_dcmd_gc_run ||
cause == GCCause::_wb_full_gc);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly sure what "explict gc"s are, but I would expect something more like this:

Suggested change
inline static bool is_explicit_gc(GCCause::Cause cause) {
return (cause == GCCause::_java_lang_system_gc ||
cause == GCCause::_dcmd_gc_run ||
cause == GCCause::_wb_full_gc);
}
inline static bool is_explicit_gc(GCCause::Cause cause) {
return (is_user_requested_gc(cause) ||
is_serviceability_requested_gc(cause) ||
cause == GCCause::_wb_young_gc) ||
cause == GCCause::_wb_full_gc);
}

because serviceability gcs are also explicitly requested by the user (from command line), and I believe all whitebox gcs are "explicit". Maybe also "_allocation_profiler" and "wb_breakpoint" ones (not sure right now about these ones).
At least the serviceability ones shouldn't be eaten by gc locker either, but maybe there is no guarantee about them to actually occur.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_explicit_full_gc which ones would fall in that category? Those are the only ones we are be interested in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All but GCCause::_wb_young_gc from this list; everything that's triggered by the end user in some form (serviceability gcs are from jcmd typically, so they fit in imho). But after some discussion with you I found that all stw collectors always suppress young gcs that failed due to gclocker, so such a change does not really matter. Keep it as is.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Apr 18, 2023
Copy link
Contributor

@tschatzl tschatzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 20, 2023
Comment on lines 556 to 559
if (!GCCause::is_explicit_gc(cause) ||
!VM_ParallelGCSystemGC::is_cause_full(cause) ||
op.full_gc_succeeded()) {
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the caller context, I'd believe all callers expect a GC cycle has just run (by this thread or another) when collect(cause) returns. Therefore, the terminating condition should be to check #gc_cycles against the intended gc-type (young or not), sth like:

{
  MutexLocker ml(Heap_lock);
  if (is_young_gc) {
    if (gc_count != total_collections()) {
      return;
    }
  } else {
    if (full_gc_count != total_full_collections()) {
      return;
    }
  }
}

(Originally, I was thinking only about full-gc, i.e. systemgc and _wb_full_gc, but it also seems reasonable to provide such guarantee, a gc-cycle has run, for _wb_young_gc.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we do that in a follow up CR. For parallel (and I guess for Serial) is that easy, for G1 it requires a bit more code movements.

@walulyai
Copy link
Member Author

Thanks, @albertnetymk and @tschatzl for the reviews!

/integrate

@openjdk
Copy link

openjdk bot commented Apr 24, 2023

Going to push as commit 4a9f8ef.
Since your change was applied there have been 356 commits pushed to the master branch:

  • ce493dd: 8306435: Juggle04/TestDescription.java should be a booleanArr test and not a byteArr one
  • f7d45b8: 8306076: Open source AWT misc tests
  • 4900517: 8306636: Disable compiler/c2/Test6905845.java with -XX:TieredStopAtLevel=3
  • 0f51e63: 8305590: Remove nothrow exception specifications from operator new
  • 8d696ae: 8306575: Clean up and open source four Dialog related tests
  • 9ed456f: 8306634: Open source AWT Event related tests
  • b2240bf: 8304696: Duplicate class names in dynamicArchive tests can lead to test failure
  • cb158ff: 8296153: Bump minimum boot jdk to JDK 20
  • 117c5b1: 8279216: Investigate implementation of premultiplied alpha in the Little-CMS 2.13
  • 723037a: 8298048: Combine CDS archive heap into a single block
  • ... and 346 more: https://git.openjdk.org/jdk/compare/f96aee74010476a850175f7012c196e40a31c188...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Apr 24, 2023
@openjdk openjdk bot closed this Apr 24, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 24, 2023
@openjdk
Copy link

openjdk bot commented Apr 24, 2023

@walulyai Pushed as commit 4a9f8ef.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated
3 participants