-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8057586: Explicit GC ignored if GCLocker is active #13191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back iwalulya! A progress list of the required criteria for merging this PR into |
Webrevs
|
|
||
println("Starting " + allocThreadNum + " allocating threads"); | ||
for (int i = 0; i < allocThreadNum; i += 1) { | ||
new Thread(new AllocatingWorker()).start(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of having an alloc thread? I'd expect whiteboxapi to be enough to trigger gc cycles regardless of heap state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://bugs.openjdk.org/browse/JDK-8057573
Reproducer taken from here. Added minor changes to fit JTREG.
The alloc thread test makes for more robust testing, but not required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get why having an alloc thread would improve/degrade the robustness of this test -- jni-call-thread and systemgc-thread should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The allocations trigger GCLocker Initiated GCs, so I prefer to keep them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment that this is intentional to increase the variety of the GC types that can happen here.
} | ||
|
||
long gcCountAfter = collector.getCollectionCount(); | ||
Asserts.assertLessThanOrEqual(gcCountBefore + fullGcCounts, gcCountAfter, "Triggered more Full GCs than expected"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this assert be placed right after wb.fullGC();
so that the failure is closer to its cause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it can.
long durationMS = (long) (1000 * durationSec); | ||
long start = System.currentTimeMillis(); | ||
long now = start; | ||
long soFar = now - start; | ||
while (soFar < durationMS) { | ||
try { | ||
Thread.sleep(durationMS - soFar); | ||
} catch (Exception e) { | ||
} | ||
now = System.currentTimeMillis(); | ||
soFar = now - start; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's unexpected for the current thread to be interrupted; if so, it should log the exception and exit peacefully, as there's no enough info here to properly process it.
try {
Thread.sleep(durationSec * 1000);
} catch (InterruptedException e) {
e.printStackTrace();
return;
}
|
||
println("Starting " + allocThreadNum + " allocating threads"); | ||
for (int i = 0; i < allocThreadNum; i += 1) { | ||
new Thread(new AllocatingWorker()).start(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get why having an alloc thread would improve/degrade the robustness of this test -- jni-call-thread and systemgc-thread should be enough.
System.out.println("SYSTEM_GC AFTER"); | ||
|
||
long gcCountAfter = collector.getCollectionCount(); | ||
Asserts.assertLessThanOrEqual(gcCountBefore + fullGcCounts, gcCountAfter, "Triggered more Full GCs than expected"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work?
var before = collector.getCollectionCount();
wb.fullGC();
var after = collector.getCollectionCount();
assert(before < after);
if (counters_before.total_full_collections() != full_gc_count) { | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other two GCs does this check inside the locker-scope. I don't think there's any practical diff -- why the inconsistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only difference is "time" at which changes were made (Initially on G1), then also I couldn't decide which version is cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the style in Serial is cleaner.
VM_ParallelGCSystemGC op(gc_count, full_gc_count, cause); | ||
VMThread::execute(&op); | ||
|
||
if (!VM_ParallelGCSystemGC::is_cause_full(cause) || op.full_gc_succeeded()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why full_gc_succeeded
here? Doesn't the following full_gc_count != total_full_collections
achieve the same purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
full_gc_count != total_full_collections
is under locker-scope, so we can avoid that locking since we are sure at this point that gc_count was incremented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but this is not a perf-critical method, be consistent (with other gcs) is more important, IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only Serial doesn't have this early return, and only reason I didn't add it is because PR was getting too big and serial required a lot more changes to have this implemented. So i think It will added later for Serial GC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please file an issue for this.
long durationMS = (long) (1000 * durationSec); | ||
long start = System.currentTimeMillis(); | ||
long now = start; | ||
long soFar = now - start; | ||
while (soFar < durationMS) { | ||
try { | ||
Thread.sleep(durationMS - soFar); | ||
} catch (InterruptedException e) { | ||
e.printStackTrace(); | ||
return; | ||
} | ||
now = System.currentTimeMillis(); | ||
soFar = now - start; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A single Thread.sleep
should be fine -- the error margin should be negligible.
public: | ||
VM_ParallelGCSystemGC(uint gc_count, uint full_gc_count, GCCause::Cause gc_cause); | ||
virtual VMOp_Type type() const { return VMOp_ParallelGCSystemGC; } | ||
virtual void doit(); | ||
bool full_gc_succeeded() const { return _full_gc_succeeded; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we name this gc_succeeded
like in g1?
|
||
println("Starting " + allocThreadNum + " allocating threads"); | ||
for (int i = 0; i < allocThreadNum; i += 1) { | ||
new Thread(new AllocatingWorker()).start(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment that this is intentional to increase the variety of the GC types that can happen here.
VM_ParallelGCSystemGC op(gc_count, full_gc_count, cause); | ||
VMThread::execute(&op); | ||
|
||
if (!VM_ParallelGCSystemGC::is_cause_full(cause) || op.full_gc_succeeded()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please file an issue for this.
@walulyai This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 329 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
|
||
long durationSec = Long.parseLong(args[0]); | ||
int allocThreadNum = Integer.parseInt(args[1]); | ||
int jniCriticalThreadNum = Integer.parseInt(args[2]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this always one in all test cases? Wouldn't it be more "stressing" to use sth larger? Same as allocThreadNum
for instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one is enough to trigger the error if one exists, whichever number we pick higher will be random. You can suggest a number if you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one is enough to trigger the error if one exists
Well, there could be concurrent issues also. I'd feel more comfortable if it's > 1. (I suggest using 4, just to match its neighbor.)
VM_ParallelGCSystemGC op(gc_count, full_gc_count, cause); | ||
VMThread::execute(&op); | ||
|
||
if (!VM_ParallelGCSystemGC::is_cause_full(cause) || op.gc_succeeded()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!is_cause_full
would cover more cases than System.gc
and whitebox-fullgc, right? Is this really intended?
The introduce of op.gc_succeeded()
is not well motivated -- the semantics of the return-val of invoke()
is also not obvious at first glance. Therefore, I'd prefer keeping the existing signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!is_cause_full would cover more cases than System.gc and whitebox-fullgc, right? Is this really intended?
Yes, it is intended.
The introduce of
op.gc_succeeded()
is not well motivated -- the semantics of the return-val ofinvoke()
is also not obvious at first glance. Therefore, I'd prefer keeping the existing signature.
Yeah, I did think about this. I can revert it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is intended.
Then, even non-explicit-gc (e.g. _metadata_GC_threshold
or some other gc-cause) would get "guarantee that at least a Full GC is executed", not matching the title of this ticket.
Yeah, I did think about this. I can revert it.
Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, even non-explicit-gc (e.g.
_metadata_GC_threshold
or some other gc-cause) would get "guarantee that at least a Full GC is executed", not matching the title of this ticket.
Thanks for catching that! Fixed
inline static bool is_explicit_gc(GCCause::Cause cause) { | ||
return (cause == GCCause::_java_lang_system_gc || | ||
cause == GCCause::_dcmd_gc_run || | ||
cause == GCCause::_wb_full_gc); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not exactly sure what "explict gc"s are, but I would expect something more like this:
inline static bool is_explicit_gc(GCCause::Cause cause) { | |
return (cause == GCCause::_java_lang_system_gc || | |
cause == GCCause::_dcmd_gc_run || | |
cause == GCCause::_wb_full_gc); | |
} | |
inline static bool is_explicit_gc(GCCause::Cause cause) { | |
return (is_user_requested_gc(cause) || | |
is_serviceability_requested_gc(cause) || | |
cause == GCCause::_wb_young_gc) || | |
cause == GCCause::_wb_full_gc); | |
} | |
because serviceability gcs are also explicitly requested by the user (from command line), and I believe all whitebox gcs are "explicit". Maybe also "_allocation_profiler" and "wb_breakpoint" ones (not sure right now about these ones).
At least the serviceability ones shouldn't be eaten by gc locker either, but maybe there is no guarantee about them to actually occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_explicit_full_gc
which ones would fall in that category? Those are the only ones we are be interested in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All but GCCause::_wb_young_gc
from this list; everything that's triggered by the end user in some form (serviceability gcs are from jcmd typically, so they fit in imho). But after some discussion with you I found that all stw collectors always suppress young gcs that failed due to gclocker, so such a change does not really matter. Keep it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
if (!GCCause::is_explicit_gc(cause) || | ||
!VM_ParallelGCSystemGC::is_cause_full(cause) || | ||
op.full_gc_succeeded()) { | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the caller context, I'd believe all callers expect a GC cycle has just run (by this thread or another) when collect(cause)
returns. Therefore, the terminating condition should be to check #gc_cycles against the intended gc-type (young or not), sth like:
{
MutexLocker ml(Heap_lock);
if (is_young_gc) {
if (gc_count != total_collections()) {
return;
}
} else {
if (full_gc_count != total_full_collections()) {
return;
}
}
}
(Originally, I was thinking only about full-gc, i.e. systemgc and _wb_full_gc
, but it also seems reasonable to provide such guarantee, a gc-cycle has run, for _wb_young_gc
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we do that in a follow up CR. For parallel (and I guess for Serial) is that easy, for G1 it requires a bit more code movements.
Thanks, @albertnetymk and @tschatzl for the reviews! /integrate |
Going to push as commit 4a9f8ef.
Your commit was automatically rebased without conflicts. |
Hi All,
Please review this change to guarantee that at least a Full GC is executed between the invocation and return of an explicit Full GC call, even if the call is concurrent with an active
GCLocker
. We specify explicit GCs as GCs triggered by the end user in some form (jcmd, System.GC, or WhiteBox testing).The change should also handle the issues reported in JDK-8299276.
Split into 3 commits, one commit for changes to each GC in [G1, Parallel, Serial].
Testing: Tier 1-5.
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/13191/head:pull/13191
$ git checkout pull/13191
Update a local copy of the PR:
$ git checkout pull/13191
$ git pull https://git.openjdk.org/jdk.git pull/13191/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 13191
View PR using the GUI difftool:
$ git pr show -t 13191
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/13191.diff
Webrev
Link to Webrev Comment