Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8300727: java/awt/List/ListGarbageCollectionTest/AwtListGarbageCollectionTest.java failed with "List wasn't garbage collected" #12594

Closed
wants to merge 2 commits into from

Conversation

aivanov-jdk
Copy link
Member

@aivanov-jdk aivanov-jdk commented Feb 16, 2023

The test has become unstable recently, there were quite a few failures, on Windows mostly. I was lucky enough to find a host where the test failed consistently.

I call System.gc() directly as suggested in comments to the bug. I used PhantomReference instead of WeakReference.

Now the test calls System.gc() in a loop and waits for the reference to be enqueued. In majority of cases, the test exits the loop at the second attempt.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8300727: java/awt/List/ListGarbageCollectionTest/AwtListGarbageCollectionTest.java failed with "List wasn't garbage collected"

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/12594/head:pull/12594
$ git checkout pull/12594

Update a local copy of the PR:
$ git checkout pull/12594
$ git pull https://git.openjdk.org/jdk pull/12594/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 12594

View PR using the GUI difftool:
$ git pr show -t 12594

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12594.diff

…tionTest.java failed with "List wasn't garbage collected"
@bridgekeeper
Copy link

bridgekeeper bot commented Feb 16, 2023

👋 Welcome back aivanov! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Feb 16, 2023
@openjdk
Copy link

openjdk bot commented Feb 16, 2023

@aivanov-jdk The following label will be automatically applied to this pull request:

  • client

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the client client-libs-dev@openjdk.org label Feb 16, 2023
@mlbridge
Copy link

mlbridge bot commented Feb 16, 2023

Webrevs

Reference<? extends List> ref;
do {
System.out.println("Attempt " + count);
System.gc();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I remember the call of System.gc() could be ignored, and the only way to trigger GC is to trigger the OOM. Probably we can combine referenceQueue.remove(ENQUEUE_TIMEOUT) + OOM?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this reason, GC logs are enabled. If, for whatever reason, the call to System.gc() is ignored, we'll see it in the test log. Then a specific GC could be selected, for example, or another fix implemented.

OOME does not guarantee a full GC cycle either.

Having the above in mind, I'd rather keep it simple.

Copy link
Member

@mrserb mrserb Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this reason, GC logs are enabled. If, for whatever reason, the call to System.gc() is ignored, we'll see it in the test log. Then a specific GC could be selected, for example, or another fix implemented.

As of now this test can be executed with different GC, and some of them can skip System.gc().

OOME does not guarantee a full GC cycle either.

But it guarantee that at least some GC will be always run, unlike System.gc().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the piece of advice in the JBS comment.

If using System.gc() is good enough for testing references:

boolean enqueued = false;
System.gc();
for (int i = 0; i < iterations; i++) {
System.gc();
enqueued = (queue.remove(100) == ref);
if (enqueued) break;
}

// Delete root -> O1, collect, verify P1 notified, P2 not notified.
O1 = null;
System.gc();
if (Q1.remove(ENQUEUE_TIMEOUT) == null) {
throw new RuntimeException("P1 not notified by O1 deletion");
} else if (Q2.remove(ENQUEUE_TIMEOUT) != null) {
throw new RuntimeException("P2 notified by O1 deletion.");
}

Then it should be good enough for this test too.

Perhaps, the same effect could be achieved by causing OOME in a loop. However, using System.gc() makes the intention clearer: run GC, wait for the phantom reference to be cleared and enqueued.

I can explicitly select a GC: -XX:+UseG1GC (default) or -XX:+UseSerialGC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are lots of tests that rely on System.gc() actually triggering a gc.
SFAIK all collectors we have today will obey it unless you use
-XX:+DisableExplicitGC
If you can make the test work with that then that would be interesting but calling just System.gc() is no worse than all those other tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many client tests generate OOM for that, we have a special Util, see the usage of Util.generateOOME() or various implementation of generateOOME.

If the OOM + sleep(to give the GC a chance to clean the weakrefs in case of slow systems) does not work, then could it be considerred as a GC bug?

OutOfMemoryError: The Java Virtual Machine implementation has run out of either virtual or physical memory, and the automatic storage manager was unable to reclaim enough memory to satisfy an object creation request.
https://docs.oracle.com/javase/specs/jvms/se19/html/jvms-6.html#jvms-6.3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I saw this method, and it periodically calls System.gc().

I still don't understand your concern. How is generating OutOfMemoryError better than calling System.gc()?

Calling System.gc() conveys the intent in a clearer way. No OOME is required.

@mrserb
Copy link
Member

mrserb commented Feb 26, 2023

Yes, I saw this method, and it periodically calls System.gc().

It generates OOM to make sure that gc will be called, unlike System.gc() which per the spec can be ignored.

I still don't understand your concern. How is generating OutOfMemoryError better than calling System.gc()?

That method uses both, which I suggested doing at the start. It is better to follow one approach than implement different patterns here and there.

@aivanov-jdk
Copy link
Member Author

Yes, I saw this method, and it periodically calls System.gc().

It generates OOM to make sure that gc will be called, unlike System.gc() which per the spec can be ignored.

Can be but it's not ignored, it's respected.

I provided you at least two tests which solely rely on System.gc() and they don't fail.

I still don't understand your concern. How is generating OutOfMemoryError better than calling System.gc()?

That method uses both, which I suggested doing at the start.

You didn't suggest, at least explicitly, using Util.generateOOME.

Nevertheless, I think generating OOME is redundant, it adds another level of complexity and makes the test slower.

The flag -Xmx100m can be safely removed from the test now: it doesn't affect how quick the reference object is cleared.

It is better to follow one approach than implement different patterns here and there.

Perhaps, the time has come to change the approach: ditch generating OOME and just use System.gc() . We've always done this, but it doesn't mean it's the best approach, does it?

If the developers write tests to verify Reference objects are cleared without generating OOME by relying only on System.gc(), I see no reason why client-libs shouldn't follow the same pattern. The old tests which use Util.generateOOME could be retrofitted to the new pattern.

@mrserb
Copy link
Member

mrserb commented Feb 27, 2023

Can be but it's not ignored, it's respected.

It could be ignored per the specification.

Nevertheless, I think generating OOME is redundant, it adds another level of complexity and makes the test slower.

If it is redundant then let's update all other tests which generate OOM right now and prove it. it would be better than having different approaches here and there.

@aivanov-jdk
Copy link
Member Author

Can be but it's not ignored, it's respected.

It could be ignored per the specification.

Yet it's not.

Nevertheless, I think generating OOME is redundant, it adds another level of complexity and makes the test slower.

If it is redundant then let's update all other tests which generate OOM right now and prove it. it would be better than having different approaches here and there.

I would absolutely explore this route and would be happy to update other tests but this is out of scope for this issue. The java/awt/List/ListGarbageCollectionTest/AwtListGarbageCollectionTest.java test often fails in the CI, which creates noise. It can be problem-listed, yet I'd rather avoid it by providing a timely fix which resolves the failure.

@prrace
Copy link
Contributor

prrace commented Feb 28, 2023

I'm OK with the fix.
If the default collector started ignoring System.gc() it would trigger lots of test failures which would make
it a JDK-wide problem to discuss first.

This one test doesn't make things markedly worse, so let's approve it and you can start a discussion on
core-libs-dev about this and perhaps later on hotspot-gc-dev.

@openjdk
Copy link

openjdk bot commented Feb 28, 2023

@aivanov-jdk This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8300727: java/awt/List/ListGarbageCollectionTest/AwtListGarbageCollectionTest.java failed with "List wasn't garbage collected"

Reviewed-by: prr, tr, serb

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 310 new commits pushed to the master branch:

  • a7e308a: 8303576: addIdentitiesToKeystore in KeystoreImpl.m needs CFRelease call in early potential CHECK_NULL return
  • dd79410: 8303509: Socket setTrafficClass does not work for IPv4 connections when IPv6 enabled
  • dc523a5: 8300258: C2: vectorization fails on simple ByteBuffer loop
  • 5e232cf: 8303564: C2: "Bad graph detected in build_loop_late" after a CMove is wrongly split thru phi
  • 8cfd74f: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter
  • 02875e7: 8171156: Class java.util.LocaleISOData has outdated information for country Code NP
  • ad326fc: 8299570: [JVMCI] Insufficient error handling when CodeBuffer is exhausted
  • 05ceb37: 8303833: java.util.LocaleISOData has wrong comments for 'Norwegian Bokmål' and 'Volapük'
  • 25de222: 8303839: [BACKOUT] JDK-8302799 and JDK-8302189
  • 5b43804: 8282201: Consider removal of expiry check in VerifyCACerts.java test
  • ... and 300 more: https://git.openjdk.org/jdk/compare/28f5250fa5cfd7938bb0899a2c17847b7458536c...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Feb 28, 2023
Copy link
Contributor

@TejeshR13 TejeshR13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@mrserb
Copy link
Member

mrserb commented Mar 2, 2023

If the default collector started ignoring System.gc() it would trigger lots of test failures which would make

It is up to you, but I disagree. There was an agreement in the team to use one approach to trigger the gc, that approach was used in most of our tests, and was used in this test as well.

I would absolutely explore this route and would be happy to update other tests but this is out of scope for this issue.

I agree that it is out of the scope of this issue, which actually fixed by adding the "retry" step to the test. The bug is unrelated to the replacing of OOM by the System.gc(). When why did you change it?

@aivanov-jdk
Copy link
Member Author

If the default collector started ignoring System.gc() it would trigger lots of test failures which would make

It is up to you, but I disagree. There was an agreement in the team to use one approach to trigger the gc, that approach was used in most of our tests, and was used in this test as well.

Do you agree that using only System.gc is more concise than generating OOME?

If System.gc is enough for testing Reference classes, why does the client libs team need to generate OOME? Our tests use the Reference classes to check if an object is garbage-collected.

So far, your argument is that this test is now different from other tests which employ both System.gc and OOME.

I would absolutely explore this route and would be happy to update other tests but this is out of scope for this issue.

I agree that it is out of the scope of this issue, which actually fixed by adding the "retry" step to the test. The bug is unrelated to the replacing of OOM by the System.gc(). When why did you change it?

I already answered this question several times. I followed the piece of advice in the JBS comment: “This is not the way to provoke a full GC in a test reliably. Just use System.gc(). See e.g. test/jdk/java/lang/ref/PhantomReferentClearing.java.”

That's what I did. Using System.gc() proved to be successful and, in my opinion, it looks cleaner than generating OOME.

I had a handful of attempts which still failed.

@mrserb
Copy link
Member

mrserb commented Mar 3, 2023

Do you agree that using only System.gc is more concise than generating OOME?
If System.gc is enough for testing Reference classes, why does the client libs team need to generate OOME? Our tests use the Reference classes to check if an object is garbage-collected.

I prefer to use OOM just based on the text of the specification I referred above. As per the spec, both should trigger some cleanup, but System.gc could be ignored. If we would like to change that approach then let's add a retry to this test first. Maybe the test will continue to fail if we will add just a retry. Isn't it suspicious that it started to fail now? and then replace the usage/implementation of generateOOM method by system.gc() everywhere.

I already answered this question several times. I followed the piece of advice in the JBS comment: “This is not the way to provoke a full GC in a test reliably. Just use System.gc(). See e.g. test/jdk/java/lang/ref/PhantomReferentClearing.java.”

That's what I did. Using System.gc() proved to be successful and, in my opinion, it looks cleaner than generating OOME.

But why we did not ask why "This is not the way to provoke a full GC in a test reliably"? It does not work as specified?

@aivanov-jdk
Copy link
Member Author

Eventually, core-libs have their own helper method in jdk.test.lib.util.ForceGC. I replaced my custom waiter with ForceGC.

I asked a question on core-libs-dev and hotspot-gc-dev. Two approaches were suggested: ForceGC and WhiteBox. I chose ForceGC because it is simpler and easier to use.

Stuart Marks provided a detailed reply with a couple of points:

“I'd strongly recommend using this [ForceGC] in preference to allocating a lot of memory in order to provoke OutOfMemoryError. That technique has historically been a cause of test flakiness, and it still is, as you've discovered.”

“It's true that System.gc() may sometimes be ignored -- for instance if Epsilon GC is enabled -- but for practical purposes, on Hotspot using a standard collector, calling it will eventually cause garbage collection and reference processing.”

Thomas also advises against using OOME:

“I recommend to not use OOME for anything unless in extremely specific situations (test OOME is triggered or something). At/after OOME the VM is in a very precarious state that can give unexpected VM bailouts.”


The latest revision uses a standard helper ForceGC class which is used in a number of tests.

Copy link
Member

@mrserb mrserb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage of ForceGC looks fine.

@aivanov-jdk
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented Mar 13, 2023

Going to push as commit f835aaa.
Since your change was applied there have been 366 commits pushed to the master branch:

  • 466ffeb: 8303965: java.net.http.HttpClient should reset the stream if response headers contain malformed header fields
  • 431e702: 8303213: Avoid AtomicReference in TextComponentPrintable
  • 4cf4c59: 8303824: Parallel: Use more strict card table API
  • 8e41bf2: 8303238: Create generalizations for existing LShift ideal transforms
  • 805a4e6: 8303883: Confusing parameter name in G1UpdateRemSetTrackingBeforeRebuild::distribute_marked_bytes
  • 25e7ac2: 8294966: Convert jdk.jartool/sun.tools.jar.FingerPrint to use the ClassFile API to parse JAR entries
  • 3018b47: 8303969: Limit printed failures within an object during G1 heap verification
  • b575e54: 8303963: Replace various encodings of UINT/SIZE_MAX in gc code
  • c183fce: 8300926: Several startup regressions ~6-70% in 21-b6 all platforms
  • 31e1e39: 8303646: [JVMCI] Add possibility to lookup ResolvedJavaType from jclass.
  • ... and 356 more: https://git.openjdk.org/jdk/compare/28f5250fa5cfd7938bb0899a2c17847b7458536c...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Mar 13, 2023
@openjdk openjdk bot closed this Mar 13, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Mar 13, 2023
@openjdk
Copy link

openjdk bot commented Mar 13, 2023

@aivanov-jdk Pushed as commit f835aaa.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@aivanov-jdk aivanov-jdk deleted the 8300727-gcList branch September 11, 2023 18:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client client-libs-dev@openjdk.org integrated Pull request has been integrated
4 participants