Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8266576: dynamicArchive/ParallelLambdaLoadTest.java crashes in tier2 testing #3910

Closed
wants to merge 1 commit into from

Conversation

yminqi
Copy link
Contributor

@yminqi yminqi commented May 7, 2021

Hi, Please review

In dynamic dump, the lambda invoker holder classes are regenerated in DynamicArchive::dump, which is after shutdown hook executed. The returned objects from the regeneration may contain invalid contents which caused crash like in this bug. It is late to execute java code, the fix is to move the call into MetaspaceShared::link_and_cleanup_shared_classes which is before shutdown hook, before halt.
JDK-8266585 and JDK-8266594 failed in different patterns, could be the same reason.

Tests: tier1,tier2

Thanks
Yumin


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8266576: dynamicArchive/ParallelLambdaLoadTest.java crashes in tier2 testing

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3910/head:pull/3910
$ git checkout pull/3910

Update a local copy of the PR:
$ git checkout pull/3910
$ git pull https://git.openjdk.java.net/jdk pull/3910/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 3910

View PR using the GUI difftool:
$ git pr show -t 3910

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3910.diff

@bridgekeeper
Copy link

bridgekeeper bot commented May 7, 2021

👋 Welcome back minqi! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label May 7, 2021
@openjdk
Copy link

openjdk bot commented May 7, 2021

@yminqi To determine the appropriate audience for reviewing this pull request, one or more labels corresponding to different subsystems will normally be applied automatically. However, no automatic labelling rule matches the changes in this pull request. In order to have an "RFR" email sent to the correct mailing list, you will need to add one or more applicable labels manually using the /label pull request command.

Applicable Labels
  • 2d
  • awt
  • beans
  • build
  • compiler
  • core-libs
  • hotspot
  • hotspot-compiler
  • hotspot-gc
  • hotspot-jfr
  • hotspot-runtime
  • i18n
  • javadoc
  • jdk
  • jmx
  • kulla
  • net
  • nio
  • security
  • serviceability
  • shenandoah
  • sound
  • swing

@yminqi
Copy link
Contributor Author

yminqi commented May 7, 2021

/label add hotspot-runtime

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label May 7, 2021
@openjdk
Copy link

openjdk bot commented May 7, 2021

@yminqi
The hotspot-runtime label was successfully added.

@mlbridge
Copy link

mlbridge bot commented May 7, 2021

Webrevs

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Yumin,

So there are three "exit" paths covered by this:

  • System.exit/halt
  • last thread terminates VM
  • -Xshare:dump

so that seems reasonable.

But I see a pre-existing problem that is now worse with this change. In JavaThread::invoke_shutdown_hooks we have this code:

// We could get here with a pending exception, if so clear it now.
if (this->has_pending_exception()) {
this->clear_pending_exception();
}

but that appears after the call to link_and_cleanup_shared_classes, so if in fact there is a pre-existing exception we will return from link_and_cleanup_shared_classes at the first CHECK (which is now on the newly moved code).

In order to get the current crashes fixed we can address the exception issue in a follow up bug.

Thanks,
David

@openjdk
Copy link

openjdk bot commented May 7, 2021

@yminqi This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8266576: dynamicArchive/ParallelLambdaLoadTest.java crashes in tier2 testing

Reviewed-by: dholmes

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 33 new commits pushed to the master branch:

  • e0c8688: 8262992: Improve @see output
  • d2b5350: 8263507: Improve structure of package summary pages
  • a65021e: 8266618: Remove broken -XX:-OptoRemoveUseless
  • 94c6177: 8266536: Provide a variant of os::iso8601_time which works with arbitrary timestamps
  • 71b8ad4: 8266609: AArch64: include FP/LR space in LIR_Assembler::initial_frame_size_in_bytes()
  • ebb68d2: 8049700: Deprecate obsolete classes and methods in javax/swing/plaf/basic
  • 3a474d9: 8265612: revise the help info for jmap histo command
  • c97f56c: 8266172: -Wstringop-overflow happens in vmError.cpp
  • 43ad24f: 8265465: jcmd VM.cds should keep already dumped archive when exception happens
  • 66191ff: 8266193: BasicJMapTest does not include testHistoParallel methods
  • ... and 23 more: https://git.openjdk.java.net/jdk/compare/6018336fc5e93675482b92df76594712c238adda...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 7, 2021
@dcubed-ojdk
Copy link
Member

JDK-8266585 and JDK-8266594 failed in different patterns, could be the same reason.

This is a disturbing statement given the breakage in the CI. Has sufficient testing
been done to determine whether these other two bugs continue to happen or not.

I'm really starting to wonder why the original fix wasn't backed out on Wednesday.

@yminqi
Copy link
Contributor Author

yminqi commented May 7, 2021

JDK-8266585 and JDK-8266594 failed in different patterns, could be the same reason.

This is a disturbing statement given the breakage in the CI. Has sufficient testing
been done to determine whether these other two bugs continue to happen or not.

I'm really starting to wonder why the original fix wasn't backed out on Wednesday.

Hi Yumin,

So there are three "exit" paths covered by this:

  • System.exit/halt
  • last thread terminates VM
  • -Xshare:dump

so that seems reasonable.

But I see a pre-existing problem that is now worse with this change. In JavaThread::invoke_shutdown_hooks we have this code:

// We could get here with a pending exception, if so clear it now.
if (this->has_pending_exception()) {
this->clear_pending_exception();
}

but that appears after the call to link_and_cleanup_shared_classes, so if in fact there is a pre-existing exception we will return from link_and_cleanup_shared_classes at the first CHECK (which is now on the newly moved code).

In order to get the current crashes fixed we can address the exception issue in a follow up bug.

Hi, David
Thanks for review. Yes, if the regeneration of holder classes returned with first CHECK, the exception is cleaned here. So the dump will be finished without finishing the original functions which link_and_cleanup_shared_classes intends to do. This requests LambdaFormInvokers::regnerate_holder_classes suppress all exceptions except for OOM (which exit directly from VM). This part needs an overall checking for various cases. Thanks for catching this.

Yumin

@yminqi
Copy link
Contributor Author

yminqi commented May 7, 2021

JDK-8266585 and JDK-8266594 failed in different patterns, could be the same reason.

This is a disturbing statement given the breakage in the CI. Has sufficient testing
been done to determine whether these other two bugs continue to happen or not.

I'm really starting to wonder why the original fix wasn't backed out on Wednesday.

Hi, Dan The overnight testing never finished, even part of the tests never got started. I did not back out first since the fix is quick, but did not expect the test take so long. Thanks for the reminding.
Yumin

@yminqi
Copy link
Contributor Author

yminqi commented May 7, 2021

There also is concurrent issue at dynamic dump time (offline discussion with Ioi). We could encounter memory stomp when multiple threads adding to the global list, LambdaFormInvokers::_lambdaform_lines. That potentially screws some of the list entry up with inconsistent contents. The list should not grow once the dumping in progress. I will update with a lock to protect the list.

@iklam
Copy link
Member

iklam commented May 7, 2021

JDK-8266585 and JDK-8266594 failed in different patterns, could be the same reason.

This is a disturbing statement given the breakage in the CI. Has sufficient testing
been done to determine whether these other two bugs continue to happen or not.
I'm really starting to wonder why the original fix wasn't backed out on Wednesday.

Hi Yumin,
So there are three "exit" paths covered by this:

  • System.exit/halt
  • last thread terminates VM
  • -Xshare:dump

so that seems reasonable.
But I see a pre-existing problem that is now worse with this change. In JavaThread::invoke_shutdown_hooks we have this code:
// We could get here with a pending exception, if so clear it now.
if (this->has_pending_exception()) {
this->clear_pending_exception();
}
but that appears after the call to link_and_cleanup_shared_classes, so if in fact there is a pre-existing exception we will return from link_and_cleanup_shared_classes at the first CHECK (which is now on the newly moved code).
In order to get the current crashes fixed we can address the exception issue in a follow up bug.

Hi, David
Thanks for review. Yes, if the regeneration of holder classes returned with first CHECK, the exception is cleaned here. So the dump will be finished without finishing the original functions which link_and_cleanup_shared_classes intends to do. This requests LambdaFormInvokers::regnerate_holder_classes suppress all exceptions except for OOM (which exit directly from VM). This part needs an overall checking for various cases. Thanks for catching this.

Yumin

I created https://bugs.openjdk.java.net/browse/JDK-8266770 "Clean pending exception before running dynamic CDS dump"

@yminqi
Copy link
Contributor Author

yminqi commented May 7, 2021

Since the original fix is backed out, closed this PR without further progress, the redo of original bug will take care of this issue.

@yminqi yminqi closed this May 7, 2021
@yminqi yminqi deleted the jdk-8266576 branch May 7, 2021 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org ready Pull request is ready to be integrated rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

4 participants