Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8308341: JNI_GetCreatedJavaVMs returns a partially initialized JVM #14139

Closed

Conversation

dholmes-ora
Copy link
Member

@dholmes-ora dholmes-ora commented May 25, 2023

We now track the in-progress and completed states of VM creation and only return a VM from JNI_GetCreatedJavaVMs when there is a fully initialized VM.

Testing:

  • new regression test
  • tiers 1-3 (sanity)

Thanks


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change requires CSR request JDK-8308816 to be approved

Issues

  • JDK-8308341: JNI_GetCreatedJavaVMs returns a partially initialized JVM
  • JDK-8308816: JNI_GetCreatedJavaVMs returns a partially initialized JVM (CSR)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14139/head:pull/14139
$ git checkout pull/14139

Update a local copy of the PR:
$ git checkout pull/14139
$ git pull https://git.openjdk.org/jdk.git pull/14139/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 14139

View PR using the GUI difftool:
$ git pr show -t 14139

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14139.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented May 25, 2023

👋 Welcome back dholmes! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added csr Pull request needs approved CSR before integration rfr Pull request is ready for review labels May 25, 2023
@openjdk
Copy link

openjdk bot commented May 25, 2023

@dholmes-ora The following labels will be automatically applied to this pull request:

  • build
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added build build-dev@openjdk.org hotspot hotspot-dev@openjdk.org labels May 25, 2023
@mlbridge
Copy link

mlbridge bot commented May 25, 2023

Webrevs

Copy link
Contributor

@jdksjolen jdksjolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, over all looks good, but I have some questions regarding the tests and I wonder whether you missed a case.

src/hotspot/share/prims/jni.cpp Show resolved Hide resolved
// We can be called by native libraries in the JDK during VM
// initialization, so only bail-out if something seems very wrong.
// Though how would we get here in that case?
if (vm_created == NOT_CREATED) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also handle IN_PROGRESS?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my comment was not clear enough. :) It is normal to get here while IN_PROGRESS due to calls from the JDK native code during initialization. So we only consider it an error to call then when NOT_CREATED.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sloppy reading from me! That makes sense, and the code doing the work (L3960-L3965) seems to take all precautions necessary to ensure that nothing goes wrong if we're IN_PROGRESS.

Just a minor suggestion: "during VM initialization" could be "while VM initialization is in progress" to mirror the language in the enum, but that's up to you.

@@ -0,0 +1,104 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this test behave even worse than it already does and have you caused a crash with this test before applying your changes? If it has caused a crash, then I think we're OK with the current form of the test.

Two ideas to make it more likely to crash:

  1. Have more threads racing
  2. Let's say you spawn more threads, is it possible for printf to prevent data races as it's MT-Safe? More threads would increase likelihood of contention on the implicit lock, leading to some threads having been significantly slowed down by having to perform a more expensive locking procedure, is my thought process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test crashes an unpatched VM. We only need the two threads. One will initiate the creation of the VM and the other then tries to attach to it. We want the second thread to call GetCreatedJavaVMs very quickly to make it more likely the VM is not initialized, but not so quickly that GetCreatedJavaVMs reports zero VMs (regardless of the patch). The time it takes the main thread to create the second thread seems to handle that nicely. Of course it depends on number of CPUs etc but on my local machine the test, and the original version from the JBS issue, reliably crashes every time before the fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, then I'm very happy with this.

@dholmes-ora
Copy link
Member Author

Thanks for looking at this @jdksjolen !

@openjdk openjdk bot removed the csr Pull request needs approved CSR before integration label May 26, 2023
Copy link
Contributor

@jdksjolen jdksjolen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this patch, everything looks good to me )including the build changes).

// We can be called by native libraries in the JDK during VM
// initialization, so only bail-out if something seems very wrong.
// Though how would we get here in that case?
if (vm_created == NOT_CREATED) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sloppy reading from me! That makes sense, and the code doing the work (L3960-L3965) seems to take all precautions necessary to ensure that nothing goes wrong if we're IN_PROGRESS.

Just a minor suggestion: "during VM initialization" could be "while VM initialization is in progress" to mirror the language in the enum, but that's up to you.

@@ -0,0 +1,104 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, then I'm very happy with this.

@openjdk
Copy link

openjdk bot commented May 26, 2023

@dholmes-ora This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8308341: JNI_GetCreatedJavaVMs returns a partially initialized JVM

Reviewed-by: jsjolen, gziemski

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 326 new commits pushed to the master branch:

  • fb0b1f0: 8051725: Improve expansion of Conv2B nodes in the middle-end
  • 3eced01: 8304074: [JMX] Add an approximation of total bytes allocated on the Java heap by the JVM
  • 15e0285: 8309110: Build failure after JDK-8307795 due to warnings in micro-benchmark StoreMaskTrueCount.java
  • 4526282: 8308977: gtest:codestrings fails on riscv
  • f600d03: 8307795: AArch64: Optimize VectorMask.truecount() on Neon
  • 07f2070: 8309095: Remove UTF-8 character from TaskbarPositionTest.java
  • 2b186e2: 8309042: MemorySegment::reinterpret cleanup action is not called for all overloads
  • 78aac24: 8308881: Strong CLD oop handle roots are demoted to non-roots concurrently
  • 1f1f604: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack
  • d35a550: 8309077: Problemlist compiler/jvmci/TestUncaughtErrorInCompileMethod.java
  • ... and 316 more: https://git.openjdk.org/jdk/compare/d9052b946682d1c0f2629455d73fe4e6b95b29db...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 26, 2023
Copy link

@gerard-ziemski gerard-ziemski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me.

I have one question though: why wouldn't it be enough to move vm_created = 1 from where we had it before, down to where we now have vm_created = COMPLETE ?

I'm not 100% sure why we need the intermediate step IN_PROGRESS

/*
* @test
* @library /test/lib
* @requires os.family == "Linux"
Copy link

@gerard-ziemski gerard-ziemski May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it Linux only?

Does ProcessTools.createNativeTestProcessBuilder("GetCreatedJavaVMs"); only work on Linux?

Filed https://bugs.openjdk.org/browse/CODETOOLS-7903477 to have it implemented on macOS.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was Linux only primarily because I copied another test. It should work on any platform that uses pthreads - so not Windows. I can expand the makefile logic to build the test code on all platforms.

@dholmes-ora
Copy link
Member Author

I have one question though: why wouldn't it be enough to move vm_created = 1 from where we had it before, down to where we now have vm_created = COMPLETE ?

Because we need to prevent two threads from concurrently loading and initializing a VM in the same process.

@dholmes-ora
Copy link
Member Author

Thanks for the review @jdksjolen

@dholmes-ora
Copy link
Member Author

Thanks for looking at this @gerard-ziemski . The test now runs on all non-Windows platforms.

@gerard-ziemski
Copy link

I have one question though: why wouldn't it be enough to move vm_created = 1 from where we had it before, down to where we now have vm_created = COMPLETE ?

Because we need to prevent two threads from concurrently loading and initializing a VM in the same process.

Wouldn't it be less complex to use a lock or some "init once" mechanism (pthread_once) instead of a 3 stage atomic field? Do we really need to know whether the process is in the middle of initialization for any reason other than whether it's actually done?

Just to clarify - I'm OK with the fix you have proposed, I was just curious if you considered any other alternatives.

@dholmes-ora
Copy link
Member Author

Thanks for the review @gerard-ziemski

Just to clarify - I'm OK with the fix you have proposed, I was just curious if you considered any other alternatives.

Atomics are the simplest cross-platform solution available at this stage of VM init (which is "nothing is initialized yet") - even Atomic use is limited to Atomic::xchg as we can't require any stubs. We have no VM locks available and anything external to the VM (like pthread_once) would require as OS-specific solution in shared code.

@dholmes-ora
Copy link
Member Author

I'm aware from #14209 that this new test will fail on AIX but as it will need the same fix as being discussed there I'd rather patch it once we know we have a cross-platform solution. I will file a follow up bug for that (unless we can fix it in #14209.)

Thanks for the reviews.

/integrate

@openjdk
Copy link

openjdk bot commented May 30, 2023

Going to push as commit 1e6770f.
Since your change was applied there have been 335 commits pushed to the master branch:

  • cb40db0: 8309134: Augment test/langtools/tools/javac/versions/Versions.java for JDK 21 language changes
  • de7fd1c: 8307944: ClassFileDumper should only load java.nio.file.Path if enabled
  • 7891de3: 8297885: misc sun/security/pkcs11 tests timed out
  • 323d6ce: 8308960: Decouple internal Version and OperatingSystem classes
  • 1b8e6bf: 8308987: Update java.lang.Class to use javadoc snippets
  • 04b0e78: 8307648: java/net/httpclient/ExpectContinueTest.java timed out
  • 6b90b05: 8297878: KEM: Implementation
  • 21af8ba: 8290499: new File(parent, "/") breaks normalization – creates File with slash at the end
  • 804f198: 8308992: New test TestHFA fails with zero
  • fb0b1f0: 8051725: Improve expansion of Conv2B nodes in the middle-end
  • ... and 325 more: https://git.openjdk.org/jdk/compare/d9052b946682d1c0f2629455d73fe4e6b95b29db...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label May 30, 2023
@openjdk openjdk bot closed this May 30, 2023
@openjdk openjdk bot removed the ready Pull request is ready to be integrated label May 30, 2023
@openjdk openjdk bot removed the rfr Pull request is ready for review label May 30, 2023
@openjdk
Copy link

openjdk bot commented May 30, 2023

@dholmes-ora Pushed as commit 1e6770f.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@dholmes-ora dholmes-ora deleted the 8308341-created-javavm branch May 30, 2023 22:47
@gerard-ziemski
Copy link

Thanks for the review @gerard-ziemski

Just to clarify - I'm OK with the fix you have proposed, I was just curious if you considered any other alternatives.

Atomics are the simplest cross-platform solution available at this stage of VM init (which is "nothing is initialized yet") - even >Atomic use is limited to Atomic::xchg as we can't require any stubs. We have no VM locks available and anything external to the >VM (like pthread_once) would require as OS-specific solution in shared code.

I guess it's a compromise either way. You have chosen to use an OS independent mechanism, at the cost of exposing the implementation to the outside world, by introducing a new stage (needs CSR). I think, I would have prefer (with what I know at this time) with keeping the implementation details hidden from outside (no need for CSR), at the cost of using platform dependent locks here, or some sort of initialize once mechanism.

@dholmes-ora
Copy link
Member Author

You have chosen to use an OS independent mechanism, at the cost of exposing the implementation to the outside world, by introducing a new stage (needs CSR).

That is not an accurate characterisation of this change. The CSR request is needed because of the change in behaviour it introduces (by only returning a fully initialized VM), and has nothing at all to do with the implementation. We still need to be able to distinguish when VM creation has started (to prevent concurrent attempt) and when it has completed (so GetCreatedJavaVMs can return a valid VM) - the mechanism by which that is achieved is immaterial.

@gerard-ziemski
Copy link

You have chosen to use an OS independent mechanism, at the cost of exposing the implementation to the outside world, >by introducing a new stage (needs CSR).

That is not an accurate characterisation of this change. The CSR request is needed because of the change in behaviour it introduces (by only returning a fully initialized VM), and has nothing at all to do with the implementation. We still need to be able to distinguish when VM creation has started (to prevent concurrent attempt) and when it has completed (so GetCreatedJavaVMs can return a valid VM) - the mechanism by which that is achieved is immaterial.

Sorry, I should have read https://bugs.openjdk.org/browse/JDK-8308816 instead of assuming the impact of this change on the outside.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build build-dev@openjdk.org hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

3 participants