Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8256641: CDS VM operations do not lock the heap #1661

Conversation

@tschatzl
Copy link
Contributor

@tschatzl tschatzl commented Dec 7, 2020

Hi all,

can I get reviews for this change that adds missing synchronization of CDS related VM operations with other heap operations?

VM_PopulateDumpSharedSpace, VM_PopulateDynamicDumpSharedSpace and VM_Verify are used during CDS operation, one for creating the CDS archive (eventually doing a GC), one for mapping in the CDS archive into the heap, and the last one for verification.

(Fwiw, imho the first two are awfully close and should be renamed to be better distinguishable, but that's another matter)

They all in one way or the other need to synchronize with garbage collection as they may either do a GC or just do verification, as actual (STW-)gc returns an uninitialized block of memory that is not parseable; and before that block of memory can be initialized, another VM operation like one of the mentioned could be started otherwise seeing that uninitialized memory and crashing.

The existing mechanism to prevent this kind of interference is taking the Heap_lock, so the suggested solution is based on having all these VM operations descend from a new VM_GC_Sync_Operation VM_Operation which does that (and only that), split out from VM_GC_Operation.

There some points I would like to bring up in advance in this change that may be contentious:

  • each VM Operation could handle Heap_lock by itself, which I considered to be too error-prone.
  • the need for VM_Verify to coordinate with garbage collections is new and has been introduced with JDK-8253081 as since then a Java thread might execute it - that's why this hasn't been a problem before. That could be undone (removed), but I kind of believe that with more expected changes to the CDS mechanism in the future the additional full-heap verification after loading the archive is worth the additional effort.
    One (implementation) drawback is that since ZGC also uses VM_Verify, that operation now gets the Heap_lock too, and is kind of also using some part of the "set of operations related to GC" in general but did not so before, keeping almost completely separate. Testing did not show an issue, and I tried to look at the code carefully to see whether there could be issues with no result. (I.e. I couldn't find an issue). Obviously I'd like to ask you to look over this again.
  • so this change adds a new VM Operation class called VM_GC_Sync_Operation that splits off the handling of Heap_lock (i.e. the actual synchronizationfromVM_GC_Operation. The reason is that I do not think the logic for the gc VM operation that prevents multiple back-to-back GC operations is a good fit for any of the VM_Populate*or evenVM_Verify` operations.

Testing: tier1-5; test case attached to the CR; other known reproducers (runtime/valhalla/inlinetypes/InlineOops.java in the Valhalla repo)


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/1661/head:pull/1661
$ git checkout pull/1661

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Dec 7, 2020

👋 Welcome back tschatzl! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

@openjdk openjdk bot commented Dec 7, 2020

@tschatzl The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot label Dec 7, 2020
@tschatzl tschatzl force-pushed the 8256641-missing-verification-heap-locking branch from 2316775 to fca60a2 Dec 9, 2020
@tschatzl tschatzl force-pushed the 8256641-missing-verification-heap-locking branch from fca60a2 to 1b5b5a8 Dec 9, 2020
@tschatzl
Copy link
Contributor Author

@tschatzl tschatzl commented Dec 9, 2020

/label add hotspot-gc
/label add hotspot-runtime
/label remove hotspot

@openjdk openjdk bot added the hotspot-gc label Dec 9, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Dec 9, 2020

@tschatzl
The hotspot-gc label was successfully added.

@openjdk
Copy link

@openjdk openjdk bot commented Dec 9, 2020

@tschatzl
The hotspot-runtime label was successfully added.

@openjdk openjdk bot removed the hotspot label Dec 9, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Dec 9, 2020

@tschatzl
The hotspot label was successfully removed.

@tschatzl tschatzl marked this pull request as ready for review Dec 9, 2020
@openjdk openjdk bot added the rfr label Dec 9, 2020
@mlbridge
Copy link

@mlbridge mlbridge bot commented Dec 9, 2020

Webrevs

}

void VM_GC_Sync_Operation::doit_epilogue() {
if (Universe::has_reference_pending_list()) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the pending list handling moved here, rather than remaining in VM_GC_Operation::doit_epilogue? This doesn't have anything to do with syncing between operations, and seems odd for VM_Verify (for example) to do.

Copy link
Contributor Author

@tschatzl tschatzl Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also answering the next question: these two items (i.e. including the prologue_succeeded stuff) have mostly been kept there to allow simple reuse in VM_GC_Operation. I'll remove those and (maybe) just break the inheritance chain.

_prologue_succeeded = true;
}
return _prologue_succeeded;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This invocation checking doesn't seem right at this level. That is, skip_operation and prologue_succeeded all seem to me to have nothing to do with syncing, instead belonging to the VM_GC_Operation level and should remain there.


// Acquire the reference synchronization lock
virtual bool doit_prologue();
// Do notifyAll (if needed) and release held lock

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/notifyAll/notify_all/

Copy link

@kimbarrett kimbarrett left a comment

Code changes look good. A couple of places where comments could use some improvement.

@@ -150,10 +144,11 @@ class VM_GC_Operation: public VM_GC_Sync_Operation {

// Acquire the reference synchronization lock
virtual bool doit_prologue();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does a lot more than just acquiring the lock. It also handles the prevention of multiple gc requests.

@@ -118,6 +110,7 @@ class VM_GC_Operation: public VM_GC_Sync_Operation {
uint _gc_count_before; // gc count before acquiring PLL
uint _full_gc_count_before; // full gc count before acquiring PLL

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pre-existing] "PLL" ? I think that might be obsolete terminology, referring to the "pending list lock"? I think should be Heap_lock now.

@openjdk
Copy link

@openjdk openjdk bot commented Dec 9, 2020

@tschatzl This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8256641: CDS VM operations do not lock the heap

Reviewed-by: kbarrett, iklam

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 30 new commits pushed to the master branch:

  • 0a0691e: 8257901: ZGC: Take virtual memory usage into account when sizing heap
  • 29ffffa: 8257997: sun/security/ssl/SSLSocketImpl/SSLSocketLeak.java again reports leaks after JDK-8257884
  • db5da96: 8257876: Avoid Reference.isEnqueued in tests
  • 4a839e9: 8256459: java/net/httpclient/ManyRequests.java and java/net/httpclient/LineBodyHandlerTest.java fail infrequently with java.net.ConnectException: Connection timed out: no further information
  • d93293f: 8256730: Code that uses Object.checkIndex() range checks doesn't optimize well
  • 869dcb6: 8257806: Optimize x86 allTrue and anyTrue vector mask operations of Vector API
  • 34650f5: 8257872: UL: -Xlog does not check number of options
  • 6847bbb: 8255918: XMLStreamFilterImpl constructor consumes XMLStreamException
  • d2f9e31: 8257638: Update usage of "type" terminology in javax.lang.model
  • f631a99: 8256888: Client manual test problem list update
  • ... and 20 more: https://git.openjdk.java.net/jdk/compare/616b1f12bd60f3d820205d8fdf811abd44c32d98...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Dec 9, 2020
Copy link
Member

@iklam iklam left a comment

The CDS part looks good to me.

I also scanned the GC code and it looks reasonable to me, but I don't understand all the details to give an official review.

iklam
iklam approved these changes Dec 9, 2020
@tschatzl
Copy link
Contributor Author

@tschatzl tschatzl commented Dec 10, 2020

I fixed the mentioned comments but would like to defer further cleanup of the classes, particularly those VM_GC_Operations that do not actually participate in the skipping protocol to JDK-8258029 I filed just now.

@tschatzl
Copy link
Contributor Author

@tschatzl tschatzl commented Dec 11, 2020

This is a change originally meant for JDK16, but the fork has occurred before integration. So re-requesting a pull there

@tschatzl tschatzl closed this Dec 11, 2020
@tschatzl tschatzl deleted the 8256641-missing-verification-heap-locking branch Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants