Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8264640: CMS ParScanClosure misses a barrier #165

Closed
wants to merge 1 commit into from

Conversation

@AntonKozlov
Copy link
Member

@AntonKozlov AntonKozlov commented Apr 2, 2021

Hi, please review an original fix for a GC crash. The jdk13u is the latest supported version that still has buggy code, it was deleted in jdk14 as a part of JEP 363: Remove the Concurrent Mark Sweep (CMS) Garbage Collector. So I'm proposing it here.

The fix is low-risk, on x86-64 it just introduces a compiler barrier to prevent two reads to be reordered as intended by surrounding comments. On CPUs with weaker memory models it introduces CPU barriers as well.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk13u-dev pull/165/head:pull/165
$ git checkout pull/165

Update a local copy of the PR:
$ git checkout pull/165
$ git pull https://git.openjdk.java.net/jdk13u-dev pull/165/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 165

View PR using the GUI difftool:
$ git pr show -t 165

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk13u-dev/pull/165.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Apr 2, 2021

👋 Welcome back akozlov! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label Apr 2, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Apr 2, 2021

Webrevs

@mlbridge
Copy link

@mlbridge mlbridge bot commented Apr 2, 2021

Mailing list message from Anton Kozlov on jdk-updates-dev:

Adding hotspot-gc-dev. It will be great to receive comments from GC experts, even the fix does not make sense for mainline jdk.

Thanks,
Anton

On 4/2/21 11:51 AM, Anton Kozlov wrote:

@mlbridge
Copy link

@mlbridge mlbridge bot commented Apr 8, 2021

Mailing list message from John Cuthbertson on jdk-updates-dev:

Hi Anton,

This looks good to me. I think I?m still a reviewer for the jdk-updates project.

For the benefit of everyone else...

We were seeing this as a crash when obtaining the size of an object to be copied. The klass was observed to be transiently NULL. We found that the object, reached through another reference path, had already been copied and the from-space oop placed on the task queue for subsequent reference field scanning. The task queue, however, had overflowed and the from-space oop was placed on the shared overflow queue where objects are chained together through their klass field. If the reads are ordered as they are in the code then everything is OK as per the comment at line 105 (in ParScanClosure::do_oop_work) but we found that gcc had reordered the reads in the non-compressed oops case. So the mark word is read and the object is observed to not forwarded (yet). Then, via another reference path, the object is copied, forwarded, and placed on the overflow task queue ? over writing the from-space object?s klass. Then in the original path the klass is read and observed to be NULL or the next overflow entry ? leading to the crash. When the from-space oop is dequeued, its klass is restored ? which is what was observed in the core file.

Using worker thread local queues, -XX:+ParGCUseLocalOverflow, seems to workaround the problem.

Thanks,

John Cuthbertson

On Apr 2, 2021, at 2:02 AM, Anton Kozlov <akozlov at azul.com> wrote:

Adding hotspot-gc-dev. It will be great to receive comments from GC experts, even the fix does not make sense for mainline jdk.

Thanks,
Anton

On 4/2/21 11:51 AM, Anton Kozlov wrote:

@mlbridge
Copy link

@mlbridge mlbridge bot commented Apr 9, 2021

Mailing list message from Anton Kozlov on jdk-updates-dev:

John, thank you for review and the comment!

Thanks,
Anton

On 4/8/21 9:33 PM, John Cuthbertson wrote:

Hi Anton,

This looks good to me. I think I?m still a reviewer for the jdk-updates project.

For the benefit of everyone else...

We were seeing this as a crash when obtaining the size of an object to be copied. The klass was observed to be transiently NULL. We found that the object, reached through another reference path, had already been copied and the from-space oop placed on the task queue for subsequent reference field scanning. The task queue, however, had overflowed and the from-space oop was placed on the shared overflow queue where objects are chained together through their klass field. If the reads are ordered as they are in the code then everything is OK as per the comment at line 105 (in ParScanClosure::do_oop_work) but we found that gcc had reordered the reads in the non-compressed oops case. So the mark word is read and the object is observed to not forwarded (yet). Then, via another reference path, the object is copied, forwarded, and placed on the overflow task queue ? over writing the from-space object?s klass. Then in the original path the klass is read and observed to be NULL or the next overflow entry ? leading to the crash. When the from-space oop is dequeued, its klass is restored ? which is what was observed in the core file.

Using worker thread local queues, -XX:+ParGCUseLocalOverflow, seems to workaround the problem.

Thanks,

John Cuthbertson

On Apr 2, 2021, at 2:02 AM, Anton Kozlov <akozlov at azul.com> wrote:

Adding hotspot-gc-dev. It will be great to receive comments from GC experts, even the fix does not make sense for mainline jdk.

Thanks,
Anton

On 4/2/21 11:51 AM, Anton Kozlov wrote:

@AntonKozlov
Copy link
Member Author

@AntonKozlov AntonKozlov commented Apr 9, 2021

/reviewer credit johnc

@openjdk
Copy link

@openjdk openjdk bot commented Apr 9, 2021

@AntonKozlov
Reviewer johnc successfully credited.

@yan-too
yan-too approved these changes Apr 9, 2021
Copy link
Collaborator

@yan-too yan-too left a comment

After John, it's easy to say lgtm!

@openjdk
Copy link

@openjdk openjdk bot commented Apr 9, 2021

@AntonKozlov This change now passes all automated pre-integration checks.

After integration, the commit message for the final commit will be:

8264640: CMS ParScanClosure misses a barrier

Reviewed-by: yan, johnc

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 12 new commits pushed to the master branch:

  • f1e4e0b: 8257988: Remove JNF dependency from libsaproc/MacosxDebuggerLocal.m
  • 2e9b3a0: 8260616: Removing remaining JNF dependencies in the java.desktop module
  • df2818b: 8259869: [macOS] Remove desktop module dependencies on JNF Reference APIs
  • 47ec3b8: 8259651: [macOS] Replace JNF_COCOA_ENTER/EXIT macros
  • cdb993e: 8259343: [macOS] Update JNI error handling in Cocoa code.
  • ad45c98: 8257853: Remove dependencies on JNF's JNI utility functions in AWT and 2D code
  • 98baf88: 8240487: Cleanup whitespace in .cc, .hh, .m, and .mm files
  • 4d5f2ab: 8261585: Restore HandleArea used in Deoptimization::uncommon_trap
  • 9d0d73e: 8257858: [macOS]: Remove JNF dependency from libosxsecurity/KeystoreImpl.m
  • 039003d: 8256501: libTestMainKeyWindow fails to build with Xcode 12.2
  • ... and 2 more: https://git.openjdk.java.net/jdk13u-dev/compare/fecca4ec49033d4f084aeea6cfdf6a187c6a0ea9...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@yan-too) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready label Apr 9, 2021
@AntonKozlov
Copy link
Member Author

@AntonKozlov AntonKozlov commented Apr 9, 2021

Yura, thanks!

/integrate

@openjdk openjdk bot added the sponsor label Apr 9, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Apr 9, 2021

@AntonKozlov
Your change (at version 4cff5d5) is now ready to be sponsored by a Committer.

@yan-too
Copy link
Collaborator

@yan-too yan-too commented Apr 9, 2021

/sponsor

@openjdk openjdk bot closed this Apr 9, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Apr 9, 2021

@yan-too @AntonKozlov Since your change was applied there have been 12 commits pushed to the master branch:

  • f1e4e0b: 8257988: Remove JNF dependency from libsaproc/MacosxDebuggerLocal.m
  • 2e9b3a0: 8260616: Removing remaining JNF dependencies in the java.desktop module
  • df2818b: 8259869: [macOS] Remove desktop module dependencies on JNF Reference APIs
  • 47ec3b8: 8259651: [macOS] Replace JNF_COCOA_ENTER/EXIT macros
  • cdb993e: 8259343: [macOS] Update JNI error handling in Cocoa code.
  • ad45c98: 8257853: Remove dependencies on JNF's JNI utility functions in AWT and 2D code
  • 98baf88: 8240487: Cleanup whitespace in .cc, .hh, .m, and .mm files
  • 4d5f2ab: 8261585: Restore HandleArea used in Deoptimization::uncommon_trap
  • 9d0d73e: 8257858: [macOS]: Remove JNF dependency from libosxsecurity/KeystoreImpl.m
  • 039003d: 8256501: libTestMainKeyWindow fails to build with Xcode 12.2
  • ... and 2 more: https://git.openjdk.java.net/jdk13u-dev/compare/fecca4ec49033d4f084aeea6cfdf6a187c6a0ea9...master

Your commit was automatically rebased without conflicts.

Pushed as commit efc81a3.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants