Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8259808: Add JFR event to detect GC locker stall #2088

Closed
wants to merge 6 commits into from

Conversation

D-D-H
Copy link
Contributor

@D-D-H D-D-H commented Jan 15, 2021

GC locker will stall the operation of GC, resulting in some Java threads can not continue to run until GC locker is released, thus affecting the response time of the application. Add a JFR event to report this information is helpful to detect this issue.

For the test purpose, I add two Whitebox methods to lock/unlock critical.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/2088/head:pull/2088
$ git checkout pull/2088

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Jan 15, 2021

👋 Welcome back ddong! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

@openjdk openjdk bot commented Jan 15, 2021

@D-D-H The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot rfr labels Jan 15, 2021
@D-D-H
Copy link
Contributor Author

@D-D-H D-D-H commented Jan 15, 2021

/label add hotspot-jfr

@openjdk openjdk bot added the hotspot-jfr label Jan 15, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Jan 15, 2021

@D-D-H
The hotspot-jfr label was successfully added.

@mlbridge
Copy link

@mlbridge mlbridge bot commented Jan 15, 2021

@D-D-H
Copy link
Contributor Author

@D-D-H D-D-H commented Jan 15, 2021

/label add hotspot-gc

@openjdk openjdk bot added the hotspot-gc label Jan 15, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Jan 15, 2021

@D-D-H
The hotspot-gc label was successfully added.

@D-D-H
Copy link
Contributor Author

@D-D-H D-D-H commented Jan 15, 2021

GC locker will stall the operation of GC, resulting in some Java threads can not continue to run until GC locker is released, thus affecting the response time of the application. Add a JFR event to report this information is helpful to detect this issue.

For the test purpose, I add two Whitebox methods to lock/unlock critical.

@D-D-H
Copy link
Contributor Author

@D-D-H D-D-H commented Jan 18, 2021

Greetings,
please help review this patch:)

Thanks

Copy link
Member

@stefank stefank left a comment

Not a review, but a few comments about what probably needs to be cleaned up before a proper review starts.

src/hotspot/share/gc/shared/gcLocker.cpp Outdated Show resolved Hide resolved
src/hotspot/share/jfr/metadata/metadata.xml Outdated Show resolved Hide resolved
src/hotspot/share/prims/whitebox.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/shared/gcLocker.cpp Outdated Show resolved Hide resolved
src/hotspot/share/utilities/ticks.hpp Outdated Show resolved Hide resolved
@D-D-H
Copy link
Contributor Author

@D-D-H D-D-H commented Jan 18, 2021

Refactored.

Testing: jdk/jfr all passed.

Copy link
Contributor

@kstefanj kstefanj left a comment

Some comments.

src/hotspot/share/gc/shared/gcTraceSend.cpp Show resolved Hide resolved
src/hotspot/share/gc/shared/gcTraceSend.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/shared/gcTraceSend.cpp Show resolved Hide resolved
src/hotspot/share/gc/shared/gcTraceSend.cpp Show resolved Hide resolved
@D-D-H
Copy link
Contributor Author

@D-D-H D-D-H commented Jan 22, 2021

Some comments.

Thanks for the review :)

Copy link
Contributor

@kstefanj kstefanj left a comment

I think this looks good now but please await a second reviewer.

I took if for a spin in out internal testing and tier1-2 looks ok as well as the JFR tests.

@openjdk
Copy link

@openjdk openjdk bot commented Jan 25, 2021

@D-D-H This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8259808: Add JFR event to detect GC locker stall

Reviewed-by: sjohanss, tschatzl, egahlin

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 194 new commits pushed to the master branch:

  • f353fcf: 8258894: C2: Forbid GCM to move stores into loops
  • ac276bb: 8257074: Update the ByteBuffers micro benchmark
  • 7ed591c: 8260314: Replace border="1" on tables with CSS
  • e696baa: 8260448: Simplify ManagementFactory$PlatformMBeanFinder
  • b3c8a52: 8259050: Error recovery in lexer could be improved
  • bf15c70: 8260460: GitHub actions still fail on Linux x86_32 with "Could not configure libc6:i386"
  • 3e4194c: 8260022: [ppc] os::print_function_and_library_name shall resolve function descriptors transparently
  • fa40a96: 8253420: Refactor HeapRegionManager::find_highest_free
  • 4d004c9: 8260449: Remove stale declaration of SATBMarkQueue::apply_closure_and_empty()
  • fd2641e: 8260236: better init AnnotationCollector _contended_group
  • ... and 184 more: https://git.openjdk.java.net/jdk/compare/ae9187d757b6ed585d85c6e66105ca20bebe7bc7...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@stefank, @kstefanj, @tschatzl, @egahlin) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready label Jan 25, 2021

import sun.hotspot.WhiteBox;

/**
Copy link
Contributor

@tschatzl tschatzl Jan 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block should be the first thing in the test after the copyright notice.

Copy link
Contributor Author

@D-D-H D-D-H Jan 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. But I notice that there are many other tests that didn't comply with this rule.

for (var i = 0; i < STALL_THREAD_COUNT; i++) {
ts[i] = new Thread(() -> {
STALL_COUNT_SIGNAL.countDown();
for (int j = 0; j < LOOP; j++) {
Copy link
Contributor

@tschatzl tschatzl Jan 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the test already uses WhiteBox, please use whitebox to trigger a gc instead of this dodgy method.

Copy link
Contributor Author

@D-D-H D-D-H Jan 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Triggering a GC is not enough, I hope these threads could be stalled by the GC locker(call GCLocker::stall_until_clear) so that a correct assertion of the number of stall count could be added.
I think it could not be done by WhiteBox::youngGC/fullGC, please correct me if I'm wrong.

Copy link
Contributor

@tschatzl tschatzl Jan 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, although this is more a theoretical concern since it's not checked. Let's keep this for now as is though.

@@ -97,6 +98,7 @@ bool GCLocker::check_active_before_gc() {
if (is_active() && !_needs_gc) {
verify_critical_count();
_needs_gc = true;
GCLockerTracer::start_gc_locker(_jni_lock_count);
Copy link
Contributor

@tschatzl tschatzl Jan 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really convinced that passing _jni_lock_count here gives a lot of information: this is the number of threads in a critical section at the point of the first thread needing a gc. It's probably better than nothing. At least this information should be added to the description of the event (if that is possible).

Copy link
Contributor Author

@D-D-H D-D-H Jan 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this field can be used to judge whether there are many threads that are often in a critical section, but I am not sure if it really helps to analyze the problem., and just as you said, it's better than nothing. An appropriate description of this field has been added.

@@ -0,0 +1,126 @@
/*
* Copyright (c) 2021 Alibaba Group Holding Limited. All Rights Reserved.
Copy link
Contributor

@tschatzl tschatzl Jan 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to keep with the general format of copyright messages in other code, i.e. "Copyright (c) , . ..."? I.e. if possible please add a comma after the year.

Copy link
Contributor Author

@D-D-H D-D-H Jan 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -1090,6 +1090,11 @@
<Field type="boolean" name="onOutOfMemoryError" label="On Out of Memory Error" />
</Event>

<Event name="GCLocker" category="Java Virtual Machine, GC, Detailed" description="GC Locker Information" label="GC Locker" startTime="true" thread="true" stackTrace="true">
<Field type="uint" name="lockCount" label="Lock Count" />
<Field type="uint" name="stallCount" label="Stall Count" />
Copy link
Contributor

@tschatzl tschatzl Jan 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add descriptions to the fields as mentioned above.

Copy link
Contributor Author

@D-D-H D-D-H Jan 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@@ -1091,8 +1091,8 @@
</Event>

<Event name="GCLocker" category="Java Virtual Machine, GC, Detailed" description="GC Locker Information" label="GC Locker" startTime="true" thread="true" stackTrace="true">
<Field type="uint" name="lockCount" label="Lock Count" />
<Field type="uint" name="stallCount" label="Stall Count" />
<Field type="uint" name="lockCount" label="Lock Count" description="the number of java threads in a critical section when the GC locker is started" />
Copy link
Member

@egahlin egahlin Jan 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest changing this to:

"The number of Java threads in a critical section when the GC locker is started"
"The number of Java threads stalled by the GC locker"

Copy link
Contributor Author

@D-D-H D-D-H Jan 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.

@@ -1091,8 +1091,8 @@
</Event>

<Event name="GCLocker" category="Java Virtual Machine, GC, Detailed" description="GC Locker Information" label="GC Locker" startTime="true" thread="true" stackTrace="true">
Copy link
Member

@egahlin egahlin Jan 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"GC Locker Information" is not very useful. Remove the description completely or provide more information.

Copy link
Contributor Author

@D-D-H D-D-H Jan 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed.

@D-D-H
Copy link
Contributor Author

@D-D-H D-D-H commented Jan 27, 2021

/integrate

@openjdk openjdk bot added the sponsor label Jan 27, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Jan 27, 2021

@D-D-H
Your change (at version 7efac60) is now ready to be sponsored by a Committer.

@kstefanj
Copy link
Contributor

@kstefanj kstefanj commented Jan 27, 2021

Since @tschatzl requested changes yesterday I will wait for him to sponsor this.

Copy link
Contributor

@tschatzl tschatzl left a comment

Looks good.

@tschatzl
Copy link
Contributor

@tschatzl tschatzl commented Jan 27, 2021

/sponsor

@openjdk openjdk bot closed this Jan 27, 2021
@openjdk openjdk bot added integrated and removed sponsor ready rfr labels Jan 27, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Jan 27, 2021

@tschatzl @D-D-H Since your change was applied there have been 194 commits pushed to the master branch:

  • f353fcf: 8258894: C2: Forbid GCM to move stores into loops
  • ac276bb: 8257074: Update the ByteBuffers micro benchmark
  • 7ed591c: 8260314: Replace border="1" on tables with CSS
  • e696baa: 8260448: Simplify ManagementFactory$PlatformMBeanFinder
  • b3c8a52: 8259050: Error recovery in lexer could be improved
  • bf15c70: 8260460: GitHub actions still fail on Linux x86_32 with "Could not configure libc6:i386"
  • 3e4194c: 8260022: [ppc] os::print_function_and_library_name shall resolve function descriptors transparently
  • fa40a96: 8253420: Refactor HeapRegionManager::find_highest_free
  • 4d004c9: 8260449: Remove stale declaration of SATBMarkQueue::apply_closure_and_empty()
  • fd2641e: 8260236: better init AnnotationCollector _contended_group
  • ... and 184 more: https://git.openjdk.java.net/jdk/compare/ae9187d757b6ed585d85c6e66105ca20bebe7bc7...master

Your commit was automatically rebased without conflicts.

Pushed as commit 311a0a9.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@D-D-H D-D-H deleted the jfr_gclocker branch Feb 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-gc hotspot-jfr integrated
5 participants