Skip to content

Conversation

@pengxiaolong
Copy link

@pengxiaolong pengxiaolong commented Mar 21, 2025

Root cause

Shenandoah has its own way to generate gc id(link, link), but when it runs a specific GC cycle, it still use the default GCIdMark(link) to generate a gc id and set it to NamedThread::_gc_id. Once the specific GC cycle finishes, the NamedThread::_gc_id is restored to the original value which is undefined, which causes the asserts when Enabling JFR, in release build it should cause invalid GC id in some of JFR events.

Solution

it is confusing that Shenandoah generates its own gc id but not use it for GC logging and JFR, the solution is fairly simple, the control thread just need inject gc id with GCIdMark(gc_id) it generates in ShenandoahControlThread::run_service and ShenandoahGenerationalControlThread::run_gc_cycle

In the test, I also noticed the value of gc_id generated by Shenandoah control thread starts from 1, which is different from the default behavior of GCIdMark which generates id starting from 0, this PR will also fix it.

Test

  • TEST=gc/shenandoah/TestWithLogLevel.java TEST_VM_OPTS="-XX:StartFlightRecording"
  • TEST=hotspot_gc_shenandoah
  • GHA

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8352588: GenShen: Enabling JFR asserts when getting GCId (Bug - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24166/head:pull/24166
$ git checkout pull/24166

Update a local copy of the PR:
$ git checkout pull/24166
$ git pull https://git.openjdk.org/jdk.git pull/24166/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24166

View PR using the GUI difftool:
$ git pr show -t 24166

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24166.diff

Using Webrev

Link to Webrev Comment

@pengxiaolong
Copy link
Author

/issue JDK-8352588

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 21, 2025

👋 Welcome back xpeng! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Mar 21, 2025

@pengxiaolong This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8352588: GenShen: Enabling JFR asserts when getting GCId

Reviewed-by: wkemper, ysr

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 68 new commits pushed to the master branch:

  • dbc620f: 8352299: GenShen: Young cycles that interrupt old cycles cannot be cancelled
  • f5a0db4: 8315447: Invalid Type Annotation attached to a method instead of a lambda
  • 60544a1: 8164714: Constructor.newInstance creates instance of inner class with null outer class
  • c856b34: 8352587: C2 SuperWord: we must avoid Multiversioning for PeelMainPost loops
  • 993eae4: 8346948: Update CLDR to Version 47.0
  • e98838f: 8352065: [PPC64] C2: Implement PopCountVL, CountLeadingZerosV and CountTrailingZerosV nodes
  • 03105fc: 8351601: [JMH] test UnixSocketChannelReadWrite failed for 2 threads config
  • fe03e2e: 8351897: Extra closing curly brace typos in Javadoc
  • fa0b18b: 8352509: Update jdk.test.lib.SecurityTools jar method to accept List parameter
  • 3ac9678: 8351224: Deprecate com.sun.tools.attach.AttachPermission for removal
  • ... and 58 more: https://git.openjdk.org/jdk/compare/06ba6cf3a137a6cdf572a876a46d18e51c248451...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@earthling-amzn, @ysramakrishna) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk
Copy link

openjdk bot commented Mar 21, 2025

@pengxiaolong This issue is referenced in the PR title - it will now be updated.

@openjdk
Copy link

openjdk bot commented Mar 21, 2025

@pengxiaolong The following labels will be automatically applied to this pull request:

  • hotspot-gc
  • shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added hotspot-gc hotspot-gc-dev@openjdk.org shenandoah shenandoah-dev@openjdk.org labels Mar 21, 2025
@pengxiaolong pengxiaolong marked this pull request as ready for review March 21, 2025 22:03
@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 21, 2025
@mlbridge
Copy link

mlbridge bot commented Mar 21, 2025

Webrevs

return Atomic::load(&_gc_count);
}

size_t ShenandoahController::get_gc_id() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to keep this method? Can't everything just use get_gc_count now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to keep it, it needs a bit more changes to touch up. I think it better to remove it to avoid the confusion with the gc_id() method,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed method ShenandoahController::get_gc_id() in the update.

"At end of Concurrent Young GC";
if (_heap->collection_set()->has_old_regions()) {
mmu_tracker->record_mixed(get_gc_id());
mmu_tracker->record_mixed(gc_id());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be get_gc_count now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we always use gc id for MMUTracker? Although the internal gc counter of Shenandoah is also fine here.

I'm ok to change it back to get_gc_count, but will also update the declaration of the relevant methods like below to make them consistent:

void record_global(size_t gc_count)

@pengxiaolong
Copy link
Author

Removed all code related to the refactor of henandoahController::_gc_id, now the change should be a pure fix for the bug.

@ysramakrishna
Copy link
Member

ysramakrishna commented Mar 24, 2025

I haven't started reviewing, but in cases where we have a "mark" (a thread local stack scoped constant variable, such as used for logging etc.) and an under;ying "true value", the expectation is that the "mark" is a snapshot of the "true", and represents a label for the work being done in that specific scope. Once you keep this model/idiom in mind, the code should become clean, and the same 0-based conventions should cleanly apply.

I hope to review the code soon'ish. Sorry for the delay.

@ysramakrishna
Copy link
Member

Once the specific GC cycle finishes, the NamedThread::_gc_id is restored to the original value which is undefined, which causes the asserts when Enabling JFR, in release build it should cause invalid GC id in some of JFR events.

This would be by design and, as you discovered, was because a suitable GCIdMark scope was missing which would have supplied the correct ID. It is important that the JFR event issues from the intended scope for the corresponding ID for which the metrics/event are being generated. In particular, if there are multiple concurrent GC ID's in progress, with a common pool of worker threads that multiplex this work, any appropriate event metrics should be correctly attributed to the right ID in question.

I am making general comments here without knowledge of the specific details, sorry! :-)

@pengxiaolong
Copy link
Author

pengxiaolong commented Mar 24, 2025

Once the specific GC cycle finishes, the NamedThread::_gc_id is restored to the original value which is undefined, which causes the asserts when Enabling JFR, in release build it should cause invalid GC id in some of JFR events.

This would be by design and, as you discovered, was because a suitable GCIdMark scope was missing which would have supplied the correct ID. It is important that the JFR event issues from the intended scope for the corresponding ID for which the metrics/event are being generated. In particular, if there are multiple concurrent GC ID's in progress, with a common pool of worker threads that multiplex this work, any appropriate event metrics should be correctly attributed to the right ID in question.

I am making general comments here without knowledge of the specific details, sorry! :-)

Thank you @ysramakrishna for reviewing the PR, appreciate it!

Yes, it is a simple bug related to the GCIdMark scope, so the fix is to make sure GCIdMark scope is correct. For common pool of worker threads, each thread should copy the gc_id to local with the constructor GCIdMark(gc_id), there some existing examples doing this in hotspot, e.g. https://github.com/openjdk/jdk/blob/master/src/hotspot/share/gc/shared/workerThread.cpp#L68

Copy link
Contributor

@earthling-amzn earthling-amzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 25, 2025
Copy link
Member

@ysramakrishna ysramakrishna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@pengxiaolong
Copy link
Author

/integrate

@pengxiaolong
Copy link
Author

Thanks for the reviews and suggestions!

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Mar 25, 2025
@openjdk
Copy link

openjdk bot commented Mar 25, 2025

@pengxiaolong
Your change (at version 57c43ef) is now ready to be sponsored by a Committer.

@phohensee
Copy link
Member

/sponsor

@openjdk
Copy link

openjdk bot commented Mar 26, 2025

Going to push as commit a2a64da.
Since your change was applied there have been 84 commits pushed to the master branch:

  • 79bffe2: 8349361: C2: RShiftL should support all applicable transformations that RShiftI does
  • eef6aef: 8352623: MultiExchange should cancel exchange impl if responseFilters throws
  • e2a461b: 8351332: Line breaks in search tag descriptions corrupt JSON search index
  • c14bbea: 8352740: Introduce new factory method HtmlTree.IMG
  • 84d3dc7: 8352965: [BACKOUT] 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure
  • b4dc364: 8346931: Replace divisions by zero in sharedRuntimeTrans.cpp
  • bc5cde1: 8352692: Add support for extra jlink options
  • 059f190: 8352490: Fatal error message for unhandled bytecode needs more detail
  • ee710fe: 8345169: Implement JEP 503: Remove the 32-bit x86 Port
  • eb6e828: 8351002: com/sun/management/OperatingSystemMXBean cpuLoad tests fail intermittently
  • ... and 74 more: https://git.openjdk.org/jdk/compare/06ba6cf3a137a6cdf572a876a46d18e51c248451...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Mar 26, 2025
@openjdk openjdk bot closed this Mar 26, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Mar 26, 2025
@openjdk
Copy link

openjdk bot commented Mar 26, 2025

@phohensee @pengxiaolong Pushed as commit a2a64da.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated shenandoah shenandoah-dev@openjdk.org

Development

Successfully merging this pull request may close these issues.

4 participants