Skip to content

Conversation

@shipilev
Copy link
Member

@shipilev shipilev commented Jul 25, 2024

While testing unrelated Shenandoah patch, I caught a GC assert when Leak Profiler was running (JDK-8337194).

Leak Profiler is notorious in using the mark words for its own needs. We have been trying to mitigate its impact on GCs by moving to separate bitsets for tracking marked objects, or by treating "marked without fwdptr" as "JFR marked" and handling it. But this is not reliable, since things like putting indexes in mark word sneak in. This is okay for Leak Profiler alone, since it restores the mark words after the operation completes, but that is still not enough when GC is already running.

I say we side-step this whack-a-mole by cleanly bailing from JFR op, when we know it is unsafe to do. I thought to use VM_Operation::doit_prologue, but I think GC start may sneak in between checking in prologue and op start.

This realistically only affects Shenandoah. All other STW collectors would never see what Leak Profiler did with mark words. ZGC would not see it, since it does not care about mark words for its own operation.

Additional testing:

  • jdk_jfr pass by default
  • jdk_jfr now passes with -XX:+UseShenandoah

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8279016: JFR Leak Profiler is broken with Shenandoah (Enhancement - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/20328/head:pull/20328
$ git checkout pull/20328

Update a local copy of the PR:
$ git checkout pull/20328
$ git pull https://git.openjdk.org/jdk.git pull/20328/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 20328

View PR using the GUI difftool:
$ git pr show -t 20328

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/20328.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 25, 2024

👋 Welcome back shade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jul 25, 2024

@shipilev This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8279016: JFR Leak Profiler is broken with Shenandoah

Reviewed-by: egahlin, rkennke, mgronlun, wkemper

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 40 new commits pushed to the master branch:

  • 6811a11: 8341408: Implement JEP 488: Primitive Types in Patterns, instanceof, and switch (Second Preview)
  • 72a45dd: 8341834: C2 compilation fails with "bad AD file" due to Replicate
  • 57c3bb6: 8343068: C2: CastX2P Ideal transformation not always applied
  • 83f3d42: 8339303: C2: dead node after failing to match cloned address expression
  • ead0116: 8331341: secondary_super_cache does not scale well: C1 and interpreter
  • 06d8216: 8318442: java/net/httpclient/ManyRequests2.java fails intermittently on Linux
  • bdd6816: 8343502: RISC-V: SIGBUS in updateBytesCRC32 after JDK-8339738
  • 4431852: 8342943: Replace predicate walking and cloning code for main/post loops with a predicate visitor
  • 1b0281d: 8333427: langtools/tools/javac/newlines/NewLineTest.java is failing on Japanese Windows
  • 471f112: 8342577: Clean up JVMTI breakpoint support
  • ... and 30 more: https://git.openjdk.org/jdk/compare/23fa1a33274d279a53fa6dde683900450561957b...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk
Copy link

openjdk bot commented Jul 25, 2024

@shipilev The following label will be automatically applied to this pull request:

  • hotspot-jfr

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-jfr hotspot-jfr-dev@openjdk.org label Jul 25, 2024
Fix

Make JFR tests more reliable with new behavior with Shenandoah

More precisely, only skip when Shenandoah already has forwarded objects

Revert "More precisely, only skip when Shenandoah already has forwarded objects"

This reverts commit 403824d.
@shipilev shipilev force-pushed the JDK-8279016-jfr-leak-profiler-shenandoah branch from 403824d to 0219ad6 Compare July 25, 2024 13:59
@shipilev shipilev marked this pull request as ready for review July 25, 2024 14:21
@openjdk openjdk bot added the rfr Pull request is ready for review label Jul 25, 2024
@shipilev
Copy link
Member Author

/label add shenandoah

@openjdk openjdk bot added the shenandoah shenandoah-dev@openjdk.org label Jul 25, 2024
@openjdk
Copy link

openjdk bot commented Jul 25, 2024

@shipilev
The shenandoah label was successfully added.

@mlbridge
Copy link

mlbridge bot commented Jul 25, 2024

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jul 26, 2024
@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Jul 26, 2024
Copy link
Contributor

@rkennke rkennke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it ok to simply skip PathToGcRootsOperation? AFAICT, this is (only) used in EventEmitter::emit(..), to emit events with reference chains. What is the consequence of not doing so during GC?
Also, why does JFR see from-space objects to begin with? This should not be allowed. Does JFR use raw loads of references to figure out chains? If so, should it use the proper Access API instead? If not - how does it see from-space refs?

@shipilev
Copy link
Member Author

shipilev commented Jul 29, 2024

Why is it ok to simply skip PathToGcRootsOperation? AFAICT, this is (only) used in EventEmitter::emit(..), to emit events with reference chains. What is the consequence of not doing so during GC?

I think this op is opportunistic, and we bail in the similar way we bail on other conditions in the same method.

Also, why does JFR see from-space objects to begin with? This should not be allowed. Does JFR use raw loads of references to figure out chains? If so, should it use the proper Access API instead? If not - how does it see from-space refs?

I don't think JFR sees from-space refs. What happens that JFR sees a to-space object (maybe already passed through LRB, since we can be in evac), tags it in markword, and that tag starts to look like a forwarding pointer to Shenandoah. So when JFR code goes around and does LRB on that already-to-space object, LRB gets confused. This is why JDK-8337194 shows "Multiple forwardings" as the failure mode.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jul 29, 2024
@ysramakrishna
Copy link
Member

Leak Profiler is notorious in using the mark words for its own needs.

It seems to already use external storage for some of the leak profiling work. Can it be taught -- at least when Shenandoah is being used -- to just use external storage for its book-keeping/tagging as well to avoid this interference?

@shipilev
Copy link
Member Author

Leak Profiler is notorious in using the mark words for its own needs.

It seems to already use external storage for some of the leak profiling work. Can it be taught -- at least when Shenandoah is being used -- to just use external storage for its book-keeping/tagging as well to avoid this interference?

Yes, but as I mentioned in PR body, this is a continuous whack-a-mole game. At this point, I believe bailing from unsafe modes is a saner tactics.

@rkennke
Copy link
Contributor

rkennke commented Jul 30, 2024

Leak Profiler is notorious in using the mark words for its own needs.

It seems to already use external storage for some of the leak profiling work. Can it be taught -- at least when Shenandoah is being used -- to just use external storage for its book-keeping/tagging as well to avoid this interference?

Yes, but as I mentioned in PR body, this is a continuous whack-a-mole game. At this point, I believe bailing from unsafe modes is a saner tactics.

The 'unsafe' part is that JFR meddles with the mark-word, and it can be argued that it shouldn't be any of JFR's business to do so. Avoiding to touch the mark-word altogether would be the sanest tactic, IMO. JFR could use a bitmap for 'marking' objects, pretty much like is done in JVMTI for very similar purpose. (See

if (!_bitset->is_marked(obj)) visit_stack()->push(obj);
) The ObjectBitSet that is used there is well-suited for that purpose because it only allocates chunks as needed, and thus avoids allocating a full bitmap that covers all of the heap.

@shipilev
Copy link
Member Author

Nope.

/open

@openjdk openjdk bot reopened this Sep 24, 2024
@openjdk
Copy link

openjdk bot commented Sep 24, 2024

@shipilev This pull request is now open

@shipilev
Copy link
Member Author

shipilev commented Oct 24, 2024

I redid the PR to summarily disable Leak Profiler with Shenandoah. This patch makes sure enabling JFR with Shenandoah does not break the VM. Moving forward, we would probably rewrite Leak Profiler to avoid dependency on mark words (https://bugs.openjdk.org/browse/JDK-8342951), but that would be a larger endeavor, and I would like to have a reliable VM sooner :)

@shipilev
Copy link
Member Author

shipilev commented Nov 5, 2024

Local testing passes. Please re-review!

@egahlin
Copy link
Member

egahlin commented Nov 5, 2024

I'm not sure why all these tests are run with Shenandoah (or any specific GC). The purpose of these unit tests is to check the Leak Profiler implementation, for example, that the object age is written correctly or that array information is serialized properly. It doesn't matter which GC, compiler, etc. is being used.

When the JFR tests were initially written, the purpose of the jtreg "jfr" tag was to filter out the JFR tests so they don't receive external flags. Since then, I think vm.flagless has been added. It may be more appropriate (or not?).

If the interaction with a certain GC needs to be tested, it's better to write a dedicated test for that, like TestG1.java and TestZ.java. If such a test doesn't work, it can be put on the ProblemList.

@shipilev
Copy link
Member Author

shipilev commented Nov 5, 2024

I'm not sure why all these tests are run with Shenandoah (or any specific GC).

It is common to run tests with specific VM options overridden/amended for extensive testing. For example, make test TEST=jdk_jfr TEST_VM_OPTS=-XX:+UseShenandoahGC. This is why OpenJDK test suites generally accept @requires filters that can test if we are running in a particular configuration the test is not supposed to work in.

@shipilev
Copy link
Member Author

shipilev commented Nov 5, 2024

I can redo this for Shenandoah-specific problem lists, for sure. ZGC does GC-specific problem lists, Shenandoah can do some as well. But we will always have to remember to add new LeakProfiler tests there.

@shipilev
Copy link
Member Author

shipilev commented Nov 5, 2024

I can redo this for Shenandoah-specific problem lists, for sure. ZGC does GC-specific problem lists, Shenandoah can do some as well. But we will always have to remember to add new LeakProfiler tests there.

Did so, see new commit.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Nov 5, 2024
Copy link
Contributor

@rkennke rkennke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks!

@shipilev
Copy link
Member Author

shipilev commented Nov 6, 2024

@mgronlun -- are you good with the current version?

@mgronlun
Copy link

mgronlun commented Nov 6, 2024

@mgronlun -- are you good with the current version?

Ok. Thank you.

@shipilev
Copy link
Member Author

shipilev commented Nov 6, 2024

Thanks!

/integrate

@openjdk
Copy link

openjdk bot commented Nov 6, 2024

Going to push as commit 0be7118.
Since your change was applied there have been 40 commits pushed to the master branch:

  • 6811a11: 8341408: Implement JEP 488: Primitive Types in Patterns, instanceof, and switch (Second Preview)
  • 72a45dd: 8341834: C2 compilation fails with "bad AD file" due to Replicate
  • 57c3bb6: 8343068: C2: CastX2P Ideal transformation not always applied
  • 83f3d42: 8339303: C2: dead node after failing to match cloned address expression
  • ead0116: 8331341: secondary_super_cache does not scale well: C1 and interpreter
  • 06d8216: 8318442: java/net/httpclient/ManyRequests2.java fails intermittently on Linux
  • bdd6816: 8343502: RISC-V: SIGBUS in updateBytesCRC32 after JDK-8339738
  • 4431852: 8342943: Replace predicate walking and cloning code for main/post loops with a predicate visitor
  • 1b0281d: 8333427: langtools/tools/javac/newlines/NewLineTest.java is failing on Japanese Windows
  • 471f112: 8342577: Clean up JVMTI breakpoint support
  • ... and 30 more: https://git.openjdk.org/jdk/compare/23fa1a33274d279a53fa6dde683900450561957b...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Nov 6, 2024
@openjdk openjdk bot closed this Nov 6, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Nov 6, 2024
@openjdk
Copy link

openjdk bot commented Nov 6, 2024

@shipilev Pushed as commit 0be7118.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@shipilev shipilev deleted the JDK-8279016-jfr-leak-profiler-shenandoah branch January 8, 2025 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-jfr hotspot-jfr-dev@openjdk.org integrated Pull request has been integrated shenandoah shenandoah-dev@openjdk.org

Development

Successfully merging this pull request may close these issues.

6 participants