-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8279016: JFR Leak Profiler is broken with Shenandoah #20328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8279016: JFR Leak Profiler is broken with Shenandoah #20328
Conversation
|
👋 Welcome back shade! A progress list of the required criteria for merging this PR into |
|
@shipilev This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 40 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
403824d to
0219ad6
Compare
|
/label add shenandoah |
|
@shipilev |
Webrevs
|
src/hotspot/share/jfr/leakprofiler/chains/pathToGcRootsOperation.cpp
Outdated
Show resolved
Hide resolved
rkennke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it ok to simply skip PathToGcRootsOperation? AFAICT, this is (only) used in EventEmitter::emit(..), to emit events with reference chains. What is the consequence of not doing so during GC?
Also, why does JFR see from-space objects to begin with? This should not be allowed. Does JFR use raw loads of references to figure out chains? If so, should it use the proper Access API instead? If not - how does it see from-space refs?
I think this op is opportunistic, and we bail in the similar way we bail on other conditions in the same method.
I don't think JFR sees from-space refs. What happens that JFR sees a to-space object (maybe already passed through LRB, since we can be in evac), tags it in markword, and that tag starts to look like a forwarding pointer to Shenandoah. So when JFR code goes around and does LRB on that already-to-space object, LRB gets confused. This is why JDK-8337194 shows "Multiple forwardings" as the failure mode. |
It seems to already use external storage for some of the leak profiling work. Can it be taught -- at least when Shenandoah is being used -- to just use external storage for its book-keeping/tagging as well to avoid this interference? |
Yes, but as I mentioned in PR body, this is a continuous whack-a-mole game. At this point, I believe bailing from unsafe modes is a saner tactics. |
The 'unsafe' part is that JFR meddles with the mark-word, and it can be argued that it shouldn't be any of JFR's business to do so. Avoiding to touch the mark-word altogether would be the sanest tactic, IMO. JFR could use a bitmap for 'marking' objects, pretty much like is done in JVMTI for very similar purpose. (See jdk/src/hotspot/share/prims/jvmtiTagMap.cpp Line 1487 in 156f0b4
|
|
Nope. /open |
|
@shipilev This pull request is now open |
|
I redid the PR to summarily disable Leak Profiler with Shenandoah. This patch makes sure enabling JFR with Shenandoah does not break the VM. Moving forward, we would probably rewrite Leak Profiler to avoid dependency on mark words (https://bugs.openjdk.org/browse/JDK-8342951), but that would be a larger endeavor, and I would like to have a reliable VM sooner :) |
|
Local testing passes. Please re-review! |
|
I'm not sure why all these tests are run with Shenandoah (or any specific GC). The purpose of these unit tests is to check the Leak Profiler implementation, for example, that the object age is written correctly or that array information is serialized properly. It doesn't matter which GC, compiler, etc. is being used. When the JFR tests were initially written, the purpose of the jtreg "jfr" tag was to filter out the JFR tests so they don't receive external flags. Since then, I think vm.flagless has been added. It may be more appropriate (or not?). If the interaction with a certain GC needs to be tested, it's better to write a dedicated test for that, like TestG1.java and TestZ.java. If such a test doesn't work, it can be put on the ProblemList. |
It is common to run tests with specific VM options overridden/amended for extensive testing. For example, |
|
I can redo this for Shenandoah-specific problem lists, for sure. ZGC does GC-specific problem lists, Shenandoah can do some as well. But we will always have to remember to add new LeakProfiler tests there. |
Did so, see new commit. |
rkennke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thanks!
|
@mgronlun -- are you good with the current version? |
Ok. Thank you. |
|
Thanks! /integrate |
|
Going to push as commit 0be7118.
Your commit was automatically rebased without conflicts. |
While testing unrelated Shenandoah patch, I caught a GC assert when Leak Profiler was running (JDK-8337194).
Leak Profiler is notorious in using the mark words for its own needs. We have been trying to mitigate its impact on GCs by moving to separate bitsets for tracking marked objects, or by treating "marked without fwdptr" as "JFR marked" and handling it. But this is not reliable, since things like putting indexes in mark word sneak in. This is okay for Leak Profiler alone, since it restores the mark words after the operation completes, but that is still not enough when GC is already running.
I say we side-step this whack-a-mole by cleanly bailing from JFR op, when we know it is unsafe to do. I thought to use
VM_Operation::doit_prologue, but I think GC start may sneak in between checking in prologue and op start.This realistically only affects Shenandoah. All other STW collectors would never see what Leak Profiler did with mark words. ZGC would not see it, since it does not care about mark words for its own operation.
Additional testing:
jdk_jfrpass by defaultjdk_jfrnow passes with-XX:+UseShenandoahProgress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/20328/head:pull/20328$ git checkout pull/20328Update a local copy of the PR:
$ git checkout pull/20328$ git pull https://git.openjdk.org/jdk.git pull/20328/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 20328View PR using the GUI difftool:
$ git pr show -t 20328Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/20328.diff
Using Webrev
Link to Webrev Comment