Skip to content

8299125: UnifiedOopRef in JFR leakprofiler should treat thread local oops correctly #11741

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

xmas92
Copy link
Member

@xmas92 xmas92 commented Dec 20, 2022

In a similar vain to JDK-8299089 there are different semantics for OopStorage and thread local oops. UnifiedOopRef in the jfr leakprofiler must be able to distinguish these and treat them appropriately.

Testing: Running test at the moment. Has been running on the generational branch of ZGC for over three months. Not many tests exercise the leakprofiler. x86_32 has mostly been tested with GHA.

Note: The accessBackend changes are only there to facilitate comparing OopLoadProxy objects directly with nullptr instead of creating and comparing with an oop() / oop(NULL). Can be backed out of this PR and put in a different RFE instead.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8299125: UnifiedOopRef in JFR leakprofiler should treat thread local oops correctly

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/11741/head:pull/11741
$ git checkout pull/11741

Update a local copy of the PR:
$ git checkout pull/11741
$ git pull https://git.openjdk.org/jdk pull/11741/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 11741

View PR using the GUI difftool:
$ git pr show -t 11741

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/11741.diff

Co-authored-by: Stefan Karlsson <stefank@openjdk.org>
@bridgekeeper
Copy link

bridgekeeper bot commented Dec 20, 2022

👋 Welcome back aboldtch! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 20, 2022
@openjdk
Copy link

openjdk bot commented Dec 20, 2022

@xmas92 The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Dec 20, 2022
@mlbridge
Copy link

mlbridge bot commented Dec 20, 2022

Webrevs

Copy link
Contributor

@fisk fisk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. So we have IN_NATIVE accesses tagged as encode_in_native, and essentially AS_RAW accesses encoded as encode_non_barriered. I usually don't pick on naming, but the term "non-barriered" seems to stray away from already established terminology. I would prefer if the mentions of "non_barriered" changed to "as_raw" for consistency.

@xmas92
Copy link
Member Author

xmas92 commented Jan 2, 2023

Hmm. So we have IN_NATIVE accesses tagged as encode_in_native, and essentially AS_RAW accesses encoded as encode_non_barriered. I usually don't pick on naming, but the term "non-barriered" seems to stray away from already established terminology. I would prefer if the mentions of "non_barriered" changed to "as_raw" for consistency.

I agree, changed the terms used.

Copy link
Contributor

@fisk fisk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@openjdk
Copy link

openjdk bot commented Jan 2, 2023

@xmas92 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8299125: UnifiedOopRef in JFR leakprofiler should treat thread local oops correctly

Reviewed-by: eosterlund, mgronlun

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 171 new commits pushed to the master branch:

  • 8b0133f: 8299259: C2: Div/Mod nodes without zero check could be split through iv phi of loop resulting in SIGFPE
  • d68de02: 8299845: Remove obsolete comments in DirtyCardToOopClosure::get_actual_top
  • debe587: 8298381: Improve handling of session tickets for multiple SSLContexts
  • eab1e62: 8297487: G1 Remark: no need to keep alive oop constants of nmethods on stack
  • c8a8388: 8299789: Compilation of gtest causes build to fail if runtime libraries are in different dirs
  • 5f37cbe: 8297572: Remove unused PrecisionStyle::Precise
  • f95346e: 8299261: Clean up AWT D3D exports
  • 195f313: 8287925: AArch64: intrinsics for compareUnsigned method in Integer and Long
  • 3a66737: 8299525: RISC-V: Add backend support for half float conversion intrinsics
  • b7eb4e2: 8297306: Incorrect brackets in Javadoc for a constructor of IteratorSpliterator
  • ... and 161 more: https://git.openjdk.org/jdk/compare/de8153cab76606350eb0ecc4302b23c52f0565a6...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jan 2, 2023
@mgronlun
Copy link

mgronlun commented Jan 3, 2023

This is introducing some new, very complicated tagging scheme. I don't understand it. What is 0b011 ?

@@ -48,7 +48,7 @@ template <typename Delegate>
void RootSetClosure<Delegate>::do_oop(oop* ref) {
assert(ref != NULL, "invariant");
assert(is_aligned(ref, HeapWordSize), "invariant");
if (*ref != NULL) {
if (NativeAccess<>::oop_load(ref) != nullptr) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GC requires that load barriers to access oops in native storage. Currently the raw load works on all GCs as it only checks for null and all GCs encode null as NULL. But it is semantically wrong, and will not work with generational ZGC where there exist colored null which is not encoded as NULL.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Should not the same construct be used in RawRootClosure? Or is it not needed there because the raw references to oops are not loaded through the Access Api?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raw storage has no GC barriers so a null oop there must be NULL. The RawAccess API is really only used when dealing with narrowOop or templated generic code which can be either. RawAccess<>::oop_load(oop* ref) will just become a *ref. I think I used RawAccess everywhere initially ,but @fisk thought that it should not be used for raw oop*, not sure if there were some other reason behind it than it being unnecessary / obfuscating.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what I might have said is that RawAccess helps with compressed oop conversion and memory ordering sprinklings. If you don't need to do anything special with either of those and you just want a plain dereference of an oop*, then just doing a plain dereference achieves the same thing with a lot less templates. Using RawAccess can make it look like there is more to it than there really is. It's perfectly correct, but maybe just not needed and makes you scratch your head a bit when really there is nothing special to see at all.

@@ -29,21 +29,33 @@
#include "utilities/globalDefinitions.hpp"

struct UnifiedOopRef {
static const uintptr_t tag_mask = LP64_ONLY(0b111) NOT_LP64(0b011);
static const uintptr_t native_tag = 0b001;
static const uintptr_t raw_tag = 0b010;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "raw", compared to "native"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Native is native storage (not in java heap) which requires GC barriers. Raw is raw storage which does not require GC barriers. This is thread local storage, such as the handle area, stack, etc.

@@ -42,47 +42,60 @@ inline T UnifiedOopRef::addr() const {
// this specialization provides a workaround.
template<>
inline uintptr_t UnifiedOopRef::addr<uintptr_t>() const {
return _value & ~uintptr_t(3);
return (_value & ~tag_mask) LP64_ONLY(>> 1);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a shift now as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not remember exactly. But I think there was some ref which on 64 bit which had the third bit set. That is this assert failed:
assert((reinterpret_cast<uintptr_t>(ref) & UnifiedOopRef::tag_mask) == 0, "Unexpected low-order bits");
I have to remove the shift and test again. I think it is strange given that ObjectAlignmentInBytes >= 8 but we added the shift for some reason. Or maybe I am just missing something obvious.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ObjectAlignmentInBytes is 4 by default on 32 bit VMs.

Copy link

@mgronlun mgronlun Jan 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fisk Why are the shifts needed on 64-bits?

Copy link

@mgronlun mgronlun Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the shift is needed in order to handle unaligned oops, as for code blob oops? The existing root processing logic has up to now skipped code blob oops, mainly because they were hard to encode. The rationale was that they are not roots in and of themselves. Now, we have seen several instances where chains are not found on iterations, so maybe the assumption on skipping code blob oops is wrong. With this encoding scheme in place, it would be possible to pass a closure for processing the code blob oops as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the tardy response.

I think I forgot what the code was actually doing at the time of writing my previous answer. We are encoding narrowOop* and oop* here not narrowOop/oop. And narrowOop* are 4 byte aligned.

Copy link

@mgronlun mgronlun Jan 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, makes sense now.

assert(((reinterpret_cast<uintptr_t>(ref) LP64_ONLY(<< 1)) & UnifiedOopRef::tag_mask) == 0, "Unexpected low-order bits");
LP64_ONLY(assert(((reinterpret_cast<uintptr_t>(ref)) & (1ull << 63)) == 0, "Unexpected high-order bit"));
UnifiedOopRef result = { (reinterpret_cast<uintptr_t>(ref) LP64_ONLY(<< 1)) | tag };
assert(result.addr<T>() == ref, "sanity");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is crazy hard to read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Refactoring out reinterpret_cast<uintptr_t>(ref) into a variable might help a bit at least. The assert just want to check that the lowest two bits are not set (the tagging does not destroy information) and that on 64 bit platforms the highest bit is not set (the shift does not destroy information).

Not sure if there is a better way to express those two invariants.

Copy link
Member

@stefank stefank Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be easier to read the code if it was rewritten as (uncompiled):

  uintptr_t value = reinterpret_cast<uintptr_t>(ref);

#ifdef _LP64
  // tag_mask is 3 bits. When ref is a narrowOop* we only have 2 alignment
  // bits, because of the 4 byte alignment of compressed oops addresses.
  // Shift up to make way for one more bit.
  assert((value & (1ull << 63)) == 0, "Unexpected high-order bit");
  value <<= 1;
#endif

  UnifiedOopRef result = { value | tag };
  assert(result.addr<T>() == ref, "sanity");

@@ -1256,6 +1256,14 @@ namespace AccessInternal {
inline bool operator !=(const T& other) const {
return load<decorators | INTERNAL_VALUE_IS_OOP, P, T>(_addr) != other;
}

inline bool operator ==(std::nullptr_t) const {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why std::nullptr_t?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is only here to facilitate being able to compare the OopLoadProxy object directly to nullptr instead of having to create and use an oop() / oop(NULL).

@xmas92
Copy link
Member Author

xmas92 commented Jan 3, 2023

This is introducing some new, very complicated tagging scheme. I don't understand it. What is 0b011 ?

We have three storage/access types in the vm, Native, Raw, Heap. The two lower bits encode this. The only valid low bits are Heap: 0b00, Native: 0b01 and Raw: 0b10. Then each storage area can be use compressed oops or not. Which is the third bit.

0b011 would be an invalid tag.

@mgronlun
Copy link

mgronlun commented Jan 3, 2023

This is introducing some new, very complicated tagging scheme. I don't understand it. What is 0b011 ?

We have three storage/access types in the vm, Native, Raw, Heap. The two lower bits encode this. The only valid low bits are Heap: 0b00, Native: 0b01 and Raw: 0b10. Then each storage area can be use compressed oops or not. Which is the third bit.

0b011 would be an invalid tag.

I see now it is the mask on 32-bit.

@stefank
Copy link
Member

stefank commented Jan 9, 2023

Hmm. So we have IN_NATIVE accesses tagged as encode_in_native, and essentially AS_RAW accesses encoded as encode_non_barriered. I usually don't pick on naming, but the term "non-barriered" seems to stray away from already established terminology. I would prefer if the mentions of "non_barriered" changed to "as_raw" for consistency.

The following is not blocking this review, but could maybe be a starting point for a future discussion:

FWIW, I think that as_raw doesn't explain "why" we are not using a normal IN_NATIVE access. Maybe that's also why Markus said:

What is "raw", compared to "native"?

It's also an "AS" property and not an "IN" property, so it talks about "how" to perform the access and not what kind of root it is. I wanted to hint about what kind of root it was when I chose the name non-barriered.

Taking a step back, ZGC deals with two categories of roots:

  1. Roots that requires the normal GC barriers (OopStorage, Global JNI Handle)
  2. Roots that requires an "externalized" GC barrier to be applied before the root is accessed (Threads, frames, nmethods)

We use different terminology for these, like:
a) colored vs uncolored roots
b) barriered vs non-barriered (Maybe I'm the only one who is using these terms)

(a) is ZGC centric, and (b) tries to be more generic.

Neither of these are good names, but I think it would be beneficial to have good names for these. It would help explain why we can use "raw" access for the latter, and hopefully help the readability of the code. Maybe this is taking it too far, but if we have good names we could split IN_NATIVE into two, and tag each call site appropriately depending on root type.

@xmas92
Copy link
Member Author

xmas92 commented Jan 12, 2023

/integrate

@openjdk
Copy link

openjdk bot commented Jan 12, 2023

Going to push as commit 89d3f3d.
Since your change was applied there have been 208 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jan 12, 2023
@openjdk openjdk bot closed this Jan 12, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jan 12, 2023
@openjdk
Copy link

openjdk bot commented Jan 12, 2023

@xmas92 Pushed as commit 89d3f3d.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants