-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8283710: JVMTI: Use BitSet for object marking #7964
Conversation
👋 Welcome back rkennke! A progress list of the required criteria for merging this PR into |
Webrevs
|
Hi Roman, I have a question about this code, unrelated to your patch. I'd like to make sure I understand it correctly. When walking the heap for JVMTI:
I try to understand why we zero out the upper 62 bits of the mark word at all at (2). It seems to me the JVMTI heap walker does not use those bits. AFAIU its read only, it only reports the objects it did find without modifying them. Would it not be sufficient just to set the lock bits to "marked" and leave the rest of the mark word intact? IIUC the only thing we really need to restore are the lock bits, for locked objects. So whenever lock bits had not been "01". But a single byte would be sufficient for that, no need to save the whole 63bits, or? So _saved_mark_stack and _saved_oop_stack could be made a lot smaller. Another question, what is the effect of age bits getting reset by JVMTI heap walking? Won't that interfere with the next GC? Thanks a a lot, ..Thomas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks straightforward enough.
One question, ObjectMarker::mark() had been a static inline function before, now we call via the new Controller, so it is not inlined and a virtual function call. Won't that cost performance?
Cheers, Thomas
@@ -427,6 +430,8 @@ class CollectedHeap : public CHeapObj<mtGC> { | |||
|
|||
virtual void initialize_serviceability() = 0; | |||
|
|||
virtual ObjectMarker* init_object_marker(); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does creation happen in collectedHeap.cpp, not objectMarker.cpp? And why expose this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention here was to be able for a GC to provide an alternative implementation, especially in the context of Lilliput. See for example here: openjdk/lilliput#32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I get that :) I just wondered whether implementing this with, say, metaprogramming may be faster. E.g. make the controller a template and plugging in a Marker implementation that provides a "mark" function. Which then could be inlined. That way you don't pay for virtual calls at runtime. For a function that is invoked for every object this may be significant.
May also be done as a future RFE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I get that :) I just wondered whether implementing this with, say, metaprogramming may be faster. E.g. make the controller a template and plugging in a Marker implementation that provides a "mark" function. Which then could be inlined. That way you don't pay for virtual calls at runtime. For a function that is invoked for every object this may be significant.
May also be done as a future RFE.
Right. Let me thinker with this idea for a bit.
This is correct.
Yes, that would be sufficient, and actually better:
Absolutely! First step for improving this would be to use preservedMarks.hpp|cpp infrastructure. This would avoid the headache of two distinct lists going out of sync (they don't, really, but brain is used anyway), and improve code sharing (GC code uses this for full-GCs). Second step might be to only preserve mark-bits and make the structure smaller, and avoid thrashing upper bits (although, GCs would overwrite some upper bits with the fwdptr anyway, so may not be worth).
It would make all somewhat-aged objects young again. Yes, it's an interference, but not catastrophic. If we can avoid it easily, then it would be better, though. Thanks, |
Possibly. I don't think this is particularily performance-critical code, though, and calling those methods most likely not what really matters. |
This was not about performance, just about keeping interfaces small. I like to keep things local if possible. |
If all one needs to preserve are the two lock bits, could one hide them in the lowest bit of the oop? Then you'd get away with a single array of 64-bit elements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm finally now looking at this. What I was hoping you'd do instead is have JVMTI not mark objects at all, and use an {oop} hashtable to determine which objects are marked. It already has to do the growable array to restore the marks, so it already costs space.
Use ResourceHashtable in preference to hashtable.hpp Hashtable though.
That might actually be the better solution: it leaves the header alone and would not require any interaction with the GC. I will try that instead. Thank you! |
Hi Coleen,
I don't understand. You would put every visited oop into that hashset? So, every object, if the walker walks the whole heap? Would that not be very expensive? You'd pay at least 16 bytes per object (oop + whatever the table entry costs). Plus, we need a good hash distribution. There would be a cutoff where the Romans bitmap proposal in Lilliput would be cheaper. And the bitmap would have O(1) lookup. |
Wouldn't the hashtable contain only the oops that you wanted to mark? That's not all of them? Maybe it is and it's a bad idea. |
AFAIU the JVMTI heap walk iterates over whole heap, and marks visited oops. So, eventually, that would be all of them. The RA allocated stacks are just for a subset of objects (locked or hashed). |
I'm told by @fisk that there's code in jfr/leakprofiler/chains/bitset.hpp that does some non-gc dependent oop marking. Maybe something like this can be used instead? At one point in rewriting the JvmtiTagMap table we were doing obj->hash() on the oops during the tagging so marking them turned out to be a huge pain. It would be nice not to mark them. |
This is an excellent suggestion! I moved bitset.* to share/utilities and changed JVMTI to use that class instead. It seems to do the job just fine. |
And yes, I will change the title and description text once we settled on how we want to do it. ;-) |
Yes, I think templatizing for MEMFLAGS makes sense (concurrentHashTable.hpp is too). Wow, this came out really well! Thank you @fisk again for the idea. |
Happy to help! |
BTW, one neat solution to the MEMFLAGS problem would be to make it a property of the thread, and stackable. With sort of a MemflagMark. Each Thread could have a property "default MEMFLAGS", set with a MemFlagMark, and that gets used by default with every malloc in this thread. This way we may be able to get rid of a lot (most?) manual MEMFLAG declarations. |
Please don't do this. Further GCs are using I also do think that the use cases for a dense I am also not sure that the automatic lazy backing of the OS isn't sufficient for these use cases (e.g. JVMTI) too. Thanks, |
Hi Thomas,
Yes, you are right. I had some more time to think about this and came to the same conclusion. Also abstracting bitwise getters and setters as I thought originally would probably not be enough.
I'm not sure about this. We often run against the commit charge of the underlying OS, which seems to be more frequent than running out of physical memory. I think voluntary uncommit does make sense. Whether the libc actually uncommits on free() is a different question, but in this case, I think they do, the fragments are large enough. Thanks, Thomas |
The trouble is that ObjectBitSet is a subclass of CHeapObj, which takes a template parameter for memflags itself. How could I avoid templatizing the ObjectBitSet itself? |
Here's an idea. Add an mtflag called mtBitMap and use that. There's plenty of bits in mt flags. |
Naa, I don't think this is justified. It isn't used as widely or frequently to warrant its own flag. |
Plus, it would muddy the water, since what we actually want is the memory accounted toward the subsystem on behalf of which we create the data structure. mtBitMap would be a bit like "mtArray" for GrowableArray. Note that there was the idea, and an associated JBS issue (https://bugs.openjdk.java.net/browse/JDK-8281819) of making NMT categories groupable in a one- or multi-layered hierarchy. That way, one could account to "mtGC" -> "mtBitSet", for instance. For now, I opt to either live with what we have with the typedef'ed BitSet, and leave MEMFLAG cleanups to a later RFE. Alternatively, make the MEMFLAG a runtime parameter of the bitset, use that for the heap allocated fragments, but use "mtInternal" for the actual class. I mean, those are just a few bytes. |
Third alternative would be not to derive from CHeapObj at all, since we don't new it anywhere if I see correctly. |
We must call new on it somewhere. I am not opposed to making this an mtFlags template then. This is a small point in this improvement. |
Yes, at some point we need to allocate the fragments and fragments table. We could perhaps work around it by passing MEMFLAGS as constexpr, but I don't think this would change generated code much. |
(all bikeshedding since we all agree the current version is fine, no?) No, sorry, I meant make BitSet a normal class. Not derived from CHeapObj, since it gets not newed, only used via composition or on a stack. And give it a runtime parm to set the MEMFLAG. So:
and use those flags for the c-heap allocation of the fragments. |
Yes, I get that. But the fragments and fragment-table are themselves inner classes that take a MEMFLAGS template. I could (perhaps) either use a constexpr MEMFLAGS arg and pass this through, or do at some point a switch like:
Which seems kinda-ugly but would work (I think), and avoid making the outer class template-ized. |
I see what you mean. This MEMFLAGS template parameter is deeply interwoven into everything. I'd just live with the current solution. It uses established pattern, so at least nobody is surprised. I think the basic problem is that CHeapObj itself is a template class. Rethinking MEMFLAGS seems worthwhile for a future RFE. As I wrote, one approach could be to make them a property of the current thread, and switchable and stackable via a Mark class. That way, everything allocated within a given range of frames would count toward a given category. No need to decide on a fine-granular basis. No need for templates. Maybe no need even to have a MEMFLAGS argument for every allocation. |
Mailing list message from Kim Barrett on hotspot-dev:
While working on something else I ran into a similar problem and found a different |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks great and thank you for all your work resolving comments and bringing this in from the lilliput changes you were going to make. This is a big improvement imo and lets us potentially fix other problems in the jvmti tag map implementation. The template instantiation doesn't bother me. There's only 2 and it's not a big code bloat and it's the way the code has been dealing with memflags. Maybe the mechanism should be revisited, like the GC change that Kim pointed to, but I don't see the motivation to make it something different. Maybe a later RFE can propose a different mechanism for memflags in general. We picked this because we thought it was simple.
Thank you for your work on this, Roman and your comments Thomas.
// NOTE: This is a little weird. NULL is not treated any special: ObjectBitSet will happily | ||
// allocate a fragement for the memory range starting at 0 and mark the first bit when passing NULL. | ||
// In the absense of any error handling, I am not sure what would possibly be a reasonable better | ||
// way to do it, though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment. If it matters someday we'll have the comment..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good
Thank you all! GHA are green too, so I'll /integrate |
Going to push as commit abfd2f9.
Your commit was automatically rebased without conflicts. |
#ifndef SHARE_JFR_LEAKPROFILER_JFRBITMAP_HPP | ||
#define SHARE_JFR_LEAKPROFILER_JFRBITMAP_HPP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong - it should be BITSET not BITMAP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong - it should be BITSET not BITMAP
Oh, good find! I filed a follow-up issue:
https://bugs.openjdk.java.net/browse/JDK-8284725
JVMTI heap walking marks objects in order to track which have been visited already. In order to do that, it uses bits in the object header. Those are the same bits that are also used by some GCs to mark objects (the lowest two bits, also used by locking code). Some GCs also use the bits in order to indicate 'forwarded' objects, where the upper bits of the header represent the forward-pointer. In the case of Shenandoah, it's even more problematic because this happens concurrently, even while JVMTI heap walks can intercept. So far we carefully worked around that problem, but it becomes very problematic in Lilliput, where accesses to the Klass* also requires to decode the header, and figure out what bits means what.
In addition to that, marking objects in their header requires that the original header gets saved and restored. We only do that for 'interesting' headers, that is headers that have a stack-lock, monitor or hash-code. All other headers are reset to their default value. This means we are losing object's GC age. This is not catastrophic, but nontheless interferes with GC.
JFR already has a datastructure called BitSet to support object marking without messing with object's headers. We can use that in JVMTI too.
Testing:
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/7964/head:pull/7964
$ git checkout pull/7964
Update a local copy of the PR:
$ git checkout pull/7964
$ git pull https://git.openjdk.java.net/jdk pull/7964/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 7964
View PR using the GUI difftool:
$ git pr show -t 7964
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/7964.diff