Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8157023: Integrate NMT with JFR #11449

Closed
wants to merge 13 commits into from

Conversation

kstefanj
Copy link
Contributor

@kstefanj kstefanj commented Dec 1, 2022

Please review this enhancement to include NMT information in JFR recordings.

Summary
Native Memory Tracking summary information can be obtained from a running VM using jcmd if started with -XX:NativeMemoryTracking=summary/detail. Using jcmd requires you to run a separate process and to parse the output to get the needed information. This change adds JFR events for NMT information to enable additional ways to consume the NMT data.

There are two new events added:

  • NativeMemoryUsage - The total native memory usage.
  • NativeMemoryUsagePart - The native memory usage for each component.

These events are sent periodically and by default the interval is 1s. This can of course be discussed, but that is the staring point. When NMT is not enabled on events will be sent.

Testing

  • Added a simple test to verify that the events are sent as expected depending on if NMT is enabled or not.
  • Mach5 sanity testing

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk pull/11449/head:pull/11449
$ git checkout pull/11449

Update a local copy of the PR:
$ git checkout pull/11449
$ git pull https://git.openjdk.org/jdk pull/11449/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 11449

View PR using the GUI difftool:
$ git pr show -t 11449

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/11449.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 1, 2022

👋 Welcome back sjohanss! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 1, 2022
@openjdk
Copy link

openjdk bot commented Dec 1, 2022

@kstefanj The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Dec 1, 2022
@mlbridge
Copy link

mlbridge bot commented Dec 1, 2022

Webrevs

@kstefanj kstefanj changed the title 8157023: jfr events for nmt 8157023: Integrate NMT with JFR Dec 1, 2022
src/hotspot/share/services/memReporter.cpp Outdated Show resolved Hide resolved
src/hotspot/share/services/memReporter.cpp Outdated Show resolved Hide resolved
@@ -706,6 +706,19 @@
<Field type="OldObjectGcRoot" name="root" label="GC Root" />
</Event>

<Event name="NativeMemoryUsagePart" category="Java Virtual Machine, Memory" label="Component Native Memory Usage" description="Native memory usage for a component" stackTrace="false" thread="false"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it odd that naming for native memory tracking “parts” aren’t named consistently. “Category” might be more obvious. I don’t have a strong opinion, just thought I’d mention my observation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these names have change a few times during dev and that's why it is out of sync. It should certainly be consistent, but it is really hard to come up with a great name for this. Especially since the different "parts" are so different, compare "GC" and "Object Monitors". Using "Category" has been discussed and I'm not against it, as always naming is very hard.

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Stefan,

I think this looks interesting, and potentially very useful. But I am not yet convinced that exposing all tags is the way to go. For those who are interested, the original ML thread: https://mail.openjdk.org/pipermail/core-libs-dev/2022-December/097404.html.

The number of values may expand considerably in the future: we may want to use tags in a far more fine-granular manner than we do now, and/or change their encoding - eg to work in a hierarchy, or groups, or in combined an UL-like fashion. So their number may expand, and their meaning change, which could render this report obsolete quickly. E.g. if we add tag hierarchies, do we then only report leaf tags? How useful would that be? If we allow tag combinations, how would we report that?

Also note that I currently work on a patch for showing NMT peak malloc values, see https://bugs.openjdk.org/browse/JDK-8297958. Peak values are very useful to have. So, do we expose them too? One more value per category. But leaving them out would render the JFR NMT report less useful.

Bottomline, I am not yet convinced that reporting all NMT categories is that useful. And it exposes implementation details that may cause breakage in the future. We could restrict them to a subset of useful ones, and only report that.

Another thought, for virtual memory mappings you report reserved and committed. But I doubt that "reserved" is really of much use. In itself, it does not cost anything, at least not on 64-bit. For a select few categories, it can signify the largest amount of committable memory (e.g. heap and code space) but those are already reported in JFR. So I think we could omit "reserved" and save a bunch of code and make the NMT JFR report less overwhelming.

Cheers, Thomas

src/hotspot/share/services/memReporter.cpp Show resolved Hide resolved
@@ -706,6 +706,19 @@
<Field type="OldObjectGcRoot" name="root" label="GC Root" />
</Event>

<Event name="NativeMemoryUsagePart" category="Java Virtual Machine, Memory" label="Component Native Memory Usage" description="Native memory usage for a component" stackTrace="false" thread="false"
startTime="false" period="everyChunk">
<Field type="string" name="type" label="Memory Type" description="Component allocating the native memory" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way than to re-transmit the category name with every event?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a way to create special JFR types and instead just send an ID. This might make sense to save some extra memory. @egahlin, is there any drawback with creating custom type? Apart from the parsing needing an extra step.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. See for https://github.com/openjdk/jdk/blob/master/src/hotspot/share/jfr/metadata/jfrSerializer.hpp for how to serialize constants. Since the number of events are few and they are opt-in using a command line flag, I don't think it will make much of a difference from a performance perspective, but I don't think it will hurt either.

@@ -22,20 +22,22 @@
*
*/
#include "precompiled.hpp"
#include "jfr/jfrEvents.hpp"
Copy link
Member

@tstuefe tstuefe Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an intermixing of layers. I think it would be cleaner if if JFR accessed the current values from outside, instead of JFR knowledge leaking into NMT.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I agree, I see the MemReporter as the reporting layer and it now knows about JFR to be able to report through it.

We have in other places created small helpers that send the events to avoid including jfrEvents.hpp all over the place, and we could to that here as well. But I don't see a problem with having the reporter residing along side the other reporters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

<Event name="NativeMemoryUsagePart" category="Java Virtual Machine, Memory" label="Component Native Memory Usage" description="Native memory usage for a component" stackTrace="false" thread="false"
startTime="false" period="everyChunk">
<Field type="string" name="type" label="Memory Type" description="Component allocating the native memory" />
<Field type="ulong" contentType="bytes" name="reserved" label="Reserved Memory" description="Reserved bytes by this component" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above. I am not sure we need reserved. If not, we could cut out a lot of code.

}

MemBaseline usage;
usage.baseline(true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is quite expensive. So it depends on how often we do this. How often are these samples taken?

Eg. Baseline.baseline does walk all thread stacks (so, probing all of them via mincore()). It also copies a lot of counters around.

It is also not threadsafe. Are we at a safepoint here? Normally NMT reports are only done at safepoints.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't looked at the details of baseline(bool summaryOnly) that much, but since it summaryOnly = true I don't think it actually walk the thread stacks, right?

We don't do this at a safepoint but looking at MallocMemorySnapshot::copy_to(...) it uses ThreadCritical to avoid things being cleaned out at the same time. Not sure if there are other thread safety problems though. I would expect this to have the same problems as a summary report triggered through jcmd because that isn't run at a safepoint either. But I do see that when used from jcmd we take a lock to serialize the NMT query, so we should probably do the same here.

  // Query lock is used to synchronize the access to tracking data.
  // So far, it is only used by JCmd query, but it may be used by
  // other tools.
  static inline Mutex* query_lock() {
    assert(NMTQuery_lock != NULL, "not initialized!");
    return NMTQuery_lock;
  }

So we certainly need to look closer at this. I would like to understand why the query lock is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking a bit more at this. To me it seems like we are protecting the stored baseline with the lock is:

class MemTracker : AllStatic {
...
  // Stored baseline
  static MemBaseline      _baseline;
...
};

That the NMTDCmd code uses through MemTracker::get_baseline(). So if the JFR events keep its own baseline we should be fine without the lock. Or am I missing something here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kstefanj ,

Haven't looked at the details of baseline(bool summaryOnly) that much, but since it summaryOnly = true I don't think it actually walk the thread stacks, right?

I think we do, since summary info contains information about committed thread stack size. I think the call chain is

MemBaseline::baseline()->
MemBaseline::baseline_summary()->
VirtualMemorySummary::snapshot()->
VirtualMemoryTracker::snapshot_thread_stacks()->
VirtualMemoryTracker::walk_virtual_memory()->
SnapshotThreadStackWalker::do_allocation_site()->
RegionIterator::next_committed()->
os::committed_in_range()->
mincore loop

We don't do this at a safepoint but looking at MallocMemorySnapshot::copy_to(...) it uses ThreadCritical to avoid things being cleaned out at the same time. Not sure if there are other thread safety problems though. I would expect this to have the same problems as a summary report triggered through jcmd because that isn't run at a safepoint either.

You are right, I could have sworn we do this with a VM op. Must have confused this with metaspace printing.

But I do see that when used from jcmd we take a lock to serialize the NMT query, so we should probably do the same here.

  // Query lock is used to synchronize the access to tracking data.
  // So far, it is only used by JCmd query, but it may be used by
  // other tools.
  static inline Mutex* query_lock() {
    assert(NMTQuery_lock != NULL, "not initialized!");
    return NMTQuery_lock;
  }

So we certainly need to look closer at this. I would like to understand why the query lock is needed.

To serialize jcmds running in parallel. Since jcmd is done by the attach listener thread, the only way I can think of would be if someone in parallel executes the command via mbean? Not sure.

Note that there is no way to get consistent counter values across all categories anyway. All the counters are updated atomically without lock, and that is all we can afford in malloc. Counters are not synchronized among themselves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do, since summary info contains information about committed thread stack size.

You are totally correct, somehow missed that part of VirtualMemorySummary::snapshot(). So yes, we should really look at only doing one snapshot per period. Any feeling for exactly how expensive this walk is?

To serialize jcmds running in parallel. Since jcmd is done by the attach listener thread, the only way I can think of would be if someone in parallel executes the command via mbean? Not sure.

Ok, so if executed via the mbean it would be executed in another thread but still use DCMD::execute() and the lock to make sure those are synchronized. Anyhow, do you agree with my above understanding that what is protected by the lock is the "cached baseline"? So as long as JFR doesn't use it, the lock should not be needed?

Note that there is no way to get consistent counter values across all categories anyway. All the counters are updated atomically without lock, and that is all we can afford in malloc. Counters are not synchronized among themselves.

Agreed, this is a snapshot at a given point of a moving target.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking a bit more at this. To me it seems like we are protecting the stored baseline with the lock is:

class MemTracker : AllStatic {
...
  // Stored baseline
  static MemBaseline      _baseline;
...
};

That the NMTDCmd code uses through MemTracker::get_baseline(). So if the JFR events keep its own baseline we should be fine without the lock. Or am I missing something here?

I think this should be safe.

But I still think about thread stack probing. Your sampler will compete via ThreadCritical with thread creation and destruction (and a number of other unrelated things) because it needs to walk the virtual memory region chains. I think NMT reporting was never planned to be done in high volume.

I experimented a lot with a large number of threads and NMT thread stack probing. Its results are sometimes surprising, e.g. I regularly get much higher numbers from NMT than for process RSS. E.g. NMT tells me we have 4g committed thread stack when RSS is just 630M. This is without swapping.

So I am not sure how reliable thread stack probing actually is in reality. I know that Zhengyu put a lot of work into it and it needed a number of iterations. Since all the rest together is just reading out counters and reporting them, thread stack probing is the only unknown time-wise. It is O(n) over both number of threads and thread stack size and will potentially lock. We can improve matters somewhat with smarter locking, but the O(n) part remains.

Maybe leave that stuff out and just report the normal counters? Or can we live with large jitters here? This is a JFR question and I'm not a JFR developer :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, maybe creating a special baseline-call that skips the thread-stacks would be ok. As you say, if these are expensive it might be problematic to do them in the periodic thread.

@tstuefe
Copy link
Member

tstuefe commented Dec 1, 2022

A case for selecting a static set of values to be reported via JFR from the "raw" NMT category mass would be that not only it hides implementation details but makes defining a compound JFR event possible.

@kstefanj
Copy link
Contributor Author

kstefanj commented Dec 2, 2022

The number of values may expand considerably in the future: we may want to use tags in a far more fine-granular manner than we do now, and/or change their encoding - eg to work in a hierarchy, or groups, or in combined an UL-like fashion. So their number may expand, and their meaning change, which could render this report obsolete quickly. E.g. if we add tag hierarchies, do we then only report leaf tags? How useful would that be? If we allow tag combinations, how would we report that?

As you know I would love to see a better way to tag allocations than the current. It is hard to answer how we would handle it when it comes but the current event structure could already handle for example multiple tags. The type field could look like "gc,remset" leaving it to the parser to decide how to handle this data. Not saying this is perfect or the way to go, but I would not like to engineer the initial solution to fit a possible future improvement. That said, we should of course try to avoid making design decisions that will restrict us going forward. One way would be to let those new events be experimental at first.

Also note that I currently work on a patch for showing NMT peak malloc values, see https://bugs.openjdk.org/browse/JDK-8297958. Peak values are very useful to have. So, do we expose them too? One more value per category. But leaving them out would render the JFR NMT report less useful.

This is another good question, my first thought was to include more information. Like the split between malloced and "mapped", but after some discussions we landed in just the two current valued. I don't see a problem adding the peak value, but would it also require the "normal" malloc value to make sense?

Bottomline, I am not yet convinced that reporting all NMT categories is that useful. And it exposes implementation details that may cause breakage in the future. We could restrict them to a subset of useful ones, and only report that.

I'm not against this, for obvious reasons I care mostly about GC, and I could see a benefit in just reporting a few "subsets". The problem I see with this is that someone might be needing something other that what we defined. We can of course add more later on, but right now when it is just around 30 events I think sending all is ok.

Another thought, for virtual memory mappings you report reserved and committed. But I doubt that "reserved" is really of much use. In itself, it does not cost anything, at least not on 64-bit. For a select few categories, it can signify the largest amount of committable memory (e.g. heap and code space) but those are already reported in JFR. So I think we could omit "reserved" and save a bunch of code and make the NMT JFR report less overwhelming.

I see your point, and I agree that committed is the most interesting one. The way I use it is exactly as you describe, it is an easy way to determine the max and even if that information is available through other events it is quite nice to have it bundled together.

@tstuefe
Copy link
Member

tstuefe commented Dec 2, 2022

The number of values may expand considerably in the future: we may want to use tags in a far more fine-granular manner than we do now, and/or change their encoding - eg to work in a hierarchy, or groups, or in combined an UL-like fashion. So their number may expand, and their meaning change, which could render this report obsolete quickly. E.g. if we add tag hierarchies, do we then only report leaf tags? How useful would that be? If we allow tag combinations, how would we report that?

As you know I would love to see a better way to tag allocations than the current. It is hard to answer how we would handle it when it comes but the current event structure could already handle for example multiple tags. The type field could look like "gc,remset" leaving it to the parser to decide how to handle this data. Not saying this is perfect or the way to go, but I would not like to engineer the initial solution to fit a possible future improvement. That said, we should of course try to avoid making design decisions that will restrict us going forward. One way would be to let those new events be experimental at first.

Also note that I currently work on a patch for showing NMT peak malloc values, see https://bugs.openjdk.org/browse/JDK-8297958. Peak values are very useful to have. So, do we expose them too? One more value per category. But leaving them out would render the JFR NMT report less useful.

This is another good question, my first thought was to include more information. Like the split between malloced and "mapped", but after some discussions we landed in just the two current valued. I don't see a problem adding the peak value, but would it also require the "normal" malloc value to make sense?

Bottomline, I am not yet convinced that reporting all NMT categories is that useful. And it exposes implementation details that may cause breakage in the future. We could restrict them to a subset of useful ones, and only report that.

I'm not against this, for obvious reasons I care mostly about GC, and I could see a benefit in just reporting a few "subsets". The problem I see with this is that someone might be needing something other that what we defined. We can of course add more later on, but right now when it is just around 30 events I think sending all is ok.

Another thought, for virtual memory mappings you report reserved and committed. But I doubt that "reserved" is really of much use. In itself, it does not cost anything, at least not on 64-bit. For a select few categories, it can signify the largest amount of committable memory (e.g. heap and code space) but those are already reported in JFR. So I think we could omit "reserved" and save a bunch of code and make the NMT JFR report less overwhelming.

I see your point, and I agree that committed is the most interesting one. The way I use it is exactly as you describe, it is an easy way to determine the max and even if that information is available through other events it is quite nice to have it bundled together.

Okay, you are convincing me :) you pre-selected a range of values to transmit, and that selection is necessarily biased and subjective, and others may find other selections valuable. But since its a matter of taste anyway, lets go with printing all tags as they are now. If we ever rebuild the tag system we'll think of something. I think omitting malloc vs mmap and malloc count is okay, and you are right, malloc peak makes sense only if we report malloc.

I still would argue for leaving out reserved, but won't press the point. If you and other reviewers think its useful, lets report it.

@egahlin
Copy link
Member

egahlin commented Dec 2, 2022

I still would argue for leaving out reserved, but won't press the point. If you and other reviewers think its useful, lets report it.

If we feel the events or some fields are not mature, we could mark them experimental. For the events to be emitted, users will need to pass a command line flag, so I think 99% of all users will never enable them. I think these events will be mostly useful for Hotspot developers and support engineers.

@jpbempel
Copy link
Member

jpbempel commented Dec 2, 2022

I think these events will be mostly useful for Hotspot developers and support engineers.

I respectfully disagree on that. In a container world, it's common to be hit by OOMKill when sizing the JVM correctly. Most people think only by the Java heap size, and forget about the native part. And when you do, there is few control or possible way to troubleshoot why the native part is going off what you expect. The only useful tool in this situation is NMT to be able to triage which space is the culprit.
see https://blog.arkey.fr/2020/11/30/off-heap-reconnaissance/

@tstuefe
Copy link
Member

tstuefe commented Dec 2, 2022

I think these events will be mostly useful for Hotspot developers and support engineers.

I respectfully disagree on that. In a container world, it's common to be hit by OOMKill when sizing the JVM correctly. Most people think only by the Java heap size, and forget about the native part. And when you do, there is few control or possible way to troubleshoot why the native part is going off what you expect. The only useful tool in this situation is NMT to be able to triage which space is the culprit. see https://blog.arkey.fr/2020/11/30/off-heap-reconnaissance/

Hi @jpbempel,

this is currently being discussed as part of https://mail.openjdk.org/pipermail/core-libs-dev/2022-December/097391.html and https://mail.openjdk.org/pipermail/core-libs-dev/2022-November/096369.html. Feel free to take part! I am especially interested in seeing concrete examples where NMT did help customers to lower memory footprint.

Cheers, Thomas

@kstefanj
Copy link
Contributor Author

kstefanj commented Dec 2, 2022

I still would argue for leaving out reserved, but won't press the point. If you and other reviewers think its useful, lets report it.

If we feel the events or some fields are not mature, we could mark them experimental. For the events to be emitted, users will need to pass a command line flag, so I think 99% of all users will never enable them. I think these events will be mostly useful for Hotspot developers and support engineers.

For experimental events I think they are emitted as normal events, just not show by default in JMC, right?

@bric3
Copy link

bric3 commented Dec 2, 2022

Feel free to take part! I am especially interested in seeing concrete examples where NMT did help customers to lower memory footprint.

Actually it's not as much as reducing the memory footprint as it is sizing properly the memory constraints in a deployment unit (either the container and/or the JVM itself). That said this story goes beyond components's memory as reported by NMT to consider the actual process RSS as the dominant metric against OOMKills.

@kstefanj
Copy link
Contributor Author

kstefanj commented Dec 2, 2022

Pushed some changes inspired by the discussion here. Not all things have been handled, but I think this is a step in the right direction.

  • The baseline is re-used if newer than 50ms, so if the two events are configured to occur with the same period they should be using the same baseline. I also save the time of the baseline and use that in all events using that baseline.
  • I swapped the event-names, or rather NativeMemoryUsage is not the event with a type, and we instead have NativeMemoryUsageTotal for the total event. This way we don't have to come up with something better for "part", and "total" is good.
  • Also move the JFR-reporting to its own file to get better separation.

Still some more open questions, like:

  • Should we include reserved?
  • Should we skip thread probing when doing the baseline?

@tstuefe
Copy link
Member

tstuefe commented Dec 2, 2022

Pushed some changes inspired by the discussion here. Not all things have been handled, but I think this is a step in the right direction.

* The baseline is re-used if newer than 50ms, so if the two events are configured to occur with the same period they should be using the same baseline. I also save the  time of the baseline and use that in all events using that baseline.

I did some tests and on my box probing 20k threads - if we get ThreadCritical right away - takes about 50-100ms.

* I swapped the event-names, or rather `NativeMemoryUsage` is not the event with a type, and we instead have `NativeMemoryUsageTotal` for the total event. This way we don't have to come up with something better for "part", and "total" is good.

* Also move the JFR-reporting to its own file to get better separation.

I like this, thank you.

@egahlin
Copy link
Member

egahlin commented Dec 2, 2022

For experimental events I think they are emitted as normal events, just not show by default in JMC, right?

Yes, you need to enable a checkbox in a preference dialog.

@egahlin
Copy link
Member

egahlin commented Dec 2, 2022

I respectfully disagree on that. In a container world, it's common to be hit by OOMKill when sizing the JVM correctly. Most people think only by the Java heap size, and forget about the native part. And when you do, there is few control or possible way to troubleshoot why the native part is going off what you expect. The only useful tool in this situation is NMT to be able to triage which space is the culprit. see https://blog.arkey.fr/2020/11/30/off-heap-reconnaissance/

Thanks, good to know.

@jpbempel
Copy link
Member

jpbempel commented Dec 2, 2022

I respectfully disagree on that. In a container world, it's common to be hit by OOMKill when sizing the JVM correctly. Most people think only by the Java heap size, and forget about the native part. And when you do, there is few control or possible way to troubleshoot why the native part is going off what you expect. The only useful tool in this situation is NMT to be able to triage which space is the culprit. see https://blog.arkey.fr/2020/11/30/off-heap-reconnaissance/

Thanks, good to know.

Also some are so desperate to get NMT stats/metrics dynamically:
https://github.com/mcabaj/nmt-metrics
https://github.com/opentable/otj-jvm/blob/master/src/main/java/com/opentable/jvm/Nmt.java

I think I have also seen published as JMX MBean

@tstuefe
Copy link
Member

tstuefe commented Dec 2, 2022

Also some are so desperate to get NMT stats/metrics dynamically: https://github.com/mcabaj/nmt-metrics https://github.com/opentable/otj-jvm/blob/master/src/main/java/com/opentable/jvm/Nmt.java

I think I have also seen published as JMX MBean

SapMachine can generate it dynamically on low error situations. https://github.com/SAP/SapMachine/wiki/SapMachine-High-Memory-Reports

@kstefanj
Copy link
Contributor Author

kstefanj commented Dec 6, 2022

Updated the change as discussed above. Summary of the changes:

  • New MemSnapshot class that can be configured to not walk the thread stacks
  • Using this for JFR events (thread stacks are not walked)
  • Skipping mtNone when sending events
  • Not grouping mtThread and mtThreadStack as text report is doing, they are reported as individual events.
  • Handling mtChunk as MemBaseline to avoid double counting, which leads to keeping a ThreadCritical for over the "malloc snapshot" to make sure we have consistent values.

I'm not totally sure I think there need to be options to skip malloc and vm snapshot, but I included it since it was proposed and discussed. I have no strong opinion, but I think they could be always included.

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kstefanj ,

Getting closer. See remarks. I did not look closely at the JFR side of things, just the NMT code.

Cheers, Thomas

// the two JFR events that are using the same data.
class MemJFRSnapshot : public AllStatic {
private:
// The baseline age threshold in millie seconds. If older
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/millie seconds/milliseconds?

_malloc_snapshot[i] = mm->malloc_size() + mm->arena_size();
total_arena_size += mm->arena_size();
}
assert(total_arena_size == ms->total_arena(), "Mismatch in accounting");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could fail intermittently since all counters are updated asynchronously. The snapshot can be modified below us while we assemble this MemSnapshot. I think that's fine, but I would just use the accumulated counter here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, was about to add other verification as well but stayed away just because of this issue. But didn't think about it when I added this.

bool include_vm;
};

class MemSnapshot : public CHeapObj<mtNMT> {
Copy link
Member

@tstuefe tstuefe Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name this not Snapshot but something different? We already have MallocMemorySnapshot and methods like "as_snapshot", which have nothing to do with this layer.

proposals:

NMTData
NMTStatistics
NMTStatSet

....
....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you don't think NMTSnapshot is good much better =) I can't come up with something much better than your suggestions, NMTUsage is not perfect either (sound like the usage cause by NMT). If we don't go with the NMT-prefix I think something like MemUsageor NativeMemoryUsage would make sense. Looking at the other reporting classes connected to NMT they are named Mem-something. So I think MemUsage, what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. Up to you.

I wish we would allow C++ namespaces, having all in something like "namespace nmt". Then it does not hurt so much if names are a bit more generic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go with MemUsage, normally I would have spent some more time trying to come up with something better, but Thursday is getting closer :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always can revamp.

void MemSnapshot::snap() {
if (_snapshot_options.update_thread_stacks) {
walk_thread_stacks();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not make sense to walk_thread_stacks independently from the VM snapshot, since it piggy-backs on it. I would move this into the _snapshot_options.include_vm condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True and this is why I kind of think malloc and vm should not be optional. But I can move this into the "vm" condition.

// Thread critical needed keep values in sync, total area size
// is deducted from mtChunk in the end to give correct values.
ThreadCritical tc;
MallocMemorySnapshot* ms = MallocMemorySummary::as_snapshot();
Copy link
Member

@tstuefe tstuefe Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make const MallocMemorySnapshot* - this is the one true copy of the real counters, since as_snapshot does not copy. We don't want to modify it.

size_t total_arena_size = 0;
for (int i = 0; i < mt_number_of_types; i++) {
MEMFLAGS flag = NMTUtil::index_to_flag(i);
MallocMemory* mm = ms->by_type(flag);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make const too, and could you please add a const variant for MallocMemorySnapshot::by_type?


// Total virtual memory size.
_vm_total.reserved = vms->total_reserved();
_vm_total.committed = vms->total_committed();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too, this is the live snapshot which can be modified while we iterate over its counters. If you want to appear consistent, I would add up committed and reserved above and use those instead of the total counter from vms.

@kstefanj
Copy link
Contributor Author

kstefanj commented Dec 6, 2022

Another thing @tstuefe, have you double checked my accounting around mtChunk? I tried to mimic what's done for the baseline and normal print-reports. Looks correct?

@tstuefe
Copy link
Member

tstuefe commented Dec 6, 2022

Another thing @tstuefe, have you double checked my accounting around mtChunk? I tried to mimic what's done for the baseline and normal print-reports. Looks correct?

Forget my last comment about mtChunk. I think what you do is correct.

@kstefanj
Copy link
Contributor Author

kstefanj commented Dec 6, 2022

Another thing @tstuefe, have you double checked my accounting around mtChunk? I tried to mimic what's done for the baseline and normal print-reports. Looks correct?

Forget my last comment about mtChunk. I think what you do is correct.

Thanks for double checking it and for the explanation in the other PR.

@kstefanj
Copy link
Contributor Author

kstefanj commented Dec 6, 2022

I hope all comments have been addressed now. I went with NMTUsage after all, turns out there was already a MemoryUsage class in the same folder.

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NMT changes look good to me. Thanks for taking my input :)

@openjdk
Copy link

openjdk bot commented Dec 6, 2022

@kstefanj This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8157023: Integrate NMT with JFR

Reviewed-by: stuefe, mgronlun, egahlin

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 173 new commits pushed to the master branch:

  • 6ed3683: 8297209: Serial: Refactor GenCollectedHeap::full_process_roots
  • 86270e3: 8269820: C2 PhaseIdealLoop::do_unroll get wrong opaque node
  • cf63f2e: 8298184: Incorrect record component type in record patterns
  • 58170f6: 8298035: Provide better descriptions for JIT compiler JFR events
  • bfcc238: 8297964: Jetty.java fails "assert(_no_handle_mark_nesting == 0) failed: allocating handle inside NoHandleMark"
  • 3e041eb: 8298248: Limit sscanf output width in cgroup file parsers
  • 4da8411: 8298108: Add a regression test for JDK-8297684
  • 80cbfab: 8298169: Remove unused methods in space.hpp
  • 221e1a4: 8286666: JEP 429: Implementation of Scoped Values (Incubator)
  • ccc69af: 8296672: Implementation of Virtual Threads (Second Preview)
  • ... and 163 more: https://git.openjdk.org/jdk/compare/2deb318c9f047ec5a4b160d66a4b52f93688ec42...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 6, 2022
@kstefanj
Copy link
Contributor Author

kstefanj commented Dec 6, 2022

Thanks @tstuefe, will try to get a JFR reviewer on board as well and get this in before the fork.

}

void MemJFRReporter::send_type_event(const Ticks& starttime, const char* type, size_t reserved, size_t committed) {
EventNativeMemoryUsage event;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are explicitly setting the startTime field yourself, you can pass 'UNTIMED' to the event constructor. This will bypass the automatic setting of _starttime as part of the constructor.

EventNativeMemoryUsage event(UNTIMED);

@@ -706,6 +706,19 @@
<Field type="OldObjectGcRoot" name="root" label="GC Root" />
</Event>

<Event name="NativeMemoryUsage" category="Java Virtual Machine, Memory" label="Native Memory Usage Per Type"
description="Native memory usage for a given memory type in the JVM" period="everyChunk">
<Field type="string" name="type" label="Memory Type" description="Type used for the native memory allocation" />
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest turning this type "string" into a JFR-constant type, like "JfrNMTType" to only send id's, instead of the entire strings. Can be done in a follow-up change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to do this at a later point, created JDK-8298278 to track this.

NMTUsage* usage = MemJFRCurrentUsage::get_usage();
Ticks timestamp = MemJFRCurrentUsage::get_timestamp();

EventNativeMemoryUsageTotal event;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are explicitly setting the startTime field yourself, you can pass 'UNTIMED' to the event constructor. This will bypass the automatic setting of _starttime as part of the constructor.

EventNativeMemoryUsageTotal event(UNTIMED);


// Helper class to avoid refreshing the NMTUsage to often and allow
// the two JFR events to use the same data.
class MemJFRCurrentUsage : public AllStatic {
Copy link

@mgronlun mgronlun Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to .cpp file to because only used internally. Does not even have to be a class, can only be static fields and accessors. But no big deal, at your will.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it but kept the class structure to better show how things hang together :)

@@ -512,6 +512,16 @@
<setting name="cutoff" control="old-objects-cutoff">0 ns</setting>
</event>

<event name="jdk.NativeMemoryUsage">
<setting name="enabled" control="gc-enabled-normal">true</setting>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gc-enabled-normal? Should this really be tied to a gc control? An NMT control to be introduced?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since users need to pass a command line flag to enable the event, I think it can be on by default, and use included gc-enabled-normal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked to @egahlin about this and maybe not optimal to tie it to a gc control, but fine for now. This does not affect the VM, just JMC.


private static void generateEvents(Recording recording) throws Exception {
// Enable the two types of events for "everyChunk", it will give
// an event att the beginning of the chunk as well as the end.
Copy link

@mgronlun mgronlun Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"att" -> "at"

@kstefanj
Copy link
Contributor Author

kstefanj commented Dec 7, 2022

Thanks for the review @mgronlun, will wait a bit for GH actions and some other testing to finish before integrating.

@kstefanj
Copy link
Contributor Author

kstefanj commented Dec 7, 2022

Thanks for the review @egahlin

@tstuefe
Copy link
Member

tstuefe commented Dec 7, 2022

Still good. I like that the usage classes moved out of the interface.

@kstefanj
Copy link
Contributor Author

kstefanj commented Dec 7, 2022

/integrate

@openjdk
Copy link

openjdk bot commented Dec 7, 2022

Going to push as commit 3b8c7ef.
Since your change was applied there have been 180 commits pushed to the master branch:

  • e86f31b: 8298301: C2: assert(main_cmp->in(2)->Opcode() == Op_Opaque1) failed: main loop has no opaque node?
  • 8edb98d: 8165943: LineBreakMeasurer does not measure correctly if TextAttribute.TRACKING is set.
  • 3934484: 8298205: Prefer Member Initialization Lists for JFR classes in os_perf.hpp
  • 389b8f4: 8297298: SequenceInputStream should override transferTo
  • dd7385d: 8298202: [AIX] Dead code elimination removed jfr constructor used by AIX
  • 29f1c3c: 8298274: Problem list TestSPISigned on Windows
  • 3de7750: 8298177: Various java.lang.invoke cleanups
  • 6ed3683: 8297209: Serial: Refactor GenCollectedHeap::full_process_roots
  • 86270e3: 8269820: C2 PhaseIdealLoop::do_unroll get wrong opaque node
  • cf63f2e: 8298184: Incorrect record component type in record patterns
  • ... and 170 more: https://git.openjdk.org/jdk/compare/2deb318c9f047ec5a4b160d66a4b52f93688ec42...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Dec 7, 2022
@openjdk openjdk bot closed this Dec 7, 2022
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Dec 7, 2022
@openjdk
Copy link

openjdk bot commented Dec 7, 2022

@kstefanj Pushed as commit 3b8c7ef.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@dholmes-ora
Copy link
Member

The new test is failing in our tier3 testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels