-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8203359: Container level resources events #3126
Conversation
👋 Welcome back jbachorik! A progress list of the required criteria for merging this PR into |
@jbachorik The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
@jbachorik Would it make sense for |
Does each getter call result in parsing /proc, or do things aggregated over several calls or hooks? Do you have any data how expensive the invocations are? You could for example try to measure it by temporary making the events durational, and fetch the values between begin() and end(), and perhaps show a 'jfr print --events Container* recording.jfr' printout. If possible, it would be interesting to get some idea about the startup cost as well If not too much overhead, I think it would be nice to skip the "flag" in the .jfcs, and always record the events in a container environment. I know there is a way to test JFR using Docker, maybe @mseledts could provide information? Some sanity tests would be good to have. |
From the looks of it the event emitting code uses On the hotspot side, we implemented some caching for frequent calls (JDK-8232207, JDK-8227006), but we didn't do that yet for the Java side since there wasn't any need (so far). If calls are becoming frequent with this it should be reconsidered. So +1 on getting some data on what the perf penalty of this is. |
Thanks to all for chiming in! I have added the tests to As for the performance - as expected, extracting the data from Caching of cgroups parsed data would help if the emission period is shorter than the cache TTL. This is exacerbated by the fact that (almost) each container event type requires data from a different cgroups control file - hence the data will not be shared between the event type instances even if cached. Realistically, caching benefits would become visible only for sub-second emission periods. If the caching is still required I would suggest having a follow up ticket just for that - it will require setting up some benchmarks to justify the changes that would need to be done in the metrics implementation. |
I tried to measure the startup regression and here are my observations:
In order to minimize the effect this change will have on the startup I would suggest using conditional registration unless I hear strong objections to that. |
src/jdk.jfr/share/classes/jdk/jfr/internal/instrument/JDKEvents.java
Outdated
Show resolved
Hide resolved
f4372e2
to
67a61bd
Compare
src/jdk.jfr/share/classes/jdk/jfr/events/ContainerCPUThrottlingEvent.java
Show resolved
Hide resolved
src/jdk.jfr/share/classes/jdk/jfr/events/ContainerCPUThrottlingEvent.java
Outdated
Show resolved
Hide resolved
src/jdk.jfr/share/classes/jdk/jfr/events/ContainerCPUUsageEvent.java
Outdated
Show resolved
Hide resolved
src/jdk.jfr/share/classes/jdk/jfr/events/ContainerConfigurationEvent.java
Outdated
Show resolved
Hide resolved
src/jdk.jfr/share/classes/jdk/jfr/events/ContainerConfigurationEvent.java
Outdated
Show resolved
Hide resolved
src/jdk.jfr/share/classes/jdk/jfr/events/ContainerIOUsageEvent.java
Outdated
Show resolved
Hide resolved
src/jdk.jfr/share/classes/jdk/jfr/events/ContainerMemoryUsageEvent.java
Outdated
Show resolved
Hide resolved
I wonder if something similar to below could be added to jdk.jfr.internal.Utils:
Then we could add checks to jdk.jfr.internal.JVMUpcalls::bytesForEagerInstrumentation(...)
and jdk.jfr.internal.JVMUpcalls:onRetransform(...)
This way we would not pay for generating bytecode for events in a non-container environment. Not sure if it works, but could perhaps make startup faster? We would still pay for generating the event handlers during registration, but it's much trickier to avoid since we need to store the event type somewhere. |
@egahlin Sounds good. |
f766faf
to
04c3f09
Compare
|
||
private static Metrics getMetrics() { | ||
if (metrics == null) { | ||
metrics = Metrics.systemMetrics(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this not lead to a lookup every time in an non-container environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Now I see why you used Metrics[]
- will fix.
@@ -719,6 +725,20 @@ public static String formatDuration(Duration d) { | |||
} | |||
} | |||
|
|||
public static boolean shouldSkipBytecode(String eventName, Class<?> superClass) { | |||
if (!superClass.getName().equals("jdk.jfr.events.AbstractJDKEvent")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a problem checking against the class instance? If so, perhaps you could add a check that the class is in the boot class loader (null).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, AbstractJDKEvent
is package private so it is not accessible from here.
@jbachorik Has this been tested on cgroups v1 and cgroups v2 Linux systems? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbachorik The test needs fixing.
OK. I've tested the latest iteration on both (cgroup v2 and cgroup v1). Testing looks good other than the |
04c3f09
to
fddd172
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but if there are test issues they should be fixed.
@jbachorik This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 161 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
@Label("Memory Pressure") | ||
@Description("(attempts per second * 1000), if enabled, that the operating system tries to satisfy a memory request for any " + | ||
"process in the current container when no free memory is readily available.") | ||
public double memoryPressure; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this memoryPressure
field go from ContainerMemoryUsageEvent
class? It's not set anywhere is it? would be cgroup v1 only api so I'm not sure it should be there for a generic event like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Removing.
fddd172
to
c3fa274
Compare
c3fa274
to
e0f5855
Compare
Thanks for the review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@egahlin Unfortunately, I had to make one late change in the periodic event hook registration. |
It's not unfortunate :-) I think we should always register the metadata, even if you can't get the event. That's how we handle different GCs. Users must always be able to explore events. For example, you should be able to configure container events in JMC (with correct labels/descriptions) without actually connecting to a JVM running in a Docker container. I think you need to add the hook, for the event metadata to be correct. Otherwise, the "period" setting will not show up. |
Yes. The failed test log would indicate also the rest of the metadata not being in a good shape. But with the hook registered everything works fine. |
/integrate |
@jbachorik Since your change was applied there have been 186 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit ee2651b. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
With this change it becomes possible to surface various cgroup level metrics (available via
jdk.internal.platform.Metrics
) as JFR events.Only a subset of the metrics exposed by
jdk.internal.platform.Metrics
is turned into JFR events to start with.For each of those subsystems a configuration data will be emitted as well. The initial proposal is to emit the configuration data events at least once per chunk and the metrics values at 30 seconds interval.
By using these values the emitted events seem to contain useful information without increasing overhead (the metrics values are read from
/proc
filesystem so that should not be done too frequently).Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/3126/head:pull/3126
$ git checkout pull/3126
Update a local copy of the PR:
$ git checkout pull/3126
$ git pull https://git.openjdk.java.net/jdk pull/3126/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 3126
View PR using the GUI difftool:
$ git pr show -t 3126
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/3126.diff