-
Notifications
You must be signed in to change notification settings - Fork 782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Draft] Initial System metrics proposal #1323
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1323 +/- ##
============================================
- Coverage 87.80% 85.81% -1.99%
Complexity 1215 1215
============================================
Files 155 156 +1
Lines 4558 4666 +108
Branches 418 421 +3
============================================
+ Hits 4002 4004 +2
- Misses 400 507 +107
+ Partials 156 155 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First question why do you need the ask dependency?
Shouldn't this be a pure API contrib-module? Or be a part of the instrumentation project? |
Not sure what you mean. Can you elaborate @bogdandrutu ? |
That's a good question. We need the SDK only for creating the |
makes sense to me |
@@ -137,6 +137,7 @@ configure(opentelemetryProjects) { | |||
jmh_core : "org.openjdk.jmh:jmh-core:${jmhVersion}", | |||
jmh_bytecode : "org.openjdk.jmh:jmh-generator-bytecode:${jmhVersion}", | |||
jsr305 : "com.google.code.findbugs:jsr305:${findBugsJsr305Version}", | |||
oshi : "com.github.oshi:oshi-core:5.1.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JNA libraries tend to be pretty bloated since they basically duplicate a native interface into Java, and for cross-platform ones have a ton of these. This seems to be 600K along with JNA itself which is 3MB+ (SDK is just 250K for reference). JDK's monitoring beans provide a lot of useful information so maybe just focus on them for now and can consider adding the huge dependency in a followup if the delta in terms of metrics seems useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to use the monitoring beans if we get all the metrics defined here available. Oshi is a very well known library but I can try to not use it because of its size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess if it's just about system metrics this could be helpful. But it seems more useful to users to focus on JVM metrics first? None of the popular Java metrics libraries seem to use oshi but provide very useful runtime metrics without much binary cost - such metrics should basically "always be enabled" on an instrumented app while this seems much more specialized and should just add the metrics that can't be gotten for free from the JDK.
https://github.com/dropwizard/metrics/tree/release/4.1.x/metrics-jvm/src/main/java/com/codahale/metrics/jvm
https://github.com/prometheus/client_java/tree/master/simpleclient_hotspot/src/main/java/io/prometheus/client/hotspot
https://github.com/micrometer-metrics/micrometer/tree/b486e1261cade606eb0ff826fbae318f6d0c9e86/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/system
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems much more specialized and should just add the metrics that can't be gotten for free from the JDK.
That's correct, and this is the precisely the idea: to provide a set of complete system-wide metrics (and was the reason why I wanted to initially put it here, as a separate, extra artifact that some users can use).
@trask is there a plan to support external instrumentation jars? In that case, this could be provided at runtime. Also, what's your take on having Oshi as a dependency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many of the metrics here, like system cpu load, are available from the JDK though. And we definitely wouldn't want double instrumentation of the same metric if people added both, e.g., instrumentation-jvm
and instrumentation-system
. So recommendation was to start with the less complex, free stuff since that gives a better picture of what's needed in the complex, dependency-heavy version. So this feels like it's the wrong order to be addressing the issue in IMO. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be a rather small codebase, so I don't see the need to postpone the discussion on what to use/report ;) I'd argue that it's not a wrong order situation, as I mentioned before, the idea is precisely to provide system-wide metrics.
we definitely wouldn't want double instrumentation of the same metric if people added both
It's an interesting scenario to have them separated. But in that case I'd rather have the system-wide disabled or something similar. Unless there's a clear path that justifies having them separated (in that case I'b stick to that, of course).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trask is there a plan to support external instrumentation jars?
ya, we need to support users writing/bringing their own instrumentation somehow, but we haven't gotten to that yet
Also, what's your take on having Oshi as a dependency?
well, the auto agent jar is already 28mb, so what's another 3mb? 😭
(side note: maybe we should use the all
classifier after all, to be clear that it's the kitchen sink of distributions)
I agree that it would be nice to split this into two modules, one that only has only JVM (mbean) metrics, and one that has only system (oshi) metrics.
The JVM metric module could detect if the system metrics module is on the classpath and disable anything that's duplicative with the system metrics (e.g. don't capture cpu total
via JVM metrics, since system metrics captures cpu system
+ user
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having the jvm module disable metrics if oshi is being used is a nice idea.
Size-wise, let's not forget we are trying to move towards also supporting manual instrumentation, not just agent. Though 10% of the entire agent seems pretty hefty FWIW. System metrics also have the issue that multiple jvms on the same system will also provide redundant metrics, common with containers. That should be taken into account too when evaluating the cost, it does take actual RAM to load large ffi modules. I would expect if packaged into the agent jar it's not enabled by default because of this potentially exploding cost to users.
That's enough high level context, for this PR how about removing the GC metrics and only take stuff from oshi? Can add GC metrics to a separate jvm metrics module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah found that we actually already have such a class too
Line 66 in 596617d
String[] label = {GC_LABEL_KEY, gc.getName()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it would be nice to split this into two modules, one that only has only JVM (mbean) metrics, and one that has only system (oshi) metrics.
That's fine by me, as long as one module is for JVM-specific metrics (the current metrics
module in contrib, with a few metric additions) and have this one is for System-wide metrics (which could be disabled by default). However, skipping the Oshi dependency wouldn't be avoided ;)
Btw, related effort: open-telemetry/opentelemetry-specification#651
@@ -137,6 +137,7 @@ configure(opentelemetryProjects) { | |||
jmh_core : "org.openjdk.jmh:jmh-core:${jmhVersion}", | |||
jmh_bytecode : "org.openjdk.jmh:jmh-generator-bytecode:${jmhVersion}", | |||
jsr305 : "com.google.code.findbugs:jsr305:${findBugsJsr305Version}", | |||
oshi : "com.github.oshi:oshi-core:5.1.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OSHI 4.x and 5.x require minimum Java 8 compatibility
. We still support Java7, don't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API and the SDK do, but that doesn't mean that all instrumentation has to.
I think it would make sense to have this in the instrumentation repo. |
r.observe(mem.getTotal(), TYPE_LABEL_KEY, "total"); | ||
r.observe(mem.getAvailable(), TYPE_LABEL_KEY, "available"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JVM mbeans provide available
, but no total
that i know of
r.observe(cpuTicks[TickType.SYSTEM.getIndex()] * 1000, TYPE_LABEL_KEY, "system"); | ||
r.observe(cpuTicks[TickType.USER.getIndex()] * 1000, TYPE_LABEL_KEY, "user"); | ||
r.observe(cpuTicks[TickType.IDLE.getIndex()] * 1000, TYPE_LABEL_KEY, "idle"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JVM mbeans provide system cpu as a single number (i assume system
+ user
?), no breakdown of system
/user
that i know of
r.observe(recv, TYPE_LABEL_KEY, "bytes_recv"); | ||
r.observe(sent, TYPE_LABEL_KEY, "bytes_sent"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JVM mbeans do not provide that i know of
r.observe(processInfo.getResidentSetSize(), TYPE_LABEL_KEY, "rss"); | ||
r.observe(processInfo.getVirtualSize(), TYPE_LABEL_KEY, "vms"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JVM mbeans provide used
memory, but i don't think this correlates to either rss
or vsz
(btw, i think vsz
is typical abbreviation not vms
)
r.observe(processInfo.getUserTime() * 1000, TYPE_LABEL_KEY, "user"); | ||
r.observe(processInfo.getKernelTime() * 1000, TYPE_LABEL_KEY, "system"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JVM mbeans provide process cpu as a single number (i assume system
+ user
?), no breakdown of system
/user
that i know of
@@ -137,6 +137,7 @@ configure(opentelemetryProjects) { | |||
jmh_core : "org.openjdk.jmh:jmh-core:${jmhVersion}", | |||
jmh_bytecode : "org.openjdk.jmh:jmh-generator-bytecode:${jmhVersion}", | |||
jsr305 : "com.google.code.findbugs:jsr305:${findBugsJsr305Version}", | |||
oshi : "com.github.oshi:oshi-core:5.1.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trask is there a plan to support external instrumentation jars?
ya, we need to support users writing/bringing their own instrumentation somehow, but we haven't gotten to that yet
Also, what's your take on having Oshi as a dependency?
well, the auto agent jar is already 28mb, so what's another 3mb? 😭
(side note: maybe we should use the all
classifier after all, to be clear that it's the kitchen sink of distributions)
I agree that it would be nice to split this into two modules, one that only has only JVM (mbean) metrics, and one that has only system (oshi) metrics.
The JVM metric module could detect if the system metrics module is on the classpath and disable anything that's duplicative with the system metrics (e.g. don't capture cpu total
via JVM metrics, since system metrics captures cpu system
+ user
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just some comments related to what's available (or not available) via JVM (mbean) metrics
Closing as we will be moving this one to the java-instrumentation repo. |
This is an initial draft PR for having an SDK-contrib library that automatically reports a few system metrics (initial implementations for other languages exist, such as Python) using observers.
The Oshi library is used to gather hardware information, and the reported metrics are (in either
bytes
orseconds
, depending on the metric itself):Internally the class is creating its own
IntervalMetricReader
. Also, these metric names will be updated when we have the related semantic conventions.Tests, configuration support and code lifting code (such as maybe moving the small
Callback
code to their own file) is needed, but wanted to get initial feedback on this feature.