Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] Initial System metrics proposal #1323

Closed

Conversation

carlosalberto
Copy link
Contributor

This is an initial draft PR for having an SDK-contrib library that automatically reports a few system metrics (initial implementations for other languages exist, such as Python) using observers.

The Oshi library is used to gather hardware information, and the reported metrics are (in either bytes or seconds, depending on the metric itself):

  • System memory (total and available).
  • System CPU (system, user time and idle time)
  • System Network Bytes (sent and received)
  • Java Process memory (rss and vms)
  • Java Process CPU (system and user time)
  • Java GC (only GC count at this moment).

Internally the class is creating its own IntervalMetricReader. Also, these metric names will be updated when we have the related semantic conventions.

Tests, configuration support and code lifting code (such as maybe moving the small Callback code to their own file) is needed, but wanted to get initial feedback on this feature.

@codecov
Copy link

codecov bot commented Jun 9, 2020

Codecov Report

Merging #1323 into master will decrease coverage by 1.98%.
The diff coverage is 0.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #1323      +/-   ##
============================================
- Coverage     87.80%   85.81%   -1.99%     
  Complexity     1215     1215              
============================================
  Files           155      156       +1     
  Lines          4558     4666     +108     
  Branches        418      421       +3     
============================================
+ Hits           4002     4004       +2     
- Misses          400      507     +107     
+ Partials        156      155       -1     
Impacted Files Coverage Δ Complexity Δ
...etry/sdk/contrib/metrics/system/SystemMetrics.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...telemetry/sdk/trace/export/BatchSpanProcessor.java 95.91% <0.00%> (+1.36%) 8.00% <0.00%> (ø%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update afabbd1...617350f. Read the comment docs.

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First question why do you need the ask dependency?

@jkwatson
Copy link
Contributor

jkwatson commented Jun 9, 2020

Shouldn't this be a pure API contrib-module? Or be a part of the instrumentation project?

@carlosalberto
Copy link
Contributor Author

why do you need the ask dependency?

Not sure what you mean. Can you elaborate @bogdandrutu ?

@carlosalberto
Copy link
Contributor Author

Shouldn't this be a pure API contrib-module? Or be a part of the instrumentation project?

That's a good question. We need the SDK only for creating the IntervalMetricReader, but otherwise fine - precisely this overlaps with whether you think this should end up in the Instrumentation repo (probably YES). @trask @tylerbenson you guys have an opinion on this?

@trask
Copy link
Member

trask commented Jun 10, 2020

Or be a part of the instrumentation project?

makes sense to me

@@ -137,6 +137,7 @@ configure(opentelemetryProjects) {
jmh_core : "org.openjdk.jmh:jmh-core:${jmhVersion}",
jmh_bytecode : "org.openjdk.jmh:jmh-generator-bytecode:${jmhVersion}",
jsr305 : "com.google.code.findbugs:jsr305:${findBugsJsr305Version}",
oshi : "com.github.oshi:oshi-core:5.1.1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JNA libraries tend to be pretty bloated since they basically duplicate a native interface into Java, and for cross-platform ones have a ton of these. This seems to be 600K along with JNA itself which is 3MB+ (SDK is just 250K for reference). JDK's monitoring beans provide a lot of useful information so maybe just focus on them for now and can consider adding the huge dependency in a followup if the delta in terms of metrics seems useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to use the monitoring beans if we get all the metrics defined here available. Oshi is a very well known library but I can try to not use it because of its size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if it's just about system metrics this could be helpful. But it seems more useful to users to focus on JVM metrics first? None of the popular Java metrics libraries seem to use oshi but provide very useful runtime metrics without much binary cost - such metrics should basically "always be enabled" on an instrumented app while this seems much more specialized and should just add the metrics that can't be gotten for free from the JDK.

https://github.com/dropwizard/metrics/tree/release/4.1.x/metrics-jvm/src/main/java/com/codahale/metrics/jvm
https://github.com/prometheus/client_java/tree/master/simpleclient_hotspot/src/main/java/io/prometheus/client/hotspot
https://github.com/micrometer-metrics/micrometer/tree/b486e1261cade606eb0ff826fbae318f6d0c9e86/micrometer-core/src/main/java/io/micrometer/core/instrument/binder/system

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems much more specialized and should just add the metrics that can't be gotten for free from the JDK.

That's correct, and this is the precisely the idea: to provide a set of complete system-wide metrics (and was the reason why I wanted to initially put it here, as a separate, extra artifact that some users can use).

@trask is there a plan to support external instrumentation jars? In that case, this could be provided at runtime. Also, what's your take on having Oshi as a dependency?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many of the metrics here, like system cpu load, are available from the JDK though. And we definitely wouldn't want double instrumentation of the same metric if people added both, e.g., instrumentation-jvm and instrumentation-system. So recommendation was to start with the less complex, free stuff since that gives a better picture of what's needed in the complex, dependency-heavy version. So this feels like it's the wrong order to be addressing the issue in IMO. Does that make sense?

Copy link
Contributor Author

@carlosalberto carlosalberto Jun 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a rather small codebase, so I don't see the need to postpone the discussion on what to use/report ;) I'd argue that it's not a wrong order situation, as I mentioned before, the idea is precisely to provide system-wide metrics.

we definitely wouldn't want double instrumentation of the same metric if people added both

It's an interesting scenario to have them separated. But in that case I'd rather have the system-wide disabled or something similar. Unless there's a clear path that justifies having them separated (in that case I'b stick to that, of course).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trask is there a plan to support external instrumentation jars?

ya, we need to support users writing/bringing their own instrumentation somehow, but we haven't gotten to that yet

Also, what's your take on having Oshi as a dependency?

well, the auto agent jar is already 28mb, so what's another 3mb? 😭

(side note: maybe we should use the all classifier after all, to be clear that it's the kitchen sink of distributions)

I agree that it would be nice to split this into two modules, one that only has only JVM (mbean) metrics, and one that has only system (oshi) metrics.

The JVM metric module could detect if the system metrics module is on the classpath and disable anything that's duplicative with the system metrics (e.g. don't capture cpu total via JVM metrics, since system metrics captures cpu system + user).

Copy link
Contributor

@anuraaga anuraaga Jun 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the jvm module disable metrics if oshi is being used is a nice idea.

Size-wise, let's not forget we are trying to move towards also supporting manual instrumentation, not just agent. Though 10% of the entire agent seems pretty hefty FWIW. System metrics also have the issue that multiple jvms on the same system will also provide redundant metrics, common with containers. That should be taken into account too when evaluating the cost, it does take actual RAM to load large ffi modules. I would expect if packaged into the agent jar it's not enabled by default because of this potentially exploding cost to users.

That's enough high level context, for this PR how about removing the GC metrics and only take stuff from oshi? Can add GC metrics to a separate jvm metrics module

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah found that we actually already have such a class too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it would be nice to split this into two modules, one that only has only JVM (mbean) metrics, and one that has only system (oshi) metrics.

That's fine by me, as long as one module is for JVM-specific metrics (the current metrics module in contrib, with a few metric additions) and have this one is for System-wide metrics (which could be disabled by default). However, skipping the Oshi dependency wouldn't be avoided ;)

Btw, related effort: open-telemetry/opentelemetry-specification#651

@@ -137,6 +137,7 @@ configure(opentelemetryProjects) {
jmh_core : "org.openjdk.jmh:jmh-core:${jmhVersion}",
jmh_bytecode : "org.openjdk.jmh:jmh-generator-bytecode:${jmhVersion}",
jsr305 : "com.google.code.findbugs:jsr305:${findBugsJsr305Version}",
oshi : "com.github.oshi:oshi-core:5.1.1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OSHI 4.x and 5.x require minimum Java 8 compatibility. We still support Java7, don't we?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API and the SDK do, but that doesn't mean that all instrumentation has to.

@tylerbenson
Copy link
Member

I think it would make sense to have this in the instrumentation repo.

Comment on lines +86 to +87
r.observe(mem.getTotal(), TYPE_LABEL_KEY, "total");
r.observe(mem.getAvailable(), TYPE_LABEL_KEY, "available");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JVM mbeans provide available, but no total that i know of

Comment on lines +102 to +104
r.observe(cpuTicks[TickType.SYSTEM.getIndex()] * 1000, TYPE_LABEL_KEY, "system");
r.observe(cpuTicks[TickType.USER.getIndex()] * 1000, TYPE_LABEL_KEY, "user");
r.observe(cpuTicks[TickType.IDLE.getIndex()] * 1000, TYPE_LABEL_KEY, "idle");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JVM mbeans provide system cpu as a single number (i assume system + user?), no breakdown of system/user that i know of

Comment on lines +126 to +127
r.observe(recv, TYPE_LABEL_KEY, "bytes_recv");
r.observe(sent, TYPE_LABEL_KEY, "bytes_sent");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JVM mbeans do not provide that i know of

Comment on lines +142 to +143
r.observe(processInfo.getResidentSetSize(), TYPE_LABEL_KEY, "rss");
r.observe(processInfo.getVirtualSize(), TYPE_LABEL_KEY, "vms");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JVM mbeans provide used memory, but i don't think this correlates to either rss or vsz (btw, i think vsz is typical abbreviation not vms)

Comment on lines +158 to +159
r.observe(processInfo.getUserTime() * 1000, TYPE_LABEL_KEY, "user");
r.observe(processInfo.getKernelTime() * 1000, TYPE_LABEL_KEY, "system");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JVM mbeans provide process cpu as a single number (i assume system + user?), no breakdown of system/user that i know of

@@ -137,6 +137,7 @@ configure(opentelemetryProjects) {
jmh_core : "org.openjdk.jmh:jmh-core:${jmhVersion}",
jmh_bytecode : "org.openjdk.jmh:jmh-generator-bytecode:${jmhVersion}",
jsr305 : "com.google.code.findbugs:jsr305:${findBugsJsr305Version}",
oshi : "com.github.oshi:oshi-core:5.1.1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trask is there a plan to support external instrumentation jars?

ya, we need to support users writing/bringing their own instrumentation somehow, but we haven't gotten to that yet

Also, what's your take on having Oshi as a dependency?

well, the auto agent jar is already 28mb, so what's another 3mb? 😭

(side note: maybe we should use the all classifier after all, to be clear that it's the kitchen sink of distributions)

I agree that it would be nice to split this into two modules, one that only has only JVM (mbean) metrics, and one that has only system (oshi) metrics.

The JVM metric module could detect if the system metrics module is on the classpath and disable anything that's duplicative with the system metrics (e.g. don't capture cpu total via JVM metrics, since system metrics captures cpu system + user).

Copy link
Member

@trask trask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just some comments related to what's available (or not available) via JVM (mbean) metrics

@carlosalberto
Copy link
Contributor Author

Closing as we will be moving this one to the java-instrumentation repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants