New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add label and mapping support in dropwizard metric exporter #98
Conversation
This component map a metric name for a metric name, label names and values by using a yaml configuration.
Allow to use a mapping configuration in DropWizard collector.
@@ -39,6 +39,11 @@ | |||
<artifactId>simpleclient</artifactId> | |||
<version>0.0.12-SNAPSHOT</version> | |||
</dependency> | |||
<dependency> | |||
<groupId>org.yaml</groupId> | |||
<artifactId>snakeyaml</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't add dependencies in here, it's used by the pushgateway and servlet and this may clash with other versions the user is using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can move the MetricMapper code in the dropwizard package then
The Prometheus standard for dealing with dotted strings is glob based rather than regex based, see https://github.com/prometheus/graphite_exporter#metric-mapping-and-configuration Accordingly I'd like the dropwizard collector to use the same approach for consistency. We only use regexes as a last resort when metrics are too unstructured to deal with in any other way. |
final String snakeCaseMetricName = snakeCasePattern.matcher(metricName).replaceAll("$1_$2").toLowerCase(); | ||
|
||
for (Rule rule : rules) { | ||
Matcher matcher = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move the inside of the loop in a sub-method ? (This will allow you to write a unit test to test only one iteration)
Dropwizard metrics are not always dot-separated names. Consequently, using regex mapping allows a wider range of mapping. Also note that metrics exported as jmx are often exposed with dropwizard library (kafka for example), that's the reason why I chosed to use the jmx_exporter code for inspiration. |
Can you give an example of this where the labels aren't dot-separated? |
Using MetricRegistry.name() helper is not mandatory. This helper create graphite-style metric names by using dot as a separator but a lot of user are not using this helper. A lot of users are also using MixedCaseMetricNames as the last metric component when using this helper. In this case, using regex can be particularly useful. |
Mixed case metric names shouldn't present a problem, worst case you just leave them as-is. Dealing with other instrumentation and monitoring systems is always a question of compromises. I'd prefer to keep things simple where possible, do you have an example of something that'd be impossible if we used the approach the graphite exporter uses? If so then we likely also need to change both the graphite and statsd exporters. |
From the kafka code (using yammer metrics old name for dropwizard): val failedProduceRequestRate = newMeter("FailedProduceRequestsPerSec", "requests", TimeUnit.SECONDS, tags) val totalProduceRequestRate = newMeter("TotalProduceRequestsPerSec", "requests", TimeUnit.SECONDS, tags) With the approach used in the graphite exporter, merging both metrics into a single prometheus metric is not possible. With the proposed approach, snake case metric name + regex allows to map (.+)_produce_requests_per_sec to the same metric with a status label (status: $1) |
I don't see any fundamental problem there, you can still get at the data with the right labels where there are labels (in this case there aren't labels). Many times when working with 3rd party monitoring things that might make sense as labels aren't practical to do as labels. It's not a show-stopper, it's just a bit annoying. In addition, what you're trying to do is bad practice. Separate failed and total counters are recommended as they're easier to work with. |
Not a show stopper, but it'd be cool to have internal consistency between metric names of a given deployment. This makes discoverability better and improves the user experience. OTOH I agree care must be given to making Prometheus consistent. I suspect that one of the reason behind the design of the rules applied to graphite metrics was to make them reversible (so as to be able to read historical data if/when we can reading back from graphite). I want to make the argument that given this understanding above, the dropwizard exporter is more similar to the JMX exporter than to the graphite potentially-reversible exporter. With this in mind, it seems that the dropwizard exporter should mimick the JMX integration, not the graphite one. This also means using YAML config files, which is similar to what happens elsewhere in the project. Would you be willing to allow using the same rules for JMX exporter and dropwizard exporter? |
You can always instrument directly with Prometheus :) You can pull out the metrics from the java client into other monitoring systems too.
I'm not aware of that being a reason. That data path wouldn't involve the graphite exporter, as it's about collection rather than long-term storage.
The JMX exporter uses regexes as there were four different types of information with effectively no standardisation that could be part of a metric. Attempting to produce a DSL with the flexibility required would not have been sane. Thus we use our last resort, throw it all into a string and use regexes. With dropwizard metrics following the dotted string format, this isn't a problem as there's only one type of information and a standard format. We can offer a simple DSL.
The graphite exporter and dropwizard collector are both dealing with dotted strings, there's no real similarity with the JMX exporter which is dealing with mBeans. If you want to propose that the JMX exporter style configuration is the right way to handle dropwizard metrics, then you must also argue that it's the right way to handle graphite and statsd as they're dealing with the exact same type of data. |
Thanks Brian. After reading your arguments, I agree that you are right to say that a DSL like the one of the graphite exporter is appropriate for dropwizard metrics. |
If you want to use dropwizard metrics but with rich (name, labels) metric names, you could also take a look at my open source project stagemonitor. Stagemonitor has its own metricRegistry.counter(name("foo_bar").tag("baz", "qux").build()).inc(); I just added support for Prometheus: stagemonitor/stagemonitor@d4fa87a |
More integrations are always good. Do you want to send a PR to add that one to the docs? I'd generally recommend Prometheus's Java metrics over Dropwizard. Our API and semantics are a bit better, even ignoring labels. |
Sure. What would be the right place to document that? Could you elaborate why your semantics are better? |
It'd be a 3rd party item on https://prometheus.io/docs/instrumenting/exporters/ http://www.robustperception.io/how-does-a-prometheus-counter-work/ and https://prometheus.io/docs/practices/histograms/ cover the main aspects. |
From looking at the source of Summary and Histogram they don't seem to support recency of response times. What I mean by that is they seem to take all response times into account since the application has started and not only recent response times like the dropwizard's Timer and its forward-decaying priority reservoir does. Or did I overlook something? |
It works the same was as the Prometheus Counter. They export cumulative stats, and from that Prometheus can calculate the latency over any time window without bias towards some samples. |
I guess I see. But the Summary, which could be compared to dropwizard's Timer does not support quantiles or does it? |
It doesn't currently, but it is being worked on. The problem with client-side quantiles is that they're expensive (locking) and can't be aggregated. Histograms don't have those problems. |
The problem with the histogram is that you have to know the distribution and range beforehand. For a general purpose monitoring solution like stagemonitor which strongly believes in simplicity for users (ie I don't want to require users to configure that) the histograms don't seem feasible. (Have you though about dynamically adjusting the buckets at runtime?) How will the impl of snapshots look like? Simmilar to dropwizard's decaying reservoir? |
Doesn't work, as all clients need to have identical buckets for aggregation to work.
It'll store samples for the last 5m or so, and calculate from there. |
Please reopen if you implement the glob-based approach. |
Introcude a metric mapper (higly inspired from jmx_exporter code) that allows to map dropwizard metrics to target metric and labels.
This code will be re-used in jmx_exporter in a further review.