Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add label and mapping support in dropwizard metric exporter #98

Closed
wants to merge 2 commits into from

Conversation

dopuskh3
Copy link

@dopuskh3 dopuskh3 commented Jan 4, 2016

Introcude a metric mapper (higly inspired from jmx_exporter code) that allows to map dropwizard metrics to target metric and labels.

This code will be re-used in jmx_exporter in a further review.

Francois Visconte added 2 commits January 4, 2016 13:44
This component map a metric name for a metric name, label names
and values by using a yaml configuration.
Allow to use a mapping configuration in DropWizard collector.
@@ -39,6 +39,11 @@
<artifactId>simpleclient</artifactId>
<version>0.0.12-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.yaml</groupId>
<artifactId>snakeyaml</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't add dependencies in here, it's used by the pushgateway and servlet and this may clash with other versions the user is using.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move the MetricMapper code in the dropwizard package then

@brian-brazil
Copy link
Contributor

The Prometheus standard for dealing with dotted strings is glob based rather than regex based, see https://github.com/prometheus/graphite_exporter#metric-mapping-and-configuration Accordingly I'd like the dropwizard collector to use the same approach for consistency.

We only use regexes as a last resort when metrics are too unstructured to deal with in any other way.

final String snakeCaseMetricName = snakeCasePattern.matcher(metricName).replaceAll("$1_$2").toLowerCase();

for (Rule rule : rules) {
Matcher matcher = null;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move the inside of the loop in a sub-method ? (This will allow you to write a unit test to test only one iteration)

@dopuskh3
Copy link
Author

dopuskh3 commented Jan 4, 2016

Dropwizard metrics are not always dot-separated names. Consequently, using regex mapping allows a wider range of mapping. Also note that metrics exported as jmx are often exposed with dropwizard library (kafka for example), that's the reason why I chosed to use the jmx_exporter code for inspiration.
Finally, note that with caching, the regex code will be called only once per source metric and will have a negligible overhead.

@brian-brazil
Copy link
Contributor

Dropwizard metrics are not always dot-separated names.

Can you give an example of this where the labels aren't dot-separated?

@dopuskh3
Copy link
Author

dopuskh3 commented Jan 4, 2016

Using MetricRegistry.name() helper is not mandatory. This helper create graphite-style metric names by using dot as a separator but a lot of user are not using this helper. A lot of users are also using MixedCaseMetricNames as the last metric component when using this helper. In this case, using regex can be particularly useful.

@brian-brazil
Copy link
Contributor

Mixed case metric names shouldn't present a problem, worst case you just leave them as-is. Dealing with other instrumentation and monitoring systems is always a question of compromises.

I'd prefer to keep things simple where possible, do you have an example of something that'd be impossible if we used the approach the graphite exporter uses? If so then we likely also need to change both the graphite and statsd exporters.

@dopuskh3
Copy link
Author

dopuskh3 commented Jan 4, 2016

From the kafka code (using yammer metrics old name for dropwizard):

val failedProduceRequestRate = newMeter("FailedProduceRequestsPerSec", "requests", TimeUnit.SECONDS, tags)

val totalProduceRequestRate = newMeter("TotalProduceRequestsPerSec", "requests", TimeUnit.SECONDS, tags)

With the approach used in the graphite exporter, merging both metrics into a single prometheus metric is not possible.

With the proposed approach, snake case metric name + regex allows to map (.+)_produce_requests_per_sec to the same metric with a status label (status: $1)

@brian-brazil
Copy link
Contributor

I don't see any fundamental problem there, you can still get at the data with the right labels where there are labels (in this case there aren't labels). Many times when working with 3rd party monitoring things that might make sense as labels aren't practical to do as labels. It's not a show-stopper, it's just a bit annoying.

In addition, what you're trying to do is bad practice. Separate failed and total counters are recommended as they're easier to work with.

@unbrice
Copy link

unbrice commented Jan 6, 2016

Not a show stopper, but it'd be cool to have internal consistency between metric names of a given deployment. This makes discoverability better and improves the user experience.

OTOH I agree care must be given to making Prometheus consistent.

I suspect that one of the reason behind the design of the rules applied to graphite metrics was to make them reversible (so as to be able to read historical data if/when we can reading back from graphite).
To contrast, the JMX exporter used a different syntax since it did not need to be reversible and using a different syntax improved internal consistency of a given deployment (and therefore discoverability and user experience).

I want to make the argument that given this understanding above, the dropwizard exporter is more similar to the JMX exporter than to the graphite potentially-reversible exporter. With this in mind, it seems that the dropwizard exporter should mimick the JMX integration, not the graphite one. This also means using YAML config files, which is similar to what happens elsewhere in the project.

Would you be willing to allow using the same rules for JMX exporter and dropwizard exporter?

@brian-brazil
Copy link
Contributor

Not a show stopper, but it'd be cool to have internal consistency between metric names of a given deployment.

You can always instrument directly with Prometheus :) You can pull out the metrics from the java client into other monitoring systems too.

I suspect that one of the reason behind the design of the rules applied to graphite metrics was to make them reversible (so as to be able to read historical data if/when we can reading back from graphite).

I'm not aware of that being a reason. That data path wouldn't involve the graphite exporter, as it's about collection rather than long-term storage.

To contrast, the JMX exporter used a different syntax since it did not need to be reversible

The JMX exporter uses regexes as there were four different types of information with effectively no standardisation that could be part of a metric. Attempting to produce a DSL with the flexibility required would not have been sane. Thus we use our last resort, throw it all into a string and use regexes.

With dropwizard metrics following the dotted string format, this isn't a problem as there's only one type of information and a standard format. We can offer a simple DSL.

I want to make the argument that given this understanding above, the dropwizard exporter is more similar to the JMX exporter than to the graphite potentially-reversible exporter.

The graphite exporter and dropwizard collector are both dealing with dotted strings, there's no real similarity with the JMX exporter which is dealing with mBeans.

If you want to propose that the JMX exporter style configuration is the right way to handle dropwizard metrics, then you must also argue that it's the right way to handle graphite and statsd as they're dealing with the exact same type of data.

@unbrice
Copy link

unbrice commented Jan 11, 2016

Thanks Brian. After reading your arguments, I agree that you are right to say that a DSL like the one of the graphite exporter is appropriate for dropwizard metrics.

@felixbarny
Copy link
Contributor

If you want to use dropwizard metrics but with rich (name, labels) metric names, you could also take a look at my open source project stagemonitor.

Stagemonitor has its own MetricRegistry but uses the Metric objects from dropwizard:

metricRegistry.counter(name("foo_bar").tag("baz", "qux").build()).inc();

I just added support for Prometheus: stagemonitor/stagemonitor@d4fa87a

@brian-brazil
Copy link
Contributor

More integrations are always good. Do you want to send a PR to add that one to the docs?

I'd generally recommend Prometheus's Java metrics over Dropwizard. Our API and semantics are a bit better, even ignoring labels.

@felixbarny
Copy link
Contributor

Sure. What would be the right place to document that?

Could you elaborate why your semantics are better?

@brian-brazil
Copy link
Contributor

@felixbarny
Copy link
Contributor

From looking at the source of Summary and Histogram they don't seem to support recency of response times. What I mean by that is they seem to take all response times into account since the application has started and not only recent response times like the dropwizard's Timer and its forward-decaying priority reservoir does. Or did I overlook something?

@brian-brazil
Copy link
Contributor

brian-brazil commented May 20, 2016

It works the same was as the Prometheus Counter. They export cumulative stats, and from that Prometheus can calculate the latency over any time window without bias towards some samples.

@felixbarny
Copy link
Contributor

I guess I see. But the Summary, which could be compared to dropwizard's Timer does not support quantiles or does it?

@brian-brazil
Copy link
Contributor

It doesn't currently, but it is being worked on.

The problem with client-side quantiles is that they're expensive (locking) and can't be aggregated. Histograms don't have those problems.

@felixbarny
Copy link
Contributor

The problem with the histogram is that you have to know the distribution and range beforehand. For a general purpose monitoring solution like stagemonitor which strongly believes in simplicity for users (ie I don't want to require users to configure that) the histograms don't seem feasible. (Have you though about dynamically adjusting the buckets at runtime?)

How will the impl of snapshots look like? Simmilar to dropwizard's decaying reservoir?

@brian-brazil
Copy link
Contributor

(Have you though about dynamically adjusting the buckets at runtime?)

Doesn't work, as all clients need to have identical buckets for aggregation to work.

How will the impl of snapshots look like? Simmilar to dropwizard's decaying reservoir?

It'll store samples for the last 5m or so, and calculate from there.

@brian-brazil
Copy link
Contributor

Please reopen if you implement the glob-based approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants