Change http_request metric names to match go #15

kirussel · 2015-09-02T16:56:39Z

I recently used prometheus to monitor web servers in go and ruby/rake that are using the official prometheus client libraries. I noticed that the go and ruby/rake client libs were using different names for the http metrics:

go: http_request_duration_microseconds
ruby/rake: http_request_durations_microseconds

I assumed that the prometheus clients should be using the same metric names.

brian-brazil · 2015-09-02T17:21:35Z

If we're going to change this we should probably switch to seconds while we're at it, and standardise this similar to the standard exports.

beorn7 · 2015-09-02T17:32:27Z

http_request_durations_microseconds < http_request_duration_microseconds < http_request_duration_seconds :)

In the upcoming revamp of client_golang, I'll press for seconds everywhere, too.

beorn7 · 2015-09-02T17:33:07Z

@juliusv @grobie Who is the canonical approver for client_ruby PRs?

juliusv · 2015-09-02T22:40:33Z

It's @grobie.

kirussel · 2015-09-09T13:41:38Z

I had assumed this would be an easy change. It would appear that keeping the metrics standardized is tricky. If there is anything I can do to help, let me know.

brian-brazil · 2015-09-09T13:50:50Z

Thinking on it, I'm not sure it's going to be possible to completely standardise as most languages will have multiple HTTP server libraries or middleware each of which we should probably encourage to export similarly (metric type, labels, which if any buckets/quantiles to expose) - but we also need to avoid clashing on names.

Considering this is rack instrumentation, rack should probably appear in the metric name.

What do others think?

beorn7 · 2015-09-09T15:01:58Z

I think in general we need the semi-automated HTTP instrumentation to be configurable. Still, we need sane defaults. That will need a more general discussion. What about this change? Should we not change anything until we have settled on something, or should we go for http_request_duration_seconds now as a pretty safe bet for the final name?

Having rack in the metric name might not always be desired as you might want to aggregate across HTTP request durations from different libraries or middlewares.

brian-brazil · 2015-09-09T15:30:09Z

Having rack in the metric name might not always be desired as you might want to aggregate across HTTP request durations from different libraries or middlewares.

Having different metric names doesn't prevent that, and that seems like a very rare use case.

beorn7 · 2015-09-09T16:06:26Z

I challenge that it is a rare use case.

The naming would also be misleading as I don't care on a high level service view what middleware served the HTTP requests I want to monitor.

brian-brazil · 2015-09-09T16:26:40Z

The problem is though that a given binary may have multiple of such middlewares, so it's not safe for them to share metric names. I care a lot about which piece of code is reporting a given metric, as otherwise I don't know exactly what it's measuring.

beorn7 · 2015-09-09T16:59:00Z

Put it in a label, then? At least the architecture of client_golang allows that in a way that still guarantees that no collisions will happen.

brian-brazil · 2015-09-09T23:55:32Z

That'd violate the rule that metrics should be meaningful if summed or averaged. With very few exceptions, a given metric should exist only within a single file.

beorn7 · 2015-09-10T08:53:26Z

It does not violate the rule. It's perfectly reasonable to sum up all HTTP requests within a binary.

brian-brazil · 2015-09-10T09:02:57Z

It's reasonable to count all HTTP requests going through a given bit of code. If you're summing it across several bits of code in different parts of the stack you'll end up counting some requests multiple times, so it's not valid.

Each bit of middleware or other http processing code needs it's own metric name. The question here is do we try and want to standardise the general structure of those metrics.

beorn7 · 2015-09-10T09:41:44Z

I'm not referring to that case. I'm referring to the case where some requests go through one middleware and some through another.

I'm not arguing that there might be the case where you want to encode the middleware in the metric name. I'm just arguing it's not a good default to do so.

brian-brazil · 2015-09-10T09:57:02Z

You'd have to be very careful with that case - you'd want both middlewares to have exactly the semantics for that to be correct, which is very unlikely (I've seen exactly one use case in this area, where there was an odd interplay between plugins and the framework on top of them). If you wanted that in the same metric the right way would be to do it in the common code that called both middlewares. Otherwise such a use should be strongly discouraged, and not something we should be offering any library-level support for.

beorn7 · 2015-09-10T11:01:45Z

OK, now we have put the whole discussion upside-down, implying that a generic name for the HTTP request duration metric is equivalent to library-level support for collecting metrics at various places in the code.

After going down that rabbit hole, I still think we should not by default expose details of the used middleware in something as fundamental as the default metric name for HTTP request duration. Please note that it would make utterly no sense in Go as all the middlewares I know work with the same interface, so the function to instrument HTTP calls we have right now could be used on any of them, which makes a reasonable default name that would include the name of the middlename pointless.

While convenience HTTP instrumentation should definitely be configurable, it should use a very generic metric name by default. http_request_duration_seconds sounds most reasonable to me.

brian-brazil · 2015-09-10T11:47:20Z

I still think we should not by default expose details of the used middleware in something as fundamental as the default metric name for HTTP request duration.

I think we should, as that's critical debugging information. Knowing exactly which level of the call stack the latency is reported from helps you figure out which level is introducing the latency.

Only where there's a standard http server for a given language/runtime should it get the name http_request_duration_seconds (which is the case in Go, whereas very few people use the one provided with Java). In all other cases you need a namespace indicating what other piece of code is involved.

that it would make utterly no sense in Go as all the middlewares I know work with the same interface, so the function to instrument HTTP calls we have right now could be used on any of them, which makes a reasonable default name that would include the name of the middlename pointless.

It sounds like we should expose a namespace option there. A given metric name should only be used by a single file/class/module, with very very few exceptions. Something having the same interface doesn't imply it should have the same metric name.

In any case, here we're talking about rack metrics. Rack is a middleware that has a number of functions, so it should have rack in the metric name somewhere and we should probably also instrument the time taken by the user's request handler, so the user can determine what the rack processing overhead is and how many requests actually get through to the handler.

beorn7 · 2015-09-10T12:33:06Z

We can do all that, just not by default. In a microservice architecture (which is the one Prometheus was built for), I will have many languages and in each many different ways of serving HTTP requests. Sure, in many situations, I want very specific metric names, and that has to be possible in an easy way. But we need a sane default that doesn't depend on the language I have implemented my microservice in.

brian-brazil · 2015-09-10T12:48:00Z

If that's the default, then it's going to cause breakage as two distinct bits of code will be exporting the same timeseries as they both will think they know the latency of say GET /myapi but in reality they only know it relative to their own subsystem/layer. As soon as there's more than one place that you can measure http_request_duration_seconds from across all code written in a given language/runtime, you need to use different metric names to distinguish each of them.
The current Go behaviour is not safe, and we shouldn't propagate it to other languages.

We can standardise that the metrics be called something_http_request_duration_seconds (with one prefix-less metric permitted per language/runtime) and that the help string should clarify exactly what it's measuring, but that's about it.

beorn7 · 2015-09-10T16:56:54Z

I have to object again: The current behavior of client_golang is safe as the library will not allow the colliding metrics to register. It's possible that other client libraries are less mature and cannot guarantee consistency.

brian-brazil · 2015-09-10T20:26:50Z

That's still unsafe in Go, the Go client gives you an error so you find out sooner that it's broken. If a user pulls in two middlewares that both happen to define the same metric, then it'll break. This is not a good outcome for the user, and will not give them a good impression of Prometheus.

grobie · 2015-09-10T20:41:07Z

I can't follow your argumentation. The ruby client doesn't let you register the same metric twice either. Rack is de facto the standard for ruby HTTP libraries. It's not a web server or a HTTP library though, it's simply an interface description saying a middleware has to respond to call(env). Adding rack to the metric name doesn't help in any way to understand better where the metric comes from.

The ruby client does not automatically use this middleware. It's provided as additional option for users to make their lives easier. It has to be explicitly embedded in a chain.

Change http_request metric names to match go

beorn7 · 2015-09-11T08:37:03Z

@grobie Thanks for ending the stalemate. 😄 I think we should still go from microseconds -> seconds eventually. That seemed to be the consensus by now.

@brian-brazil We seem to have very different definitions of "safe". in golang_client, we have acknowledged that we cannot catch everything on compile time. So the whole library is designed to fail on startup time (rather than scrape time, which would be a bad and possibly dangerous experience indeed) in case there is a misconfiguration. With the current function to instrument an HTTP handler, you have to pass in a handler name, essentially telling the system "I'm instrumenting the handler for this endpoint." If you instrument it twice accidentally, than that's an error and will be reported to you. That's not a bad outcome for you as the user, that's a very very good outcome. If you do advanced stuff, like instrumenting various layers of middleware for the same endpoint, you can still configure different metric names or labels. But I really don't want to make things harder for the normal use case to help advanced users avoiding mistakes, especially if those mistakes will be detected and not go unnoticed.

In completely different news, my current work on client_golang has already taught me how hard it is to provide a generally useful convenience instrumentation of HTTP endpoints (now that we have histograms and summaries with different use cases, plus a number of more issues). Not sure yet where I'll end up, but at the moment, I'm trying to find a way to make a configurable instrumentation very easy instead of providing one preconfigured one...

brian-brazil · 2015-09-11T08:51:41Z

If rack is the defacto standard, then it getting the unprefixed name makes sense. We'll still need some standardisation around that (e.g. it's the metric for the first http layer typically hit).

If you do advanced stuff, like instrumenting various layers of middleware for the same endpoint, you can still configure different metric names or labels.

I think here's the confusion, I'm not worried about "you" doing this as that's obviously your error and one you can easily fix. I'm worried about multiple independent library authors doing it which then makes it impossible to use those libraries together - instrumentation's lite version of dependency hell. Catching it at compile or startup time is better than at scrape time, but it's still the same fundamental problem.

If the "you" can't change those libraries, then it's a very bad outcome. I'd also like to avoid this all ending up being configurable (as that's not scalable in different ways), and aim for library metrics to be globally unique across an entire language.

My vision is that library authors provide useful prometheus instrumentation by default, without any work required by the user other than choosing which monitoring system they want to hook it into.

In completely different news, my current work on client_golang has already taught me how hard it is to provide a generally useful convenience instrumentation of HTTP endpoints

You should try Python, at least Go has a common interface :)

beorn7 · 2015-09-11T10:16:15Z

On Fri, Sep 11, 2015 at 10:51 AM, Brian Brazil notifications@github.com wrote:

If the "you" can't change those libraries, then it's a very bad outcome. I'd also like to avoid this all ending up being configurable (as that's not scalable in different ways), and aim for library metrics to be globally unique across an entire language.

My vision is that library authors provide useful prometheus instrumentation by default, without any work required by the user other than choosing which monitoring system they want to hook it into.

That's all true, but you are talking about a library doing
"something", and that "something" happens to be instrumented. That
instrumentation should be properly namespaced. (Which will BTW still
not safely avoid collisions unless we go for Java style domain
namespacing... so I guess an option to change the namespace for
instrumentation in a library is still good to have... We should
include best practices for library instrumentation in the docs.)

However, this PR is not about how to instrument a library, this is
about the library to instrument (other code or other libraries).

Björn Rabenstein, Engineer
http://soundcloud.com/brabenstein

kirussel added 3 commits September 2, 2015 12:47

Change http_request metric names to match go

7ef2ccc

Update continuous build to match metric changes

5e48019

Remove unecessary spacing continuous build errors

9d88e00

grobie added a commit that referenced this pull request Sep 10, 2015

Merge pull request #15 from kirussel/rename_rack_metrics

02a57fe

Change http_request metric names to match go

grobie merged commit 02a57fe into prometheus:master Sep 10, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change http_request metric names to match go #15

Change http_request metric names to match go #15

kirussel commented Sep 2, 2015

brian-brazil commented Sep 2, 2015

beorn7 commented Sep 2, 2015

beorn7 commented Sep 2, 2015

juliusv commented Sep 2, 2015

kirussel commented Sep 9, 2015

brian-brazil commented Sep 9, 2015

beorn7 commented Sep 9, 2015

brian-brazil commented Sep 9, 2015

beorn7 commented Sep 9, 2015

brian-brazil commented Sep 9, 2015

beorn7 commented Sep 9, 2015

brian-brazil commented Sep 9, 2015

beorn7 commented Sep 10, 2015

brian-brazil commented Sep 10, 2015

beorn7 commented Sep 10, 2015

brian-brazil commented Sep 10, 2015

beorn7 commented Sep 10, 2015

brian-brazil commented Sep 10, 2015

beorn7 commented Sep 10, 2015

brian-brazil commented Sep 10, 2015

beorn7 commented Sep 10, 2015

brian-brazil commented Sep 10, 2015

grobie commented Sep 10, 2015

beorn7 commented Sep 11, 2015

brian-brazil commented Sep 11, 2015

beorn7 commented Sep 11, 2015

Change http_request metric names to match go #15

Change http_request metric names to match go #15

Conversation

kirussel commented Sep 2, 2015

brian-brazil commented Sep 2, 2015

beorn7 commented Sep 2, 2015

beorn7 commented Sep 2, 2015

juliusv commented Sep 2, 2015

kirussel commented Sep 9, 2015

brian-brazil commented Sep 9, 2015

beorn7 commented Sep 9, 2015

brian-brazil commented Sep 9, 2015

beorn7 commented Sep 9, 2015

brian-brazil commented Sep 9, 2015

beorn7 commented Sep 9, 2015

brian-brazil commented Sep 9, 2015

beorn7 commented Sep 10, 2015

brian-brazil commented Sep 10, 2015

beorn7 commented Sep 10, 2015

brian-brazil commented Sep 10, 2015

beorn7 commented Sep 10, 2015

brian-brazil commented Sep 10, 2015

beorn7 commented Sep 10, 2015

brian-brazil commented Sep 10, 2015

beorn7 commented Sep 10, 2015

brian-brazil commented Sep 10, 2015

grobie commented Sep 10, 2015

beorn7 commented Sep 11, 2015

brian-brazil commented Sep 11, 2015

beorn7 commented Sep 11, 2015