Improve rack middlewares #32

grobie · 2016-09-09T20:25:36Z

Move from prometheus/client/rack/* to prometheus/middleware/*
Use histogram instead of summary in collector middleware
Spepicfy collector metrics by changing the prefix to http_server_*
Remove HTTP Host header from default label builder

Fixes #27.

@tmc and @mikeurbach as users of the library.
@brian-brazil for comments on the default metrics names and help strings.

# TYPE http_server_requests_total counter
# HELP http_server_requests_total The total number of HTTP requests handled by the Rack application.
http_server_requests_total{method="get",path="/",code="200"} 74
http_server_requests_total{method="post",path="/",code="200"} 64
http_server_requests_total{method="delete",path="/",code="404"} 18
http_server_requests_total{method="delete",path="/",code="200"} 61
http_server_requests_total{method="get",path="/",code="404"} 8
http_server_requests_total{method="post",path="/",code="404"} 15
# TYPE http_server_request_latency_seconds histogram
# HELP http_server_request_latency_seconds The HTTP response latency of the Rack application.
http_server_request_latency_seconds{method="get",path="/",code="200",le="0.005"} 74
http_server_request_latency_seconds{method="get",path="/",code="200",le="0.01"} 74
http_server_request_latency_seconds{method="get",path="/",code="200",le="0.025"} 74
...
# TYPE http_server_exceptions_total counter
# HELP http_server_exceptions_total The total number of exceptions raised by the Rack application.
http_server_exceptions_total{exception="NoMethodError"} 10

I was thinking about removing http_server_requests_total as it's included in the histogram, but I've seen beginners to be heavily confused using http_server_request_latency_seconds_count as counter. I guess it's not too harmful.

coveralls · 2016-09-09T20:31:30Z

Coverage decreased (-0.002%) to 99.087% when pulling 271d793 on grobie/middleware-rewrite into 5e16541 on master.

brian-brazil · 2016-09-10T04:14:06Z

lib/prometheus/middleware/collector.rb

+      def init_request_metrics
+        @requests = @registry.counter(
+          :http_server_requests_total,
+          'The total number of HTTP requests handled by the Rack application.',


If it's from rack, then the metric name should contain "rack"

Rack is the de facto standard interface for ruby HTTP servers. It can be implemented by many different servers (unicorn, thin, puma, ...). I thought about including it, but concluded after some discussions that including it in the metric name is neither expected nor helpful.

olleolleolle · 2016-12-20T12:37:14Z

Thanks a lot for improving the Rack middleware.

As a novice Prometheus user providing data from a Rack app, it was nice to see a middleware.

As I got it up and presenting data, I later learned that the "summary" metric was wrong for what we'd tried to use it for. We now find ourselves in the wrong column of table "Histogram vs Summary".

Can we move this forward in any way? Are all of the understanding-improving/renaming comments blocking a merge?

grobie · 2016-12-20T19:30:32Z

That, but mostly it's me traveling for the last couple of months. I'll have some time between Christmas and new year's and it's on my list to get this rolling again.

…

On Tue, Dec 20, 2016, 06:37 Olle Jonsson ***@***.***> wrote: Thanks a lot for improving the Rack middleware. As a novice Prometheus user providing data from a Rack app, it was nice to see a middleware. As I got it up and presenting data, I later learned that the "summary" metric was wrong for what we'd tried to use it for. We now find ourselves in the wrong column of table "Histogram vs Summary" <https://prometheus.io/docs/practices/histograms/#quantiles>. Can we move this forward in any way? Are all of the understanding-improving/renaming comments blocking a merge? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#32 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAANaMMyMQWpyK8Ce5uCHmYMXEjV7WA5ks5rJ8v6gaJpZM4J5bUy> .

perlun · 2017-01-19T07:28:34Z

@grobie - any updates on this one?

grobie · 2017-01-30T06:28:23Z

I've been traveling in the Amazon jungle for the last few weeks, but will have more internet connections during the next couple of days. I'll try to pick this up again.

coveralls · 2017-02-14T04:03:34Z

Coverage remained the same at 99.09% when pulling 26b229e on grobie/middleware-rewrite into 1b458a0 on master.

coveralls · 2017-02-14T04:03:34Z

Coverage remained the same at 99.09% when pulling 26b229e on grobie/middleware-rewrite into 1b458a0 on master.

coveralls · 2017-02-14T04:03:34Z

Coverage remained the same at 99.09% when pulling 26b229e on grobie/middleware-rewrite into 1b458a0 on master.

coveralls · 2017-02-14T04:47:48Z

Coverage decreased (-0.1%) to 98.965% when pulling 2caa1bf on grobie/middleware-rewrite into 1b458a0 on master.

coveralls · 2017-02-14T04:47:48Z

Coverage decreased (-0.1%) to 98.965% when pulling 2caa1bf on grobie/middleware-rewrite into 1b458a0 on master.

coveralls · 2017-02-14T04:47:48Z

Coverage decreased (-0.1%) to 98.965% when pulling 2caa1bf on grobie/middleware-rewrite into 1b458a0 on master.

coveralls · 2017-02-14T04:47:48Z

Coverage decreased (-0.1%) to 98.965% when pulling 2caa1bf on grobie/middleware-rewrite into 1b458a0 on master.

coveralls · 2017-02-14T04:53:48Z

Coverage decreased (-0.1%) to 98.965% when pulling ac14e38 on grobie/middleware-rewrite into 1b458a0 on master.

coveralls · 2017-02-14T04:53:48Z

Coverage decreased (-0.1%) to 98.965% when pulling ac14e38 on grobie/middleware-rewrite into 1b458a0 on master.

coveralls · 2017-02-14T04:53:48Z

Coverage decreased (-0.1%) to 98.965% when pulling ac14e38 on grobie/middleware-rewrite into 1b458a0 on master.

coveralls · 2017-02-14T04:53:48Z

Coverage decreased (-0.1%) to 98.965% when pulling ac14e38 on grobie/middleware-rewrite into 1b458a0 on master.

grobie

Alright, I finally addressed almost all of the feedback. The only outstanding issues is how to handle the response code label.

The term request is pretty overloaded and I always understood it to describe here the full cycle of a HTTP request and response. For example the latency metric measures the full round-trip and not just the time until the request reaches a server (which would be very hard to define anyway).

It's a very common need to have a metric broken out by request method, path and response code. Neither http_request_... nor http_response_... metrics would include all three. I'm open to suggestions, but the current approach seems to be the best option IMHO.

As these metrics were implemented based on client_golang, I'd like to get @beorn7's and @juliusv's input as well.

grobie · 2017-02-14T05:12:35Z

The metrics generated by the included HTTP tracer would look like this:

# TYPE http_server_requests_total counter
# HELP http_server_requests_total The total number of HTTP requests handled by the Rack application.
http_server_requests_total{code="200",method="get",path="/"} 76.0
http_server_requests_total{code="200",method="post",path="/"} 67.0
http_server_requests_total{code="404",method="delete",path="/"} 16.0
http_server_requests_total{code="200",method="delete",path="/"} 62.0
http_server_requests_total{code="404",method="get",path="/"} 7.0
http_server_requests_total{code="404",method="post",path="/"} 10.0
# TYPE http_server_request_latency_seconds histogram
# HELP http_server_request_latency_seconds The HTTP response latency of the Rack application.
http_server_request_latency_seconds{method="get",path="/",le="0.005"} 83
http_server_request_latency_seconds{method="get",path="/",le="0.01"} 83
http_server_request_latency_seconds{method="get",path="/",le="0.025"} 83
http_server_request_latency_seconds{method="get",path="/",le="0.05"} 83
http_server_request_latency_seconds{method="get",path="/",le="0.1"} 83
http_server_request_latency_seconds{method="get",path="/",le="0.25"} 83
http_server_request_latency_seconds{method="get",path="/",le="0.5"} 83
http_server_request_latency_seconds{method="get",path="/",le="1"} 83
http_server_request_latency_seconds{method="get",path="/",le="2.5"} 83
...
http_server_request_latency_seconds{method="delete",path="/",le="5"} 78
http_server_request_latency_seconds{method="delete",path="/",le="10"} 78
http_server_request_latency_seconds{method="delete",path="/",le="+Inf"} 78
http_server_request_latency_seconds_sum{method="delete",path="/"} 0.003424651000000001
http_server_request_latency_seconds_count{method="delete",path="/"} 78
# TYPE http_server_exceptions_total counter
# HELP http_server_exceptions_total The total number of exceptions raised by the Rack application.
http_server_exceptions_total{exception="NoMethodError"} 12.0

coveralls · 2017-02-14T05:16:54Z

Coverage decreased (-0.1%) to 98.965% when pulling 9e03c52 on grobie/middleware-rewrite into 1b458a0 on master.

grobie · 2017-02-14T13:28:02Z

Request means request, response means response.

And still it's commonly understood that a statement "The request latency was 400ms" includes both the time to sent a HTTP request and receive a HTTP response without the need to say something like "The request and response latency was ...".

I'd suggest switching the request counter to being called response, and dropping the code label from the latency.

As commented earlier, I had already dropped the code label from the latency histogram.

Following your argumentation, a http_server_responses_total metric wouldn't be allowed to include method and path labels as a HTTP response doesn't have such fields by definition.

I understand the naming isn't perfect and remember the same discussion with Matt years ago. Considering the colloquial use of the term request, the similar problems with labels on a http_responses_total metric, the benefit of having the same prefix on all HTTP metrics and the existing usage of http_requests_total metrics in various client libraries, I'm for keeping the current naming pattern.

Are there other committers voting to change the naming? @beorn7 @juliusv

beorn7 · 2017-02-14T14:37:00Z

The naming used by https://godoc.org/github.com/prometheus/client_golang/prometheus#InstrumentHandler in client_golang should not serve as a role model in any way. It's strongly deprecated (and coming up with a replacement that doesn't immediately run into similar or other problems turns out to be hard).

The problem with the metric name http_requests_total, as I understood it so far, is the one described in https://prometheus.io/docs/instrumenting/writing_exporters/#naming :

Generally metric names should allow someone who’s familiar with Prometheus but not a particular system to make a good guess as to what a metric means. A metric named http_requests_total is not extremely useful - are these being measured as they come in, in some filter or when they get to the user’s code?

Thus, the name should include some kind of subsystem. (In this case, perhaps rack_http_requests_total? But I'm not too familiar with the Ruby code here.)

The discussion around request vs. response is new to me. I certainly want to see how many of my DELETE requests got a 200 response. If I separated those two from each other, how would I do this?

juliusv · 2017-02-14T14:43:06Z

While I've heard about the request vs. response argument before, I agree that it's often useful to have both the method (or path) and the status code in one metric, and that the colloquial use of "request" is not only about the request, but also the response. @brian-brazil or would there be a better metric name wording?

grobie · 2017-02-14T18:10:00Z

Regarding the inclusion of rack in metric names.

The problem with the metric name http_requests_total, as I understood it so far, is the one described in https://prometheus.io/docs/instrumenting/writing_exporters/#naming :

That's the reason this PR changes the metric names from http_requests_... to http_server_requests_.... I considered including rack in the name as well, but after some thinking and discussions with others opted against it. Copy&pasting my answer to Brian from above:

Rack is the de facto standard interface for ruby HTTP servers. It can be implemented by many different servers (unicorn, thin, puma, ...). I thought about including it, but concluded after some discussions that including it in the metric name is neither expected nor helpful.

Rack is really just an interface definition and not an implementation in its own. Literally every ruby server uses rack these days, so that servers (unicorn, thin, ...), frameworks (sinatra, rails, ...) and plugins (rack middlewares) can be freely used together. It does not provide any additional information to include rack in the metric name, as it's still unclear where the prometheus middlewares are added in the chain of middlewares. For reference, a typical Rails application uses 20-30 middlewares.

beorn7 · 2017-02-15T09:38:08Z

The reasoning about the name appears sound to me. However, it also proves how difficult generic naming is. My current plan for the HTTP instrumentation helpers in client_golang is to always let the user provide a name, without default. But as said, we are still in exploration state there.

matthiasr · 2017-02-16T08:56:51Z

Request means request, response means response.

They are not independent. The response means nothing without the request, and the request has no result without the response.

In an ideal nanoservice world, where each binary does exactly one thing, the responses may make sense without further context, but out here in the real world we need the response breakdown by (method,path,status). We actively graph and alert this, as otherwise high-volume low-value endpoints swamp out signal from low-volume high-value ones. Consider a CRUD service – if the R request rate is 1000x the CUD rate (not uncommon in consumer websites cough), and a 0.1% error rate on reads is acceptable, it is impossible to detect a completely failing C/U/D endpoint without method and path.

grobie · 2017-02-20T18:00:03Z

In order to unblock this PR, I'd ask for a vote how to proceed here. Arguments against and for http_server_requests_total/http_server_responses_total have been written above.

Summary of the changes in this PR:

changed prefix of all HTTP tracer metrics from http_... to http_server_...
replace summary with histogram for http_server_request_latency metric
removed code label from histogram (formerly a summary)
provided two separate options to change the labels added to http_server_requests_total and http_server_request_latency (formerly both label sets were changed simultaneously)
renamed the middlewares from Prometheus::Client::Rack::XXX to Prometheus::Middleware::XXX to make the breaking changes clear

The newly generated metrics look like this:

# TYPE http_server_requests_total counter
# HELP http_server_requests_total The total number of HTTP requests handled by the Rack application.
http_server_requests_total{code="200",method="get",path="/"} 76.0
http_server_requests_total{code="200",method="post",path="/"} 67.0
http_server_requests_total{code="404",method="delete",path="/"} 16.0
http_server_requests_total{code="200",method="delete",path="/"} 62.0
http_server_requests_total{code="404",method="get",path="/"} 7.0
http_server_requests_total{code="404",method="post",path="/"} 10.0
# TYPE http_server_request_latency_seconds histogram
# HELP http_server_request_latency_seconds The HTTP response latency of the Rack application.
http_server_request_latency_seconds{method="get",path="/",le="0.005"} 83
http_server_request_latency_seconds{method="get",path="/",le="0.01"} 83
http_server_request_latency_seconds{method="get",path="/",le="0.025"} 83
http_server_request_latency_seconds{method="get",path="/",le="0.05"} 83
http_server_request_latency_seconds{method="get",path="/",le="0.1"} 83
http_server_request_latency_seconds{method="get",path="/",le="0.25"} 83
http_server_request_latency_seconds{method="get",path="/",le="0.5"} 83
http_server_request_latency_seconds{method="get",path="/",le="1"} 83
http_server_request_latency_seconds{method="get",path="/",le="2.5"} 83
...
http_server_request_latency_seconds{method="delete",path="/",le="5"} 78
http_server_request_latency_seconds{method="delete",path="/",le="10"} 78
http_server_request_latency_seconds{method="delete",path="/",le="+Inf"} 78
http_server_request_latency_seconds_sum{method="delete",path="/"} 0.003424651000000001
http_server_request_latency_seconds_count{method="delete",path="/"} 78
# TYPE http_server_exceptions_total counter
# HELP http_server_exceptions_total The total number of exceptions raised by the Rack application.
http_server_exceptions_total{exception="NoMethodError"} 12.0

Please vote with 👍 / 👎 if you agree with these changes. @brian-brazil @beorn7 @juliusv @matthiasr

matthiasr · 2017-02-21T09:26:20Z

👍

beorn7 · 2017-02-21T13:15:16Z

Slightly different note: We often use …_duration_seconds instead of …_latency_seconds. I would prefer if one was used consistently (or there are rules when to use what). Does anybody have thoughts about this?

grobie · 2017-02-21T13:41:15Z

I'm totally in favor of using the more common one. Will change!

…

On Tue, Feb 21, 2017, 09:15 Björn Rabenstein ***@***.***> wrote: Slightly different note: We often use …_duration_seconds instead of …_latency_seconds. I would prefer if one was used consistently (or there are rules when to use what). Does anybody have thoughts about this? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#32 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAANaAmtzN9p5EdjsqRQpt63dD14XGnDks5reuNkgaJpZM4J5bUy> .

beorn7 · 2017-02-21T14:24:06Z

But is …_duration_seconds the more common? I always thought so, but I'm not sure anymore…

brian-brazil · 2017-02-21T16:07:41Z

I don't think there's a clear convention there, I'd say follow whatever the Go client has.

beorn7 · 2017-02-21T16:21:50Z

As of now, the Go client has …_duration_seconds (actually …_duration_microseconds, but changing that to seconds is one of the easy and obvious changes that will happen).

grobie · 2017-02-24T00:25:07Z

Ok, I changed latency to duration.

* Move from prometheus/client/rack/* to rack/middleware/* * Use histogram instead of summary in collector middleware * Spepicfy collector metrics by changing the prefix to http_server_* * Remove HTTP Host header from default label builder

coveralls · 2017-02-24T00:28:23Z

Coverage decreased (-0.1%) to 98.987% when pulling 43f01b1 on grobie/middleware-rewrite into 6606b59 on master.

coveralls · 2017-02-24T00:28:23Z

Coverage decreased (-0.1%) to 98.987% when pulling 43f01b1 on grobie/middleware-rewrite into 6606b59 on master.

coveralls · 2017-02-24T00:28:23Z

Coverage decreased (-0.1%) to 98.987% when pulling 43f01b1 on grobie/middleware-rewrite into 6606b59 on master.

coveralls · 2017-02-24T00:28:23Z

Coverage decreased (-0.1%) to 98.987% when pulling 43f01b1 on grobie/middleware-rewrite into 6606b59 on master.

beorn7 · 2017-02-24T10:01:53Z

👍 from a pure "how the metrics look like" perspective.

Strech · 2017-02-24T10:14:18Z

lib/prometheus/middleware/exporter.rb

+
+      def call(env)
+        if env['PATH_INFO'] == @path
+          format = negotiate(env['HTTP_ACCEPT'], @acceptable)


I think it will be more efficient to fetch a key from env, like env.fetch('HTTP_ACCEPT', '') and assume that in negotiate method we will always get a string in accept argument.

Then we can improve negotiate method and replace accept.to_s.empty? on accept.empty?

I don't like the idea to set default value here, but it can also be a strategy – always provide accept argument, then it could be env.fetch('HTTP_ACCEPT', '*/*') or something like env.fetch('HTTP_ACCEPT', DEFAULT_HTTP_ACCEPT), but I'm afraid then it will shift responsibilities of negotiation a bit here

Thanks @Strech! I incorporated your suggestions and simplified the code. Note, most of that was necessary when the ruby client library supported more than one exposition format (json and text). It might become necessary again in the future if #10 gets implemented.

@grobie copy, I hope that I will have a chance to contribute more in next few months 👍

Strech · 2017-02-24T10:18:05Z

lib/prometheus/middleware/exporter.rb

+      end
+
+      def parse(header)
+        header.to_s.split(/\s*,\s*/).map do |type|


This method is used only by negotiate and we will never pass header as non-string value, so we can omit to_s here

coveralls · 2017-02-25T01:41:57Z

Coverage decreased (-0.1%) to 98.986% when pulling 5c38163 on grobie/middleware-rewrite into 6606b59 on master.

coveralls · 2017-02-25T01:41:57Z

Coverage decreased (-0.1%) to 98.986% when pulling 5c38163 on grobie/middleware-rewrite into 6606b59 on master.

coveralls · 2017-02-25T01:41:57Z

Coverage decreased (-0.1%) to 98.986% when pulling 5c38163 on grobie/middleware-rewrite into 6606b59 on master.

grobie mentioned this pull request Sep 9, 2016

use histogram for http_request_duration_seconds #28

Closed

brian-brazil reviewed Sep 10, 2016
View reviewed changes

grobie force-pushed the grobie/middleware-rewrite branch from 271d793 to 26b229e Compare February 14, 2017 03:58

grobie force-pushed the grobie/middleware-rewrite branch from 26b229e to 2caa1bf Compare February 14, 2017 04:38

grobie force-pushed the grobie/middleware-rewrite branch from 2caa1bf to ac14e38 Compare February 14, 2017 04:49

grobie commented Feb 14, 2017

View reviewed changes

grobie force-pushed the grobie/middleware-rewrite branch from ac14e38 to 9e03c52 Compare February 14, 2017 05:11

grobie force-pushed the grobie/middleware-rewrite branch from 9e03c52 to 43f01b1 Compare February 24, 2017 00:23

Improve rack middlewares

226451c

* Move from prometheus/client/rack/* to rack/middleware/* * Use histogram instead of summary in collector middleware * Spepicfy collector metrics by changing the prefix to http_server_* * Remove HTTP Host header from default label builder

Strech reviewed Feb 24, 2017

View reviewed changes

Simplify Prometheus exporter response format negotation

5c38163

grobie force-pushed the grobie/middleware-rewrite branch from 43f01b1 to 5c38163 Compare February 25, 2017 01:37

grobie merged commit 9e28f0d into master Feb 25, 2017

grobie deleted the grobie/middleware-rewrite branch February 25, 2017 01:42

Improve rack middlewares #32

Improve rack middlewares #32

Conversation

grobie commented Sep 9, 2016 • edited

coveralls commented Sep 9, 2016 • edited

brian-brazil Sep 10, 2016

Choose a reason for hiding this comment

grobie Feb 14, 2017

Choose a reason for hiding this comment

olleolleolle commented Dec 20, 2016

grobie commented Dec 20, 2016 via email

perlun commented Jan 19, 2017

grobie commented Jan 30, 2017

coveralls commented Feb 14, 2017 • edited

coveralls commented Feb 14, 2017

coveralls commented Feb 14, 2017

coveralls commented Feb 14, 2017

coveralls commented Feb 14, 2017

coveralls commented Feb 14, 2017

coveralls commented Feb 14, 2017

coveralls commented Feb 14, 2017

coveralls commented Feb 14, 2017

coveralls commented Feb 14, 2017

coveralls commented Feb 14, 2017

grobie left a comment

Choose a reason for hiding this comment

grobie commented Feb 14, 2017

coveralls commented Feb 14, 2017 • edited

grobie commented Feb 14, 2017

beorn7 commented Feb 14, 2017

juliusv commented Feb 14, 2017

grobie commented Feb 14, 2017

beorn7 commented Feb 15, 2017

matthiasr commented Feb 16, 2017

grobie commented Feb 20, 2017

matthiasr commented Feb 21, 2017

beorn7 commented Feb 21, 2017

grobie commented Feb 21, 2017 via email

beorn7 commented Feb 21, 2017

brian-brazil commented Feb 21, 2017

beorn7 commented Feb 21, 2017 • edited

grobie commented Feb 24, 2017

coveralls commented Feb 24, 2017

coveralls commented Feb 24, 2017

coveralls commented Feb 24, 2017

coveralls commented Feb 24, 2017

beorn7 commented Feb 24, 2017

Strech Feb 24, 2017

Choose a reason for hiding this comment

Strech Feb 24, 2017

Choose a reason for hiding this comment

grobie Feb 25, 2017

Choose a reason for hiding this comment

Strech Feb 25, 2017 • edited

Choose a reason for hiding this comment

Strech Feb 24, 2017 • edited

Choose a reason for hiding this comment

coveralls commented Feb 25, 2017

coveralls commented Feb 25, 2017

coveralls commented Feb 25, 2017 • edited

grobie commented Sep 9, 2016 •

edited

coveralls commented Sep 9, 2016 •

edited

coveralls commented Feb 14, 2017 •

edited

coveralls commented Feb 14, 2017 •

edited

beorn7 commented Feb 21, 2017 •

edited

Strech Feb 25, 2017 •

edited

Strech Feb 24, 2017 •

edited

coveralls commented Feb 25, 2017 •

edited