New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus client metrics #1961

Merged
merged 8 commits into from Aug 7, 2018

Conversation

Projects
None yet
3 participants
@zcox
Contributor

zcox commented Jul 23, 2018

Client middleware that collects a few Prometheus metrics, very similar to the prometheus-server-metrics module. Hopefully this will be useful to the http4s community. This is my first http4s PR, feedback would be greatly appreciated.

Currently, four client metrics are collected:

  • Counter for total number of responses received
  • Histogram for response duration/latency
  • Gauge for number of active requests
  • Counter for total number of client errors

At Banno, we often use the same Client instance to send requests to multiple destinations (e.g. endpoints, services) and want to track metrics separately based on destination, so several of the metrics use a destination label. This PR uses the value of the request attribute PrometheusClientMetrics.Destination as the destination label value, if it is set. User code can set this attribute to some meaningful value when creating the Request. Otherwise, a fallback value can be specified when creating the middleware. This may be a bit specific to our own use cases, but hopefully is a common need for client metric collection.

A few things I'm a bit unsure about:

  • Is the destination label sufficiently generic and useful to many users of http4s-client?
  • Is the destination label stuff a misuse of the attribute system? Is there a better way to track metrics separately per-destination?
  • Should onClientError also update the response counter and histogram?
  • Other client metrics that should be collected?
@rossabaker

Thanks, @zcox! I like it.

Another counter that may be interesting is bytes, though that's tricky: you need to observe the response body (inaccurate if the entire thing isn't consumed) or rely on Content-Length (not available for chunked responses). I think it's not on our server metrics yet, due to the same complication. I wouldn't block progress on that, but throwing it out there.

}
/** The value of this key in the request's attributes is used as the value for the destination metric label. */
val Destination = AttributeKey[String]

This comment has been minimized.

@rossabaker

rossabaker Jul 26, 2018

Member

I sometimes want to phase out attributes, but I can't get rid of all the use cases for them. And here we have another.

Alternative: maybe a function Request => Option[String] could be provided, so the destination could be calculated from the request (probably the Uri?) itself.

This comment has been minimized.

@zcox

zcox Jul 26, 2018

Contributor

I like the idea of a generic Request => Option[String], will try that out. Maybe even Request => List[String] and let user specify a List[String] of label names when creating the middleware?

request: Request[F]
): F[DisposableResponse[F]] =
for {
start <- Sync[F].delay(System.nanoTime())

This comment has been minimized.

@rossabaker

rossabaker Jul 26, 2018

Member

Would the new cats-effect Timer be a better fit here?

This comment has been minimized.

@zcox

zcox Jul 26, 2018

Contributor

I've never used cats.effect.Timer before, but I gave it my best shot in 9b635c6. Does that look right?

.labels(reportDestination(request.attributes, metrics.destination))
.inc())
responseAttempt <- client.open(request).attempt
end <- Sync[F].delay(System.nanoTime())

This comment has been minimized.

@rossabaker

rossabaker Jul 26, 2018

Member

This is going to measure time-to-first-byte, and not time to process the body. Both can be interesting, though the latter may be greatly affected by how aggressively the caller consumes the stream. If you want to add the latter, it would be a finalizer on the response body.

This comment has been minimized.

@zcox

zcox Jul 26, 2018

Contributor

I attempted to separate response duration into "response received" and "body processed" phases in eff0654. @rossabaker @ChristopherDavenport would you mind taking a look at the TODO comments? Seems to work correctly in our service, but this is starting to exceed my http4s knowledge.

private def onClientError[F[_]: Sync](request: Request[F], metrics: ClientMetrics): F[Unit] =
Sync[F].delay {
//not updating responseDuration or responseCounter, since we did not receive a response

This comment has been minimized.

@rossabaker

rossabaker Jul 26, 2018

Member

Maybe add an error counter?

This comment has been minimized.

@zcox

zcox Jul 26, 2018

Contributor

clientErrorsCounter is incremented a few lines below, would that be enough?

@rossabaker rossabaker requested a review from ChristopherDavenport Jul 26, 2018

@rossabaker

This comment has been minimized.

Member

rossabaker commented Jul 26, 2018

Paging Mr. Prometheus, @ChristopherDavenport.

@zcox

This comment has been minimized.

Contributor

zcox commented Jul 26, 2018

99cc76f introduces a Request[F] => String function to obtain the destination label value, and includes the DestinationAttribute module with utils to store destination in request attributes.

The destination function could be as simple as _.uri.host.map(_.value).getOrElse("").

Usage with DestinationAttribute would look something like this:

//somewhere things get intialized
for {
  ...
  cr = CollectorRegistry.defaultRegistry //or create your own
  client <- Http1Client[F]().flatMap(c => PrometheusClientMetrics(cr, destination = DestinationAttribute.getDestination()).run(c))
  ...
  x = ThingThatUsesClient1(DestinationAttribute.setRequestDestination(client, "d1"), )
  y = ThingThatUsesClient2(DestinationAttribute.setRequestDestination(client, "d2"), )
  z = ThingThatUsesClient3(client)
  ...
}

//ThingThatUsesClient3
for {
  req <- createRequest1(...).withAttribute(DestinationAttribute.Destination, "d3")
  resp <- client.fetch(req)
}

for {
  req <- createRequest2(...).withAttribute(DestinationAttribute.Destination, "d4")
  resp <- client.fetch(req)
}
@ChristopherDavenport

I think this looks largely correct to me. I wonder about using the attributes this way, but should likely work very well initially.

}
}
object DestinationAttribute {

This comment has been minimized.

@ChristopherDavenport

ChristopherDavenport Jul 26, 2018

Member

Could we get this moved to its own file please?

zcox added some commits Jul 26, 2018

@zcox

This comment has been minimized.

Contributor

zcox commented Jul 26, 2018

I agree the attribute stuff is a bit weird/clunky, I would be totally OK with not including that in http4s (and maybe just using it internally).

The one thing I like about it is that it puts the destination labelling close to the point of request creation. I could see the Request[F] => String turning into a big pattern match on hosts/paths/etc, but maybe I just need to try using that instead of attributes in our use case and see how it looks.

zcox added some commits Jul 26, 2018

@rossabaker

That's a fair point on the destination attribute. I'd probably rather pass a function and not have to think about it, but one would have to be careful to choose one with a bounded set of outputs, because Prometheus creates a new time series for every distinct label value. Naively setting it to host, for a client that connects to many hosts, would be hard on the Prometheus server.

result <- responseAttempt.fold(
e =>
onClientError(request, metrics) *>
Sync[F].raiseError[DisposableResponse[F]](e),
r => onResponse(request, r.response, start, end, metrics) *> r.pure[F]
//TODO can we just swap out DisposableResponse.response like this? what about dispose?

This comment has been minimized.

@rossabaker

rossabaker Jul 28, 2018

Member

This is correct. The dispose is just an F[Unit] that releases the underlying connection.

startTime: Long,
responseReceivedTime: Long,
metrics: ClientMetrics[F]): F[Response[F]] =
//TODO is this correct? update most of the metrics early, in case response body is discarded

This comment has been minimized.

@rossabaker

rossabaker Jul 28, 2018

Member

Yes. There is no guarantee that BodyProcessed will get updated, but there's no harm.

I suppose we could make it reliable if we hooked into the dispose, rather than the completion of the body. There shouldn't be a lot of time between the completion of the body and the dispose. It would trade a bit of accuracy for completeness. I don't have a strong opinion. 🤔

This comment has been minimized.

@zcox

zcox Jul 28, 2018

Contributor

IIUC, this code will produce two metrics if the body is fully consumed, with both response_phase=response_received and response_phase=body_processed, but if the body is not fully consumed then there is only one metric with response_phase=response_received. But if the histogram was updated with body_processed on dispose then there would always be both metrics?

I kinda slightly lean towards not producing the body_processed metric if the body is not fully consumed, but I don't have a strong opinion either. I'm happy to try out the dispose approach if that is preferred.

@rossabaker

This looks good to me in current form. I guess the debate is then the attribute destination vs. a function destination. If we still prefer the ergonomics of the former, I guess I'm on board.

@zcox

This comment has been minimized.

Contributor

zcox commented Jul 31, 2018

PrometheusClientMetrics.apply currently takes this function parameter, which allows users to compute destination however they want, with a default of just empty string:

destination: Request[F] => String = { _: Request[F] => "" }

All of the attribute stuff was extracted into the DestinationAttribute module, which I'd be totally happy to just use internally in our own code base if it's not the right thing to include in http4s. Long-term, we may not even end up using it, and may get destination from the request some other way besides attributes.

@rossabaker

Wow, I missed that refactoring. That seems a fair compromise.

@ChristopherDavenport ChristopherDavenport merged commit 9bdd7fe into http4s:master Aug 7, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment