observability for endpoints by kubinio123 · Pull Request #1151 · softwaremill/tapir

kubinio123 · 2021-04-09T13:35:14Z

metrics interceptor
prometheus metrics

prometheus metrics

adamw · 2021-04-12T07:41:39Z


+// metrics
+
+lazy val prometheusMetrics: ProjectMatrix = (projectMatrix in file ("metrics/prometheus-metrics"))


also needs to be added to aggregates in the beginning

adamw · 2021-04-12T08:46:00Z

  def queryParameters: QueryParams

  lazy val acceptsContentTypes: Either[String, Seq[ContentTypeRange]] = Accepts.parse(headers)
+  val timestamp: Deadline = Deadline.now


is that the right type to capture a timestamp?

maybe let's call it simply requestStart? It's not clear what this timestamp is

It states that it main purpose is for repeated attempts, but on the other hand it's has now which captures current nanoseconds time, and +/- methods which is handy for getting time diff.
I think most appropriate type here would be plain Long from System.nanoTime but it carries no time context, so maybe Duration ?

adamw · 2021-04-12T08:50:58Z

+        _ <- collectMetrics { case Metric(m, Some(onRequest), _, _) => onRequest(request, m) }
+        response <- next(None)
+        _ <- collectMetrics { case Metric(m, _, Some(onResponse), _) => onResponse(request, response, m) }
+      } yield response


we should also probably somehow notify the metrics if the effect fails, that is, there's an exception that's being thrown?

that also depends, if the metrics interceptor should wrap the exception-handling one, or the other way round

Maybe MetricsInterceptor should be always called after ExceptionInterceptor?

yeah, well ... unless there's an exception during metrics collection ;-) but maybe that should be handled in the server.

If metrics should be called first/last (around the exc interceptor), they have to be added to the front of the interceptors list

Another problem: the response is only really "done" (well not even then, but it's more "done" than currently) when the whole response is sent. In terms of akka-http/http4s: when the response stream is completed. I think we won't run away from needing some way of interpreter-specific integration, which would allow us to "plug into" the response stream, getting a callback once it is complete.

In other words - a ServerResponse can be created quickly, but with a stream that takes a long time to evaluate

adamw · 2021-04-12T08:52:01Z

 `.appendInterceptor`.
+
+
+## Observability


let's move this to a separate top-level file

adamw · 2021-04-12T12:09:16Z

+    onResponse: Option[(ServerRequest, ServerResponse[_], M) => F[Unit]] = None,
+    onDecodeFailure: Option[(ServerRequest, M) => F[Unit]] = None
+) {
+  def onRequest(f: (ServerRequest, M) => F[Unit]): Metric[F, M] = copy(onRequest = Some(f))


I think we should pass in here an implicit MonadError here, as in EndpointInterceptor - that way users don't have to provide one, when creating e.g. PrometheusMetrics. This is supposed to be mainly an internal thing (the MonadError)

adamw · 2021-04-12T12:16:28Z

+        .help("Unserved HTTP requests")
+        .labelNames("path", "method")
+        .register(registry)
+    ).onDecodeFailure { (req, counter) => monad.unit(counter.labels(path(req), method(req)).inc()) }


decode failure is not a request failure. A single request can cause mulitple decode failures,until one endpoint matches and gives a decode success. So a failed request is rather one where the status code is non-2xx/3xx

adamw · 2021-04-12T12:17:34Z

+      monad.unit(histogram.labels(path(req), method(req), status(res)).observe((Deadline.now - req.timestamp).toMillis.toDouble / 1000.0))
+    }
+
+  private def path(request: ServerRequest): String = request.pathSegments.mkString("/")


We shouldn't use the full request's path, but rather the endpoint's path template. That way, any dynamic url segments are replaced with path captures

adamw · 2021-04-12T12:24:32Z

+    Metric[F, Counter](
+      Counter
+        .build()
+        .namespace("tapir")


These should be somehow configurable - e.g. for most apps this namespace doesn't make sense. It would be best if each metric would coudl be configured taking in the request/endpoint, and yielding a set of labels & label values to apply. With sane defaults, of course, so that adding default metrics is just a couple lines of code

adamw · 2021-04-12T12:26:38Z

+        }
+      }
+
+  implicit val monad: FutureMonad = new FutureMonad()


our goal should be to be able to remove this from the example - that shouldn't be necessary for users to use the metrics (if possible - maybe not?)

# Conflicts: # core/src/test/scala/sttp/tapir/server/interpreter/ServerInterpreterTest.scala

adamw · 2021-04-20T16:10:21Z

  def queryParameters: QueryParams

  lazy val acceptsContentTypes: Either[String, Seq[ContentTypeRange]] = Accepts.parse(headers)
+  val requestStart: Deadline = Deadline.now


maybe instead of adding this here, we could use a RequestInterceptor, capture the starting timestamp there, and create an EndpointInterceptor with it?

That should work, if this RequestInterceptor would be head of interceptors list, then EndpointInteceptor created by it would be called last which is what we need for metrics.
But requestStart would have to be added as next argument to Metrics callback.

wait in metrics test

adamw · 2021-04-23T14:09:56Z

+package sttp.tapir.server.interpreter
+
+trait BodyListener[F[_], B] {
+  def onComplete(body: B)(cb: => F[Unit]): F[B]


maybe it would be useful to also have an onFailure callback - so that we can also capture situations, where producing the body fails (which results in a broken connection, as the headers are already sent out)

this could be a separate callback, or a single one taking a Try[Unit] parameter. Maybe see the Future/Try API for an inspiration?

adamw · 2021-04-23T14:12:29Z

+
+case class Metric[M](
+    metric: M,
+    onRequest: Option[(Endpoint[_, _, _, _], ServerRequest, M) => Unit] = None,


idea: maybe a metric could be initialised per-request (in the request interceptor) using onRequest: (ServerRequest, M) => P, where P is the payload type. This payload would then be passed into onEndpointRequest: (Endpoint, ServerRequest, M, P) => F[Unit] & similarly to onEndpointResponse.

This would allow us to generalise over the quite specific Deadline parameter.

And another question: shouldn't the metric function return F[Unit], not Unit? Capturing a metric could be a side-effecting operation (it probably is ;) )

Or maybe simpler, we could go back to the idea of "initialising" a metric on request start, which would produce a "proper" metric (a constant instance in most cases, but for capturing timings this would include the initial timestamp). So here we'd only have a function onRequest: (ServerRequest, M) => EndpointMetric

adamw · 2021-04-23T14:12:44Z

+case class Metric[M](
+    metric: M,
+    onRequest: Option[(Endpoint[_, _, _, _], ServerRequest, M) => Unit] = None,
+    onResponse: Option[(Endpoint[_, _, _, _], ServerRequest, ServerResponse[_], Deadline, M) => Unit] = None


Wouldn't Instant be a simpler choice over Deadline?

# Conflicts: # core/src/main/scala/sttp/tapir/server/interpreter/ServerInterpreter.scala # core/src/test/scala/sttp/tapir/server/interpreter/ServerInterpreterTest.scala # server/akka-http-server/src/main/scala/sttp/tapir/server/akkahttp/AkkaHttpServerOptions.scala # server/finatra-server/src/main/scala/sttp/tapir/server/finatra/FinatraServerInterpreter.scala # server/finatra-server/src/main/scala/sttp/tapir/server/finatra/FinatraServerOptions.scala # server/http4s-server/src/main/scala/sttp/tapir/server/http4s/Http4sServerOptions.scala # server/vertx/src/main/scala/sttp/tapir/server/vertx/VertxCatsServerOptions.scala # server/vertx/src/main/scala/sttp/tapir/server/vertx/VertxFutureServerOptions.scala # server/vertx/src/main/scala/sttp/tapir/server/vertx/VertxZioServerOptions.scala

kubinio123 · 2021-04-26T15:29:22Z

+            }
+          } yield withMetrics
+
+          responseWithMetrics.handleError { case e: Exception =>


onException metrics are collected here, so in withBodyOnComplete case Failure is left empty, since it could collect onException twice i think. If body is a stream which fails before the response is returned, then it could be collected in body listener and here in handleError if the exception is thrown?

I think both should be handled, as these are two different scenarios:

the effect creating the ServerResponse completes successfully, yielding an instance of ServerResponse with a streaming body. However, the streaming body might fail when streamed to the client.

the above effect fails, so we never even get a ServerResponse

Note that a request can be considered "complete" only after 1. successfully finishes (the stream is done). But it can fail either with 1. or 2., but not with both.

adamw · 2021-04-27T07:06:03Z

+    }
+
+  private def collectMetrics(pf: PartialFunction[EndpointMetric[F], F[Unit]])(implicit monad: MonadError[F]): F[Unit] = {
+    def collect(metrics: List[EndpointMetric[F]]): F[Unit] = {


that's usually called sequence (a function List[F[T]] => F[List[T]] - which we kind of have here)

adamw · 2021-04-27T07:15:59Z

+      )(implicit monad: MonadError[F], bodyListener: BodyListener[F, B]): F[Option[ServerResponse[B]]] = {
+        if (ignoreEndpoints.contains(ctx.endpoint)) endpointHandler.onDecodeFailure(ctx)
+        else {
+          ctx.failure match {


I think we should attach run the metrics not based on the nature of the decode failure, but basing on its handling (we don't really know if a mismatch will be handled differently than other errors). So if downstream returns a response, then we need to run the on request & on response callbacks

adamw · 2021-04-27T07:17:07Z

+  def onComplete(body: B)(cb: Try[Unit] => F[Unit]): F[B]
+}
+
+object BodyListenerSyntax {


I think if you add the implicit in the companion object (BodyListener), then you won't have to import the syntax, it will be automatically searched

I did, and it still requires both

import sttp.tapir.server.interpreter.BodyListener import sttp.tapir.server.interpreter.BodyListener._

ah, I thought that this is in the implicit scope. But apparently not :)

adamw · 2021-04-27T07:19:03Z

    is match {
-      case Nil => (request: ServerRequest) => firstNotNone(request, ses, eisAcc.reverse)
+      case Nil =>
+        new RequestHandler[F, B] {


maybe it would make sense to add a factory method to RequestHandler which would accept a plain Request => F[Response] function?

adamw · 2021-04-27T07:24:25Z

+  val collectorRegistry = CollectorRegistry.defaultRegistry
+
+  // Metric for counting responses labeled by path, method and status code
+  val responsesTotal = Metric[Future, Counter](


I think in the example we should show how to use default metrics, and move the custom metric creation to the docs

adamw · 2021-04-27T07:30:01Z

+        m.unit {
+          EndpointMetric()
+            .onRequest { ep => m.eval(gauge.labels(labels.forRequest(ep, req): _*).inc()) }
+            .onResponse { (ep, _) => m.eval(gauge.labels(labels.forRequest(ep, req): _*).dec()) }


we should decrease on exception as well

adamw · 2021-04-27T07:32:22Z

+`Metric` wraps an aggregation object (like counter or gauge), and requires a function
+returning `EndpointMetric` with metric collection logic. There are three callbacks in `EndpointMetric`:
+
+1. `onRequest` - called after successful request decoding


would be good to describe the two-stage onRequest (with metric initialisation)

adamw · 2021-04-27T07:34:42Z

+  })
+```
+
+Labels for default metrics can be customized, any attribute from `Endpoint`, `ServerRequest` and `ServerResponse` could


would be good to mention what are the default labels (with a path template, status code etc.)

I mentioned that couple lines above, before prometheus example, but I'll move it here

I think it would be good to give an example of what the labels will be

adamw · 2021-04-27T07:35:21Z

+      onRequest = { (req, counter, m) =>
+        m.unit {
+          EndpointMetric()
+            .onResponse { (ep, res) => m.eval(counter.labels(labels.forRequest(ep, req) ++ labels.forResponse(res): _*).inc()) }


here I guess onException should be counted as a 500?

adamw · 2021-04-27T07:38:08Z

+
+import scala.util.{Failure, Success, Try}
+
+class Http4SBodyListener[F[_], G[_]](gToF: G ~> F)(implicit m: MonadError[G], a: Applicative[F])


[minor] name: Http4sBodyListener (small s :) )

adamw · 2021-04-27T07:43:05Z

+          .body("""{"invalid":"body",}""")
+          .send(backend)
+          .map { _ =>
+            wait(200) {


why do we have to wait for the counters to be updated?

Without that it sometimes fail for akka server, maybe transformDataBytes where the callback is executed on Future[Done] completion is called after response is returned, but it shouldn't right?

Ah, onComplete can run asynchronously, after the request completes and is sent. Hm... maybe we can use eventuallyhere? I wouldn't want to always sleep 200ms, this can be a cause of flaky tests

adamw · 2021-04-27T07:43:26Z

+}
+
+object ServerMetricsTest {
+  class Counter(var value: Int = 0) {


probably should be an AtomicInteger as it can be updated from multiple threads

adamw · 2021-04-27T07:44:27Z

+    m.unit {
+      { (ctx: RoutingContext) =>
+        body {
+          ctx.addBodyEndHandler(_ => cb(Success(())))


no way to intercept an exception?

addBodyEndHandler callback will not be executed on exception, there's addEndHandler which will be executed but will be called when the response is disposed or an exception has been encountered to allow consistent cleanup, so it's not the moment we want to capture here right, since it has to be a body failure?

can't we use both, one for success, another for error?

adamw · 2021-04-27T07:47:42Z

  val in_input_stream_out_input_stream: Endpoint[InputStream, Unit, InputStream, Any] =
    endpoint.post.in("api" / "echo").in(inputStreamBody).out(inputStreamBody).name("echo input stream")

+  val in_empty_out_empty: Endpoint[Unit, Unit, Unit, Any] = endpoint.post.in("api" / "empty").in(emptyInput).out(emptyOutput)


isn't this just endpoint.post.in("api" / "empty")? Also, maybe we can simply use e.g. in_root_path

kubinio123 added 2 commits April 9, 2021 15:31

metrics interceptor

51ace9a

prometheus metrics

docs

0ff4154

kubinio123 requested a review from adamw April 12, 2021 07:37

adamw reviewed Apr 12, 2021

View reviewed changes

adamw requested changes Apr 12, 2021

View reviewed changes

adamw reviewed Apr 12, 2021

View reviewed changes

labels customization

223b7a8

kubinio123 linked an issue Apr 13, 2021 that may be closed by this pull request

Observability for endpoints #952

Closed

kubinio123 added 11 commits April 14, 2021 12:57

response listener

05bbedd

Merge branch 'master' into observability-for-endpoints

e373380

# Conflicts: # core/src/test/scala/sttp/tapir/server/interpreter/ServerInterpreterTest.scala

merge

229c8f3

akka response listener tests

28b9f1f

listeners for specific servers

bcceae2

docs update

e4326e2

unit tests

effefe5

body listener updates

fb61e3e

clean up

3859c78

effect to body listener

e6daffc

metrics after exception handling

e71a341

kubinio123 requested a review from adamw April 19, 2021 13:31

adamw reviewed Apr 20, 2021

View reviewed changes

kubinio123 added 3 commits April 22, 2021 13:10

request start moved to interceptor

4e26648

finatra compile fix

49905ef

metrics fix compile 2_12

11e4089

wait in metrics test

kubinio123 requested a review from adamw April 22, 2021 12:55

adamw reviewed Apr 23, 2021

View reviewed changes

kubinio123 added 4 commits April 26, 2021 13:46

metrics update

f2e0725

body listener update

ca8699f

merge

78ce6d0

kubinio123 commented Apr 26, 2021

View reviewed changes

adamw reviewed Apr 27, 2021

View reviewed changes

kubinio123 and others added 6 commits April 27, 2021 12:05

review updates

64cccf2

docs update

a783f7a

Rename method

d879f64

Simpler server metrics endpoint

23beec9

Docs

5e6c207

Fix tests

eecb475

adamw merged commit 5f456a2 into master Apr 29, 2021

mergify Bot deleted the observability-for-endpoints branch April 29, 2021 11:22


		// metrics

		lazy val prometheusMetrics: ProjectMatrix = (projectMatrix in file ("metrics/prometheus-metrics"))


		import scala.util.{Failure, Success, Try}

		class Http4SBodyListener[F[_], G[_]](gToF: G ~> F)(implicit m: MonadError[G], a: Applicative[F])

Conversation

kubinio123 commented Apr 9, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kubinio123 Apr 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kubinio123 Apr 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kubinio123 Apr 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

kubinio123 Apr 12, 2021 •

edited

Loading

kubinio123 Apr 26, 2021 •

edited

Loading

kubinio123 Apr 27, 2021 •

edited

Loading

kubinio123 Apr 27, 2021 •

edited

Loading