Skip to content

Conversation

@rbarazzutti
Copy link
Contributor

Hi @brian-brazil,

As other users mentioned in issue 381, it would be quite convenient to have the servlet's MetricsFilter reporting on HTTP status code.

I implemented it and wrote a dedicated unit test.

Thanks.

@brian-brazil
Copy link
Contributor

Thanks for your PR. It is recommended not to break out latency by success/failure, as that often leads to users only graphing success and missing important data. This also is a breaking change, and may have cardinality issues. If you want to add this it should be as a separate counter metric.

@rbarazzutti
Copy link
Contributor Author

Hi @brian-brazil,

Thanks for your answer, the issues you mentioned do make sense. It is right that this change might be somehow breaking.

IMHO, it'd be nice to have a metric that contains both latencies and status codes. To illustrate this, I'd say that 500s responses might be served very quickly or pretty slowly (i.e. database connectivity loss).

Would it ok to be able to enable this with a parameter on the MetricsFilter?

Thanks.

@brian-brazil
Copy link
Contributor

brian-brazil commented Aug 9, 2018 via email

@rbarazzutti
Copy link
Contributor Author

Hi @brian-brazil,

Thanks for sharing your point of view.

I understand that a metric that changes its labels values across processes might be misleading. But in this specific case, the labels aren't changing; one of them, the status, is attributed once the HTTP response is ready. This behavior seems to be pretty common and is even presented in the querying example of the Prometheus documentation. And I'm pretty sure that this feature might interest other programmers according to issue 381.

Following your remark that this change might be breaking, I redefined this feature as an option of the MetricsFilter (that is disabled by default). So it won't be breaking for anyone.

Thanks.
Cheers.

@brian-brazil
Copy link
Contributor

I redefined this feature as an option of the MetricsFilter (that is disabled by default).

Metrics still shouldn't vary their labels across processes. This should be a separate counter metric.

Signed-off-by: Raphael P. Barazzutti <raphael@barazzutti.net>
Signed-off-by: Raphael P. Barazzutti <raphael@barazzutti.net>
Signed-off-by: Raphael P. Barazzutti <raphael@barazzutti.net>
Signed-off-by: Raphael P. Barazzutti <raphael@barazzutti.net>
Signed-off-by: Raphael P. Barazzutti <raphael@barazzutti.net>
@ghost
Copy link

ghost commented Sep 30, 2019

So now that it is a separate metric, what's the view on this PR?

@brian-brazil, you mention that this

is a breaking change

Did you have something specific in mind (e.g. "A given metric shouldn't really vary it's labels across processes"), or something else?

For people instrumenting legacy applications, this PR introduces a dependency on Servlet API 3.0 that would be breaking (for us).

@brian-brazil
Copy link
Contributor

Did you have something specific in mind

Adding instrumentation labels is a breaking change, and this still appears not to be a separate metric.

@rbarazzutti
Copy link
Contributor Author

Adding instrumentation labels is a breaking change, and this still appears not to be a separate metric.

Hi @brian-brazil, the filter defined in the class StatusAwareMetricFilter provides a separate metric. The user has to choose between the two approaches/metrics, between MetricFilter and StatusAwareMetricFilter.

StatusAwareMetricFilter also builds a histogram, because, in my opinion, it is useful when somebody wants to record SLIs about the latency of a web service. I.e. to be able to select GETs with status 200 and not take into account GETs with status 3xx (redirects that are likely not part of the service to be evaluated).
Like in https://engineering.bitnami.com/articles/implementing-slos-using-prometheus.html

Last point: as mentioned @MattiasSeeman the retrieval of status (HttpServletResponse::getStatus()) was not in early specifications of servlet-api (came with version 2.5), so it may be problematic when instrumenting legacy applications, this change is not breaking for these users as long as they stick to the usual MetricFilter.

@brian-brazil
Copy link
Contributor

The user has to choose between the two approaches/metrics, between MetricFilter and StatusAwareMetricFilter.

The user shouldn't have to choose. This is intended as a simple way to get some quick metrics, not a full-on solution.

I.e. to be able to select GETs with status 200 and not take into account GETs with status 3xx

As I mentioned elsewhere this is a bad idea, as while that is a rare use case which exists and what actually generally happens is users fail to include any failures in their latency graphs.

@rbarazzutti
Copy link
Contributor Author

Okay. Got your point. Thanks, I'll see how I can adapt this.

@rbarazzutti
Copy link
Contributor Author

Looks like #560 fits the requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants