feat: add prom metrics for connector->server requests #3200

mxyng · 2022-09-12T20:58:14Z

Summary

Observe the requests the connector makes to the server and report them as Prometheus metrics.

Checklist

Wrote appropriate unit tests
Considered security implications of the change
Updated associated docs where necessary
Updated associated configuration where necessary
Change is backwards compatible if it needs to be (user can upgrade without manual steps?)
Nothing sensitive logged
Considered data migrations for smooth upgrades

Related Issues

Resolves #2664

internal/connector/connector.go

api/client.go

internal/connector/connector.go

dnephin

The change seems reasonable to me. I guess someone can use this to solve #2664 by building an alert that checks for the time size the last 200 response code from this metric?

I think this metric is valuable for other purposes, but generally how I would expect to monitor something like #2664 would be with a gauge metric that emits the last time the reconciler ran successfully. If the reconcile loop is failing for any reason (not just calls to the infra API), that's a signal that the role bindings are out of date. I think monitoring the success of the reconcile operation is the most accurate signal. I'd use the gauge metric in a monitor that checks when the value is older than some threshold, and send an alert. Not a blocker, just something to consider.

api/client.go

internal/connector/connector.go

mxyng · 2022-09-14T20:01:12Z

@dnephin you're right that this does not capture the sync status of the cluster's roles but that was never the intent. My understanding of #2664 is that it's specifically concerned with the connectivity between the server and the connector, which tracking requests will address. The sync status of cluster roles is out of scope.

dnephin · 2022-09-14T20:55:19Z

api/client.go

+	if client.ObserveFunc != nil {
+		client.ObserveFunc(start, req, resp)
+	}


Nice! I really like this approach for being able to instrument the api client.

internal/connector/connector.go

dnephin · 2022-09-15T15:20:20Z

internal/connector/connector.go

+		ObserveFunc: func(start time.Time, request *http.Request, response *http.Response, err error) {
+			errorLabel := ""
+			if err != nil {
+				errorLabel = err.Error()


Won't this be too high cardinality, since these are effectively infinite error strings?

I was thinking of just a true or false value for the error label.

Technically yes. My concern with setting just true or false is it provides no information as to why it failed. With the status code, there's at least some context. Setting a true or false error is almost like setting status to -1 on error instead of a meaningful HTTP code

That's true. We could attempt to detect the error and reduce it to a few well known constant values, but I'm not sure it's worth the effort right now. We can always add that later as a separate errorClass or some other label.

Generally I would not expect metrics to tell me about the error. If I see the number of errors increasing I would go to logs to figure out more details about the errors.

That's true. It's nice when metrics tell you what the problem is but not necessary. In this case, I'd prefer setting status to -1 since status and error=true are (probably) mutually exclusive; adding a separate labels seems unnecessary

mxyng requested review from BruceMacD, dnephin, jmorganca, pdevine and kimskimchi as code owners September 12, 2022 20:58

mxyng commented Sep 12, 2022

View reviewed changes

internal/connector/connector.go Outdated Show resolved Hide resolved

BruceMacD approved these changes Sep 13, 2022

View reviewed changes

api/client.go Outdated Show resolved Hide resolved

internal/connector/connector.go Outdated Show resolved Hide resolved

internal/connector/connector.go Show resolved Hide resolved

mxyng force-pushed the mxyng/connector-metrics branch from 17bbd0e to d51b3b2 Compare September 14, 2022 18:20

dnephin reviewed Sep 14, 2022

View reviewed changes

api/client.go Outdated Show resolved Hide resolved

internal/connector/connector.go Outdated Show resolved Hide resolved

internal/connector/connector.go Outdated Show resolved Hide resolved

mxyng force-pushed the mxyng/connector-metrics branch from d51b3b2 to 65cd5c0 Compare September 14, 2022 20:02

dnephin reviewed Sep 14, 2022

View reviewed changes

mxyng force-pushed the mxyng/connector-metrics branch 2 times, most recently from e000c63 to e914921 Compare September 15, 2022 02:58

dnephin reviewed Sep 15, 2022

View reviewed changes

mxyng force-pushed the mxyng/connector-metrics branch from e914921 to 34c248c Compare September 15, 2022 21:47

mxyng added 2 commits September 16, 2022 11:41

feat(connector): add server response metric

b1ca224

improve: use status label -1 on error

e7b8344

mxyng force-pushed the mxyng/connector-metrics branch from 34c248c to e7b8344 Compare September 16, 2022 18:46

mxyng merged commit 03afc71 into main Sep 19, 2022

mxyng deleted the mxyng/connector-metrics branch September 19, 2022 21:45

mxyng mentioned this pull request Sep 20, 2022

Add destination name label to infra_destinations metric #2664

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add prom metrics for connector->server requests #3200

feat: add prom metrics for connector->server requests #3200

mxyng commented Sep 12, 2022

dnephin left a comment

mxyng commented Sep 14, 2022

dnephin Sep 14, 2022

dnephin Sep 15, 2022

mxyng Sep 15, 2022

dnephin Sep 15, 2022

mxyng Sep 16, 2022

feat: add prom metrics for connector->server requests #3200

feat: add prom metrics for connector->server requests #3200

Conversation

mxyng commented Sep 12, 2022

Summary

Checklist

Related Issues

dnephin left a comment

Choose a reason for hiding this comment

mxyng commented Sep 14, 2022

dnephin Sep 14, 2022

Choose a reason for hiding this comment

dnephin Sep 15, 2022

Choose a reason for hiding this comment

mxyng Sep 15, 2022

Choose a reason for hiding this comment

dnephin Sep 15, 2022

Choose a reason for hiding this comment

mxyng Sep 16, 2022

Choose a reason for hiding this comment