-
Notifications
You must be signed in to change notification settings - Fork 137
Use Prometheus collector pattern for stateless metrics #270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for the PR! Can you create a tracking issue for the defunct metrics you removed? |
|
Done in #271 |
pkg/controller/metrics.go
Outdated
| } | ||
| } | ||
|
|
||
| type NodeController struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be NodeCollector not NodeController right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's a typo. 😇
pkg/controller/metrics.go
Outdated
|
|
||
| nodes: prometheus.NewDesc( | ||
| metricsPrefix+"nodes", | ||
| "The number of nodes created by a machine", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am actually not sure if this metric makes sense, because it will include all nodes, not only the ones that were created by a machine-controller, we could get the same info from kube state metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, had the same thought yesterday. Then I also looked into kube-state-metrics again and found kube_node_created.
The reason I added them back were
- We already had node metrics.
- You still get node metrics, even though you have not deployed kube-state-metrics.
I'm fine with removing them.
pkg/controller/metrics.go
Outdated
| ), | ||
| nodeCreated: prometheus.NewDesc( | ||
| metricsPrefix+"node_created", | ||
| "The number of nodes created by a machine", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should read something like nodeCreationTimestamp or am I misunderstanding something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% sure on that one either.
I searched for created and deleted in our Prometheus metrics and found that kube-state-metrics simply uses e.g. kube_node_created and others.
The Prometheus docs don't say anything specific about naming for timestamps either.
https://prometheus.io/docs/instrumenting/writing_exporters/#naming
251e7e1 to
08e8f17
Compare
What this PR does / why we need it:
If a machine can't be deleted we want to alert on it.
With the old approach we had state in the metrics that we couldn't delete. Hence a machine that once existed still had metrics which made us unable to alert on the metric, even when the machine was long gone.
By using the Prometheus collector pattern we simply query the informers to give us a list of machines and nodes. We iterate over those lists and forget about the state afterwards.
This improvement allows metrics of deleted machines to not show up anymore.
We can now tell, that if a
machine_controller_machine_deletedmetric is there for more than 10min the machine is not able to be deleted.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Special notes for your reviewer:
I've also removed the histograms for the operations as they were utterly broken.
Example histogram:
In this example, all buckets were hit 64 making them useless. For all the other machine-controller histograms I looked at I saw the same results.