Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add metrics.md to document exposed metrics
This change adds a document to describe the metrics available from this operator.
- Loading branch information
Showing
1 changed file
with
53 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# CAO Metrics | ||
|
||
The Cluster Autoscaler Operator reports the following metrics: | ||
|
||
## Metrics provided by the controller runtime | ||
|
||
The [controller runtime](https://github.com/kubernetes-sigs/controller-runtime) | ||
integration with the operator provides metrics about the webhook admission server. | ||
You can find more information about these metrics names and their labels through | ||
the following links: | ||
|
||
### Kubernetes controller metrics | ||
|
||
The labels `controller="cluster_autoscaler_controller"` and | ||
`controller="machine_autoscaler_controller"` can be used to refine queries against these metrics. | ||
* [Controller runtime reconciliation metrics implementation](https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/internal/controller/metrics/metrics.go) | ||
|
||
### Admission webhook metrics | ||
|
||
The label `webhook="/validate-clusterautoscalers"` can be used to refine the | ||
queries for these metrics. | ||
* [Controller runtime webhook metrics implementation](https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/webhook/internal/metrics/metrics.go) | ||
|
||
### Prometheus REST server metrics | ||
|
||
The `url` label can be quite useful for querying these metrics, here are a few | ||
of the available URL values to use: `"https://172.30.0.1:443/%7Bprefix%7D"`, `"https://172.30.0.1:443/apis?timeout=32s"`. | ||
Also, the `verb` label can be used with common HTTP request verbs (eg `"GET"`). | ||
* [Controller runtime Prometheus REST server metrics implementation](https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/metrics/client_go_adapter.go) | ||
|
||
### Prometheus work queue metrics | ||
|
||
The labels `name="cluster_autoscaler_controller"` and | ||
`name="machine_autoscaler_controller"` can be used to refine queries against these metrics. | ||
* [Controller runtime Prometheus work queue metrics implementation](https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/metrics/workqueue.go) | ||
|
||
## Metrics about the Prometheus collectors | ||
|
||
Prometheus provides some default metrics about the internal state | ||
of the running process and the metric collection. You can find more information | ||
about these metric names and their labels through the following links: | ||
|
||
* [Prometheus documentation, Standard and runtime collectors](https://prometheus.io/docs/instrumenting/writing_clientlibs/#standard-and-runtime-collectors) | ||
* [Prometheus client Go language collectors](https://github.com/prometheus/client_golang/blob/master/prometheus/go_collector.go) | ||
|
||
# Cluster Autoscaler Metrics | ||
|
||
The Cluster Autoscaler Operator is responsible for lifecycle management of the | ||
[Kubernetes Cluster Autoscaler](https://github.com/kubernetes/autoscaler) on OpenShift. The metrics | ||
described previous in this document are specifically from that operator. If you would | ||
like to gather metrics from the cluster autoscaler itself please see the | ||
[Cluster Autoscaler Monitoring](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/proposals/metrics.md) | ||
documentation. |