add metrics.md to document exposed metrics

This change adds a document to describe the metrics available from this operator.
openshift · Jun 10, 2020 · 147311f · 147311f
1 parent 798a64b
commit 147311f
Showing 1 changed file with 53 additions and 0 deletions.
diff --git a/docs/dev/metrics.md b/docs/dev/metrics.md
@@ -0,0 +1,53 @@
+# CAO Metrics
+
+The Cluster Autoscaler Operator reports the following metrics:
+
+## Metrics provided by the controller runtime
+
+The [controller runtime](https://github.com/kubernetes-sigs/controller-runtime)
+integration with the operator provides metrics about the webhook admission server.
+You can find more information about these metrics names and their labels through
+the following links:
+
+### Kubernetes controller metrics
+
+The labels `controller="cluster_autoscaler_controller"` and
+`controller="machine_autoscaler_controller"` can be used to refine queries against these metrics.
+* [Controller runtime reconciliation metrics implementation](https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/internal/controller/metrics/metrics.go)
+
+### Admission webhook metrics
+
+The label `webhook="/validate-clusterautoscalers"` can be used to refine the
+queries for these metrics.
+* [Controller runtime webhook metrics implementation](https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/webhook/internal/metrics/metrics.go)
+
+### Prometheus REST server metrics
+
+The `url` label can be quite useful for querying these metrics, here are a few
+of the available URL values to use: `"https://172.30.0.1:443/%7Bprefix%7D"`, `"https://172.30.0.1:443/apis?timeout=32s"`.
+Also, the `verb` label can be used with common HTTP request verbs (eg `"GET"`).
+* [Controller runtime Prometheus REST server metrics implementation](https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/metrics/client_go_adapter.go)
+
+### Prometheus work queue metrics
+
+The labels `name="cluster_autoscaler_controller"` and
+`name="machine_autoscaler_controller"` can be used to refine queries against these metrics.
+* [Controller runtime Prometheus work queue metrics implementation](https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/metrics/workqueue.go)
+
+## Metrics about the Prometheus collectors
+
+Prometheus provides some default metrics about the internal state
+of the running process and the metric collection. You can find more information
+about these metric names and their labels through the following links:
+
+* [Prometheus documentation, Standard and runtime collectors](https://prometheus.io/docs/instrumenting/writing_clientlibs/#standard-and-runtime-collectors)
+* [Prometheus client Go language collectors](https://github.com/prometheus/client_golang/blob/master/prometheus/go_collector.go)
+
+# Cluster Autoscaler Metrics
+
+The Cluster Autoscaler Operator is responsible for lifecycle management of the
+[Kubernetes Cluster Autoscaler](https://github.com/kubernetes/autoscaler) on OpenShift. The metrics
+described previous in this document are specifically from that operator. If you would
+like to gather metrics from the cluster autoscaler itself please see the
+[Cluster Autoscaler Monitoring](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/proposals/metrics.md)
+documentation.