Skip to content

Commit

Permalink
add metrics.md to document exposed metrics
Browse files Browse the repository at this point in the history
This change adds a document to describe the metrics available from this
operator.
  • Loading branch information
elmiko committed Jun 10, 2020
1 parent 798a64b commit 147311f
Showing 1 changed file with 53 additions and 0 deletions.
53 changes: 53 additions & 0 deletions docs/dev/metrics.md
@@ -0,0 +1,53 @@
# CAO Metrics

The Cluster Autoscaler Operator reports the following metrics:

## Metrics provided by the controller runtime

The [controller runtime](https://github.com/kubernetes-sigs/controller-runtime)
integration with the operator provides metrics about the webhook admission server.
You can find more information about these metrics names and their labels through
the following links:

### Kubernetes controller metrics

The labels `controller="cluster_autoscaler_controller"` and
`controller="machine_autoscaler_controller"` can be used to refine queries against these metrics.
* [Controller runtime reconciliation metrics implementation](https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/internal/controller/metrics/metrics.go)

### Admission webhook metrics

The label `webhook="/validate-clusterautoscalers"` can be used to refine the
queries for these metrics.
* [Controller runtime webhook metrics implementation](https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/webhook/internal/metrics/metrics.go)

### Prometheus REST server metrics

The `url` label can be quite useful for querying these metrics, here are a few
of the available URL values to use: `"https://172.30.0.1:443/%7Bprefix%7D"`, `"https://172.30.0.1:443/apis?timeout=32s"`.
Also, the `verb` label can be used with common HTTP request verbs (eg `"GET"`).
* [Controller runtime Prometheus REST server metrics implementation](https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/metrics/client_go_adapter.go)

### Prometheus work queue metrics

The labels `name="cluster_autoscaler_controller"` and
`name="machine_autoscaler_controller"` can be used to refine queries against these metrics.
* [Controller runtime Prometheus work queue metrics implementation](https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/metrics/workqueue.go)

## Metrics about the Prometheus collectors

Prometheus provides some default metrics about the internal state
of the running process and the metric collection. You can find more information
about these metric names and their labels through the following links:

* [Prometheus documentation, Standard and runtime collectors](https://prometheus.io/docs/instrumenting/writing_clientlibs/#standard-and-runtime-collectors)
* [Prometheus client Go language collectors](https://github.com/prometheus/client_golang/blob/master/prometheus/go_collector.go)

# Cluster Autoscaler Metrics

The Cluster Autoscaler Operator is responsible for lifecycle management of the
[Kubernetes Cluster Autoscaler](https://github.com/kubernetes/autoscaler) on OpenShift. The metrics
described previous in this document are specifically from that operator. If you would
like to gather metrics from the cluster autoscaler itself please see the
[Cluster Autoscaler Monitoring](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/proposals/metrics.md)
documentation.

0 comments on commit 147311f

Please sign in to comment.