Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
204 changes: 186 additions & 18 deletions contributors/devel/instrumentation.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,28 @@
## Instrumenting Kubernetes with a new metric
## Instrumenting Kubernetes

The following is a step-by-step guide for adding a new metric to the Kubernetes
code base.
The following references and outlines general guidelines for metric instrumentation
in Kubernetes components. Components are instrumented using the
[Prometheus Go client library](https://github.com/prometheus/client_golang). For non-Go
components. [Libraries in other languages](https://prometheus.io/docs/instrumenting/clientlibs/)
are available.

We use the Prometheus monitoring system's golang client library for
instrumenting our code. Once you've picked out a file that you want to add a
metric to, you should:
The metrics are exposed via HTTP in the
[Prometheus metric format](https://prometheus.io/docs/instrumenting/exposition_formats/),
which is open and well-understood by a wide range of third party applications and vendors
outside of the Prometheus eco-system.

The [general instrumentation advice](https://prometheus.io/docs/practices/instrumentation/)
from the Prometheus documentation applies. This document reiterates common pitfalls and some
Kubernetes specific considerations.

Prometheus metrics are cheap as they have minimal internal memory state. Set and increment
operations are thread safe and take 10-25 nanoseconds (Go & Java).
Thus, instrumentation can and should cover all operationally relevant aspects of an application,
internal and external.

## Quick Start

The following describes the basic steps required to add a new metric (in Go).

1. Import "github.com/prometheus/client_golang/prometheus".

Expand All @@ -22,29 +39,180 @@ the values.
labels on the metric. If so, add "Vec" to the name of the type of metric you
want and add a slice of the label names to the definition.

https://github.com/kubernetes/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/apiserver/apiserver.go#L53
https://github.com/kubernetes/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/kubelet/metrics/metrics.go#L31
[Example](https://github.com/kubernetes/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/apiserver/apiserver.go#L53)
```go
requestCounter = prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "apiserver_request_count",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

Help: "Counter of apiserver requests broken out for each verb, API resource, client, and HTTP response code.",
},
[]string{"verb", "resource", "client", "code"},
)
```

3. Register the metric so that prometheus will know to export it.

https://github.com/kubernetes/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/kubelet/metrics/metrics.go#L74
https://github.com/kubernetes/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/apiserver/apiserver.go#L78
[Example](https://github.com/kubernetes/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/apiserver/apiserver.go#L78)
```go
func init() {
prometheus.MustRegister(requestCounter)
prometheus.MustRegister(requestLatencies)
prometheus.MustRegister(requestLatenciesSummary)
}
```

4. Use the metric by calling the appropriate method for your metric type (Set,
Inc/Add, or Observe, respectively for Gauge, Counter, or Histogram/Summary),
first calling WithLabelValues if your metric has any labels

https://github.com/kubernetes/kubernetes/blob/3ce7fe8310ff081dbbd3d95490193e1d5250d2c9/pkg/kubelet/kubelet.go#L1384
https://github.com/kubernetes/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/apiserver/apiserver.go#L87
[Example](https://github.com/kubernetes/kubernetes/blob/cd3299307d44665564e1a5c77d0daa0286603ff5/pkg/apiserver/apiserver.go#L87)
```go
requestCounter.WithLabelValues(*verb, *resource, client, strconv.Itoa(*httpCode)).Inc()
```


## Instrumentation types

Components have metrics capturing events and states that are inherent to their
application logic. Examples are request and error counters, request latency
histograms, or internal garbage collection cycles. Those metrics are instrumented
directly in the application code.

Secondly, there are business logic metrics. Those are not about observed application
behavior but abstract system state, such as desired replicas for a deployment.
They are not directly instrumented but collected from otherwise exposed data.

In Kubernetes they are generally captured in the [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics)
component, which reads them from the API server.
For this types of metric exposition, the
[exporter guidelines](https://prometheus.io/docs/instrumenting/writing_exporters/)
apply additionally.

## Naming

Metrics added directly by application or package code should have a unique name.
This avoids collisions of metrics added via dependencies. They also clearly
distinguish metrics collected with different semantics. This is solved through
prefixes:

```
<component_name>_<metric>
```

For example, suppose the kubelet instrumented its HTTP requests but also uses
an HTTP router providing its own implementation. Both expose metrics on total
http requests. They should be distinguishable as in:

```
kubelet_http_requests_total{path=”/some/path”,status=”200”}
routerpkg_http_requests_total{path=”/some/path”,status=”200”,method=”GET”}
```

As we can see they expose different labels and thus a naming collision would
not have been possible to resolve even if both metrics counted the exact same
requests.

Resource objects that occur in names should inherit the spelling that is used
in kubectl, i.e. daemon sets are `daemonset` rather than `daemon_set`.

## Dimensionality & Cardinality

Metrics can often replace more expensive logging as they are time-aggregated
over a sampling interval. The [multidimensional data model](https://prometheus.io/docs/concepts/data_model/)
enables deep insights and all metrics should use those label dimensions
where appropriate.

A common error that often causes performance issues in the ingesting metric
system is considering dimensions that inhibit or eliminate time aggregation
by being too specific. Typically those are user IDs or error messages.
More generally: one should know a comprehensive list of all possible values
for a label at instrumentation time.

Notable exceptions are exporters like kube-state-metrics, which expose per-pod
or per-deployment metrics, which are theoretically unbound over time as one could
constantly create new ones, with new names. However, they have
a reasonable upper bound for a given size of infrastructure they refer to and
its typical frequency of changes.

In general, “external” labels like pod or node name do not belong into the
instrumentation itself. They are to be attached to metrics by the collecting
system that has the external knowledge ([blog post](https://www.robustperception.io/target-labels-are-for-life-not-just-for-christmas/)).

## Normalization

Metrics should be normalized with respect to their dimensions. They should
expose the minimal set of labels, each of which provides additional information.
Labels that are composed from values of different labels are not desirable.
For example:

```
example_metric{pod=”abc”,container=”proxy”,container_long=”abc/proxy”}
```

It often seems feasible to add additional meta information about an object
to all metrics about that object, e.g.:

```
kube_pod_container_restarts{namespace=...,pod=...,container=...}
```

A common use case is wanting to look at such metrics w.r.t to the node the
pod is scheduled on. So it seems convenient to add a “node” label.

```
kube_pod_container_restarts{namespace=...,pod=...,container=...,node=...}
```

This however only caters to one specific query use case. There are many more
pieces of metadata that could be added, effectively blowing up the instrumentation.
They are also not guaranteed to be stable over time. What if pods at some
point can be live migrated?
Those pieces of information should be normalized into an info-level metric
([blog post](https://www.robustperception.io/exposing-the-software-version-to-prometheus/)),
which is always set to 1. For example:

```
kube_pod_info{pod=...,namespace=...,pod_ip=...,host_ip=..,node=..., ...}
```

The metric system can later denormalize those along the identifying labels
“pod” and “namespace” labels. This leads to...

## Resource Referencing

It is often desirable to correlate different metrics about a common object,
such as a pod. Label dimensions can be used to match up different metrics.
This is most easy if label names and values are following a common pattern.
For metrics exposed by the same application, that often happens naturally.

For a system composed of several independent, and also pluggable components,
it makes sense to set cross-component standards to allow easy querying in
metric systems without extensive post-processing of data.
In Kubernetes, those are the resource objects such as deployments,
pods, or services and the namespace they belong to.

The following should be consistently used:

```
example_metric_ccc{pod=”example-app-5378923”, namespace=”default”}
```

An object is referenced by its unique name in a label named after the resource
itself (i.e. `pod`/`deployment`/... and not `pod_name`/`deployment_name`)
and the namespace it belongs to in the `namespace` label.

Note: namespace/name combinations are only unique at a certain point in time.
For time series this is given by the timestamp associated with any data point.
UUIDs are truly unique but not convenient to use in user-facing time series
queries.
They can still be incorporated using an info level metric as described above for
`kube_pod_info`. A query to a metric system selecting by UUID via a the info level
metric could look as follows:

These are the metric type definitions if you're curious to learn about them or
need more information:
```
kube_pod_restarts and on(namespace, pod) kube_pod_info{uuid=”ABC”}
```

https://github.com/prometheus/client_golang/blob/master/prometheus/gauge.go
https://github.com/prometheus/client_golang/blob/master/prometheus/counter.go
https://github.com/prometheus/client_golang/blob/master/prometheus/histogram.go
https://github.com/prometheus/client_golang/blob/master/prometheus/summary.go


<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
Expand Down