Skip to content

Commit

Permalink
Add documentation for Component SLIs feature (#37767)
Browse files Browse the repository at this point in the history
* add component SLIs documentation

* Update content/en/docs/reference/instrumentation/slis.md

Co-authored-by: Tim Bannister <tim@scalefactory.com>

* Update content/en/docs/reference/instrumentation/slis.md

Co-authored-by: Tim Bannister <tim@scalefactory.com>

* Update content/en/docs/reference/instrumentation/slis.md

Co-authored-by: Tim Bannister <tim@scalefactory.com>

* Update content/en/docs/reference/instrumentation/slis.md

Co-authored-by: Tim Bannister <tim@scalefactory.com>

* Update content/en/docs/reference/instrumentation/slis.md

Co-authored-by: Tim Bannister <tim@scalefactory.com>

* Update content/en/docs/reference/instrumentation/slis.md

Co-authored-by: Tim Bannister <tim@scalefactory.com>

* Update content/en/docs/reference/instrumentation/slis.md

Co-authored-by: Tim Bannister <tim@scalefactory.com>

* remove prometheus metric definitions and shell colorization

* Update content/en/docs/reference/instrumentation/slis.md

Co-authored-by: Tim Bannister <tim@scalefactory.com>

* Update content/en/docs/reference/instrumentation/slis.md

Co-authored-by: Rey Lejano <rlejano@gmail.com>

Co-authored-by: Tim Bannister <tim@scalefactory.com>
Co-authored-by: Rey Lejano <rlejano@gmail.com>
  • Loading branch information
3 people committed Nov 8, 2022
1 parent b0fa875 commit 1591d7d
Show file tree
Hide file tree
Showing 2 changed files with 79 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ different Kubernetes components.
| `AnyVolumeDataSource` | `true` | Beta | 1.24 | |
| `AppArmor` | `true` | Beta | 1.4 | |
| `CheckpointContainer` | `false` | Alpha | 1.25 | |
| `ComponentSLIs` | `false` | Alpha | 1.26 | |
| `CPUManager` | `false` | Alpha | 1.8 | 1.9 |
| `CPUManager` | `true` | Beta | 1.10 | |
| `CPUManagerPolicyAlphaOptions` | `false` | Alpha | 1.23 | |
Expand Down Expand Up @@ -669,6 +670,8 @@ Each feature gate is designed for enabling/disabling a specific feature:
for more details.
- `CheckpointContainer`: Enables the kubelet `checkpoint` API.
See [Kubelet Checkpoint API](/docs/reference/node/kubelet-checkpoint-api/) for more details.
- `ComponentSLIs`: Enables the component's SLIs metrics endpoint.
See [Kubernetes Component SLIs](/docs/reference/instrumentation/slis/) for more details.
- `ControllerManagerLeaderMigration`: Enables Leader Migration for
[kube-controller-manager](/docs/tasks/administer-cluster/controller-manager-leader-migration/#initial-leader-migration-configuration) and
[cloud-controller-manager](/docs/tasks/administer-cluster/controller-manager-leader-migration/#deploy-cloud-controller-manager)
Expand Down
76 changes: 76 additions & 0 deletions content/en/docs/reference/instrumentation/slis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
---
reviewers:
- logicalhan
title: Kubernetes Component SLI Metrics
linkTitle: Service Level Indicator Metrics
content_type: reference
weight: 20
---

<!-- overview -->

{{< feature-state for_k8s_version="v1.26" state="alpha" >}}

As an alpha feature, Kubernetes lets you configure Service Level Indicator (SLI) metrics
for each Kubernetes component binary. This metric endpoint is exposed on the serving
HTTPS port of each component, at the path `/metrics/slis`. You must enable the
`ComponentSLIs` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
for every component from which you want to scrape SLI metrics.

<!-- body -->

## SLI Metrics

With SLI metrics enabled, each Kubernetes component exposes two metrics,
labeled per healthcheck:

- a gauge (which represents the current state of the healthcheck)
- a counter (which records the cumulative counts observed for each healthcheck state)

You can use the metric information to calculate per-component availability statistics.
For example, the API server checks the health of etcd. You can work out and report how
available or unavailable etcd has been - as reported by its client, the API server.


The prometheus gauge data looks like this:

```
# HELP kubernetes_healthcheck [ALPHA] This metric records the result of a single healthcheck.
# TYPE kubernetes_healthcheck gauge
kubernetes_healthcheck{name="autoregister-completion",type="healthz"} 1
kubernetes_healthcheck{name="autoregister-completion",type="readyz"} 1
kubernetes_healthcheck{name="etcd",type="healthz"} 1
kubernetes_healthcheck{name="etcd",type="readyz"} 1
kubernetes_healthcheck{name="etcd-readiness",type="readyz"} 1
kubernetes_healthcheck{name="informer-sync",type="readyz"} 1
kubernetes_healthcheck{name="log",type="healthz"} 1
kubernetes_healthcheck{name="log",type="readyz"} 1
kubernetes_healthcheck{name="ping",type="healthz"} 1
kubernetes_healthcheck{name="ping",type="readyz"} 1
```

While the counter data looks like this:

```
# HELP kubernetes_healthchecks_total [ALPHA] This metric records the results of all healthcheck.
# TYPE kubernetes_healthchecks_total counter
kubernetes_healthchecks_total{name="autoregister-completion",status="error",type="readyz"} 1
kubernetes_healthchecks_total{name="autoregister-completion",status="success",type="healthz"} 15
kubernetes_healthchecks_total{name="autoregister-completion",status="success",type="readyz"} 14
kubernetes_healthchecks_total{name="etcd",status="success",type="healthz"} 15
kubernetes_healthchecks_total{name="etcd",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="etcd-readiness",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="informer-sync",status="error",type="readyz"} 1
kubernetes_healthchecks_total{name="informer-sync",status="success",type="readyz"} 14
kubernetes_healthchecks_total{name="log",status="success",type="healthz"} 15
kubernetes_healthchecks_total{name="log",status="success",type="readyz"} 15
kubernetes_healthchecks_total{name="ping",status="success",type="healthz"} 15
kubernetes_healthchecks_total{name="ping",status="success",type="readyz"} 15
```

## Using this data

The component SLIs metrics endpoint is intended to be scraped at a high frequency. Scraping
at a high frequency means that you end up with greater granularity of the gauge's signal, which
can be then used to calculate SLOs. The `/metrics/slis` endpoint provides the raw data necessary
to calculate an availability SLO for the respective Kubernetes component.

0 comments on commit 1591d7d

Please sign in to comment.