Add documentation for Component SLIs feature (#37767)

* add component SLIs documentation * Update content/en/docs/reference/instrumentation/slis.md Co-authored-by: Tim Bannister <tim@scalefactory.com> * Update content/en/docs/reference/instrumentation/slis.md Co-authored-by: Tim Bannister <tim@scalefactory.com> * Update content/en/docs/reference/instrumentation/slis.md Co-authored-by: Tim Bannister <tim@scalefactory.com> * Update content/en/docs/reference/instrumentation/slis.md Co-authored-by: Tim Bannister <tim@scalefactory.com> * Update content/en/docs/reference/instrumentation/slis.md Co-authored-by: Tim Bannister <tim@scalefactory.com> * Update content/en/docs/reference/instrumentation/slis.md Co-authored-by: Tim Bannister <tim@scalefactory.com> * Update content/en/docs/reference/instrumentation/slis.md Co-authored-by: Tim Bannister <tim@scalefactory.com> * remove prometheus metric definitions and shell colorization * Update content/en/docs/reference/instrumentation/slis.md Co-authored-by: Tim Bannister <tim@scalefactory.com> * Update content/en/docs/reference/instrumentation/slis.md Co-authored-by: Rey Lejano <rlejano@gmail.com> Co-authored-by: Tim Bannister <tim@scalefactory.com> Co-authored-by: Rey Lejano <rlejano@gmail.com>
kubernetes · Nov 8, 2022 · 1591d7d · 1591d7d
1 parent b0fa875
commit 1591d7d
Show file tree

Hide file tree

Showing 2 changed files with 79 additions and 0 deletions.
diff --git a/content/en/docs/reference/command-line-tools-reference/feature-gates.md b/content/en/docs/reference/command-line-tools-reference/feature-gates.md
@@ -65,6 +65,7 @@ different Kubernetes components.
 | `AnyVolumeDataSource` | `true` | Beta | 1.24 | |
 | `AppArmor` | `true` | Beta | 1.4 | |
 | `CheckpointContainer` | `false` | Alpha | 1.25 | |
+| `ComponentSLIs` | `false` | Alpha | 1.26 | |
 | `CPUManager` | `false` | Alpha | 1.8 | 1.9 |
 | `CPUManager` | `true` | Beta | 1.10 | |
 | `CPUManagerPolicyAlphaOptions` | `false` | Alpha | 1.23 | |
@@ -669,6 +670,8 @@ Each feature gate is designed for enabling/disabling a specific feature:
   for more details.
 - `CheckpointContainer`: Enables the kubelet `checkpoint` API.
   See [Kubelet Checkpoint API](/docs/reference/node/kubelet-checkpoint-api/) for more details.
+- `ComponentSLIs`: Enables the component's SLIs metrics endpoint.
+  See [Kubernetes Component SLIs](/docs/reference/instrumentation/slis/) for more details.
 - `ControllerManagerLeaderMigration`: Enables Leader Migration for
   [kube-controller-manager](/docs/tasks/administer-cluster/controller-manager-leader-migration/#initial-leader-migration-configuration) and
   [cloud-controller-manager](/docs/tasks/administer-cluster/controller-manager-leader-migration/#deploy-cloud-controller-manager)

diff --git a/content/en/docs/reference/instrumentation/slis.md b/content/en/docs/reference/instrumentation/slis.md
@@ -0,0 +1,76 @@
+---
+reviewers:
+- logicalhan
+title: Kubernetes Component SLI Metrics
+linkTitle: Service Level Indicator Metrics
+content_type: reference
+weight: 20
+---
+
+<!-- overview -->
+
+{{< feature-state for_k8s_version="v1.26" state="alpha" >}}
+
+As an alpha feature, Kubernetes lets you configure Service Level Indicator (SLI) metrics 
+for each Kubernetes component binary. This metric endpoint is exposed on the serving 
+HTTPS port of each component, at the path `/metrics/slis`. You must enable the 
+`ComponentSLIs` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
+for every component from which you want to scrape SLI metrics.
+
+<!-- body -->
+
+## SLI Metrics
+
+With SLI metrics enabled, each Kubernetes component exposes two metrics,
+labeled per healthcheck:
+
+- a gauge (which represents the current state of the healthcheck)
+- a counter (which records the cumulative counts observed for each healthcheck state)
+
+You can use the metric information to calculate per-component availability statistics.
+For example, the API server checks the health of etcd. You can work out and report how
+available or unavailable etcd has been - as reported by its client, the API server.
+
+
+The prometheus gauge data looks like this:
+
+```
+# HELP kubernetes_healthcheck [ALPHA] This metric records the result of a single healthcheck.
+# TYPE kubernetes_healthcheck gauge
+kubernetes_healthcheck{name="autoregister-completion",type="healthz"} 1
+kubernetes_healthcheck{name="autoregister-completion",type="readyz"} 1
+kubernetes_healthcheck{name="etcd",type="healthz"} 1
+kubernetes_healthcheck{name="etcd",type="readyz"} 1
+kubernetes_healthcheck{name="etcd-readiness",type="readyz"} 1
+kubernetes_healthcheck{name="informer-sync",type="readyz"} 1
+kubernetes_healthcheck{name="log",type="healthz"} 1
+kubernetes_healthcheck{name="log",type="readyz"} 1
+kubernetes_healthcheck{name="ping",type="healthz"} 1
+kubernetes_healthcheck{name="ping",type="readyz"} 1
+```
+
+While the counter data looks like this:
+
+```
+# HELP kubernetes_healthchecks_total [ALPHA] This metric records the results of all healthcheck.
+# TYPE kubernetes_healthchecks_total counter
+kubernetes_healthchecks_total{name="autoregister-completion",status="error",type="readyz"} 1
+kubernetes_healthchecks_total{name="autoregister-completion",status="success",type="healthz"} 15
+kubernetes_healthchecks_total{name="autoregister-completion",status="success",type="readyz"} 14
+kubernetes_healthchecks_total{name="etcd",status="success",type="healthz"} 15
+kubernetes_healthchecks_total{name="etcd",status="success",type="readyz"} 15
+kubernetes_healthchecks_total{name="etcd-readiness",status="success",type="readyz"} 15
+kubernetes_healthchecks_total{name="informer-sync",status="error",type="readyz"} 1
+kubernetes_healthchecks_total{name="informer-sync",status="success",type="readyz"} 14
+kubernetes_healthchecks_total{name="log",status="success",type="healthz"} 15
+kubernetes_healthchecks_total{name="log",status="success",type="readyz"} 15
+kubernetes_healthchecks_total{name="ping",status="success",type="healthz"} 15
+kubernetes_healthchecks_total{name="ping",status="success",type="readyz"} 15
+```
+
+## Using this data
+
+The component SLIs metrics endpoint is intended to be scraped at a high frequency. Scraping
+at a high frequency means that you end up with greater granularity of the gauge's signal, which
+can be then used to calculate SLOs. The `/metrics/slis` endpoint provides the raw data necessary
+to calculate an availability SLO for the respective Kubernetes component.