Skip to content

Commit

Permalink
feat: Add support keep only positive/true values.
Browse files Browse the repository at this point in the history
This adds a new flag `--metric-keep-true` that allows metrics
with series indicating false/0 value to be dropped.  This is useful
for reducing cardinality of metrics where many of the values are 0.
In larger clusters, the size of the returned metrics can be quite large
which can be costly to process and store in some environments.

The plumbing adds support for filtering series by label name, label value
or value, but this PR only sets up filtering specifically for metrics
with "condition" labels and various state metrics (e.g. kube_pod_status_phase).

This idea for this feature was discussed in kubernetes#2380.

Related kubernetes#2116
  • Loading branch information
jwilder committed May 3, 2024
1 parent 2b8eea4 commit 528dbb3
Show file tree
Hide file tree
Showing 21 changed files with 251 additions and 81 deletions.
2 changes: 2 additions & 0 deletions README.md.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,8 @@ In a 100 node cluster scaling test the latency numbers were as follows:

By default, kube-state-metrics exposes several metrics for events across your cluster. If you have a large number of frequently-updating resources on your cluster, you may find that a lot of data is ingested into these metrics. This can incur high costs on some cloud providers. Please take a moment to [configure what metrics you'd like to expose](docs/developer/cli-arguments.md), as well as consult the documentation for your Kubernetes environment in order to avoid unexpectedly high costs.

To reduce cardinality, you can set `--metric-keep-true` to keep series with positive values. This drops series with "condition" labels and multiple state labels. This will reduce the number of time series created by kube-state-metrics. This may affect queries that expect a zero value.

### kube-state-metrics vs. metrics-server

The [metrics-server](https://github.com/kubernetes-incubator/metrics-server)
Expand Down
65 changes: 65 additions & 0 deletions docs/cli-arguments.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Command line arguments

kube-state-metrics can be configured through command line arguments.

Those arguments can be passed during startup when running locally:

`kube-state-metrics --telemetry-port=8081 --kubeconfig=<KUBE-CONFIG> --apiserver=<APISERVER> ...`

Or configured in the `args` section of your deployment configuration in a Kubernetes / Openshift context:

```yaml
spec:
template:
spec:
containers:
- args:
- '--telemetry-port=8081'
- '--kubeconfig=<KUBE-CONFIG>'
- '--apiserver=<APISERVER>'
```

## Available options:

[embedmd]:# (../help.txt)
```txt
$ kube-state-metrics -h
Usage of ./kube-state-metrics:
--add_dir_header If true, adds the file directory to the header of the log messages
--alsologtostderr log to standard error as well as files
--apiserver string The URL of the apiserver to use as a master
--enable-gzip-encoding Gzip responses when requested by clients via 'Accept-Encoding: gzip' header.
-h, --help Print Help text
--host string Host to expose metrics on. (default "::")
--kubeconfig string Absolute path to the kubeconfig file
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory
--log_file string If non-empty, use this log file
--log_file_max_size uint Defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
--logtostderr log to standard error instead of files (default true)
--metric-allowlist string Comma-separated list of metrics to be exposed. This list comprises of exact metric names and/or regex patterns. The allowlist and denylist are mutually exclusive.
--metric-annotations-allowlist string Comma-separated list of Kubernetes annotations keys that will be used in the resource' labels metric. By default the metric contains only name and namespace labels. To include additional annotations provide a list of resource names in their plural form and Kubernetes annotation keys you would like to allow for them (Example: '=namespaces=[kubernetes.io/team,...],pods=[kubernetes.io/team],...)'. A single '*' can be provided per resource instead to allow any annotations, but that has severe performance implications (Example: '=pods=[*]').
--metric-denylist string Comma-separated list of metrics not to be enabled. This list comprises of exact metric names and/or regex patterns. The allowlist and denylist are mutually exclusive.
--metric-labels-allowlist string Comma-separated list of additional Kubernetes label keys that will be used in the resource' labels metric. By default the metric contains only name and namespace labels. To include additional labels provide a list of resource names in their plural form and Kubernetes label keys you would like to allow for them (Example: '=namespaces=[k8s-label-1,k8s-label-n,...],pods=[app],...)'. A single '*' can be provided per resource instead to allow any labels, but that has severe performance implications (Example: '=pods=[*]').
--metric-opt-in-list string Comma-separated list of metrics which are opt-in and not enabled by default. This is in addition to the metric allow- and denylists
--metric-keep-true Only keep series with positive values for conditions or states. By default, all metric values are kept.
--namespaces string Comma-separated list of namespaces to be enabled. Defaults to ""
--namespaces-denylist string Comma-separated list of namespaces not to be enabled. If namespaces and namespaces-denylist are both set, only namespaces that are excluded in namespaces-denylist will be used.
--one_output If true, only write logs to their native severity level (vs also writing to each lower severity level)
--pod string Name of the pod that contains the kube-state-metrics container. When set, it is expected that --pod and --pod-namespace are both set. Most likely this should be passed via the downward API. This is used for auto-detecting sharding. If set, this has preference over statically configured sharding. This is experimental, it may be removed without notice.
--pod-namespace string Name of the namespace of the pod specified by --pod. When set, it is expected that --pod and --pod-namespace are both set. Most likely this should be passed via the downward API. This is used for auto-detecting sharding. If set, this has preference over statically configured sharding. This is experimental, it may be removed without notice.
--port int Port to expose metrics on. (default 8080)
--resources string Comma-separated list of Resources to be enabled. Defaults to "certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,leases,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments"
--shard int32 The instances shard nominal (zero indexed) within the total number of shards. (default 0)
--skip_headers If true, avoid header prefixes in the log messages
--skip_log_headers If true, avoid headers when opening log files
--stderrthreshold severity logs at or above this threshold go to stderr (default 2)
--telemetry-host string Host to expose kube-state-metrics self metrics on. (default "::")
--telemetry-port int Port to expose kube-state-metrics self metrics on. (default 8081)
--tls-config string Path to the TLS configuration file
--total-shards int The total number of shards. Sharding is disabled when total shards is set to 1. (default 1)
--use-apiserver-cache Sets resourceVersion=0 for ListWatch requests, using cached resources from the apiserver instead of an etcd quorum read.
-v, --v Level number for the log level verbosity
--version kube-state-metrics build version information
--vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging
```
27 changes: 19 additions & 8 deletions internal/store/builder.go
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ import (
clientset "k8s.io/client-go/kubernetes"
"k8s.io/client-go/tools/cache"
"k8s.io/klog/v2"
"k8s.io/kube-state-metrics/v2/pkg/metric"

ksmtypes "k8s.io/kube-state-metrics/v2/pkg/builder/types"
"k8s.io/kube-state-metrics/v2/pkg/customresource"
Expand Down Expand Up @@ -83,11 +84,14 @@ type Builder struct {
allowLabelsList map[string][]string
useAPIServerCache bool
utilOptions *options.Options
metricFilterFunc MetricFilterFunc
}

// NewBuilder returns a new builder.
func NewBuilder() *Builder {
b := &Builder{}
b := &Builder{
metricFilterFunc: KeepAllFilter,
}
return b
}

Expand Down Expand Up @@ -258,6 +262,13 @@ func (b *Builder) WithAllowLabels(labels map[string][]string) error {
return err
}

// WithMetricFilter configures a filter function to allow dropping metrics
// based on values, label keys or values. If the fn return true, the metric
// will be filtered.
func (b *Builder) WithMetricFilter(fn func(*metric.Metric) bool) {
b.metricFilterFunc = fn
}

// Build initializes and registers all enabled stores.
// It returns metrics writers which can be used to write out
// metrics from the stores.
Expand Down Expand Up @@ -374,7 +385,7 @@ func (b *Builder) buildDaemonSetStores() []cache.Store {
}

func (b *Builder) buildDeploymentStores() []cache.Store {
return b.buildStoresFunc(deploymentMetricFamilies(b.allowAnnotationsList["deployments"], b.allowLabelsList["deployments"]), &appsv1.Deployment{}, createDeploymentListWatch, b.useAPIServerCache)
return b.buildStoresFunc(deploymentMetricFamilies(b.allowAnnotationsList["deployments"], b.allowLabelsList["deployments"], b.metricFilterFunc), &appsv1.Deployment{}, createDeploymentListWatch, b.useAPIServerCache)
}

func (b *Builder) buildEndpointsStores() []cache.Store {
Expand All @@ -386,15 +397,15 @@ func (b *Builder) buildEndpointSlicesStores() []cache.Store {
}

func (b *Builder) buildHPAStores() []cache.Store {
return b.buildStoresFunc(hpaMetricFamilies(b.allowAnnotationsList["horizontalpodautoscalers"], b.allowLabelsList["horizontalpodautoscalers"]), &autoscaling.HorizontalPodAutoscaler{}, createHPAListWatch, b.useAPIServerCache)
return b.buildStoresFunc(hpaMetricFamilies(b.allowAnnotationsList["horizontalpodautoscalers"], b.allowLabelsList["horizontalpodautoscalers"], b.metricFilterFunc), &autoscaling.HorizontalPodAutoscaler{}, createHPAListWatch, b.useAPIServerCache)
}

func (b *Builder) buildIngressStores() []cache.Store {
return b.buildStoresFunc(ingressMetricFamilies(b.allowAnnotationsList["ingresses"], b.allowLabelsList["ingresses"]), &networkingv1.Ingress{}, createIngressListWatch, b.useAPIServerCache)
}

func (b *Builder) buildJobStores() []cache.Store {
return b.buildStoresFunc(jobMetricFamilies(b.allowAnnotationsList["jobs"], b.allowLabelsList["jobs"]), &batchv1.Job{}, createJobListWatch, b.useAPIServerCache)
return b.buildStoresFunc(jobMetricFamilies(b.allowAnnotationsList["jobs"], b.allowLabelsList["jobs"], b.metricFilterFunc), &batchv1.Job{}, createJobListWatch, b.useAPIServerCache)
}

func (b *Builder) buildLimitRangeStores() []cache.Store {
Expand All @@ -406,19 +417,19 @@ func (b *Builder) buildMutatingWebhookConfigurationStores() []cache.Store {
}

func (b *Builder) buildNamespaceStores() []cache.Store {
return b.buildStoresFunc(namespaceMetricFamilies(b.allowAnnotationsList["namespaces"], b.allowLabelsList["namespaces"]), &v1.Namespace{}, createNamespaceListWatch, b.useAPIServerCache)
return b.buildStoresFunc(namespaceMetricFamilies(b.allowAnnotationsList["namespaces"], b.allowLabelsList["namespaces"], b.metricFilterFunc), &v1.Namespace{}, createNamespaceListWatch, b.useAPIServerCache)
}

func (b *Builder) buildNetworkPolicyStores() []cache.Store {
return b.buildStoresFunc(networkPolicyMetricFamilies(b.allowAnnotationsList["networkpolicies"], b.allowLabelsList["networkpolicies"]), &networkingv1.NetworkPolicy{}, createNetworkPolicyListWatch, b.useAPIServerCache)
}

func (b *Builder) buildNodeStores() []cache.Store {
return b.buildStoresFunc(nodeMetricFamilies(b.allowAnnotationsList["nodes"], b.allowLabelsList["nodes"]), &v1.Node{}, createNodeListWatch, b.useAPIServerCache)
return b.buildStoresFunc(nodeMetricFamilies(b.allowAnnotationsList["nodes"], b.allowLabelsList["nodes"], b.metricFilterFunc), &v1.Node{}, createNodeListWatch, b.useAPIServerCache)
}

func (b *Builder) buildPersistentVolumeClaimStores() []cache.Store {
return b.buildStoresFunc(persistentVolumeClaimMetricFamilies(b.allowAnnotationsList["persistentvolumeclaims"], b.allowLabelsList["persistentvolumeclaims"]), &v1.PersistentVolumeClaim{}, createPersistentVolumeClaimListWatch, b.useAPIServerCache)
return b.buildStoresFunc(persistentVolumeClaimMetricFamilies(b.allowAnnotationsList["persistentvolumeclaims"], b.allowLabelsList["persistentvolumeclaims"], b.metricFilterFunc), &v1.PersistentVolumeClaim{}, createPersistentVolumeClaimListWatch, b.useAPIServerCache)
}

func (b *Builder) buildPersistentVolumeStores() []cache.Store {
Expand Down Expand Up @@ -462,7 +473,7 @@ func (b *Builder) buildStorageClassStores() []cache.Store {
}

func (b *Builder) buildPodStores() []cache.Store {
return b.buildStoresFunc(podMetricFamilies(b.allowAnnotationsList["pods"], b.allowLabelsList["pods"]), &v1.Pod{}, createPodListWatch, b.useAPIServerCache)
return b.buildStoresFunc(podMetricFamilies(b.allowAnnotationsList["pods"], b.allowLabelsList["pods"], b.metricFilterFunc), &v1.Pod{}, createPodListWatch, b.useAPIServerCache)
}

func (b *Builder) buildCsrStores() []cache.Store {
Expand Down
15 changes: 10 additions & 5 deletions internal/store/deployment.go
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ var (
descDeploymentLabelsDefaultLabels = []string{"namespace", "deployment"}
)

func deploymentMetricFamilies(allowAnnotationsList, allowLabelsList []string) []generator.FamilyGenerator {
func deploymentMetricFamilies(allowAnnotationsList, allowLabelsList []string, filterFn MetricFilterFunc) []generator.FamilyGenerator {
return []generator.FamilyGenerator{
*generator.NewFamilyGeneratorWithStability(
"kube_deployment_created",
Expand Down Expand Up @@ -166,17 +166,22 @@ func deploymentMetricFamilies(allowAnnotationsList, allowLabelsList []string) []
basemetrics.STABLE,
"",
wrapDeploymentFunc(func(d *v1.Deployment) *metric.Family {
ms := make([]*metric.Metric, len(d.Status.Conditions)*len(conditionStatuses))
ms := make([]*metric.Metric, 0, len(d.Status.Conditions)*len(conditionStatuses))

for i, c := range d.Status.Conditions {
for _, c := range d.Status.Conditions {
conditionMetrics := addConditionMetrics(c.Status)

for j, m := range conditionMetrics {
for _, m := range conditionMetrics {
metric := m

metric.LabelKeys = []string{"condition", "status"}
metric.LabelValues = append([]string{string(c.Type)}, metric.LabelValues...)
ms[i*len(conditionStatuses)+j] = metric

if filterFn(metric) {
continue
}

ms = append(ms, metric)
}
}

Expand Down
4 changes: 2 additions & 2 deletions internal/store/deployment_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -194,8 +194,8 @@ func TestDeploymentStore(t *testing.T) {
}

for i, c := range cases {
c.Func = generator.ComposeMetricGenFuncs(deploymentMetricFamilies(c.AllowAnnotationsList, nil))
c.Headers = generator.ExtractMetricFamilyHeaders(deploymentMetricFamilies(c.AllowAnnotationsList, nil))
c.Func = generator.ComposeMetricGenFuncs(deploymentMetricFamilies(c.AllowAnnotationsList, nil, KeepAllFilter))
c.Headers = generator.ExtractMetricFamilyHeaders(deploymentMetricFamilies(c.AllowAnnotationsList, nil, KeepAllFilter))
if err := c.run(); err != nil {
t.Errorf("unexpected collecting result in %vth run:\n%s", i, err)
}
Expand Down
10 changes: 7 additions & 3 deletions internal/store/horizontalpodautoscaler.go
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ var (
targetMetricLabels = []string{"metric_name", "metric_target_type"}
)

func hpaMetricFamilies(allowAnnotationsList, allowLabelsList []string) []generator.FamilyGenerator {
func hpaMetricFamilies(allowAnnotationsList, allowLabelsList []string, filterFn MetricFilterFunc) []generator.FamilyGenerator {
return []generator.FamilyGenerator{
createHPAInfo(),
createHPAMetaDataGeneration(),
Expand All @@ -65,7 +65,7 @@ func hpaMetricFamilies(allowAnnotationsList, allowLabelsList []string) []generat
createHPAStatusDesiredReplicas(),
createHPAAnnotations(allowAnnotationsList),
createHPALabels(allowLabelsList),
createHPAStatusCondition(),
createHPAStatusCondition(filterFn),
}
}

Expand Down Expand Up @@ -386,7 +386,7 @@ func createHPALabels(allowLabelsList []string) generator.FamilyGenerator {
)
}

func createHPAStatusCondition() generator.FamilyGenerator {
func createHPAStatusCondition(filterFn MetricFilterFunc) generator.FamilyGenerator {
return *generator.NewFamilyGeneratorWithStability(
"kube_horizontalpodautoscaler_status_condition",
"The condition of this autoscaler.",
Expand All @@ -403,6 +403,10 @@ func createHPAStatusCondition() generator.FamilyGenerator {
metric := m
metric.LabelKeys = []string{"condition", "status"}
metric.LabelValues = append([]string{string(c.Type)}, metric.LabelValues...)

if filterFn(metric) {
continue
}
ms = append(ms, metric)
}
}
Expand Down
4 changes: 2 additions & 2 deletions internal/store/horizontalpodautoscaler_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -425,8 +425,8 @@ func TestHPAStore(t *testing.T) {
},
}
for i, c := range cases {
c.Func = generator.ComposeMetricGenFuncs(hpaMetricFamilies(c.AllowAnnotationsList, c.AllowLabelsList))
c.Headers = generator.ExtractMetricFamilyHeaders(hpaMetricFamilies(c.AllowAnnotationsList, c.AllowLabelsList))
c.Func = generator.ComposeMetricGenFuncs(hpaMetricFamilies(c.AllowAnnotationsList, c.AllowLabelsList, KeepAllFilter))
c.Headers = generator.ExtractMetricFamilyHeaders(hpaMetricFamilies(c.AllowAnnotationsList, c.AllowLabelsList, KeepAllFilter))
if err := c.run(); err != nil {
t.Errorf("unexpected collecting result in %vth run:\n%s", i, err)
}
Expand Down
10 changes: 9 additions & 1 deletion internal/store/job.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ var (
jobFailureReasons = []string{"BackoffLimitExceeded", "DeadlineExceeded", "Evicted"}
)

func jobMetricFamilies(allowAnnotationsList, allowLabelsList []string) []generator.FamilyGenerator {
func jobMetricFamilies(allowAnnotationsList, allowLabelsList []string, filterFn MetricFilterFunc) []generator.FamilyGenerator {
return []generator.FamilyGenerator{
*generator.NewFamilyGeneratorWithStability(
descJobAnnotationsName,
Expand Down Expand Up @@ -279,6 +279,10 @@ func jobMetricFamilies(allowAnnotationsList, allowLabelsList []string) []generat
for _, m := range metrics {
metric := m
metric.LabelKeys = []string{"condition"}

if filterFn(metric) {
continue
}
ms = append(ms, metric)
}
}
Expand All @@ -304,6 +308,10 @@ func jobMetricFamilies(allowAnnotationsList, allowLabelsList []string) []generat
for _, m := range metrics {
metric := m
metric.LabelKeys = []string{"condition"}

if filterFn(metric) {
continue
}
ms = append(ms, metric)
}
}
Expand Down
4 changes: 2 additions & 2 deletions internal/store/job_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -254,8 +254,8 @@ func TestJobStore(t *testing.T) {
},
}
for i, c := range cases {
c.Func = generator.ComposeMetricGenFuncs(jobMetricFamilies(nil, nil))
c.Headers = generator.ExtractMetricFamilyHeaders(jobMetricFamilies(nil, nil))
c.Func = generator.ComposeMetricGenFuncs(jobMetricFamilies(nil, nil, KeepAllFilter))
c.Headers = generator.ExtractMetricFamilyHeaders(jobMetricFamilies(nil, nil, KeepAllFilter))
if err := c.run(); err != nil {
t.Errorf("unexpected collecting result in %vth run:\n%s", i, err)
}
Expand Down
Loading

0 comments on commit 528dbb3

Please sign in to comment.