Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce KSM cardinality by denylisting unused metrics #1076

Merged
merged 1 commit into from Apr 16, 2021

Conversation

paulfantom
Copy link
Member

In most scenarios KSM ships a lot of metrics and it is one of main sources of cardinality issues in prometheus. This addon allows running KSM in "lite" mode by deny listing metrics that aren't used in mixins and don't have much operational value (mostly because the information they carry is already included in other metrics).

I decided to make this an optional addon as such configuration may be disruptive to end-users and IMHO this should be opt-in.

… unused metrics

Signed-off-by: paulfantom <pawel@krupa.net.pl>
@paulfantom
Copy link
Member Author

paulfantom commented Apr 7, 2021

Following metrics are removed with this addon:

kube_*_created - There is not much value (from monitoring perspective) in knowing when each pod started, but this data has high cardinality
kube_*_metadata_resource_version - There is not much value (from monitoring perspective) in knowing which resource version, but this data has high cardinality
kube_replicaset_metadata_generation - In most cases kube_deployment_metadata_generation is more useful due
kube_replicaset_status_observed_generation - Same as above
kube_pod_restart_policy - This is obtainable from manifests and doesn't carry much value in runtime
kube_pod_init_container_status_terminated - kube_pod_init_container_status_terminated carries similar information with additional data about reason
kube_pod_init_container_status_running - kube_pod_init_container_status_running covers similar data
kube_pod_container_status_terminated - kube_pod_container_status_terminated_reason carries similar information with additional data about reason
kube_pod_container_status_running - kube_pod_container_status_ready covers similar data
kube_pod_completion_time - Completion time is useful mostly with cronjobs and in that case kube_job_status_completion_time better reflects this
kube_pod_status_scheduled - kube_pod_status_scheduled_time carries similar information with additional data about time

There are more metrics that aren't used directly in kube-prometheus, but I decided to remove only the ones above as those had a high impact on cardinality, and removing them shouldn't create issues during normal operations.

The idea is not to force all users to use this patch but allow ones who want to and gradually introduce those changes in the default installation.

/cc @prometheus-operator/kube-prometheus-reviewers

@paulfantom paulfantom changed the title WIP: reduce KSM cardinality by denylisting unused metrics reduce KSM cardinality by denylisting unused metrics Apr 7, 2021
@simonpasquier
Copy link
Contributor

lgtm

@paulfantom paulfantom merged commit 8b62749 into prometheus-operator:main Apr 16, 2021
@paulfantom paulfantom deleted the ksm-lite branch April 16, 2021 10:36
simonpasquier added a commit to simonpasquier/cluster-monitoring-operator that referenced this pull request May 3, 2021
Some metrics exposed by kube-state-metrics have high cardinality and
they aren't used in any alerting/recording rule or dashboard.

This patch is based on this upstream kube-prometheus pull request:
prometheus-operator/kube-prometheus#1076

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants