Filter out non-running pods in Prometheus #3073

acondrat · 2022-09-05T09:35:06Z

Prometheus job generated by the PodMonitor does not exclude non-running pods. All the "completed" Pods are still going to be listed as targets in Prometheus and marked as down. This issue is related to PodMonitor implementation and is discussed in prometheus-operator/prometheus-operator#4816

Signed-off-by: Arcadie Condrat arcadie.condrat@gmail.com

Prometheus job generated by the PodMonitor does not exclude non-running pods. All the "completed" Pods are still going to be listed as targets in Prometheus and marked as down. This issue is related to PodMonitor implementation and is discussed in prometheus-operator/prometheus-operator#4816 Signed-off-by: Arcadie Condrat <arcadie.condrat@gmail.com>

acondrat · 2022-09-05T09:35:59Z

Same change for the Flux2 helm chart - fluxcd-community/helm-charts#121

stefanprodan · 2022-09-05T09:37:24Z

@acondrat would this prevent users from being notified when a Flux controller is in crash loop?

acondrat · 2022-09-05T10:22:27Z

A pod that is in a crashloop would still be captured by the up == 0 query. I suppose most(if not all users) have an alerting rule built around that. This is also something that is captured by the kube-state-metrics exporter with a KubeDeploymentReplicasMismatch alert.

stefanprodan

LGTM

Thanks @acondrat

SuperQ · 2022-09-16T13:38:23Z

You may also want to include Pending. I think in some cases, the Pod IP has been assigned, but not all containers have started.

Including Pending should also shorten the time between a Pod starting and Prometheus getting the configuration updated for the first scrape.

stefanprodan added the area/monitoring Monitoring related issues and pull requests label Sep 5, 2022

stefanprodan approved these changes Sep 5, 2022

View reviewed changes

stefanprodan merged commit 73668d1 into fluxcd:main Sep 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter out non-running pods in Prometheus #3073

Filter out non-running pods in Prometheus #3073

acondrat commented Sep 5, 2022

acondrat commented Sep 5, 2022

stefanprodan commented Sep 5, 2022

acondrat commented Sep 5, 2022

stefanprodan left a comment

SuperQ commented Sep 16, 2022

Filter out non-running pods in Prometheus #3073

Filter out non-running pods in Prometheus #3073

Conversation

acondrat commented Sep 5, 2022

acondrat commented Sep 5, 2022

stefanprodan commented Sep 5, 2022

acondrat commented Sep 5, 2022

stefanprodan left a comment

Choose a reason for hiding this comment

SuperQ commented Sep 16, 2022