eliminate kube_daemonset_status_number_misscheduled fluctuation due to autoscaling #812

garo · 2019-07-09T04:26:51Z

/kind feature

What happened:

The metric kube_daemonset_status_number_misscheduled is often used with Prometheus alerting to alert if a DaemonSet member pod cannot be scheduled in all machines where it requires to be present.

However in a cluster with an active cluster autoscaler new machines constantly come and go, thus there are often several nodes which hasn't had time to schedule the required DaemonSet pods. This causes the kube_daemonset_status_number_misscheduled based alerts to trigger without an actual valid error condition.

What you expected to happen:

I would expect either that the kube_daemonset_status_number_misscheduled metric would not include DaemonSet pod members from nodes which hasn't yet had the time to get the pod working, or that the metric would investigate each DaemonSet pod member and determine the reason why the pod is not running and ignore reasons when the pod is still in starting mode.

tariq1890 · 2019-07-09T05:12:48Z

Have you tried blacklisting this metric?

brancz · 2019-07-09T05:15:16Z

I agree the situation is not good, but I don't think this is to be fixed in kube-state-metrics. kube-state-metrics takes this value directly from the DaemonSet object, so if anything then this should change, or the alerting rule should be less sensitive, which is probably appropriate either way. Can you share the alerting rule? (if you didn't write it yourself it probably makes most sense to open an issue where you are taking the alerting rule from :) )

garo · 2019-07-09T06:08:59Z

The current rule I'm using comes from the Prometheus Operator project by default and it's a simple

"kube_daemonset_status_number_misscheduled{job="kube-state-metrics"} > 0"

I can change this to be less and less sensitive so that it alerts later, but if the cluster is constantly scaling then the problem never goes away.

You can think this in a way that each node hosting a DaemonSet pod has its own timeline. Now the kube_daemonset_status_number_misscheduled simply combines all together and thus ignoring the node individuality. If your cluster is constantly scaling then you will always have one node which is just being started and this will then trigger the alert. Reducing the threshold can then easily miss nodes which are problematic for another reason unrelated to scaling.

I would gladly modify the alert in any way possible to fix the scaling based issues while still retaining the visibility for non-scaling based issues, but to my knowledge there isn't another metric which I could use.

One fix could be to create a new metric like kube_daemonset_pod_status which would have daemonset="" label and another label for the node where the pod is scheduled and the value being the statys. This way I could create an alert which would look each pod for each daemonset as a single individual and trigger only if a single pod is unscheduled for more than x minutes.

Might kube-state-metrics be a right place for this kind of new metric?

brancz · 2019-07-09T06:34:22Z

The current rule I'm using comes from the Prometheus Operator project by default and it's a simple

The Prometheus Operator pretty much doesn't define any alerting rules (if at all kube-prometheus), it imports all of them through the kubernetes-mixin.

kube-state-metrics only mirrors Kubernetes API objects to Prometheus metrics, no correlation or pre-aggregation or something like that, so if this data is supposed to end up in kube-state-metrics, then it first has to be in the object itself.

In terms of the alerting rule, I think increasing the "for" value could be a first step, a second one could be changing the threshold to be higher. At the end of the day, if the alerting rule doesn't actually help you, it either worth modifying or removing entirely.

garo · 2019-07-09T06:39:03Z

Thank you for the clear answer that kube-state-metrics is not a place to do aggregations, thus making my feature request invalid.

What comes for my use case this particular metric doesn't help me at this stage so I'm going to remove it. What I will be lacking is the knowledge if a single pod belonging to a DaemonSet is unable to start itself for a longer period of time, but I will need to find another way to express that alert.

Thank you.

aantn · 2021-12-28T16:54:22Z

@garo you can do it with Robusta by using an on_dameonset_update trigger.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 9, 2019

garo closed this as completed Jul 9, 2019

omerlh mentioned this issue Feb 6, 2020

Alert KubeDaemonSetMisScheduled is too sensitive kubernetes-monitoring/kubernetes-mixin#347

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eliminate kube_daemonset_status_number_misscheduled fluctuation due to autoscaling #812

eliminate kube_daemonset_status_number_misscheduled fluctuation due to autoscaling #812

garo commented Jul 9, 2019

tariq1890 commented Jul 9, 2019

brancz commented Jul 9, 2019

garo commented Jul 9, 2019 •

edited

brancz commented Jul 9, 2019

garo commented Jul 9, 2019

aantn commented Dec 28, 2021

eliminate kube_daemonset_status_number_misscheduled fluctuation due to autoscaling #812

eliminate kube_daemonset_status_number_misscheduled fluctuation due to autoscaling #812

Comments

garo commented Jul 9, 2019

tariq1890 commented Jul 9, 2019

brancz commented Jul 9, 2019

garo commented Jul 9, 2019 • edited

brancz commented Jul 9, 2019

garo commented Jul 9, 2019

aantn commented Dec 28, 2021

garo commented Jul 9, 2019 •

edited