Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube_pod_container_status_waiting_reason strange behaviour #468

Closed
cabrinoob opened this issue May 28, 2018 · 4 comments
Closed

kube_pod_container_status_waiting_reason strange behaviour #468

cabrinoob opened this issue May 28, 2018 · 4 comments

Comments

@cabrinoob
Copy link

Hi,
I'am using prometheus to watch the state of running pods on my k8s cluster. I'm using the kube_pod_container_status_waiting_reason metric to do this.

For the test purpose, I create a deployment with a non-existing image in it to force error to raise:

$ kubectl run foo --image foo

Then, on my prometheus UI, I launch this query :

kube_pod_container_status_waiting_reason{reason=~"ContainerCreating|CrashLoopBackOff|ErrImagePull|ImagePullBackOff"} > 0

during the first minute I have this result :

kube_pod_container_status_waiting_reason{app="prometheus",chart="prometheus-6.3.0",component="kube-state-metrics",container="foo",heritage="Tiller",instance="100.97.57.7:8080",job="kubernetes-service-endpoints",kubernetes_name="my-release-prometheus-kube-state-metrics",kubernetes_namespace="prometheus",namespace="jung",pod="foo-6db855bd79-wb2rs",reason="ContainerCreating",release="my-release"}

So, kube-state-metric reports that my pod is in "ContainerCreating" state

Then, during about 1 minute I have this result :

kube_pod_container_status_waiting_reason{app="prometheus",chart="prometheus-6.3.0",component="kube-state-metrics",container="foo",heritage="Tiller",instance="100.97.57.7:8080",job="kubernetes-service-endpoints",kubernetes_name="my-release-prometheus-kube-state-metrics",kubernetes_namespace="prometheus",namespace="jung",pod="foo-6db855bd79-wb2rs",reason="ErrImagePull",release="my-release"}

kube-state-metric reports that my pod is now in "ErrImagePull" state (as expected)

My problem is that this status does not persist more than 1 or 2 minutes, because if I refresh my query, I have a "no data" response while my deployment is still in ImagePullBackOff state.

Is it a normal behaviour?

Thank you for your help

@brancz
Copy link
Member

brancz commented May 28, 2018

Yes this is the correct behavior, because kube-state-metrics always reflects the state of the Kubernetes API, so because the Kubernetes API is changing the state of the Pod, so is the metric of the respective Pod.

You can solve this by using max_over_time in combination with a for statement in an alerting rule. That way you can do things like: "if this pod has had the ErrImagePull condition looking back 5minutes for 20minutes"

@cabrinoob
Copy link
Author

ok, but when you say that kube-state-metrics always reflect the state of the K8S API I don't really understand why my pod state is not stucked in "ImagePullBackOff" state.

Because when I do a kubectl get pod on my cluster I always have :

foo-6db855bd79-lznmk                     0/1       ImagePullBackOff   0          14m

So why kube-state-metrics does not always report this status too?

@brancz
Copy link
Member

brancz commented May 28, 2018

It might be due to the version of kube-state-metrics you are using, the ImagePullBackOff state was only added in v1.3.0 (in this commit).

@cabrinoob
Copy link
Author

cabrinoob commented May 28, 2018

ah ok .... I'am using k8s.gcr.io/kube-state-metrics:v1.2.0. Need to update ;)

Edit : That solved the problem!

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants