-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KubernetesPodNotHealthy expr problem #94
Comments
Hi @youpai, thanks for reporting this issue. I tested the initial query, and I got the following error:
Regarding this message, I think min_over_time does not support subquery. Adding ':' does not work either, on my Prometheus instance. What version of Prometheus are you using ? |
The version of Prometheus is 2.16.0。 |
Ok, my Prometheus server was too old then. I'll prepare a PR ;) |
Hi! I think this expression query still needs some love. We just began using it and we're getting false positives. That means, short-lived pods that go trough the following phases: Pending, Running, Succeeded, Failed, Unknown get marked as unhealthy even if they did their work. Example:
From: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase
In our case they are terminated by the system, but everything was correct. I'll be thinking on a better expression but if you have any ideas I'm all eyes! |
@lgg42 I'm running into this too. I have gitlab CI runners and unit tests running on my cluster and they all trigger this because they are short lived. Did you ever come up with a better query? |
@djhoese Nope, hadn't the time yet. But is good to see I'm not the only one, sorry you're also suffering it 🙃 |
I want to use this ,but the "expr" Doesn't seem right. I get the error like:
if I use
min_over_time(sum by (namespace, pod, env, stage) (kube_pod_status_phase{phase=~"Pending|Unknown|Failed"})[1h:])
, the result is OK。The text was updated successfully, but these errors were encountered: