New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metric kube_pod_container_status_terminated_reason don't detect all events #2153
Comments
cc @CatherineF-dev |
Have you tried |
yes but on this metric I cannot set an alert, if the pod goes OOM the last reason is always OOMKilled and last field always 1 also if this happen multiple times. |
It would be nice to find a good alert query for this. Maybe when querying the kube_pod_container_status_last_terminated_reason you also need to query whether the pod has restarted recently?
|
It seems to be working! I'm testing it. Thanks |
QQ: is it working? If you don't have other questions, we will close it. |
Confirmed, it's working for me. Thanks! |
FWIW, there is also a new metrics in kubelet to better detect OOMKilled containers: kubernetes/kubernetes#108004 Closing since the initial problem seem to have been resolved. |
What happened:
The metric kube_pod_container_status_terminated_reason is still experiment but since a very long time.
This metric can be very useful for monitoring alerts but is not detecting all "Errors" or "OOMKilled" often the event is not collected.
What you expected to happen:
Trace of any event of termination
How to reproduce it (as minimally and precisely as possible):
I don't think is reproducible but for example OOMKilled is very often not detected in the metric
Anything else we need to know?:
Environment:
kubectl version
): 1.25.6The text was updated successfully, but these errors were encountered: