You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create an alert when a single pod is using up the majority of CPU or memory of a node.
The following query returns per-container average number of CPUs used during the last 5 minutes:
rate(container_cpu_usage_seconds_total{container!~"POD|"}[5m])
The lookbehind window in square brackets (5m in the case above) can be changed to the needed value. See possible time duration values here.
The container!~"POD|" filter removes metrics related to cgroups hierarchy (see this answer for more details) and metrics for e.g. pause containers (see these docs).
Since each pod can contain multiple containers, then the following query can be used for returning per-pod average number of CPUs used during the last 5 minutes:
sum(
rate(container_cpu_usage_seconds_total{container!~"POD|"}[5m])
) by (namespace,pod)
The text was updated successfully, but these errors were encountered:
Related to:
#4491
Create an alert when a single pod is using up the majority of CPU or memory of a node.
The following query returns per-container average number of CPUs used during the last 5 minutes:
rate(container_cpu_usage_seconds_total{container!~"POD|"}[5m])
The lookbehind window in square brackets (5m in the case above) can be changed to the needed value. See possible time duration values here.
The container!~"POD|" filter removes metrics related to cgroups hierarchy (see this answer for more details) and metrics for e.g. pause containers (see these docs).
Since each pod can contain multiple containers, then the following query can be used for returning per-pod average number of CPUs used during the last 5 minutes:
sum(
rate(container_cpu_usage_seconds_total{container!~"POD|"}[5m])
) by (namespace,pod)
The text was updated successfully, but these errors were encountered: