Skip to content

CPUThrottlingHigh (Prometheus Alert)

Natan Yellin edited this page Aug 23, 2022 · 40 revisions

Alert explanation

A pod is CPU throttled. It wanted to use the CPU and was blocked due to the pod's CPU limit.

Special Cases

If this occurs on metrics-server then see this page.

Why CPU throttling can occur despite low CPU usage permalink

Imagine a pod with a CPU limit of 100m which is equivalent to 1/10 vCPU. The pod does nothing for 10 minutes. Then it uses the CPU nonstop for 200ms. The usage during the burst is equivalent to 2/10 vCPU, hence the pod is over it's limit and will be throttled. On the other hand, the average CPU usage will be incredibly low. The burst is so small (200 milliseconds) that it wont show up in any graphs.

Recommended remediation

  1. Remove this pod's CPU limit entirely
  2. Make sure you have a CPU request set, preferably a relatively accurate one
  3. If you still have CPU throttling, raise the CPU request.

Why you don't need CPU limits

As long as your other pods have a CPU request, Kubernetes maintainers like Tim Hockin recommend not using limits at all. This way pods are free to use spare CPU instead of letting the CPU stay idle.

Contrary to common belief, even if you remove this pod's CPU limit, other pods are still guaranteed the CPU they requested. The CPU limit only effects how spare CPU is distributed.

We wrote a full blog post on CPU throttling and why it's safe to remove limits.

Dealing with noisy neighbors without limits

The recommended strategy to prevent one pod from using up the entire CPU is not to use limits, but rather to give each pod a proper request. Even without limits, a pod can never use up CPU that another pod requested and needs.

Remediating this alert without removing limits

If you prefer not to remove the limit, you can increase the limit and/or request instead. This will reduce CPU throttling by giving the pod more access to CPU.

Alternatively, you can increase the alert's threshold if you just don't care and want the alert to stop firing. The pod will remain throttled which is not ideal.

Slack Discussion

Have questions about this alert in your environment? Join the #alert-cpu-throttling-high channel on the Robusta Slack and ask our Kubernetes team anything you like! Want to help others? Come join the channel to help answer questions!

Saving Time

We research alerts for you to save time, but sometimes the correct solution depends on your environment. Send Robusta your Prometheus alerts by webhook and get better alerts with actionable advice. Robusta runs checks on your environment and gives you the bottom-line along with a button to apply a fix.

Additional reading