New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warn about CPU limits in teleport-cluster
Helm chart
#36251
Conversation
The PR changelog entry failed validation: Changelog entry not found in the PR body. Please add a "no-changelog" label to the PR, or changelog lines starting with |
🤖 Vercel preview here: https://docs-b1f0b3wxe-goteleport.vercel.app/docs/ver/preview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
[the Static CPU management policy](https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/#static-policy), | ||
a multithreaded workload with CPU limits will very likely not behave the way you expect when approaching its CPU limit. | ||
|
||
Teleport will become unstable once throttling starts. We recommend not to set CPU limits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a paragraph about the implications of such actions?
Since people don't seem to know how it works, it's probably good to give them an idea that CPU limits control the CPU time of the process and not the actual CPU cores reserved. This leads to huge latencies because Teleport will quickly consume its quota and will not be scheduled on any cores for long periods of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a link to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From prev experience, no one will read it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the follow up!
🤖 Vercel preview here: https://docs-r51p5u345-goteleport.vercel.app/docs/ver/preview |
@hugoShaka See the table below for backport results.
|
* Warn about CPU limits * fixup! Warn about CPU limits
Because people keep shooting themselves in the foot with CFS quotas and this causes S1s.
Technical explanation as to why cpu limits are not the best idea:
requests.cpu:1
andlimits.cpu: 1
this absolutely does not mean that Teleport will run on a single CPU, nor that its CPU will be reserved. On an 8 core node, this means teleport will run 13% of the time on all CPUs, and then not be scheduled during the remaining 87% of the observed period. The Static CPU management policy does the thing people expect: it statically allocates CPUs to each workload (plus you get a nice CPU affinity that can help a lot with single-threaded workloads)