New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cpu manager: handle reduced cpuset in root cgroup. #87522
cpu manager: handle reduced cpuset in root cgroup. #87522
Conversation
Hi @ipuustin. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ipuustin The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cc @ConnorDoyle |
/ok-to-test |
If kubelet is started with root cgroup which has a reduced cpuset (meaning it's not equal to the system CPU count), CPU manager may give out exclusive CPUs to the containers outside of the allowed cpuset. Read the root cgroup cpuset value to prevent this from happening.
62c4d17
to
f8deca2
Compare
Fixed a bunch of tests and added some new ones. |
/retest |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codewise looks good to me, just one minor comment
@@ -121,7 +121,7 @@ func (s *sourcesReadyStub) AddSource(source string) {} | |||
func (s *sourcesReadyStub) AllReady() bool { return true } | |||
|
|||
// NewManager creates new cpu manager based on provided policy | |||
func NewManager(cpuPolicyName string, reconcilePeriod time.Duration, machineInfo *cadvisorapi.MachineInfo, numaNodeInfo topology.NUMANodeInfo, specificCPUs cpuset.CPUSet, nodeAllocatableReservation v1.ResourceList, stateFileDirectory string, affinity topologymanager.Store) (Manager, error) { | |||
func NewManager(cpuPolicyName string, reconcilePeriod time.Duration, machineInfo *cadvisorapi.MachineInfo, numaNodeInfo topology.NUMANodeInfo, specificCPUs cpuset.CPUSet, nodeAllocatableReservation v1.ResourceList, stateFileDirectory string, affinity topologymanager.Store, cgroupPath string) (Manager, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking would cpusetPath
or similar be better name for the argument(?)
@ipuustin: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
If kubelet is started with a root cgroup which has a reduced cpuset (meaning it's not equal to the system CPU count), CPU manager may do CPU allocation to containers using CPUs which are not part of the available cpuset. This would lead to the container runtime cpuset write failing with a "permission denied" error and the container failing to launch. This PR enables CPU manage static policy to read the root cgroup cpuset value to prevent this from happening.
This PR can be tested by creating a new cgroup (
foobar
) and adjusting it's top-level cpuset to be smaller than the amount of available CPUs in the system. Kubelet can then be launched with the following options:What type of PR is this?
/kind bug
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
This PR is missing tests. I was thinking of adding them once I get some green light to this fix.
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: