New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delegate cgroup exists to systemd and libcontainer #102250
Conversation
When systemd cgroup driver is used, systemd is responsible for ensuring the correct controllers are propagated. If we try to propagate a controller controlled by systemd, it can remove it for us, creating a mess.
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv2 |
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv1 |
Still WIP and testing, but; /cc @harche |
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv1 |
1 similar comment
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv1 |
By the way, since opencontainers/runc@9087f2e all runc/libcontainer's cgroup controllers have Exists() method which might be used here. The difference though is the libcontainer Exists do not check the presence of all the controllers, assuming that if Apply() was called, they all either do or do not exist. If it is possible to relax the "exists" check in here, this might result in removing a lot of code :) |
I would also audit all the uses of Say, calling Calling |
Ahh, I see. I just saw this issues opencontainers/runc#1440 and thought it wasn't fixed.
Well, it kinda depends. Calling
Well, yeah. If we want to create it, we would of course need to verify if we should run create or update. |
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv1 |
/milestone v1.22 /assign @mrunalp @Random-Liu |
Test failures are btw. because those test suites have a huuuge amount of failing tests (that we are still working on). This one helps a bit on the overall health. /skip |
/test pull-kubernetes-node-kubelet-serial-crio-cgroupv2 |
@odinuge: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Hi (@bobbypage, @yujuhong, @harche, @dims) |
This regression affects me. Since it is a regression I don't think it should be affected by the code freeze. During a code freeze, bug fixes can still go in. |
if this is fixing a release-blocking regression, that should be linked prominently in the PR description so it doesn't get bumped from the release incorrectly |
/milestone clear from SIG Node CI subgroup today, this isn't release-critical -- we will handle this via the runc bump |
We are now on runc v1.0.2, and that should hopefully have fixed all these major issues. /hold cancel |
/remove-kind failing-test |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What type of PR is this?
/kind bug
/kind failing-test
What this PR does / why we need it:
When systemd cgroup driver is used, systemd is responsible for ensuring
the correct controllers are propagated. If we try to propagate a
controller controlled by systemd, it can remove it for us, creating a
mess.
This fixes issues when
burstable.slice
exists, but systemd removes the cpuset controller since it isn't in use. In this case, trying to recreate the slice will cause a fatal error in kubelet.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: