Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to know what cgroup subsystems runc supports #1440

Open
derekwaynecarr opened this issue May 8, 2017 · 6 comments
Open

Ability to know what cgroup subsystems runc supports #1440

derekwaynecarr opened this issue May 8, 2017 · 6 comments

Comments

@derekwaynecarr
Copy link
Contributor

Users have attempted to run a containerized kubelet and have reported issues when running on the following Linux environment (Linux moby 4.9.8-moby #1 SMP Wed Feb 8 09:56:43 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux) that are related to the presence of the openrc cgroup subsystem (https://wiki.gentoo.org/wiki/OpenRC/CGroups)

The kubelet does cgroup creation via runc, but has an Exists code check that verifies the desired cgroup actually exists. It does this by iterating over each subsystem and ensuring the cgroup exists as expected (see: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/cgroup_manager_linux.go#L226).

The kubelet can manually filter out this cgroup for now, but it would be nice if runc had a way to return back the list of subsystems it actually supports so additional unsupported subsystems do not cause confusion.

@derekwaynecarr
Copy link
Contributor Author

For example, if GetCgroupMounts had an option to return back the list of mounts natively understood by runc, it would have not tripped up the kubelet. I am open to other ideas as well.

FYI @mrunalp @vishh @sjenning

@crosbymichael
Copy link
Member

Do you know why it does the exists check after creation?

@derekwaynecarr
Copy link
Contributor Author

@crosbymichael -- the kubelet uses runc libraries to create a cgroup hiearchy like the following:

/kubepods/pod1/container1
/kubepods/burstable/pod2/container2
/kubepods/besteffort/pod3/container3

There is a control loop that basically asks "does the kubepods cgroup exist, and if not, create it", and that control loop is confused by the presence of additional subsystems not known to runc. For now, I will probably just limit the exists check to a list of subsystems we care about, but it seemed generally useful for runc to report back the list of mountpoints it cares about as well rather than all.

@crosbymichael
Copy link
Member

@derekwaynecarr do you know how the two are out of sync? If its the same code why are they reporting different things?

@derekwaynecarr
Copy link
Contributor Author

a flow may help.

kubelet launches, and checks if /kubepods exists. it verifies this by looking over each reported mount and seeing if /sys/fs/cgroup/subsystem-here/kubepods is present. if one of them is empty, it assumes it should recreate it and call the libcontainer manager.Apply(-1).

at a separate step in the code, the kubelet then tries to create the quality of service cgroups for burstable and besteffort. it first verifies that the kubepods cgroup exists as expected, and that existence check fails because there is no folder present in /sys/fs/cgroup/openrc/kubepods. we run into the same general issue when restarting kubelets.

I am basically trying to determine the most reliable path for verifying a cgroup exists once runc has applied it. For the moment, I have just filtered our exist checks to a set of supported subsystems, but it would be nice if runc had a function for this use case.

@crosbymichael
Copy link
Member

Ok, I understand now. Thanks.

Maybe its better to add an Exists() method on the manager to check if a path exists and supported by the code instead doing a stat or whatever else is going on. Since each subsystem is a type in the code we add the exists method there and if libcontainer does not support it you will get a ErrNotSupportSubsystem error or something.

k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this issue May 11, 2017
Automatic merge from submit-queue (batch tested with PRs 45515, 45579)

Ignore openrc cgroup

**What this PR does / why we need it**:
It is a work-around for the following: opencontainers/runc#1440

**Special notes for your reviewer**:
I am open to a cleaner way to do this, but we have many developer users on Macs that ran containerized kubelets that are not able to run them right now due to the inclusion of openrc tripping up our existence checks.  Ideally, runc can give us a call to say "does this exist according to what runc knows about".  Or we could add a whitelist check.  Right now, this was the smallest hack pending more discussion.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants