Constrain the host sandbox cgroup for devices and cpusets by default when sandbox-cgroup-only selected #2793
Conversation
Thank you for raising your pull request. Please note that the main development of Kata Containers has moved to the 2.0-dev branch of https://github.com/kata-containers/kata-containers repository. The kata-containers/runtime repository is kept for 1.x release maintenance. Please check twice if your change should go to the 2.0-dev branch directly. If it is strongly required for adding the change to Kata Containers 1.x releases, please ping @kata-containers/runtime to assign a dedicated developer to be responsible for porting the change to 2.0-dev branch. Thanks! |
dc66f8f
to
88c1a04
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @egernst. Please could you change the log calls to remove your name :)
}).Debug("EGERNST") | ||
} | ||
m.Lock() | ||
cgroups.CpusetCpus = cpuset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No validation on cpuset
? Can it be blank? Can we check the expected format, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An empty string is normal/expected. Validation is done @ the sandbox via CPUSets package, and I expect we'll just have an error, if the format somehow was broken, when we try to 'apply' the cgroup (see err handling there).
9d442d3
to
78ed479
Compare
@devimc @amshinde @jodh-intel @jcvenegas updated - PTAL. This is verified using cgroupfs. I will begin testing for systemd based group handling, but that will involve more cgroup handling updates outside the scope of 'fixing' the sandbox constraints. |
78ed479
to
26fc577
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @egernst - but I think that for security reasons the device cgroup is a must
(still WIP until I have tests running in k8s) |
26fc577
to
1eade24
Compare
Tested with containerd/k8s and with Docker CLI with cgroupfs. This should be about ready from my perspective. |
d3a828b
to
a4a42b9
Compare
@jodh-intel @bergwolf @devimc PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I left some questions
virtcontainers/sandbox.go
Outdated
resources = *spec.Linux.Resources | ||
// engine by default. The exception is for devices whitelist as well as sandbox-level | ||
// CPUSet. | ||
resources.Devices = spec.Linux.Resources.Devices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, I understand why cpu and memory are not being honoured, but what about blockIO ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple issues open for this ability now, and there's work in progress on K8S side to add this. Are you okay with holding off until that discussion (both on Kata, [1], and k8s side) complete to address this subsystem?
[1] - #2160
d0cffb4
to
97b41ce
Compare
Updated to address feedback. PTAL @devimc et al |
Ci is not happy
|
97b41ce
to
73412a8
Compare
73412a8
to
1017646
Compare
/test |
@chavafg @GabyCT any insights into this failure? http://jenkins.katacontainers.io/blue/organizations/jenkins/kata-containers-runtime-opensuse-15-PR/detail/kata-containers-runtime-opensuse-15-PR/1219/pipeline#log-3049 on SuSe test? I expect my changes to only by exercised when dealing with sandbox group only flag set. |
not really, have restarted it |
@jodh-intel Can you take another look? |
@bergwolf @jcvenegas Can you take a look at this PR? |
It is looking good probably update kata docs to document what is expected and why around cpusets would be great to point to users. |
took a look and seems that the opensuse CI is only failing with this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
@chavafg The PR should only have impacts on cpuset/devices cgroups related tests. |
@chavafg if you agree, can you help get this merged (skip the travis)? |
So I looked at the configuration for the opensuse CI and I see that we use This is the error I see in the kata-runtime logs of the CI job:
You can see the full log by downloading the kata-runtime from the artifacts of the CI job: http://jenkins.katacontainers.io/job/kata-containers-runtime-opensuse-15-PR/1221/artifact/artifacts/kata-runtime_06.gz You should be able to reproduce in your environment using this flag and running the functional tests from the tests repo:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @egernst.
lgtm
f86a1a8
to
52af634
Compare
Calculate sandbox's CPUSet as the union of each of the container's CPUSets. Signed-off-by: Eric Ernst <eric@amperecomputing.com>
added for calculating union of cpusets Signed-off-by: Eric Ernst <eric@amperecomputing.com>
add function for applying a CPUset change to a cgroup Signed-off-by: Eric Ernst <eric@amperecomputing.com>
Allow for constraining the cpuset as well as the devices-whitelist . Revert sandbox constraints for cpu/memory, as they break the K8S use case. Can re-add behind a non-default flag in the future. The sandbox CPUSet should be updated every time a container is created, updated, or removed. To facilitate this without rewriting the 'non constrained cgroup' handling, let's add to the Sandbox's cgroupsUpdate function. Fixes: #2792 Signed-off-by: Eric Ernst <eric@amperecomputing.com>
52af634
to
8b4c299
Compare
addressed issue. /test |
@chavafg PTAL? |
Thanks @egernst, ARM CI failure is unrelated and SLES CI is also having unrelated issues. We can merge this. |
Fixes #2811 |
This PR pulls in package from Kubernetes for working with CPUSets.
This will calculate the union of the containers to create a sandbox-level CPUSet.