New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
processes from /system.slice using cores assigned to guaranteed Pods when --cpu-manager-policy=static is set #85764
Comments
It seems to me that the
The |
I looked a bit further and it seems that the 10-core request is converted into cpu shares:
Especially https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/types.go#L25, which is used to set the cgroup limit, doesn't have a field for integer CPU requests. Also, container manager only knows that "10" cpus are reserved, but it's not told which ones those are. |
Thanks.
Can you please let me know how to specified this? From https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/, --system-reserved=cpu is to be a total number of cores and not the cpu numbers like 1-10. Do you see any issue in "--system-reserved=cpu=10 param we have given". I can see another param ---reserved-cpus. this option is added in k8s 1.17 only. Also from documentation, it looks like it's alternative of -system-reserved=cpu. Regarding your previous response - 6:cpuset:/ I can check why it's coming "/" and not the "/system.slice". |
I think |
I tested with the following configuration - I am finding same issue. I deployed a Pod with 4 cores. /var/lib/kubelet/cpu_manager_state shows - {"policyName":"static","defaultCpuSet":"0-3,6-15,18-23","entries":{"80135d993f44b38528069c26a050c402807f10f89aee354ef04af75671b04e83":"4-5,16-17"},"checksum":1025617988} The above result shows that the Pod got 4 cores 4,5,16,17. But same issue as we can see ceph-osd is scheduled on the core 5 which is assigned to the app Pod. app Pod is not using any external volume. ps -eLF | grep -i ceph
I didn't get it. Could you please let me know how to do that. |
I have a PR which should help with your problem here by automating the cpuset resource setting to system-reserved cgroup: #87452 You can set a cpuset on a cgroup by writing the
Note that if you are reducing the cpuset, you need to first set it to all subdirectories (the parent cgroup can't have a smaller cpuset than the child cgroup). Otherwise you will get "permission denied" errors. The docs are here: https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt |
@subrnath @ipuustin --reserved-cpus option is meant to be used with either isolcpu kargs or systemd CPUAffinity control. Using systemd CPUAffinity as an example, in /etc/systemd/system.conf, specify CPUAffinity=0 1 2 3; reboot the machine to make sure systemd take it; then for kubelet specify --reserved-cpus=0-3. see if this works for you, at least as a workaround. As @ipuustin mentioned, enable cpuset cgroup setting might be another way to solve this - need to check if it is realistic - we will discuss that in the other PR. |
@jianzzha @ipuustin I read the discussion above, and try to understand your statement/conclusion: "--reserved-cpus" can avoid Guaranteed Pods to be landed on given cpuset, and "CPUAffinity" can let all systemd managed process to stay the given cpuset. |
@harper1011 yes that's right. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This is still very much a problem. Any chance this can get reopened? Otherwise I will probably make a duplicate. |
What happened:
Following is the configuration for kubelet to enable the --cpu-manager-policy=static feature.
--cpu-manager-policy=static --system-reserved=cpu=10,memory=10Gi --system-reserved-cgroup=/system.slice.
There is no separate cgroup for the kubelet and hence --kube-reserved is not used.
./user.slice
./system.slice
./system.slice/system-getty.slice
PSR field of "ps -eLF " command on the host shows that processes from /system.slice are using cores assigned to the Application Pods which are guaranteed and integer cores assigned.
Here is /var/lib/kubelet/cpu_manager_state which shows that this Pod is assigned 8-11 and 32 - 35
{"policyName":"static","defaultCpuSet":"0-7,12-31,36-47","entries":{"f7cc858318f3a84d9cbd340e058126379292f09578ca362680e3a350c8edad63":"8-11,32-35"},"checksum":462128154}
on the Host,
ps -eLF | awk '{print $9 " " $13 " " $14 " " $2 " " $3}' | sort | grep -i glusterfs
8 /usr/sbin/glusterfs --log-level=ERROR 8870 1
Here 8 is the core ,PID 8870 and PPID is 1. Following command shows that cgroup of the PID 8870 is /system.slice
$ cat /proc/8870/cgroup
11:hugetlb:/
10:freezer:/
9:blkio:/system.slice/run-rf63a79fe920945f497cdb8ed37874e36.scope
8:memory:/system.slice/run-rf63a79fe920945f497cdb8ed37874e36.scope
7:pids:/system.slice/run-rf63a79fe920945f497cdb8ed37874e36.scope
6:cpuset:/
5:net_cls,net_prio:/
4:perf_event:/
3:cpu,cpuacct:/system.slice/run-rf63a79fe920945f497cdb8ed37874e36.scope
2:devices:/system.slice/run-rf63a79fe920945f497cdb8ed37874e36.scope
1:name=systemd:/system.slice/run-rf63a79fe920945f497cdb8ed37874e36.scope
Similarly there are other processes from the /system.slice like docker daemon etc also using the cores assigned to this Pod.
What you expected to happen:
No system process from /system.slice should be scheduled on the cores assigned to this guaranteed Pod having integer CPU
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kubectl version
): 1.15.3cat /etc/os-release
):$ cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.4 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.4 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
uname -a
):4.10.0-27-generic
/area kubelet
/sig node
The text was updated successfully, but these errors were encountered: