Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guaranteed PODs CPUs are shared by other process running on the same host #99895

Closed
DapengJiao opened this issue Mar 6, 2021 · 10 comments
Closed
Labels
kind/support Categorizes issue or PR as a support question. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@DapengJiao
Copy link

What happened:

Guaranteed PODs CPUs are shared by other process running on the same host

What you expected to happen:

This pod runs in the Guaranteed QoS class should be granted with exclusive CPUs.

How to reproduce it (as minimally and precisely as possible):

  • Deploy K8S cluster (1.20.5) on OpenStack with 3 master and 2 workers, each worker VM has 32 vCPUs
  • Set "--cpu-manager-policy=status" and configure "--reserved-cpus=0-1" on each workers' kubelet configuration
  • Set "CPUAffinity=0-1" on "/etc/systemd/system.conf" configuration file
  • Deploy a set of dummy "nginx" Pods with a deployment of 20 replicas, each replicas requests&limits 4 CPUs
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
spec:
  replicas: 20
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        resources:
          limits:
            memory: "200Mi"
            cpu: "4"
          requests:
            memory: "200Mi"
            cpu: "4"
  • After Pods are created, login each worker and check its "cpu_manager_state" and process status
worker-pool1-vam0h3j8-eccd-cluster-dapeng:/home/eccd # cat /var/lib/kubelet/cpu_manager_state | jq .
{
  "policyName": "static",
  "defaultCpuSet": "0-1,34-35",
  "entries": {
    "31c83fdd-1468-4877-95e5-4d774586eb0d": {
      "nginx": "2-5"
    },
    "32ea3dd9-d886-448f-a11d-ab0ec4ba0652": {
      "nginx": "14-17"
    },
    "566567ce-a71c-44c9-9052-036b4351c056": {
      "nginx": "30-33"
    },
    "6c753f14-57f4-48d0-90a2-e6d554a3bb49": {
      "nginx": "26-29"
    },
    "a3d9c419-2e59-4ac1-87e9-879f4e9e8fc7": {
      "nginx": "10-13"
    },
    "cc18e830-0129-479a-ad15-94a71efdeb8b": {
      "nginx": "18-21"
    },
    "f2d73bc5-8264-4931-bbc0-6c8b91c3db18": {
      "nginx": "6-9"
    },
    "f9717485-d049-4296-8322-b3fab800bf90": {
      "nginx": "22-25"
    }
  },
  "checksum": 3030712563
}
worker-pool1-vam0h3j8-eccd-cluster-dapeng:/home/eccd # ps -Ao user,uid,comm,pid,pcpu,psr | awk '{if ($5!=0.0) {print}}' | awk '{if ($6!=0) {print}}' | awk '{if ($6!=20) {print}}' | awk '{if ($6!=40) {print}}' | awk '{if ($6!=60) {print}}'
USER       UID COMMAND           PID %CPU PSR
root         0 systemd             1  0.6   1
root         0 rcu_sched           8  0.2  32
root         0 ksoftirqd/1        16  0.1   1
message+   499 dbus-daemon      1158  0.3   1
root         0 docker-containe  1694  0.1   1
26          26 python           1753  0.4   1
root         0 dockerd          1795  5.5   1
root         0 diag_coll_worke  1798  0.2   1
root         0 docker-containe  2663  0.8   1
root         0 calico-node      5594  1.8  35
root         0 node-cache       5992  0.3  22
root         0 pause            6999  0.2   4
root         0 pause            7068  0.2   3
root         0 pause            7099  0.2   8
root         0 pause            7112  0.2  17
root         0 pause            7169  0.1   5
eccd      1001 systemd          9310  0.7   1
53222    53222 java            10031  0.5   4
101        101 nginx-ingress-c 10436  4.0   4
root         0 pause           12180  0.6   5
root         0 pause           12207  0.6   9
root         0 pause           12224  0.6   7
root         0 sadc            15778  3.0   1
root         0 ps              15781  200   1
9685      9685 registry        15885  0.2  14
21414    21414 java            16250  0.6   5
47040    47040 prometheus      22122  4.5   3
root         0 kworker/1:0     24405  0.1   1
root         0 node_exporter   25094  0.7   1
root         0 alertmanager    25311  0.3  23
root         0 node-cert-expor 28684  0.1  28

From the output we could see, quite many processes are running on CPU which belong to guaranteed cpu_sets

Anything else we need to know?:

If we reboot the worker node, then the processes which running on guaranteed cpu sets will moved to defaultCpuSet.

Environment:

  • Kubernetes version (use kubectl version):
kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.4", GitCommit:"ec2760d6d916781de466541a6babb4309766c995", GitTreeState:"clean", BuildDate:"2021-02-27T17:24:15Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.4", GitCommit:"ec2760d6d916781de466541a6babb4309766c995", GitTreeState:"clean", BuildDate:"2021-02-27T17:18:03Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration:
    OpenStack
  • OS (e.g: cat /etc/os-release):
NAME="SLES"
VERSION="15-SP1"
VERSION_ID="15.1"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP1"
ID="sles"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:15:sp1"
  • Kernel (e.g. uname -a):
    Linux director-0-eccd-cluster-dapeng 4.12.14-197.83-default #1 SMP Thu Feb 11 22:01:45 UTC 2021 (547a203) x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:
@DapengJiao DapengJiao added the kind/bug Categorizes issue or PR as related to a bug. label Mar 6, 2021
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 6, 2021
@k8s-ci-robot
Copy link
Contributor

@DapengJiao: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@DapengJiao
Copy link
Author

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 6, 2021
@maxlaverse
Copy link
Contributor

Hi @DapengJiao,
What you describe is how the feature is suppose to work, isn't it ? (asking because of the bug label you added)

https://kubernetes.io/docs/tasks/administer-cluster/cpu-management-policies/

Note: System services such as the container runtime and the kubelet itself can continue to run on these exclusive CPUs. The exclusivity only extends to other pods.

The --reserved-cpus=0-1 parameter tells Kubelet to not schedule any Pod on those CPUs, but this doesn't mean processes outside Kubernetes won't run on CPU 0 and 1. It's "exclusive" in regard of other Pods, and not in regard of any other process running on Linux (at least that's how I understood it)

I suppose you actually added "CPUAffinity=0-1" in "/etc/systemd/system.conf" to achieve this, and prevent non-Kubernetes Pods to run on those cores.

I was wondering how to dedicate CPUs even more myself. Have you tried something around isolcpus ?
https://unix.stackexchange.com/questions/326579/how-to-ensure-exclusive-cpu-availability-for-a-running-process

@DapengJiao
Copy link
Author

Hi @maxlaverse

Thanks for your answer.
To be honest, I want to label as "question" instead of "bug".

I am aware of the statement(Note:) you pasted. But I was considering kubelet should not allocate exclusive CPUs from the cpuset which are already using by other process (container runtime, kubelet or other process from cloud infra layer)

For isolcpus I remember there was some discussion for that, and the conclusion is that isolcpusis not working together with --reserved-cpus. #87862

@ehashman
Copy link
Member

/kind support
/remove-kind bug

@k8s-ci-robot k8s-ci-robot added kind/support Categorizes issue or PR as a support question. and removed kind/bug Categorizes issue or PR as related to a bug. labels Mar 16, 2021
@xiaoxubeii
Copy link
Member

kubelet cpu manager cannot exclude binding cores to any other processes except other pods now. I think maybe you should try workaround first.

@cynepco3hahue
Copy link

The CPU manager does not guarantee that only the pod will run on a specific set of CPUs(via cgroup cpuset).

  1. so you have non-container processes, should be fixed via OS configuration(moving interrupts, specifying CPUAffinity for systems services...)
  2. pause containers, should be solved on the CRI level because the Kubelet does not monitor the pause container(pod wrapper) at all, I provided the functionality for CRI-O(Provide functionality to start infra containers on the specified set of CPUs cri-o/cri-o#4459), but probably the containerd should do something similar

@cynepco3hahue
Copy link

/close
Please free to open KEP or feature requests for the future CPU manager imrovement.

@k8s-ci-robot
Copy link
Contributor

@cynepco3hahue: Closing this issue.

In response to this:

/close
Please free to open KEP or feature requests for the future CPU manager imrovement.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@maxpain
Copy link

maxpain commented Jul 29, 2022

I have a pod, that has two containers.
I want one container to use 1 CPU exclusively with affinity and another container to use a part of a CPU from a shared pool.
Is it possible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

7 participants