Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--kube-reserved and --system-reserved are not working #72762

Closed
y-koseki opened this issue Jan 10, 2019 · 9 comments
Closed

--kube-reserved and --system-reserved are not working #72762

y-koseki opened this issue Jan 10, 2019 · 9 comments
Labels
area/kubelet kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@y-koseki
Copy link

What happened:

I have ran kubelet with parameter:

--kube-reserved=cpu=2,memory=2Gi,ephemeral-storage=1Gi
--system-reserved=cpu=500m,memory=1Gi,ephemeral-storage=3Gi
--eviction-hard=memory.available<500Mi,nodefs.available<10%

The capacity of k8s node VM is as follows.

Capacity:
 cpu:                16
 ephemeral-storage:  31444004Ki
 hugepages-2Mi:      0
 memory:             32780296Ki
 pods:               110
Allocatable:
 cpu:                13500m
 ephemeral-storage:  24683826743
 hugepages-2Mi:      0
 memory:             29122568Ki
 pods:               110

Problem1

Pods can use ephemeral-storage over Allocatable.
The result of curl https://${master_name}/api/v1/nodes/${node_name}/proxy/stats/summary | jq .node.fs is as follows.

{
  "time": "2019-01-09T11:46:12Z",
  "availableBytes": 4850827264,
  "capacityBytes": 32198660096,
  "usedBytes": 27347832832,
  "inodesFree": 9475345,
  "inodes": 9504864,
  "inodesUsed": 29519
}
  • Allocatable is 24683826743 byte.
  • usedBytes is 27347832832 bytes.

Problem2

Pods can use CPU over Allocatable.
The result of kubectl top pods is as follows.

NAME                CPU(cores)   MEMORY(bytes)
stress-pod-cpu1-1   990m         2Mi
stress-pod-cpu1-2   970m         2Mi
stress-pod-cpu13    12811m       257Mi
test-pd-3           0m           10Mi
  • Allocatable is 13500m.
  • The total value of CPU is 14771m.

The result of curl https://${master_name}/api/v1/nodes/${node_name}/proxy/stats/summary | jq .node.cpu is as follows.

{
  "time": "2019-01-09T11:55:45Z",
  "usageNanoCores": 14544109279,
  "usageCoreNanoSeconds": 24011030451539372
}

What you expected to happen:

I expected that Pods can NOT use ephemeral-storage and CPU over Allocatable.
It seems that --kube-reserved and --system-reserved are not working.
I have also tried to run kubelet with parameter:

--kube-reserved=cpu=2,memory=2Gi,ephemeral-storage=1Gi
--system-reserved=cpu=500m,memory=1Gi,ephemeral-storage=3Gi
--eviction-hard=memory.available<500Mi,nodefs.available<10%
--kube-reserved-cgroup=/system.slice
--system-reserved-cgroup=/system.slice
--enforce-node-allocatable=pods,system-reserved,kube-reserved

However, it did not resolve the problems.

How to reproduce it (as minimally and precisely as possible):

Run kubelet with parameter above.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:17:39Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-21T09:05:37Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/amd64"}
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
  • Kernel (e.g. uname -a):
Linux {hostname} 3.10.0-514.10.2.el7.x86_64 #1 SMP Fri Mar 3 00:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools:
  • Others:
@y-koseki y-koseki added the kind/bug Categorizes issue or PR as related to a bug. label Jan 10, 2019
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jan 10, 2019
@y-koseki
Copy link
Author

/area kubelet
/sig node

@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jan 10, 2019
@derekwaynecarr
Copy link
Member

@y-koseki the reservation appears to have been properly applied looking at the allocatable capacity reported back to the scheduler. what --cgroup-manager flag did you specify? is it possible for you to report the cgroupfs values you see under kubepods.slice for cpu and memory?

@derekwaynecarr
Copy link
Member

fyi @dashpole have you seen this? i am not aware of anyone that is enforcing-node-allocatable in production for anything other than pods but its possible we have a bug that needs more investigation.

@dashpole
Copy link
Contributor

Problem 1 looks like it is looking at the disk usage of the entire node, not the allocatable usage only. It is also worth pointing out that the kubelet only enforces allocatable for ephemeral storage through monitoring + response, so usage by pods can temporarily exceed allocatable.

@dashpole
Copy link
Contributor

dashpole commented Feb 12, 2019

Problem 2 looks like it might be a real bug. Although the problem with metrics from kubectl top is that they are 10s averages, so I am not 100% sure. I think I might have seen something like that before, but didn't dig into it. It very well could be a bug.

My first thoughts are:
Do we still set cpu.shares for the kube-reserved and system-reserved cgroups even when they are not enforced? I would think you either need to "enforce" cpu allocatable on everything (pods, kube, and system) or none of them.

Do we enforce that the kube-reserved cgroup and system-reserved cgroup have the same parent cgroup as kubepods? If not, I don't think cpu shares are correctly calculated, as cpu time is split proportionally to shares among cgroups with the same parent.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 13, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 12, 2019
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

5 participants