application crash due to k8s 1.9.x open the kernel memory accounting by default #61937

wzhx78 · 2018-03-30T11:25:08Z

when we upgrade the k8s from 1.6.4 to 1.9.0, after a few days, the product environment report the machine is hang and jvm crash in container randomly , we found the cgroup memory css id is not release, when cgroup css id is large than 65535, the machine is hang, we must restart the machine.

we had found runc/libcontainers/memory.go in k8s 1.9.0 had delete the if condition, which cause the kernel memory open by default, but we are using kernel 3.10.0-514.16.1.el7.x86_64, on this version, kernel memory limit is not stable, which leak the cgroup memory leak and application crash randomly

when we run "docker run -d --name test001 --kernel-memory 100M " , docker report
WARNING: You specified a kernel memory limit on a kernel older than 4.0. Kernel memory limits are experimental on older kernels, it won't work as expected and can cause your system to be unstable.

k8s.io/kubernetes/vendor/github.com/opencontainers/runc/libcontainer/cgroups/fs/memory.go

-		if d.config.KernelMemory != 0 {
+			// Only enable kernel memory accouting when this cgroup
+			// is created by libcontainer, otherwise we might get
+			// error when people use `cgroupsPath` to join an existed
+			// cgroup whose kernel memory is not initialized.
 			if err := EnableKernelMemoryAccounting(path); err != nil {
 				return err
 			}

I want to know why kernel memory open by default? can k8s consider the different kernel version?

Is this a BUG REPORT or FEATURE REQUEST?: BUG REPORT

Uncomment only one, leave it on its own line:

/kind bug
/kind feature

What happened:
application crash and cgroup memory leak

What you expected to happen:
application stable and cgroup memory doesn't leak

How to reproduce it (as minimally and precisely as possible):
install k8s 1.9.x on kernel 3.10.0-514.16.1.el7.x86_64 machine, and create and delete pod repeatedly, when create more than 65535/3 times , the kubelet report "cgroup no space left on device" error, when the cluster run a few days , the container will crash.

Anything else we need to know?:

Environment: kernel 3.10.0-514.16.1.el7.x86_64

Kubernetes version (use kubectl version): k8s 1.9.x
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Kernel (e.g. uname -a): 3.10.0-514.16.1.el7.x86_64
Install tools: rpm
Others:

The text was updated successfully, but these errors were encountered:

qkboy · 2018-03-30T12:16:44Z

Use below test case can reproduce this error:
first, make cgroup memory to be full:

# uname -r
3.10.0-514.10.2.el7.x86_64
# kubelet --version
Kubernetes 1.9.0
# mkdir /sys/fs/cgroup/memory/test
# for i in `seq 1 65535`;do mkdir /sys/fs/cgroup/memory/test/test-${i}; done
# cat /proc/cgroups |grep memory
memory  11      65535   1

then release 99 cgroup memory that can be used next to create:

# for i in `seq 1 100`;do rmdir /sys/fs/cgroup/memory/test/test-${i} 2>/dev/null 1>&2; done 
# mkdir /sys/fs/cgroup/memory/stress/
# for i in `seq 1 100`;do mkdir /sys/fs/cgroup/memory/test/test-${i}; done 
mkdir: cannot create directory ‘/sys/fs/cgroup/memory/test/test-100’: No space left on device <-- notice number 100 can not create
# for i in `seq 1 100`;do rmdir /sys/fs/cgroup/memory/test/test-${i}; done <-- delete 100 cgroup memory
# cat /proc/cgroups |grep memory
memory  11      65436   1

second, create a new pod on this node.
Each pod will create 3 cgroup memory directory. for example:

# ll /sys/fs/cgroup/memory/kubepods/pod0f6c3c27-3186-11e8-afd3-fa163ecf2dce/
total 0
drwxr-xr-x 2 root root 0 Mar 27 14:14 6d1af9898c7f8d58066d0edb52e4d548d5a27e3c0d138775e9a3ddfa2b16ac2b
drwxr-xr-x 2 root root 0 Mar 27 14:14 8a65cb234767a02e130c162e8d5f4a0a92e345bfef6b4b664b39e7d035c63d1

So when we recreate 100 cgroup memory directory, there will be 4 item failed:

# for i in `seq 1 100`;do mkdir /sys/fs/cgroup/memory/test/test-${i}; done    
mkdir: cannot create directory ‘/sys/fs/cgroup/memory/test/test-97’: No space left on device <-- 3 directory used by pod
mkdir: cannot create directory ‘/sys/fs/cgroup/memory/test/test-98’: No space left on device
mkdir: cannot create directory ‘/sys/fs/cgroup/memory/test/test-99’: No space left on device
mkdir: cannot create directory ‘/sys/fs/cgroup/memory/test/test-100’: No space left on device
# cat /proc/cgroups 
memory  11      65439   1

third, delete the test pod. Recreate 100 cgroup memory directory before confirm all test pod's container are already destroy.
The correct result that we expected is only number 100 cgroup memory directory can not be create:

# cat /proc/cgroups 
memory  11      65436   1
# for i in `seq 1 100`;do mkdir /sys/fs/cgroup/memory/test/test-${i}; done 
mkdir: cannot create directory ‘/sys/fs/cgroup/memory/test/test-100’: No space left on device

But the incorrect result is all cgroup memory directory created by pod are leaked:

# cat /proc/cgroups 
memory  11      65436   1 <-- now cgroup memory total directory
# for i in `seq 1 100`;do mkdir /sys/fs/cgroup/memory/test/test-${i}; done    
mkdir: cannot create directory ‘/sys/fs/cgroup/memory/test/test-97’: No space left on device
mkdir: cannot create directory ‘/sys/fs/cgroup/memory/test/test-98’: No space left on device
mkdir: cannot create directory ‘/sys/fs/cgroup/memory/test/test-99’: No space left on device
mkdir: cannot create directory ‘/sys/fs/cgroup/memory/test/test-100’: No space left on device

Notice that cgroup memory count already reduce 3 , but they occupy space not release.

wzhx78 · 2018-03-30T12:46:29Z

/sig container
/kind bug

wzhx78 · 2018-03-30T12:48:00Z

@kubernetes/sig-cluster-container-bugs

feellifexp · 2018-03-30T16:36:42Z

This bug seems to be related: opencontainers/runc#1725

Which docker version are you using?

qkboy · 2018-03-30T16:55:21Z

@feellifexp with docker 1.13.1

frol · 2018-03-30T18:07:03Z

There is indeed a kernel memory leak up to 4.0 kernel release. You can follow this link for details: moby/moby#6479 (comment)

wzhx78 · 2018-03-31T01:23:30Z

@feellifexp the kernel log also have this message after upgrade to k8s 1.9.x

kernel: SLUB: Unable to allocate memory on node -1 (gfp=0x8020)

wzhx78 · 2018-03-31T01:30:03Z

I want to know why k8s 1.9 delete this line if d.config.KernelMemory != 0 { in k8s.io/kubernetes/vendor/github.com/opencontainers/runc/libcontainer/cgroups/fs/memory.go

feellifexp · 2018-03-31T04:40:17Z

I am not an expert here, but this seems to be change from runc, and the change was introduced to k8s since v1.8.
After reading the code, it seems it impacts cgroupfs cgroup driver, while systemd driver is not changed. But I did not test the theory yet.
Maybe experts from kubelet and container can chime in further.

kevin-wangzefeng · 2018-03-31T12:40:26Z

/sig node

kevin-wangzefeng · 2018-03-31T12:59:57Z

I want to know why k8s 1.9 delete this line if d.config.KernelMemory != 0 { in
k8s.io/kubernetes/vendor/github.com/opencontainers/runc/libcontainer/cgroups/fs/memory.go

I guess opencontainers/runc#1350 is the one you are looking for, which is actually an upstream change.

/cc @hqhq

wzhx78 · 2018-03-31T14:33:13Z

thanks @kevin-wangzefeng , the runc upstream had changed, I know why now , the change is `hqhq/runc@fe898e7 , but enable kernel memory accounting on root by default , the child cgroup will enable also, this will cause cgroup memory leak on kernel 3.10.0, @hqhq , is there any way to let us enable or disable kernel memory by ourself? or get the warning log when the kernel < 4.0

hqhq · 2018-04-01T01:35:06Z

@wzhx78 The root cause is there are kernel memory limit bugs in 3.10, if you don't want to use kernel memory limit because it's not stable on your kernel, the best solution would be to disable kernel memory limit on your kernel.

I can't think of a way to workaround this on runc side without causing issues like opencontainers/runc#1083 and opencontainers/runc#1347 , unless we add some ugly logic like do different things for different kernel versions, I'm afraid that won't be an option.

wzhx78 · 2018-04-01T10:09:35Z

@hqhq it's exactly kernel 3.10's bug, but we spent more time to found it and it brought us big trouble on production environment, since we only upgrade k8s version from 1.6.x to 1.9.x. In k8x 1.6.x version , it doesn't open the kernel memory by default since runc had if condition. but after 1.9.x, runc open it by default. we don't want others who upgrade the k8s 1.9.x version had this big trouble. And runc is popular container solution, we think it need to consider different kernel version, at least, if runc can report the error message in kubelet error log when the kernel is not suitable for open kernel memory by default

wzhx78 · 2018-04-02T10:24:46Z

@hqhq any comments ?

hqhq · 2018-04-02T11:26:02Z

Maybe you can add an option like --disable-kmem-limit for both k8s and runc to make runc disable kernel memory accounting.

warmchang · 2018-04-03T12:38:24Z

v1.8 and all later versions will be affected by this.
e5a6a79#diff-17daa5db16c7d00be0fe1da12d1f9165L39

wzhx78 · 2018-04-03T14:25:04Z

@warmchang yes.

Is this reasonable to add --disable-kmem-limit flag in k8s ? anyone can discuss this with us ?

like-inspur · 2018-04-13T03:32:28Z

I don't find there is a config named disable-kmem-limit for k8s. How to add this flag? @wzhx78

wzhx78 · 2018-04-14T09:21:30Z

k8s doesn't support now, we need to discuss with community whether is reasonable to add this flag in kubelet start option

gyliu513 · 2018-04-16T03:44:08Z

Not only 1.9, but also 1.10 and master have same issue. This is a very serious issue for production, I think providing a parameter to disable kmem limit is good.

/cc @dchen1107 @thockin any comments for this? Thanks.

wzhx78 · 2018-04-23T07:05:32Z

@thockin @dchen1107 any comments for this?

gyliu513 · 2018-05-21T15:00:16Z

@dashpole any reason to update memory.go as follows in e5a6a79#diff-17daa5db16c7d00be0fe1da12d1f9165L39 , this is seriously impacting Kubernetes 1.8, 1.9, 1.10, 1.11 etc.

-		if d.config.KernelMemory != 0 {
+			// Only enable kernel memory accouting when this cgroup
+			// is created by libcontainer, otherwise we might get
+			// error when people use `cgroupsPath` to join an existed
+			// cgroup whose kernel memory is not initialized.
 			if err := EnableKernelMemoryAccounting(path); err != nil {
 				return err
 			}

gjkim42 · 2021-04-23T08:37:15Z

This issue is still reproduced in the following environment without kernel parameter cgroup.memory=nokmem

$ uname -r
3.10.0-1160.24.1.el7.x86_64
$ cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core)

xref: docker/for-linux#841

/remove-lifecycle rotten

gjkim42 · 2021-04-23T08:38:02Z

/reopen

k8s-ci-robot · 2021-04-23T08:38:12Z

@gjkim42: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2021-04-23T08:38:18Z

@wzhx78: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

TvdW · 2021-04-23T09:38:21Z

CentOS 7.6 went EOL in 2019. Since the main fix for this problem was in CentOS 7.8, please check with a more recent version.

gjkim42 · 2021-04-23T09:44:55Z

@TvdW

Also reproduced in the following environment.
After we make a pod and delete it, 3 cgroups memory directories are leaked.

$ uname -r
3.10.0-1160.11.1.el7.x86_64
$ cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)

gjkim42 · 2021-04-23T09:49:41Z

#61937 (comment)

It seems that the kernel bug which causes this error is finally fixed now, and will be released in kernel-3.10.0-1075.el7, which is due in RHEL 7.8, but goodness knows when that will be, as RHEL 7.7 only came out on August 6th, ~3 weeks ago.

https://bugzilla.redhat.com/show_bug.cgi?id=1507149#c101

There may be other bugs.

JoshuaAndrew · 2021-04-23T09:57:35Z

this problem caused by OS and Docker
(1) this problem was fixed in RHEL/CentOS 7.8
(2) Disable kmem accounting in runc on RHEL/CentOS (docker/escalation#614, docker/escalation#692) docker/engine#121 in Docker CE 18.09.1

so you should upgrade centos and docker

gjkim42 · 2021-04-23T10:01:02Z

@JoshuaAndrew
Do you mean that either of them addresses the issue? or do we need both of them?

JoshuaAndrew · 2021-04-23T10:03:05Z

@gjkim42
need both of them

gjkim42 · 2021-04-23T10:59:39Z

Actually, i am using containerd directly(not by docker).

$ containerd --version
containerd containerd.io 1.3.9 ea765aba0d05254012b0b9e595e995c09186427f
$ runc --version
runc version 1.0.0-rc10
commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
spec: 1.0.1-dev

gjkim42 · 2021-04-23T11:19:09Z

Also reproduced in the following environment.

# cat /etc/centos-release
CentOS Linux release 7.8.2003 (Core)
# uname -r
3.10.0-1160.24.1.el7.x86_64
# containerd --version
containerd containerd.io 1.3.9 ea765aba0d05254012b0b9e595e995c09186427f
# runc --version
runc version 1.0.0-rc10
commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
spec: 1.0.1-dev

gjkim42 · 2021-04-23T11:50:25Z

#61937 (comment)

It seems that the kernel bug which causes this error is finally fixed now, and will be released in kernel-3.10.0-1075.el7, which is due in RHEL 7.8, but goodness knows when that will be, as RHEL 7.7 only came out on August 6th, ~3 weeks ago.

https://bugzilla.redhat.com/show_bug.cgi?id=1507149#c101

I am not sure what they fixed at CentOS 7.8, but it didn't solve the problem.

According to docker-archive/engine@8486ea1,
I think CentOS 7 gave up solving this problem at the kernel level.

chilicat · 2021-04-23T14:50:19Z

We do not see the issue anymore since Centos 7.7. We set the kernel options: cgroup.memory=nokmem

Our current environment is as following:

# cat /etc/centos-release
CentOS Linux release 7.9.2009 (Core)


# uname -r
3.10.0-1160.21.1.el7.x86_64

# docker version
Client:
 Version:           18.09.2
 API version:       1.39
 Go version:        go1.10.6
 Git commit:        18.09.2
 Built:             Wed Mar  6 12:37:27 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.2
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       6247962
  Built:            Wed Mar  6 12:32:48 2019
  OS/Arch:          linux/amd64
  Experimental:     false



# runc --version
runc version 1.0.0-rc6+dev

# containerd --version
containerd github.com/containerd/containerd 1.2.2 9754871865f7fe2f4e74d43e2fc7ccd237edcbce

gjkim42 · 2021-04-23T15:24:43Z

Thanks @chilicat
I also confirmed that setting the kernel parameter can resolve the issue.

HOWEVER, I am wondering if it is safe to set kernel parameter cgroup.memory=nokmem or if there is any other way than to set the kernel parameter.

wmealing · 2021-04-24T00:33:42Z

I misread your post @gjkim42 , sorry please ignore.

gjkim42 · 2021-05-20T06:42:39Z

cc @ehashman @bobbypage @dims

Does sig-node aware of this issue?
I think every cluster hosted by CentOS 7 has had this issue.

ehashman · 2021-05-20T18:18:04Z

CentOS 7 is a much older kernel than what we test CI on in SIG Node/upstream Kubernetes (currently the 5.4.x series). People are welcome to experiment with kernel parameters and share workarounds for their own distributions/deployments but any support will be best effort.

kolyshkin · 2021-05-20T21:26:02Z

I strongly suggest employing a workaround described at #61937 (comment)

Also, since runc v1.0.0-rc94 runc never sets kernel memory (so upgrading to runc >= v1.0.0-rc94 should solve the problem).

ffromani · 2021-06-24T09:44:10Z

Kubernetes does not use issues on this repo for support requests. If you have a question on how to use Kubernetes or to debug a specific issue, please visit our forums.

/remove-kind bug
/kind support
/close

Extra rationale: this issue affects Centos 7, which is indeed much older than what we test in CI, and because workaround exists (see runc v1.0.0-rc94):

k8s-ci-robot · 2021-06-24T09:44:18Z

@fromanirh: Closing this issue.

In response to this:

Kubernetes does not use issues on this repo for support requests. If you have a question on how to use Kubernetes or to debug a specific issue, please visit our forums.

/remove-kind bug
/kind support
/close

Extra rationale: this issue affects Centos 7, which is indeed much older than what we test in CI, and because workaround exists (see runc v1.0.0-rc94):

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

andrewzrant · 2024-03-27T14:44:26Z

Thanks @chilicat I also confirmed that setting the kernel parameter can resolve the issue.

HOWEVER, I am wondering if it is safe to set kernel parameter cgroup.memory=nokmem or if there is any other way than to set the kernel parameter.

yes, i also want to know cgroup.memory=nokmem will cause bad results? and cgroup.kmem desgins

k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Mar 30, 2018

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Mar 30, 2018

qkboy mentioned this issue Mar 30, 2018

Kernel not freeing memory cgroup causing no space left on device moby/moby#29638

Open

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 31, 2018

hqhq mentioned this issue Apr 11, 2018

Enabling kmem accounting can break applications on CentOS7 opencontainers/runc#1725

Closed

andrewrynhard mentioned this issue May 4, 2018

fix: hierarchical accounting and reclaim siderolabs/talos#63

Closed

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 23, 2021

k8s-ci-robot reopened this Apr 23, 2021

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 23, 2021

k8s-ci-robot added kind/support Categorizes issue or PR as a support question. and removed kind/bug Categorizes issue or PR as related to a bug. labels Jun 24, 2021

k8s-ci-robot closed this as completed Jun 24, 2021

pixiake mentioned this issue Aug 23, 2021

Add the requirement for Linux Kernel Version kubesphere/kubekey#633

Merged

pacoxu mentioned this issue Oct 9, 2022

The update Pod was having trouble allocating memory #112940

Closed

application crash due to k8s 1.9.x open the kernel memory accounting by default #61937

application crash due to k8s 1.9.x open the kernel memory accounting by default #61937

Comments

wzhx78 commented Mar 30, 2018

qkboy commented Mar 30, 2018 • edited Loading

wzhx78 commented Mar 30, 2018

wzhx78 commented Mar 30, 2018

feellifexp commented Mar 30, 2018

qkboy commented Mar 30, 2018

frol commented Mar 30, 2018

wzhx78 commented Mar 31, 2018

wzhx78 commented Mar 31, 2018 • edited Loading

feellifexp commented Mar 31, 2018

kevin-wangzefeng commented Mar 31, 2018

kevin-wangzefeng commented Mar 31, 2018

wzhx78 commented Mar 31, 2018

hqhq commented Apr 1, 2018 • edited Loading

wzhx78 commented Apr 1, 2018

wzhx78 commented Apr 2, 2018

hqhq commented Apr 2, 2018

warmchang commented Apr 3, 2018

wzhx78 commented Apr 3, 2018 • edited Loading

like-inspur commented Apr 13, 2018

wzhx78 commented Apr 14, 2018

gyliu513 commented Apr 16, 2018

wzhx78 commented Apr 23, 2018

gyliu513 commented May 21, 2018 • edited Loading

gjkim42 commented Apr 23, 2021

gjkim42 commented Apr 23, 2021

k8s-ci-robot commented Apr 23, 2021

k8s-ci-robot commented Apr 23, 2021

TvdW commented Apr 23, 2021

gjkim42 commented Apr 23, 2021

gjkim42 commented Apr 23, 2021 • edited Loading

JoshuaAndrew commented Apr 23, 2021

gjkim42 commented Apr 23, 2021

JoshuaAndrew commented Apr 23, 2021

gjkim42 commented Apr 23, 2021 • edited Loading

gjkim42 commented Apr 23, 2021

gjkim42 commented Apr 23, 2021

chilicat commented Apr 23, 2021

gjkim42 commented Apr 23, 2021

wmealing commented Apr 24, 2021 • edited Loading

gjkim42 commented May 20, 2021

ehashman commented May 20, 2021

kolyshkin commented May 20, 2021

ffromani commented Jun 24, 2021

k8s-ci-robot commented Jun 24, 2021

andrewzrant commented Mar 27, 2024 • edited Loading

qkboy commented Mar 30, 2018 •

edited

Loading

wzhx78 commented Mar 31, 2018 •

edited

Loading

hqhq commented Apr 1, 2018 •

edited

Loading

wzhx78 commented Apr 3, 2018 •

edited

Loading

gyliu513 commented May 21, 2018 •

edited

Loading

gjkim42 commented Apr 23, 2021 •

edited

Loading

gjkim42 commented Apr 23, 2021 •

edited

Loading

wmealing commented Apr 24, 2021 •

edited

Loading

andrewzrant commented Mar 27, 2024 •

edited

Loading