New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot allocate memory when starting a container #18295

Closed
sroze opened this Issue Nov 29, 2015 · 12 comments

Comments

Projects
None yet
@sroze
Copy link

sroze commented Nov 29, 2015

I'm experiencing a strange issue with Fedora 21 and Docker with overlayfs. After a given amount of time (few days), I can't start any new container because I have the following error:

# docker run -d --name=test-nginx nginx
Error response from daemon: chown /var/lib/docker/overlay/ef0ae8ce05c39e4a5831df86fe55a27eaa4bae44014d091953a088e9be0c4523-init/merged/dev: cannot allocate memory

When I check the memory usage it looks like first of all there's still available SWAP and that the available memory is far enough:

# free -m
              total        used        free      shared  buff/cache   available
Mem:          16038        2351         123          75       13563       12876
Swap:          4093          28        4065

I actually have 61 containers running, but based on the used value of free it looks like there's something wrong to me. What do you think about that?

Thank you very much.


docker version:

Client:
 Version:      1.9.0
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   76d6bc9
 Built:        Tue Nov  3 18:04:38 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      1.9.0
 API version:  1.21
 Go version:   go1.4.2
 Git commit:   76d6bc9
 Built:        Tue Nov  3 18:04:38 UTC 2015
 OS/Arch:      linux/amd64

docker info:

Containers: 539
Images: 890
Server Version: 1.9.0
Storage Driver: overlay
 Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.1.12-101.fc21.x86_64
Operating System: Fedora 21 (Twenty One)
CPUs: 8
Total Memory: 15.66 GiB
Name: ns368695.ip-94-23-36.eu
ID: SMYU:RAJG:DY2N:MTWV:2JEG:VJ7I:Y7Y5:5UFO:S7B2:M5ME:34RW:6CKF

uname -a:

Linux ns368695.ip-94-23-36.eu 4.1.12-101.fc21.x86_64 #1 SMP Wed Oct 28 15:18:44 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
@sroze

This comment has been minimized.

Copy link
Author

sroze commented Nov 29, 2015

FYI, rebooting the server "fixes" the problem, just restarting docker didn't. As this server is part of a Kubernetes cluster, the containers were restarted successfully and there's now a lot of free memory. Looks like there's some memory leak somewhere. Happy to help with any information!

@ndelitski

This comment has been minimized.

Copy link

ndelitski commented Dec 4, 2015

have same issues

# docker version
Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.5.1
 Git commit:   a34a1d5
 Built:        Fri Nov 20 13:39:26 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.5.1
 Git commit:   a34a1d5
 Built:        Fri Nov 20 13:39:26 UTC 2015
 OS/Arch:      linux/amd64
# docker info
Containers: 22
Images: 394
Server Version: 1.9.1
Storage Driver: overlay
 Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.2.6-301.fc23.x86_64
Operating System: Fedora 23 (Cloud Edition)
CPUs: 2
Total Memory: 7.796 GiB
Name: ip-172-30-1-191.eu-west-1.compute.internal
ID: GBHO:6Y7O:DHMB:3UNW:XJ4E:6DS6:2IXR:SOXH:JVRJ:TTEN:6FAE:R7XZ
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
# free -m
              total        used        free      shared  buff/cache   available
Mem:           7983        2049         345           0        5588        5754
Swap:             0           0           0
# uname -a
Linux ip-172-30-1-191.eu-west-1.compute.internal 4.2.6-301.fc23.x86_64 #1 SMP Fri Nov 20 22:22:41 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
@sroze

This comment has been minimized.

Copy link
Author

sroze commented Jan 4, 2016

Same again with Docker 1.9.1:
docker version:

Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.5.1
 Git commit:   a34a1d5
 Built:        Fri Nov 20 13:39:26 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.5.1
 Git commit:   a34a1d5
 Built:        Fri Nov 20 13:39:26 UTC 2015
 OS/Arch:      linux/amd64
@cpuguy83

This comment has been minimized.

Copy link
Contributor

cpuguy83 commented Jan 4, 2016

If restarting docker didn't fix the issue, then it's not likely docker to be having the memory issue.
Can you check top when you run into this issue?

@chengchengmu

This comment has been minimized.

Copy link

chengchengmu commented Mar 17, 2016

Hello,

We got the following error while Kubernetes tries to start a Docker container 👍
Failed to start with docker id 843f7989c3e0 with error: API error (500): Cannot start container 843f7989c3e0f08c208998b90c1e8fe5e71b8690e6f38062ddc5d4a7858a40f8: [8] System error: fork/exec /usr/bin/docker: cannot allocate memory

It happened on a VM with OpenShift/Kubernetes with a uptime of 2 weeks.

Here are some details :

docker version
Client version: 1.7.1
Client API version: 1.19
Package Version (client): docker-1.7.1-115.el7.x86_64
Go version (client): go1.4.2
Git commit (client): 446ad9b/1.7.1
OS/Arch (client): linux/amd64
Server version: 1.7.1
Server API version: 1.19
Package Version (server): docker-1.7.1-115.el7.x86_64
Go version (server): go1.4.2
Git commit (server): 446ad9b/1.7.1
OS/Arch (server): linux/amd64
docker info
Containers: 51
Images: 193
Storage Driver: overlay
 Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.10.0-229.el7.x86_64
Operating System: Red Hat Enterprise Linux Server 7.1 (Maipo)
CPUs: 4
Total Memory: 7.64 GiB
Name: ose3-int-node1.figaro.amadeus.net
ID: E2UB:X3D3:2Q5X:XMJ6:RF6P:TLQ3:V6P4:CVA2:GKU4:KRCS:NOYE:KN7M
uname -a
Linux ose3-int-node1.figaro.amadeus.net 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.1 (Maipo)

top shows 5GB of available memory

Unfortunately this stack has been deleted. Have to wait for a moment to reproduce the issue.

It seems that happened also to more recent version of Docker @sroze @ndelitski
Similar issues tracked here : #8539

How to investigate the leak ?

@ajmenteng

This comment has been minimized.

Copy link

ajmenteng commented May 20, 2016

I've had the same issue and got it solved after removing all untagged images

@xidianwlc

This comment has been minimized.

Copy link

xidianwlc commented Jun 1, 2016

I have the same error in my product environment
[root@cps-7-104 install]# docker version
Client:
Version: 1.10.3
API version: 1.22
Go version: go1.5.3
Git commit: 20f81dd
Built: Thu Mar 10 21:49:11 2016
OS/Arch: linux/amd64

Server:
Version: 1.10.3
API version: 1.22
Go version: go1.5.3
Git commit: 20f81dd
Built: Thu Mar 10 21:49:11 2016
OS/Arch: linux/amd64

[root@cps-7-104 install]# docker info
Containers: 18
Running: 18
Paused: 0
Stopped: 0
Images: 4
Server Version: 1.10.3
Storage Driver: overlay
Backing Filesystem: xfs
Execution Driver: native-0.2
Logging Driver: json-file
Plugins:
Volume: local
Network: null host bridge
Kernel Version: 3.10.0-229.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 32
Total Memory: 125.7 GiB
Name: cps-7-104
ID: OH62:4S6G:5YN5:3VFF:IANR:SS6B:RDV7:BKLV:I5AZ:GX2H:IRNH:ASIK
WARNING: bridge-nf-call-iptables is disabled
WARNING: bridge-nf-call-ip6tables is disabled
Labels:
group=cps
env=online

[root@cps-7-104 install]# uname -r
3.10.0-229.el7.x86_64

[root@cps-7-104 install]# cat /etc/system-release
CentOS Linux release 7.1.1503 (Core)

error info in /var/log/messages :
Jun 1 10:02:37 cps-7-104 kernel: docker: page allocation failure: order:4, mode:0x1040d0
Jun 1 10:02:37 cps-7-104 kernel: CPU: 9 PID: 17261 Comm: docker Tainted: G -------------- T 3.10.0-229.el7.x86_64 #1
Jun 1 10:02:37 cps-7-104 kernel: Hardware name: Dell Inc. PowerEdge R620/046R5M, BIOS 2.2.2 01/16/2014
Jun 1 10:02:37 cps-7-104 kernel: 00000000001040d0 000000007180c6a3 ffff881f4961b8e0 ffffffff81604b0a
Jun 1 10:02:37 cps-7-104 kernel: ffff881f4961b970 ffffffff8115c5d0 0000000000000000 ffff88203ffd9000
Jun 1 10:02:37 cps-7-104 kernel: 0000000000000004 00000000001040d0 ffff881f4961b970 000000007180c6a3
Jun 1 10:02:37 cps-7-104 kernel: Call Trace:
Jun 1 10:02:37 cps-7-104 kernel: [] dump_stack+0x19/0x1b
Jun 1 10:02:37 cps-7-104 kernel: [] warn_alloc_failed+0x110/0x180
Jun 1 10:02:37 cps-7-104 kernel: [] __alloc_pages_nodemask+0x9a8/0xb90
Jun 1 10:02:37 cps-7-104 kernel: [] alloc_pages_current+0xa9/0x170
Jun 1 10:02:37 cps-7-104 kernel: [] __get_free_pages+0xe/0x50
Jun 1 10:02:37 cps-7-104 kernel: [] kmalloc_order_trace+0x2e/0xa0
Jun 1 10:02:37 cps-7-104 kernel: [] ? ovl_copy_xattr+0x6a/0x170 [overlay]
Jun 1 10:02:37 cps-7-104 kernel: [] ovl_copy_xattr+0x8b/0x170 [overlay]
Jun 1 10:02:37 cps-7-104 kernel: [] ? ovl_create_real+0x10a/0x250 [overlay]
Jun 1 10:02:37 cps-7-104 kernel: [] ovl_copy_up_one+0x393/0x810 [overlay]
Jun 1 10:02:37 cps-7-104 kernel: [] ovl_copy_up+0xec/0x120 [overlay]
Jun 1 10:02:37 cps-7-104 kernel: [] ovl_create_or_link+0x76/0x510 [overlay]
Jun 1 10:02:37 cps-7-104 kernel: [] ovl_create_object+0x3d/0x60 [overlay]
Jun 1 10:02:37 cps-7-104 kernel: [] ovl_mkdir+0x23/0x30 [overlay]
Jun 1 10:02:37 cps-7-104 kernel: [] vfs_mkdir+0xb7/0x160
Jun 1 10:02:37 cps-7-104 kernel: [] SyS_mkdirat+0x6f/0xe0
Jun 1 10:02:37 cps-7-104 kernel: [] system_call_fastpath+0x16/0x1b

@Ikelo

This comment has been minimized.

Copy link

Ikelo commented Jun 16, 2017

Try below command,
sudo sysctl -w vm.max_map_count=26214

@kolyshkin

This comment has been minimized.

Copy link
Contributor

kolyshkin commented Dec 13, 2017

This is a kernel "bug" which was fixed by the following commit:

commit e4ad29fa0d224d05e08b2858e65f112fd8edd4fe
Author: Vito Caputo <vito.caputo@coreos.com>
Date:   Sat Oct 24 07:19:46 2015 -0500

    ovl: use a minimal buffer in ovl_copy_xattr

This fix made its way to 4.5 upstream kernel; let me figure out when it was fixed in RHEL

@kolyshkin

This comment has been minimized.

Copy link
Contributor

kolyshkin commented Dec 13, 2017

Base kernel for RHEL 7.3 (kernel-3.10.0-514.el7) already includes the abovementioned patch. It is not included in base RHEL 7.2 kernel (kernel-3.10.0-327.el7), but is there in the latest 7.2 update (kernel-3.10.0-327.36.3.el7).

So, whoever is experiencing this, and seeing ovl_copy_xattr in kernel call trace caused by an allocation failure, need to update their kernel to upstream 4.5+, or RHEL's 7.3+, or latest update of RHEL 7.2. I'm not sure about other distros.

I believe the bug can be closed.

@thaJeztah

This comment has been minimized.

Copy link
Member

thaJeztah commented Dec 28, 2017

closing, per the above comment

@thaJeztah thaJeztah closed this Dec 28, 2017

@ysjjovo

This comment has been minimized.

Copy link

ysjjovo commented Sep 11, 2018

still got this problem with kernel 3.10.0-693.11.6.el7.x86_64,docker version 17.03.2-ce.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment