New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

growing reported docker container virtual memory size with java processes #15020

Open
Andrei-Pozolotin opened this Issue Jul 27, 2015 · 45 comments

Comments

Projects
None yet
@Andrei-Pozolotin

Andrei-Pozolotin commented Jul 27, 2015

  1. in various multiple different containers running java, top/htop tools report VIRT (virtual) memory
    as order of magnitude greater then the physical memory of RES (resident) process memory

  2. processes seem to run fine for several days (did not test longer), while VIRT keeps going up and RES stays fixed (as configured by java startup parameters setting process memory limits)

  3. each GC (garbage collection) request to the JVM increases reported VIRT value

  4. java 8 GC type is "G1"

  5. same applications with same o/s, same settings, etc - do no exhibit such behavior when run on the host

example: total physical ram 16G, java process resident ram 1.6G, reported virtual size 80G and growing

top/htop report:

image

docker stats report: for the same procesess
image

uname -a
Linux wks002 3.13.0-58-generic #97-Ubuntu SMP Wed Jul 8 02:56:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
java -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
docker info
Containers: 3
Images: 248
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 254
 Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.13.0-58-generic
Operating System: Ubuntu 14.04.2 LTS
CPUs: 8
Total Memory: 15.62 GiB
Name: wks002
ID: 65Y6:OAEC:KGES:X3NV:LMEH:CL5X:6XLV:WKZ3:52CB:I6AJ:J4Y4:HLM3
WARNING: No swap limit support
docker --version
Docker version 1.7.1, build 786b29d

series of images:
https://registry.hub.docker.com/u/carrotgarden/zinc/dockerfile/
https://registry.hub.docker.com/u/carrotgarden/java-service/dockerfile/
https://registry.hub.docker.com/u/carrotgarden/java-build/dockerfile/
https://registry.hub.docker.com/u/carrotgarden/java-8/dockerfile/
https://registry.hub.docker.com/u/carrotgarden/ubuntu/dockerfile/

typical container invocation command:

# container
NAME=$(basename $DIR)

# docker image
IMAGE="carrotgarden/$NAME"

docker run \
    --rm \
    --net=host \
    --pid=host \
    --env HOME \
    --env DISPLAY \
    --env TERM=xterm \
    --env NO_AT_BRIDGE=1 \
    --user $USER \
    --device=/dev/dri \
    --device=/dev/snd \
    --device=/dev/video0 \
    --volume=/etc/group:/etc/group:ro \
    --volume=/etc/passwd:/etc/passwd:ro \
    --volume=/home:/home \
    --volume=/tmp:/tmp \
    --name=$NAME \
    $IMAGE &

typical java execution parameters:
https://registry.hub.docker.com/u/carrotgarden/zinc/dockerfile/

ENV ZINC_COMMAND=" \
    java \
    -server \
    -XX:+UseG1GC \
    -XX:+DoEscapeAnalysis \
    -XX:+UseCompressedOops \
    -XX:+UseCompressedClassPointers \
    -XX:+HeapDumpOnOutOfMemoryError \
    -XX:InitialHeapSize=$JAVA_HEAP \
    -XX:MaxHeapSize=$JAVA_HEAP \
    -XX:ThreadStackSize=$JAVA_STACK \
    -XX:MetaspaceSize=$JAVA_META \
    -XX:MaxMetaspaceSize=$JAVA_META \
    -XX:InitialCodeCacheSize=$JAVA_CODE \
    -XX:ReservedCodeCacheSize=$JAVA_CODE \
    -Dzinc.home=$ZINC_FOLDER \
    -classpath $ZINC_FOLDER/lib/*:. \
    com.typesafe.zinc.Nailgun \
    $ZINC_PORT $ZINC_TIMEOUT \
"
@joerg84

This comment has been minimized.

joerg84 commented Aug 15, 2015

I could reproduce something similar by starting a lot of simple container
Virtual mem goes up to around 30GB, and doesn't go down even after 'docker kill' and 'docker rm'.
Same behavior with aufs.

Script
seq 1 1000 | xargs -iX -n1 -P 1000 docker run --net=host -d thomasr/sleep sleep 10000

ubuntu@ip-172-31-14-77:~$ sudo docker info
Containers: 0
Images: 5
Storage Driver: overlay
Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.19.0-25-generic
Operating System: Ubuntu 15.04
CPUs: 1
Total Memory: 3.66 GiB
Name: ip-172-31-14-77
ID: MQJQ:VLAC:BMZ7:ULF4:3WSF:U2TL:ENEN:HHZ6:GN3E:JCO5:DIAR:XIR4
WARNING: No swap limit support
@dudehook

This comment has been minimized.

dudehook commented Sep 9, 2015

Seeing something similar. I'm running a simple Scala (JVM) application which loads a lot of data into and out of memory. I set the JVM to 8G heap (-Xmx8G). I have a machine with 132G memory, and it can't handle more than 7-8 containers because they grow well past the 8G limit I imposed on the JVM.

docker stat shows that each container itself is using much more memory than the JVM is supposed to be using. For instance:

CONTAINER CPU % MEM USAGE/LIMIT MEM % NET I/O
dave-1 3.55% 10.61 GB/135.3 GB 7.85% 7.132 MB/959.9 MB
perf-1 3.63% 16.51 GB/135.3 GB 12.21% 30.71 MB/5.115 GB

It almost seems that the JVM is asking the OS for memory, which is allocated within the container, and the JVM is freeing memory as its GC runs, but the container doesn't release the memory back to the main OS. So... memory leak.

@kixiro

This comment has been minimized.

kixiro commented Oct 22, 2015

htop_t

# docker --version
Docker version 1.6.2, build 7c8fca2
# docker info
Containers: 13
Images: 164
Storage Driver: devicemapper
 Pool Name: docker-254:1-5554455-pool
 Pool Blocksize: 65.54 kB
 Backing Filesystem: extfs
 Data file: /dev/loop0
 Metadata file: /dev/loop1
 Data Space Used: 4.44 GB
 Data Space Total: 107.4 GB
 Data Space Available: 17.7 GB
 Metadata Space Used: 9.343 MB
 Metadata Space Total: 2.147 GB
 Metadata Space Available: 2.138 GB
 Udev Sync Supported: true
 Data loop file: /var/lib/docker/devicemapper/devicemapper/data
 Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata
 Library Version: 1.02.90 (2014-09-01)
Execution Driver: native-0.2
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
CPUs: 8
Total Memory: 7.817 GiB
Name: stress
ID: QGL6:M4OM:F4GF:LPTC:X5Q2:GHUO:76TV:HDOS:IZZO:BFSW:47Z5:36DH
# htop --version
htop 1.0.3 - (C) 2004-2012 Hisham Muhammad
Released under the GNU GPL.
# free
             total       used       free     shared    buffers     cached
Mem:       8197204    7527808     669396       3588     363656    2642860
-/+ buffers/cache:    4521292    3675912
Swap:      4290556      18676    4271880
@stevenschlansker

This comment has been minimized.

stevenschlansker commented Oct 23, 2015

Us too. What ran in 1.7.1 just fine will eventually trigger a container OOM situation in 1.8.3

Client:
 Version:      1.8.3
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   f4bf5c7
 Built:        Mon Oct 12 05:37:18 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      1.8.3
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   f4bf5c7
 Built:        Mon Oct 12 05:37:18 UTC 2015
 OS/Arch:      linux/amd64

Containers: 20
Images: 253
Storage Driver: overlay
 Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.0.9
Operating System: Ubuntu 14.04.2 LTS
CPUs: 8
Total Memory: 29.45 GiB
Name: mesos-slave5-prod-euwest1
ID: AFFX:JFYM:FH3D:FJYN:RQLH:KBZB:PPIC:BYYL:JC44:ENYJ:VPPS:73JK
@lnowak

This comment has been minimized.

lnowak commented Jan 29, 2016

I am seeing the same problem. I have moved from 1.6.2 to 1.9.1 and observing. Hope it helps.

Containers: 2
Images: 151
Server Version: 1.9.1
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 155
Dirperm1 Supported: true
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.16.0-4-amd64
Operating System: Debian GNU/Linux 8 (jessie)
CPUs: 1
Total Memory: 999 MiB
Name: ip-10-0-1-10
ID: EOJA:5WDB:TANH:M5R2:WPSM:66SA:C7XP:2GOX:2LWZ:7GQE:QM6O:NX4L
WARNING: No memory limit support
WARNING: No swap limit support

@mjaverto

This comment has been minimized.

mjaverto commented Jan 29, 2016

I can also confirm this on Docker version 1.9.1, build a34a1d5/1.9.1

Using JVMTop inside the container shows the reasonable / expected heap size for us, yet just plain top shows almost 3x as much:

Image of JVMtop and top

Definitely appears to be a problem with longer running containers.

Containers: 7
Images: 45
Server Version: 1.9.1
Storage Driver: devicemapper
Pool Name: docker-docker--pool
Pool Blocksize: 524.3 kB
Base Device Size: 107.4 GB
Backing Filesystem: xfs
Data file:
Metadata file:
Data Space Used: 2.423 GB
Data Space Total: 9.437 GB
Data Space Available: 7.014 GB
Metadata Space Used: 1.225 MB
Metadata Space Total: 25.17 MB
Metadata Space Available: 23.94 MB
Udev Sync Supported: true
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.93-RHEL7 (2015-01-28)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.1.13-19.31.amzn1.x86_64
Operating System: Amazon Linux AMI 2015.09
CPUs: 2
Total Memory: 7.8 GiB

@leafknode

This comment has been minimized.

leafknode commented Feb 10, 2016

Would be nice to get an official response on this; I'm experiencing the same issues as all of you. Docker 1.10 just came out, I will give that whirl to see if that addresses anything.

@dgaviola

This comment has been minimized.

dgaviola commented Feb 10, 2016

We are having the same issue with Docker 1.6.2 and JDK 1.8.0_72.

@thaJeztah thaJeztah added the kind/bug label Feb 13, 2016

@khuongquixey

This comment has been minimized.

khuongquixey commented Mar 9, 2016

+1 Would love to see this addressed.

We're seeing this as well. Our fix is resorting to daily reboots for our containers (Docker 1.10).

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented Mar 12, 2016

@unclejack any ideas?

@rafamonteiro

This comment has been minimized.

rafamonteiro commented Mar 31, 2016

@thaJeztah and @unclejack
I can confirm having a similar problem. If the container isn't restarted after a certain period, it end up consuming all the machine memory.
The application stack consist of :

  • Amazon EC2 instance c3.4xlarge
  • Oracle JVM 1.7.0_80 x64 - with XMS and XMX set to 20GB using the G1GC
  • CoreOS stable (899.13.0)
  • Docker 1.9.1
  • Solr running inside the container version 4.10.2

The following is the TOP output with the container running for 24 hours approx.
screen shot 2016-03-31 at 12 16 50 am
screen shot 2016-03-31 at 12 17 03 am

And this is after using JMX to force a FULL GC in the JVM
screen shot 2016-03-31 at 12 17 41 am
screen shot 2016-03-31 at 12 17 47 am

And here are the output of the JVM GC Logs
solr_gc.log.0.txt

@leafknode

This comment has been minimized.

leafknode commented Apr 19, 2016

Really annoying that this issue has been around for so long without any official reply or acknowledgement from the devs.

@cpuguy83

This comment has been minimized.

Contributor

cpuguy83 commented Apr 19, 2016

It would be interesting to see the output of pmap
Also a quick search on java virt mem usage brings me to tweaking MALLOC_ARENA_MAX, seems to be an issue with glibc >= 2.1

This is not likely to be a Docker issue but rather a misconfiguration, perhaps even something set up in "normal" environments (like in an init script) that is missing in the container.

@Jimexist

This comment has been minimized.

Jimexist commented May 6, 2016

We hit the same issue using java:jre-alpine while setting

# omitted
java \
    -server \
    -Xms128m \
    -Xmx4096m \
    -XX:+UnlockExperimentalVMOptions \
    -XX:+UseG1GC \
# omitted

and the gc details are:
gc.txt

edit:
issues are the same with java:latest

edit2:
export MALLOC_ARENA_MAX=4 seems to solve the issue for now

@cfontes

This comment has been minimized.

cfontes commented May 24, 2016

Experiencing the same issue, but on production.

Machines with 32G of ram are reporting 98% usage but my 12 containers should be using only 16Gb from the JVM allocation.

Any updates on this?

@allen13

This comment has been minimized.

allen13 commented Jun 9, 2016

I tried running cassandra for hawkular metrics collection on a huge 128GB baremetal using kubernetes. I spent quite a bit of time tweaking the JVM to no avail. Cassandra would quickly consume a lot of memory then crash, sometimes within hours. I am also running Jenkins as well. It doesn't get any use but the jvm still goes memory crazy and eats everything until it crashes. It really looks like java is having problems across the board when run in a container environment.

Docker 1.9.1
Java 1.8
Centos 7
Kubernetes 1.3

@jmkgreen

This comment has been minimized.

jmkgreen commented Jun 10, 2016

Curious: top shows this memory ballooning, csysdig does not...

@gmcoringa

This comment has been minimized.

gmcoringa commented Jun 10, 2016

As @Jimexist said, using MALLOC_ARENA_MAX=4 solve the problem.

More details about it: https://siddhesh.in/posts/malloc-per-thread-arenas-in-glibc.html

@allen13

This comment has been minimized.

allen13 commented Jun 13, 2016

The MALLOC_ARENA_MAX=4 seems to be working for me. Is there anyway we can get this fixed at a higher level so future developers don't fall into the same rabbit hole?

@jmkgreen

This comment has been minimized.

jmkgreen commented Jun 14, 2016

I suggest this is at least documented. Coding something to avoid the issue may fox anyone not expecting this instruction to be provided.

@aba182

This comment has been minimized.

aba182 commented Jun 28, 2016

Why is 4 the suggested MALLOC_ARENA_MAX value? The default for 64 bit machines is 8 * #cores. For machines I am working against they would have a default MALLOC_ARENA_MAX of 384(assuming that by cores it is counting total threads. Even if not it would be 192 arena's). 4 is a very drastic leap from either of these numbers and on reading it sounds that more pools allows for improved memory allocation for thread heavy processes.

Has anyone trying other values yet? I will be attempting on some machines on my end but curious if others have already experimented.

@johanhaleby

This comment has been minimized.

johanhaleby commented Jun 30, 2016

Do you specify the MALLOC_ARENA_MAX environment variable in the container or at the docket host? If so do you need to restart the containers/servers afterwards for the change to take effect?

I've tried setting it in the container without any luck.

@sunnysinghgithub

This comment has been minimized.

sunnysinghgithub commented Jul 21, 2016

We are also facing a similar issue. I have set the max heap size to 512M and then docker container memory limit to 512-200 = 312M, just to give a buffer. But, even then, the container crashes when the docker memory hits the limit after some time. I also looked at the jvm metrics and found that the heap used never grew beyond 120M. So, in this case, when looking at the docket stats, docker mem started from 312M and then kept growing until it hit the max 512M.

@andjarnic

This comment has been minimized.

andjarnic commented Sep 8, 2016

Could use a reply that explains how to use the MALLOC_ARENA_MAX=4 option. I just tried it, with using .bashrc on my docker host, exporting it, confirmed it is exported, and it still did not work for me. I start a JVM in docker that just sits idle and it crashes. I have a 1G Digital Ocean Droplet, and set the JVM to 768M max, I know its running up against memory, but when started it is only using 168MB of memory. I came back a couple hours later and it was crashed due to OOM. No requests made to my service, etc.

So any help on exactly how/where we put the export MALLOC.. thing would be appreciated. I figured the hose docker container is where it would need to be. That doesnt seem to work for me. I am using Docker 1.10.3. Ubuntu 16.04 (and 14.04), Java 8.

The other issue is why cant the memory be reclaimed? I have to reboot the docker host to get anything to work again. Is it a memory leak in Docker?

@jmkgreen

This comment has been minimized.

jmkgreen commented Sep 8, 2016

@andjarnic you need to set the environment variable as the container launches, either as a -e parameter to docker run, or an environment entry in a compose file.

@andjarnic

This comment has been minimized.

andjarnic commented Sep 8, 2016

So just to be clear, I set up a systemd configuration to start docker on boot, the start looks like:

ExecStart=/usr/bin/docker daemon -H fd://

Do I add the -e option here? Or is it something I add only to the docker run call to the Java docker container?

@jmkgreen

This comment has been minimized.

jmkgreen commented Sep 8, 2016

@andjarnic environment variables has nothing to do with the daemon. Perhaps following some of the many tutorials would be of benefit. After launching experimental containers, exec-in to them and see what exists using env.

@andjarnic

This comment has been minimized.

andjarnic commented Sep 8, 2016

@jmkgreen What I was asking is if the docker host needs the env variable exposed for the docker-machine to use, or if the individual docker container, when started, needs it. I thought it would just be the docker host env setting.

Needless to say, I tried both approaches, and it doesnt work for me. Still getting OOM crashes. This is really too bad, I had high hopes for using Docker in production. With my docker images crashing in minutes even sitting idle running out of memory, I wonder how anyone has been able to use Java with Docker. I am now trying to see if I can roll back to an earlier version of docker... thinking 1.6.0 or something may not have this issue. Hopefully the devs know about it and fix it soon.

@cpuguy83

This comment has been minimized.

Contributor

cpuguy83 commented Sep 9, 2016

@andjarnic This is almost certainly an environmental issue and not related to a particular version of Docker.

@cpuguy83

This comment has been minimized.

Contributor

cpuguy83 commented Sep 9, 2016

That said, I've seen Java (in particular) do some seriously strange things when the backing FS is AUFS.

@jmkgreen

This comment has been minimized.

jmkgreen commented Sep 19, 2016

@andjarnic verify the environment variable is set within the running container. I've just run docker exec -it my_container bash and within that environment run env and can see:

...
MALLOC_ARENA_MAX=4
...

If I docker inspect my_container I can see "Memory": 524288000 in the output which matches my mem_limit: 500m instruction within my docker-compose.yaml file for this container.

Separately, and again documented elsewhere, my JVM launch includes memory management configuration:

root@9eead1132c4f:/# ps ax
  PID TTY      STAT   TIME COMMAND
    1 ?        Ssl   35:00 java -Xms128m -Xmx256m -jar my-program.jar
 3014 ?        Ss     0:00 bash
 3021 ?        R+     0:00 ps ax

If you get this far and concur that all are reporting as you expect, you may wish to attach a JVM monitor to watch what your application is actually doing, a procedure that far exceeds the remit of this GitHub issue. Hope this helps!

@Guilhermeslucas

This comment has been minimized.

Guilhermeslucas commented Jan 3, 2017

I'm having the same issue running Jenkins on a Docker Container. Someone know how to fix it? Thanks!

@cpuguy83

This comment has been minimized.

Contributor

cpuguy83 commented Jan 4, 2017

These articles may be interesting to some:

http://trustmeiamadeveloper.com/2016/03/18/where-is-my-memory-java/
http://matthewkwilliams.com/index.php/2016/03/17/docker-cgroups-memory-constraints-and-java-cautionary-tale/

I don't think they relate to the original issue, but others reporting various other "java's using more than what I expect" type issues.

@Guilhermeslucas

This comment has been minimized.

Guilhermeslucas commented Jan 6, 2017

I did what this articles mentioned, but the JVM still breaking my container. What are the arguments you are setting the JVM and the Docker Container?

Best Regards,

@megastef

This comment has been minimized.

megastef commented Jan 13, 2017

@Brismadsen

This comment has been minimized.

Brismadsen commented Feb 20, 2017

Did anybody succeed in finding a solution for this? (My memory is gone soo fast...)

@vimal3271

This comment has been minimized.

vimal3271 commented Mar 10, 2017

Any solution so far?

@osigida

This comment has been minimized.

osigida commented Mar 10, 2017

@vimal3271, it is a JVM issue, so it should be fixed there.
You can try Java 9 and test if it's working fine: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/5f1d1df0ea49

@jeff303

This comment has been minimized.

jeff303 commented Mar 29, 2017

@osigida , I believe that is a different issue. What you have linked to is the fact that the default heap size is calculated incorrectly due to inability to handle cgroups. If one sets max heap explicitly, that issue is no longer of any relevance. What's being discussed here is a slow increase in resident set size of the Java process, which is happening outside of the heap.

@osigida

This comment has been minimized.

osigida commented Mar 29, 2017

@jeff303 I'm well informed what we are discussing here, and observe the problem almost every day ;-).
However, it is not very clear from the PR, if you are right or not. Need to look how phys_mem is used, but i still hope jvm developers know what they are doing.

@loretoparisi

This comment has been minimized.

loretoparisi commented Apr 21, 2017

@Jimexist Did you set the MALLOC_ARENA_MAX=4 within the Dockerfile like:

ENV MALLOC_ARENA_MAX=4

I'm getting this issue on latest Docker and Ubuntu 16.04 LTS.

@kopax

This comment has been minimized.

kopax commented Apr 28, 2017

I have the same issue with jenkins official docker image on latest Docker and Ubuntu 16.04 LTS.

Does someone know how I can fix this ?

TASK_FAILED (Docker container run error: Container exited on error: exited with status 137
@turesheim

This comment has been minimized.

turesheim commented Dec 27, 2017

I was troubleshooting an issue we have with too much virtual memory being claimed for a process on our 40core/node HPC cluster and used Docker to do various experiments. It appears that Docker somehow does not respect MALLOC_ARENA_MAX and will regardless allow the amount of virtual memory to grow to a number correlating to the number of CPUs being allocated to it. Typically 1GB per two cores, which matches what I see in this article: Linux glibc >= 2.10 (RHEL 6) malloc may show excessive virtual memory usage. The Docker container is based on CentOS 7.

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented Dec 28, 2017

@turesheim docker itself does nothing with that environment variable (or memory management of the processes inside the container); it sets up namespaces and cgroups for the process, which is all part of the kernel.

You mention the container uses Centos 7; what's is running on he host? (what distro and kernel?)

@turesheim

This comment has been minimized.

turesheim commented Dec 28, 2017

The host is running macOS 10.13.2. I did some more tests on both Docker and the cluster, and I now believe that Docker, nor the HPC OS (CentOS 6 + some amendments) is to blame, but rather the JVM. I found some possible evidence that this has been remedied in Java 9, or possibly a late version of Java 8, but will have to do further investigations to make sure. Thank you for the swift response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment