New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
potential memory leak in docker 1.12.6 #257
Comments
We should have this fix in docker-1.12.6-32.git88a4867.el7 which is going to be shipped in RHEL7.3.6 in a couple of weeks, and will show up in Centos Soon after. |
Would I be able to build now against that commit and test it? |
Looks like the new version is out for Centos... I have installed it and will see what it looks like over the next couple of weeks... |
Well, my initial reaction was that the new release looked good... Unfortunately, I am now seeing the docker daemon randomly crashing in all environments with nothing logged. Sometimes docker will start right back up with a systemctl restart docker and sometime it takes several iterations for it to restart. This is happening in both openstack and non-openstack environments... Here is an example systemctl status docker:
and an example of trying to do a build on a non-openshift environment where dockerd died in the middle of the build:
I saw something similar with the previous version (12.6.28) and had to move back to 12.6.16... I'm not sure how to further debug this issue... |
I do see this regarding oci-register-machine:
|
I reverted to 12.6.16 which completes the build without issue |
Ping @rhatdan about register machine |
I'm not seeing that particular message on my other servers though... May be a red herring... |
So here are the log entries after a clean install and start of the latest Centos docker image. Within three seconds the docker process terminates. This seems to occur right after NetworkManager does some stuff. Is it possible that NetworkManager is causing dockerd to fail?:
|
Very likely actually :/ |
Well, disabling NetworkManager didn't seem to help:
|
Mmm seems likely a container being terminated tearing down the whole Docker daemon? |
This is a fresh install with /var/lib/docker removed. So there shouldn't be any containers defined or running.
This does not occur at all with 1.12.16...
…Sent from my iPhone
On Jul 7, 2017, at 5:38 PM, Antonio Murdaca ***@***.***> wrote:
Mmm seems likely a container being terminated tearing down the whole Docker daemon?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I'm going to open a separate issue for this current issue... |
After testing with the new version of the CentOS RPMs for docker 1.12.6.32 I do not believe the memory leak issue has been resolved. dockerd is still growing without bound requiring a restart of the docker service... For example,
|
|
|
what it looks like after a restart:
|
Is there any additional info we can provide to help solve this problem? I'm seeing the exact same behavior as @fortinj66.
Looking at
|
This is the goroutine list:
|
Seems like an IO issue, I will look into this. |
@runcom Thanks! I'm not sure if this is a red herring, but I noticed that the nessus scans we're required to run causes a lot of
Debug log during exec failure:
|
It could be very likely an IO leak from your logs, I'm looking into this right now. |
I am seeing the same: [root@os-node-d04 log]# journalctl -u docker | grep -c 'attach: stdout: begin' |
After further testing, it looks like the streams are closed properly when running:
but the streams are not closed properly when running:
|
are you seeing an uptick in the docker goroutines when you run without the -i (the bad method)? |
Ah yep, I didn't catch that before. 4 goroutines are being created with or without EDIT 2: I can't reproduce the issue with upstream 17.05. I'll test upstream 1.12.6 now. If the issue happens there, I'll do a git bisect to find out which commit fixed the problem. |
@fortinj66 @runcom I found the commit that fixed the leak in upstream docker after doing a git bisect. It looks like this is the fix: Issue: moby#30311 Could we get that backported to 1.12.6? |
Yeah thanks for spotting that, I'll try to backport here and comment here once done. |
That is great news! |
I just built new RPMs, pushed them out to one of our dev kubernetes clusters, and things are looking good! I will let it run overnight and see how it goes.
|
Ref: #257 Signed-off-by: Dmitry Shyshkin <dmitry@shyshkin.org.ua> Signed-off-by: Antonio Murdaca <runcom@redhat.com>
Cherry-picked upstream fix to docker-1.12.6, docker-1.13.1, docker-1.13.1-rhel - will keep this open until anyone can confirm it really fixes this issue :) Thanks a lot guys /cc @jwhonce |
I'll give this a build tomorrow morning and give it a try... Actually, I built them tonight and deployed to my dev Openshift 1.5 cluster. So far looks good but I'll know better tomorrow. |
Greate work everyone on finding this. |
I can happily report that docker memory usage so far looks very good. Where normally I would see dockerd at 4-5 Gig, it is now steady at just under 200 Meg. Huge difference! The goroutines count is under 200 where before it was over 150,000. I believe that this issue has been resolved !! I have pushed this to a busier set of servers and will monitor during the day. @agunnerson-ibm Awesone job tracking this down!! |
Thanks! I can also confirm that it's fixed on our cluster! Only approximately ~50 goroutines on each node in an idle kubernetes cluster after the nessus scans run. It used to go above 200,000. Memory usage has been below 150MB too. |
I'll keep this open until the official Centos7 RPMs are available and test again with those. --John |
Agreed. We pushed it out to all of our kubernetes clusters and haven't run into this issue again. |
Awesome, let's keep this open until we build rpms |
@runcom if i've checked the correct branch and commits, this fix is now available in docker-1.12.6-61.git85d7426.el7.centos.x86_64.rpm has anyone used this on 7.3 yet? |
@gogeof this is a random issue and I believe has nothing to do with the current issue. You should open a different issue. |
@rhatdan yeah, ok I will open a different issue if the problem should again in new docker version. Thanks! |
Hi Everyone, I am facing this issue "dockerd leaks ExecIds on failed exec -i ". Whenever I use wrong command with docker exec -ti < wrong command> the ExecIDs leakage happens. I am using docker version 1.12.6 and I am able to reproduce this issue. Steps to reproduce: OS - CoreOS v: 1632.3.0 $ docker info |
We are not updating docker-1.12 any longer, does this issue still exist in docker-1.13? |
I understand that but the previous comments suggests that this issue is fixed in docker 1.12.6. |
Moving this issue from moby#33472 as they are indicating that 1.12.6 is no longer supported and there are significant changes between the vanilla moby and projectatomic
Let me step back and describe the initial issue...
Attached is the heap infomation from earlier today:
heap.txt
BUG REPORT INFORMATION
Use the commands below to provide key information from your environment:
You do NOT have to include this information if this is a FEATURE REQUEST
-->
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.):
OS: CentOS 7
Openshift Origin 3.4
Steps to reproduce the issue:
Describe the results you received:
dockerd memory usage increased with time leading to 1-2 docker restarts per week. Shutting down containers does not release the memory
Describe the results you expected:
dockerd memory usage should not always increase over time...
Additional information you deem important (e.g. issue happens only occasionally):
The text was updated successfully, but these errors were encountered: