cadvisor prevents docker from removing monitored containers? #771

Open
cornelius-keller opened this Issue Jun 12, 2015 · 97 comments

Comments

Projects
None yet
@cornelius-keller

Hi all, I have a problem using cadvisor on centos 7. When cadvisor is running, docker failes to remove other containers saying that the containers filesystem is busy. After stopping cadvisor is stopped container removal is working again.

I demostrated that in this gist: https://gist.github.com/cornelius-keller/0fd2d23b68ccd88c9328

I also included os version and docker info in the gist.

@rjnagal

This comment has been minimized.

Show comment
Hide comment
@rjnagal

rjnagal Jun 12, 2015

Collaborator

Thanks for reporting, @cornelius-keller

what cadvisor version are you running? Can you get host:port/validate for cadvisor?
Is this a temporary situation, or does the container fs stays busy till you delete cadvisor?

Collaborator

rjnagal commented Jun 12, 2015

Thanks for reporting, @cornelius-keller

what cadvisor version are you running? Can you get host:port/validate for cadvisor?
Is this a temporary situation, or does the container fs stays busy till you delete cadvisor?

@cornelius-keller

This comment has been minimized.

Show comment
Hide comment
@cornelius-keller

cornelius-keller Jun 12, 2015

@rjnagal
Cadvisor version is:

[root@583274-app35 ~]# docker images
REPOSITORY                                      TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
docker.io/google/cadvisor                       latest              399ae3c46a0e        47 hours ago        19.89 MB
[root@583274-app35 ~]# 

This is a permanent situation. The container fs stays busy untill I delete cadvisor.

What do you mean by getting host:port/validate for cadvisor? Cadvisor was still running and responsive on the web ui if that is what you mean. Unfortunately I can't give you any public host port to validate as cadvisor is only exposed via a vpn.

@rjnagal
Cadvisor version is:

[root@583274-app35 ~]# docker images
REPOSITORY                                      TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
docker.io/google/cadvisor                       latest              399ae3c46a0e        47 hours ago        19.89 MB
[root@583274-app35 ~]# 

This is a permanent situation. The container fs stays busy untill I delete cadvisor.

What do you mean by getting host:port/validate for cadvisor? Cadvisor was still running and responsive on the web ui if that is what you mean. Unfortunately I can't give you any public host port to validate as cadvisor is only exposed via a vpn.

@rjnagal

This comment has been minimized.

Show comment
Hide comment
@rjnagal

rjnagal Jun 12, 2015

Collaborator

Yeah, I just need the ouput from /validate endpoint on cadvisor UI. You can
scrub any data that's private in there. Thanks

On Fri, Jun 12, 2015 at 9:54 AM, Cornelius Keller notifications@github.com
wrote:

@rjnagal https://github.com/rjnagal
Cadvisor version is:

[root@583274-app35 ~]# docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZEdocker.io/google/cadvisor latest 399ae3c46a0e 47 hours ago 19.89 MB
[root@583274-app35 ~]#

This is a permanent situation. The container fs stays busy untill I delete
cadvisor.

What do you mean by getting host:port/validate for cadvisor? Cadvisor was
still running and responsive on the web ui if that is what you mean.
Unfortunately I can't give you any public host port to validate as cadvisor
is only exposed via a vpn.


Reply to this email directly or view it on GitHub
#771 (comment).

Collaborator

rjnagal commented Jun 12, 2015

Yeah, I just need the ouput from /validate endpoint on cadvisor UI. You can
scrub any data that's private in there. Thanks

On Fri, Jun 12, 2015 at 9:54 AM, Cornelius Keller notifications@github.com
wrote:

@rjnagal https://github.com/rjnagal
Cadvisor version is:

[root@583274-app35 ~]# docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZEdocker.io/google/cadvisor latest 399ae3c46a0e 47 hours ago 19.89 MB
[root@583274-app35 ~]#

This is a permanent situation. The container fs stays busy untill I delete
cadvisor.

What do you mean by getting host:port/validate for cadvisor? Cadvisor was
still running and responsive on the web ui if that is what you mean.
Unfortunately I can't give you any public host port to validate as cadvisor
is only exposed via a vpn.


Reply to this email directly or view it on GitHub
#771 (comment).

@cornelius-keller

This comment has been minimized.

Show comment
Hide comment
@cornelius-keller

cornelius-keller Jun 12, 2015

Sorry was a long day, did not get that this was an endpoint. I added the output to the gist.

Sorry was a long day, did not get that this was an endpoint. I added the output to the gist.

@gianlucaborello

This comment has been minimized.

Show comment
Hide comment
@gianlucaborello

gianlucaborello Jun 23, 2015

I am facing this same issue. Essentially, running cadvisor with --volume=/:/rootfs:ro causes other containers' devicemapper mounts to be mounted inside the cadvisor container, so they can't be properly destroyed when issuing docker rm on the target container as they will appear in use.

How can this be solved?

I am facing this same issue. Essentially, running cadvisor with --volume=/:/rootfs:ro causes other containers' devicemapper mounts to be mounted inside the cadvisor container, so they can't be properly destroyed when issuing docker rm on the target container as they will appear in use.

How can this be solved?

@hoeghh

This comment has been minimized.

Show comment
Hide comment
@hoeghh

hoeghh Jul 10, 2015

When i run it on Fedora 21, it works fine. But when i run it on Ubuntu 14.04.2 LTS I get the same error as described above.

Error response from daemon: Cannot destroy container xxx_jenkinsMaster_1230: Driver aufs failed to remove root filesystem 13b421d0458e740e42e5fa5ac1cb68f32638f0bc723d9ba16718955214d79b7d: rename /var/lib/docker/aufs/mnt/13b421d0458e740e42e5fa5ac1cb68f32638f0bc723d9ba16718955214d79b7d /var/lib/docker/aufs/mnt/13b421d0458e740e42e5fa5ac1cb68f32638f0bc723d9ba16718955214d79b7d-removing: device or resource busy

The main difference is, that Ubuntu uses AUFS, where Fedora uses Devicemapper. Maby thats the problem.

hoeghh commented Jul 10, 2015

When i run it on Fedora 21, it works fine. But when i run it on Ubuntu 14.04.2 LTS I get the same error as described above.

Error response from daemon: Cannot destroy container xxx_jenkinsMaster_1230: Driver aufs failed to remove root filesystem 13b421d0458e740e42e5fa5ac1cb68f32638f0bc723d9ba16718955214d79b7d: rename /var/lib/docker/aufs/mnt/13b421d0458e740e42e5fa5ac1cb68f32638f0bc723d9ba16718955214d79b7d /var/lib/docker/aufs/mnt/13b421d0458e740e42e5fa5ac1cb68f32638f0bc723d9ba16718955214d79b7d-removing: device or resource busy

The main difference is, that Ubuntu uses AUFS, where Fedora uses Devicemapper. Maby thats the problem.

@shredder12

This comment has been minimized.

Show comment
Hide comment
@shredder12

shredder12 Aug 28, 2015

@rjnagal I can confirm that this issue happens on Ubuntu trusty x64 with Doceker 1.8.1, cadvisor:latest and devicemapper.

'1cb6051b30a1' being the container ID.

# grep -l 1cb6051b30a1 /proc/*/mountinfo
/proc/1963/mountinfo
# ps aux | grep -i 1963
root      1963  1.9  0.8 588740 71688 ?        Ssl  Aug26  30:08 /usr/bin/cadvisor
root     14767  0.0  0.0  11744   952 pts/0    S+   00:56   0:00 grep --color=auto -i 1963

Please suggest a workaround for this.

@rjnagal I can confirm that this issue happens on Ubuntu trusty x64 with Doceker 1.8.1, cadvisor:latest and devicemapper.

'1cb6051b30a1' being the container ID.

# grep -l 1cb6051b30a1 /proc/*/mountinfo
/proc/1963/mountinfo
# ps aux | grep -i 1963
root      1963  1.9  0.8 588740 71688 ?        Ssl  Aug26  30:08 /usr/bin/cadvisor
root     14767  0.0  0.0  11744   952 pts/0    S+   00:56   0:00 grep --color=auto -i 1963

Please suggest a workaround for this.

@difro

This comment has been minimized.

Show comment
Hide comment
@difro

difro Aug 28, 2015

Contributor

same here with CentOS + Docker 1.8.1(devicemapper)

Had to remove --volume=/:/rootfs:ro && --volume=/var/lib/docker:/var/lib/docker:ro

Contributor

difro commented Aug 28, 2015

same here with CentOS + Docker 1.8.1(devicemapper)

Had to remove --volume=/:/rootfs:ro && --volume=/var/lib/docker:/var/lib/docker:ro

@vishh

This comment has been minimized.

Show comment
Hide comment
@vishh

vishh Aug 28, 2015

Contributor

@rjnagal: Excepting disk usage calculation, cAdvisor does not poke at any
of these directories right?

On Fri, Aug 28, 2015 at 12:26 AM, Jihoon Chung notifications@github.com
wrote:

same here with CentOS + Docker 1.8.1(devicemapper)

Had to remove --volume=/:/rootfs:ro &&
--volume=/var/lib/docker:/var/lib/docker:ro


Reply to this email directly or view it on GitHub
#771 (comment).

Contributor

vishh commented Aug 28, 2015

@rjnagal: Excepting disk usage calculation, cAdvisor does not poke at any
of these directories right?

On Fri, Aug 28, 2015 at 12:26 AM, Jihoon Chung notifications@github.com
wrote:

same here with CentOS + Docker 1.8.1(devicemapper)

Had to remove --volume=/:/rootfs:ro &&
--volume=/var/lib/docker:/var/lib/docker:ro


Reply to this email directly or view it on GitHub
#771 (comment).

@hourliert

This comment has been minimized.

Show comment
Hide comment
@hourliert

hourliert Oct 5, 2015

Same problem here with Ubuntu 14.04.3.

@difro solution works but cadvisor can't provide docker stats anymore.

Any workaround?

Same problem here with Ubuntu 14.04.3.

@difro solution works but cadvisor can't provide docker stats anymore.

Any workaround?

@rmetzler

This comment has been minimized.

Show comment
Hide comment
@rmetzler

rmetzler Oct 5, 2015

The last time I ran into this problem, I digged a little bit into the cAdvisor source code. I'm not 100% sure - because it was a few weeks ago - but this is essentially the gist:

If you use cAdvisor like it is shown in README.md you'll mount /var/lib/docker as a volume into the container. This will create dead containers.

The reason, cAdvisor wants you to mount /var/lib/docker is - as far as I could see - only to display a certain info that is only interesting for admins and should be known before hand.

rmetzler commented Oct 5, 2015

The last time I ran into this problem, I digged a little bit into the cAdvisor source code. I'm not 100% sure - because it was a few weeks ago - but this is essentially the gist:

If you use cAdvisor like it is shown in README.md you'll mount /var/lib/docker as a volume into the container. This will create dead containers.

The reason, cAdvisor wants you to mount /var/lib/docker is - as far as I could see - only to display a certain info that is only interesting for admins and should be known before hand.

@jimmidyson

This comment has been minimized.

Show comment
Hide comment
@jimmidyson

jimmidyson Oct 5, 2015

Collaborator

We should be able to get all info from a docker inspect rather than parsing the container config file. Seems like mounting /var/lib/docker is causing more trouble than it's worth.

Collaborator

jimmidyson commented Oct 5, 2015

We should be able to get all info from a docker inspect rather than parsing the container config file. Seems like mounting /var/lib/docker is causing more trouble than it's worth.

@svenmueller

This comment has been minimized.

Show comment
Hide comment
@svenmueller

svenmueller Oct 22, 2015

we also encounter the same problem (cadvisor:lastest, ubuntu 14.04)

we also encounter the same problem (cadvisor:lastest, ubuntu 14.04)

@svenmueller

This comment has been minimized.

Show comment
Hide comment
@svenmueller

svenmueller Jan 26, 2016

any updates regarding this?

any updates regarding this?

@vishh

This comment has been minimized.

Show comment
Hide comment
@vishh

vishh Jan 26, 2016

Contributor

The best we can do for now is to let users optionally disable filesystem
usage metrics. We are waiting for some of the new upstream kernel features
to simplify disk accounting.

On Tue, Jan 26, 2016 at 2:51 PM, Sven Müller notifications@github.com
wrote:

any updates regarding this?


Reply to this email directly or view it on GitHub
#771 (comment).

Contributor

vishh commented Jan 26, 2016

The best we can do for now is to let users optionally disable filesystem
usage metrics. We are waiting for some of the new upstream kernel features
to simplify disk accounting.

On Tue, Jan 26, 2016 at 2:51 PM, Sven Müller notifications@github.com
wrote:

any updates regarding this?


Reply to this email directly or view it on GitHub
#771 (comment).

@tuxknight

This comment has been minimized.

Show comment
Hide comment
@tuxknight

tuxknight Feb 1, 2016

Same situation.
My Docker Version is 1.9.1
Cadvisor version 0.18.0

And when docker rm container fails, the status of that container change to "dead" .
Is it possible to umount that specific mountpoint when container status changed to "exit" or "dead" ?

Same situation.
My Docker Version is 1.9.1
Cadvisor version 0.18.0

And when docker rm container fails, the status of that container change to "dead" .
Is it possible to umount that specific mountpoint when container status changed to "exit" or "dead" ?

@arhea

This comment has been minimized.

Show comment
Hide comment

arhea commented Feb 3, 2016

+1

@vishh

This comment has been minimized.

Show comment
Hide comment
@vishh

vishh Feb 3, 2016

Contributor

cAdvisor doesn't mount anything. It runs du periodically to collect
filesystem stats. Other than that, it does not touch the container's
filesystem at all.
The easy fix for this would be to retry docker deletion or disable
filesystem aggregation in cadvisor.

On Wed, Feb 3, 2016 at 2:57 PM, Alex Rhea notifications@github.com wrote:

+1


Reply to this email directly or view it on GitHub
#771 (comment).

Contributor

vishh commented Feb 3, 2016

cAdvisor doesn't mount anything. It runs du periodically to collect
filesystem stats. Other than that, it does not touch the container's
filesystem at all.
The easy fix for this would be to retry docker deletion or disable
filesystem aggregation in cadvisor.

On Wed, Feb 3, 2016 at 2:57 PM, Alex Rhea notifications@github.com wrote:

+1


Reply to this email directly or view it on GitHub
#771 (comment).

@tonysickpony

This comment has been minimized.

Show comment
Hide comment
@tonysickpony

tonysickpony Feb 11, 2016

running cAdvisor without --volume=/:/rootfs:ro seems to fix it.
As pointed out in https://github.com/google/cadvisor/blob/master/docs/running.md
I haven't fully tested it yet, but works fine up to now

running cAdvisor without --volume=/:/rootfs:ro seems to fix it.
As pointed out in https://github.com/google/cadvisor/blob/master/docs/running.md
I haven't fully tested it yet, but works fine up to now

@xbglowx

This comment has been minimized.

Show comment
Hide comment
@xbglowx

xbglowx Feb 11, 2016

I had to remove the following volume mounts:

  • /:/rootfs:ro
  • /var/lib/docker/:/var/lib/docker:ro

Setup:

  • Ubuntu 14.04.3 LTS
  • docker 1.9.1 with aufs
  • cAdvisor 0.20.5

xbglowx commented Feb 11, 2016

I had to remove the following volume mounts:

  • /:/rootfs:ro
  • /var/lib/docker/:/var/lib/docker:ro

Setup:

  • Ubuntu 14.04.3 LTS
  • docker 1.9.1 with aufs
  • cAdvisor 0.20.5
@xbglowx

This comment has been minimized.

Show comment
Hide comment
@xbglowx

xbglowx Apr 14, 2016

Upgraded docker to 1.10.3 and now cAdvisor can only see the docker images, but no containers, if I only use volume mounts:

  • /var/run:/var/run:rw
  • /sys:/sys:ro
  • /var/lib/docker/:/var/lib/docker:ro

If I add /:/rootfs:ro, cAdvisor can see the containers, but I get device or resource busy, when trying to remove any container.

xbglowx commented Apr 14, 2016

Upgraded docker to 1.10.3 and now cAdvisor can only see the docker images, but no containers, if I only use volume mounts:

  • /var/run:/var/run:rw
  • /sys:/sys:ro
  • /var/lib/docker/:/var/lib/docker:ro

If I add /:/rootfs:ro, cAdvisor can see the containers, but I get device or resource busy, when trying to remove any container.

@vishh

This comment has been minimized.

Show comment
Hide comment
@vishh

vishh Apr 14, 2016

Contributor

@xbglowx Are you using the latest cadvisor release?

Contributor

vishh commented Apr 14, 2016

@xbglowx Are you using the latest cadvisor release?

@xbglowx

This comment has been minimized.

Show comment
Hide comment
@xbglowx

xbglowx Apr 15, 2016

Using google/cadvisor:v0.22.0

xbglowx commented Apr 15, 2016

Using google/cadvisor:v0.22.0

@jordic jordic referenced this issue in moby/moby Apr 16, 2016

Closed

docker restart fails #22042

@jordic

This comment has been minimized.

Show comment
Hide comment
@jordic

jordic Apr 16, 2016

Any ideas or suggestions how can i dig inside the issue?

jordic commented Apr 16, 2016

Any ideas or suggestions how can i dig inside the issue?

@vishh

This comment has been minimized.

Show comment
Hide comment
@vishh

vishh Apr 27, 2016

Contributor
Contributor

vishh commented Apr 27, 2016

@timstclair

This comment has been minimized.

Show comment
Hide comment
@timstclair

timstclair Apr 28, 2016

Contributor

I was able to reproduce this locally with docker v1.9.1 and cAdvisor 0.22.0, but only right after starting cAdvisor and only once (removing a second container works). I could not reproduce with docker v1.11.

Is this consistent with everyone else's experience?

Contributor

timstclair commented Apr 28, 2016

I was able to reproduce this locally with docker v1.9.1 and cAdvisor 0.22.0, but only right after starting cAdvisor and only once (removing a second container works). I could not reproduce with docker v1.11.

Is this consistent with everyone else's experience?

@jordic

This comment has been minimized.

Show comment
Hide comment
@jordic

jordic Apr 28, 2016

With docker 1.11.1 the is issue is gone. With the latest fixes from docker part, seems working now.

jordic commented Apr 28, 2016

With docker 1.11.1 the is issue is gone. With the latest fixes from docker part, seems working now.

@ashkop

This comment has been minimized.

Show comment
Hide comment
@ashkop

ashkop May 4, 2016

I'm still able to reproduce this with docker 1.11.1 and cAdvisor 0.23.0. Ubuntu 14.04.

ashkop commented May 4, 2016

I'm still able to reproduce this with docker 1.11.1 and cAdvisor 0.23.0. Ubuntu 14.04.

@vishh

This comment has been minimized.

Show comment
Hide comment
@vishh

vishh May 4, 2016

Contributor

@ashkop Can you try running cAdvisor with --disable_metrics="tcp,disk" and see if that resolves the issue? Note that you will not get docker container filesystem metrics by adding this flag.

Contributor

vishh commented May 4, 2016

@ashkop Can you try running cAdvisor with --disable_metrics="tcp,disk" and see if that resolves the issue? Note that you will not get docker container filesystem metrics by adding this flag.

@xbglowx

This comment has been minimized.

Show comment
Hide comment
@xbglowx

xbglowx May 4, 2016

If I try using --disable_metrics="tcp,disk" I get the following:

sudo docker run -ti -v /var/lib/docker/:/var/lib/docker:ro -v /var/run:/var/run:rw -v /sys:/sys:ro -v /:/rootfs:ro google/cadvisor --disable_metrics="tcp,disk"
panic: assignment to entry in nil map

goroutine 1 [running]:
panic(0xb0c8c0, 0xc8201c0440)
    /usr/local/go/src/runtime/panic.go:481 +0x3e6
main.(*metricSetValue).Set(0x15ac528, 0x7ffe3cea1f59, 0x8, 0x0, 0x0)
    /go/src/github.com/google/cadvisor/cadvisor.go:85 +0x1da
flag.(*FlagSet).parseOne(0xc82004e060, 0xc82005e901, 0x0, 0x0)
    /usr/local/go/src/flag/flag.go:881 +0xdd9
flag.(*FlagSet).Parse(0xc82004e060, 0xc82000a100, 0x2, 0x2, 0x0, 0x0)
    /usr/local/go/src/flag/flag.go:900 +0x6e
flag.Parse()
    /usr/local/go/src/flag/flag.go:928 +0x6f
main.main()
    /go/src/github.com/google/cadvisor/cadvisor.go:99 +0x68

This is with cAdvisor version 0.23.0 (750f18e). Works fine with 0.22.0.

I still need to see if using --disable_metrics="tcp,disk" fixes the problem.

xbglowx commented May 4, 2016

If I try using --disable_metrics="tcp,disk" I get the following:

sudo docker run -ti -v /var/lib/docker/:/var/lib/docker:ro -v /var/run:/var/run:rw -v /sys:/sys:ro -v /:/rootfs:ro google/cadvisor --disable_metrics="tcp,disk"
panic: assignment to entry in nil map

goroutine 1 [running]:
panic(0xb0c8c0, 0xc8201c0440)
    /usr/local/go/src/runtime/panic.go:481 +0x3e6
main.(*metricSetValue).Set(0x15ac528, 0x7ffe3cea1f59, 0x8, 0x0, 0x0)
    /go/src/github.com/google/cadvisor/cadvisor.go:85 +0x1da
flag.(*FlagSet).parseOne(0xc82004e060, 0xc82005e901, 0x0, 0x0)
    /usr/local/go/src/flag/flag.go:881 +0xdd9
flag.(*FlagSet).Parse(0xc82004e060, 0xc82000a100, 0x2, 0x2, 0x0, 0x0)
    /usr/local/go/src/flag/flag.go:900 +0x6e
flag.Parse()
    /usr/local/go/src/flag/flag.go:928 +0x6f
main.main()
    /go/src/github.com/google/cadvisor/cadvisor.go:99 +0x68

This is with cAdvisor version 0.23.0 (750f18e). Works fine with 0.22.0.

I still need to see if using --disable_metrics="tcp,disk" fixes the problem.

@timstclair

This comment has been minimized.

Show comment
Hide comment
@timstclair

timstclair May 4, 2016

Contributor

Yeah, that was fixed in #1259, but it's not integrated into any release.

Contributor

timstclair commented May 4, 2016

Yeah, that was fixed in #1259, but it's not integrated into any release.

@ashkop

This comment has been minimized.

Show comment
Hide comment
@ashkop

ashkop May 5, 2016

@vishh Unfortunately the flag didn't help. As @xbglowx mentioned, this option causes 0.23.0 to crash, so I tried 0.22.0 and canary. Both still prevent me from removing containers. Here's the error message I get:

Error response from daemon: Unable to remove filesystem for 9e96817fba0a443f75d1426b6d7a586f4bc84217b06eb021f6d28bae4f341473: remove /var/lib/docker/containers/9e96817fba0a443f75d1426b6d7a586f4bc84217b06eb021f6d28bae4f341473/shm: device or resource busy

ashkop commented May 5, 2016

@vishh Unfortunately the flag didn't help. As @xbglowx mentioned, this option causes 0.23.0 to crash, so I tried 0.22.0 and canary. Both still prevent me from removing containers. Here's the error message I get:

Error response from daemon: Unable to remove filesystem for 9e96817fba0a443f75d1426b6d7a586f4bc84217b06eb021f6d28bae4f341473: remove /var/lib/docker/containers/9e96817fba0a443f75d1426b6d7a586f4bc84217b06eb021f6d28bae4f341473/shm: device or resource busy

@infiniteproject

This comment has been minimized.

Show comment
Hide comment
@infiniteproject

infiniteproject May 5, 2016

Same here on Debian 8, Docker 1.11.1 and latest cAdvisor.

Same here on Debian 8, Docker 1.11.1 and latest cAdvisor.

@vishh

This comment has been minimized.

Show comment
Hide comment
@vishh

vishh May 5, 2016

Contributor

@timstclair Can we make a v0.23.1 release with the fix for --disable_metrics flag?

Contributor

vishh commented May 5, 2016

@timstclair Can we make a v0.23.1 release with the fix for --disable_metrics flag?

@moortimis

This comment has been minimized.

Show comment
Hide comment
@moortimis

moortimis May 10, 2016

I am experiencing the same issue with the following versions

"cAdvisor version: 0.23.0-750f18e"
google/cadvisor latest 5cda8139955b 8 days ago 48.92 MB

CentOS Linux release 7.2.1511 (Core)
Docker version 1.11.1, build 5604cbe

Work around was to remove /var/lib/docker from the shared volume.

I am experiencing the same issue with the following versions

"cAdvisor version: 0.23.0-750f18e"
google/cadvisor latest 5cda8139955b 8 days ago 48.92 MB

CentOS Linux release 7.2.1511 (Core)
Docker version 1.11.1, build 5604cbe

Work around was to remove /var/lib/docker from the shared volume.

@rjnagal

This comment has been minimized.

Show comment
Hide comment
@rjnagal

rjnagal May 10, 2016

Collaborator

@vishh Is this fixed if we just stopped tracking disk metrics for these machines? Are there other dependencies?

Collaborator

rjnagal commented May 10, 2016

@vishh Is this fixed if we just stopped tracking disk metrics for these machines? Are there other dependencies?

@soumyadipDe

This comment has been minimized.

Show comment
Hide comment
@soumyadipDe

soumyadipDe Nov 3, 2016

@rhuddleston tried using dm.use_deferred_removal=true and dm.use_deferred_deletion=true. Still Resource busy error is throwing but containers are getting removed which were being in Dead state earlier. Is that the same with you?

@rhuddleston tried using dm.use_deferred_removal=true and dm.use_deferred_deletion=true. Still Resource busy error is throwing but containers are getting removed which were being in Dead state earlier. Is that the same with you?

@EvertMDC

This comment has been minimized.

Show comment
Hide comment
@EvertMDC

EvertMDC Nov 3, 2016

I noticed something different the other day. My lvm volume couldn't be removed until I stopped the cadvisor container. This must be related.

EvertMDC commented Nov 3, 2016

I noticed something different the other day. My lvm volume couldn't be removed until I stopped the cadvisor container. This must be related.

@psychok7

This comment has been minimized.

Show comment
Hide comment
@psychok7

psychok7 Nov 14, 2016

I had the same problem using docker-compose. As a Workaround (and like it has been mentioned before) i removed my Cadvisor service from the compose file and started the container manually using just docker before anything else.
Also, i had to connect this container to the same network as my compose services using docker network connect default cadvisor, this way my services can now see the container. I can now restart my services without running into this nasty error.

psychok7 commented Nov 14, 2016

I had the same problem using docker-compose. As a Workaround (and like it has been mentioned before) i removed my Cadvisor service from the compose file and started the container manually using just docker before anything else.
Also, i had to connect this container to the same network as my compose services using docker network connect default cadvisor, this way my services can now see the container. I can now restart my services without running into this nasty error.

@theroys

This comment has been minimized.

Show comment
Hide comment
@theroys

theroys Nov 15, 2016

The workaround that always works as per our experience running CAdvisor in host as a process , rather than as a docker container. We have it running in Production without any incidence.

theroys commented Nov 15, 2016

The workaround that always works as per our experience running CAdvisor in host as a process , rather than as a docker container. We have it running in Production without any incidence.

@toussa

This comment has been minimized.

Show comment
Hide comment
@toussa

toussa Nov 15, 2016

Same problem for us too... We had to stop to use cAdvisor in production until this is fixed.
The workaround that says to install cAdivsor directly to the host is not possible for us.

toussa commented Nov 15, 2016

Same problem for us too... We had to stop to use cAdvisor in production until this is fixed.
The workaround that says to install cAdivsor directly to the host is not possible for us.

@Nightbr

This comment has been minimized.

Show comment
Hide comment
@Nightbr

Nightbr Nov 30, 2016

Hey,
any news from this issue ? I'm still stuck with this:

  • docker 1.12.3
  • cadvisor latest (v0.24.1)
  • Debian Jessie64

The bug "device is busy" appears only if cadvisor is start after other containers we want to manage (restart, remove, ...).

I tried all workaround but I'm not completely satified:

  1. Remove this two volumes
    - /:/rootfs:ro
    - /var/lib/docker/:/var/lib/docker:ro
    Problem: We loose most container metrics...

  2. Start cadvisor first
    This is the easier workaround to put into practice but it is not really convenient and scallable...

  3. Stop all container, start cadvisor, restart all container:

#!/bin/sh

# Stop all containers
docker stop $(docker ps -a -q)

# Start cadvisor
docker run \
    -d \
    --name=cadvisor \
    --volume=/:/rootfs:ro \
    --volume=/var/run:/var/run:rw \
    --volume=/sys:/sys:ro \
    --volume=/var/lib/docker/:/var/lib/docker:ro \
    --net logs_back-tier \
    google/cadvisor

# Restart all container
docker start $(docker ps -a -q)

Very slow if you have a lots of containers... And not working well if you use docker-compose and depends_on for starting order...

  1. Install cadvisor and start it on the host

Any documentation or advice with this ? Because it is really painfull with docker to link container with service on host (you need to find the host ip for the container network, ...). If someone has anything on this I would appreciate.

That's it ;) The best is to fix this issue perhaps in future version but it seems to be a bit critical and reproductible.

Thanks in advance!

Nightbr commented Nov 30, 2016

Hey,
any news from this issue ? I'm still stuck with this:

  • docker 1.12.3
  • cadvisor latest (v0.24.1)
  • Debian Jessie64

The bug "device is busy" appears only if cadvisor is start after other containers we want to manage (restart, remove, ...).

I tried all workaround but I'm not completely satified:

  1. Remove this two volumes
    - /:/rootfs:ro
    - /var/lib/docker/:/var/lib/docker:ro
    Problem: We loose most container metrics...

  2. Start cadvisor first
    This is the easier workaround to put into practice but it is not really convenient and scallable...

  3. Stop all container, start cadvisor, restart all container:

#!/bin/sh

# Stop all containers
docker stop $(docker ps -a -q)

# Start cadvisor
docker run \
    -d \
    --name=cadvisor \
    --volume=/:/rootfs:ro \
    --volume=/var/run:/var/run:rw \
    --volume=/sys:/sys:ro \
    --volume=/var/lib/docker/:/var/lib/docker:ro \
    --net logs_back-tier \
    google/cadvisor

# Restart all container
docker start $(docker ps -a -q)

Very slow if you have a lots of containers... And not working well if you use docker-compose and depends_on for starting order...

  1. Install cadvisor and start it on the host

Any documentation or advice with this ? Because it is really painfull with docker to link container with service on host (you need to find the host ip for the container network, ...). If someone has anything on this I would appreciate.

That's it ;) The best is to fix this issue perhaps in future version but it seems to be a bit critical and reproductible.

Thanks in advance!

@EvertMDC

This comment has been minimized.

Show comment
Hide comment
@EvertMDC

EvertMDC Nov 30, 2016

I haven't seen any solution to this yet, apart from installing cadvisor on the host itself without a container. But your machines have to support that.

It's a difficult one to solve in my opinion since cadvisor actually works. It's only in the runtime environment of docker that issues arise.

I haven't seen any solution to this yet, apart from installing cadvisor on the host itself without a container. But your machines have to support that.

It's a difficult one to solve in my opinion since cadvisor actually works. It's only in the runtime environment of docker that issues arise.

@chadxz

This comment has been minimized.

Show comment
Hide comment
@chadxz

chadxz Dec 9, 2016

I noticed today that my systems were using devicemapper, so i switched them to using aufs. So far I have not had any issue with being unable to remove containers when they are started before the cAdvisor container is started. Using all volumes also... docker 1.11.1 / aufs / cAdvisor 0.24.1

chadxz commented Dec 9, 2016

I noticed today that my systems were using devicemapper, so i switched them to using aufs. So far I have not had any issue with being unable to remove containers when they are started before the cAdvisor container is started. Using all volumes also... docker 1.11.1 / aufs / cAdvisor 0.24.1

@RRAlex

This comment has been minimized.

Show comment
Hide comment
@RRAlex

RRAlex Dec 13, 2016

Seems like (f)statfs are the problem, according to docker at least:
https://docs.docker.com/engine/admin/troubleshooting_volume_errors/

RRAlex commented Dec 13, 2016

Seems like (f)statfs are the problem, according to docker at least:
https://docs.docker.com/engine/admin/troubleshooting_volume_errors/

@xbglowx

This comment has been minimized.

Show comment
Hide comment
@xbglowx

xbglowx Feb 18, 2017

This is no longer an issue for me, since I switched to:

  • Kernel: 4.4.x
  • Storage driver: overlay2

xbglowx commented Feb 18, 2017

This is no longer an issue for me, since I switched to:

  • Kernel: 4.4.x
  • Storage driver: overlay2
@viossat

This comment has been minimized.

Show comment
Hide comment
@viossat

viossat Apr 12, 2017

None of the workarounds above were working for me. Like @xbglowx, the issue has been solved after upgrading the kernel (from 3.16 to 4.9).

viossat commented Apr 12, 2017

None of the workarounds above were working for me. Like @xbglowx, the issue has been solved after upgrading the kernel (from 3.16 to 4.9).

@RRAlex

This comment has been minimized.

Show comment
Hide comment
@RRAlex

RRAlex Apr 12, 2017

@viossat: which storage driver are you using?

RRAlex commented Apr 12, 2017

@viossat: which storage driver are you using?

@viossat

This comment has been minimized.

Show comment
Hide comment
@viossat

viossat Apr 12, 2017

It switched from aufs to overlay2 by itself after the kernel upgrade (Docker 17.04.0-ce).
(overlay is in the mainline from kernel 3.18 and overlay2 is supported from 4.0)

viossat commented Apr 12, 2017

It switched from aufs to overlay2 by itself after the kernel upgrade (Docker 17.04.0-ce).
(overlay is in the mainline from kernel 3.18 and overlay2 is supported from 4.0)

@keyolk

This comment has been minimized.

Show comment
Hide comment
@keyolk

keyolk Jun 15, 2017

I got similar issue with prometheus node_exporter also
prometheus/node_exporter#602

seems bind mounting the path including /var/lib/docker
makes mount namespace leaking.

both are resolved with running it on host directly.

keyolk commented Jun 15, 2017

I got similar issue with prometheus node_exporter also
prometheus/node_exporter#602

seems bind mounting the path including /var/lib/docker
makes mount namespace leaking.

both are resolved with running it on host directly.

@zevarito

This comment has been minimized.

Show comment
Hide comment
@zevarito

zevarito Jun 15, 2017

@jangaraj jangaraj referenced this issue in monitoringartist/dockbix-agent-xxl Jun 19, 2017

Closed

Open file handles prevent containers from being removed #28

@keyolk

This comment has been minimized.

Show comment
Hide comment
@keyolk

keyolk Jun 21, 2017

@zevarito
I think it can be mitigated.
if I can put exact volumes to be used to the container.
what I means just take off /var/lib/docker/devicemapper being mounted.

could you inform me what of exact host data it uses ?

keyolk commented Jun 21, 2017

@zevarito
I think it can be mitigated.
if I can put exact volumes to be used to the container.
what I means just take off /var/lib/docker/devicemapper being mounted.

could you inform me what of exact host data it uses ?

@timstclair timstclair assigned tallclair and unassigned timstclair Jul 7, 2017

@jindov

This comment has been minimized.

Show comment
Hide comment
@jindov

jindov Jul 17, 2017

Same issue with my system:

OS: ubuntu 14.04 LTS
Kernel: 3.13.0-48-generic
Docker: 17.04.0-ce

Got this issue when run cadvisor v:0.26 with docker (even cadvisor:latest). Everything seems ok with node_exporter

jindov commented Jul 17, 2017

Same issue with my system:

OS: ubuntu 14.04 LTS
Kernel: 3.13.0-48-generic
Docker: 17.04.0-ce

Got this issue when run cadvisor v:0.26 with docker (even cadvisor:latest). Everything seems ok with node_exporter

@viossat

This comment has been minimized.

Show comment
Hide comment
@viossat

viossat Jul 17, 2017

@jindov Try to upgrade your kernel, you need to switch to the overlay driver. See my previous comment.

viossat commented Jul 17, 2017

@jindov Try to upgrade your kernel, you need to switch to the overlay driver. See my previous comment.

@jindov

This comment has been minimized.

Show comment
Hide comment
@jindov

jindov Jul 18, 2017

Thank @viossat, I will try to upgrade on dev env, but with prod env, we can't do this, so I decide to run on host directly. It's worked well

jindov commented Jul 18, 2017

Thank @viossat, I will try to upgrade on dev env, but with prod env, we can't do this, so I decide to run on host directly. It's worked well

@garyden

This comment has been minimized.

Show comment
Hide comment
@garyden

garyden Aug 13, 2017

How to run cadvisor on the host directly?

Gary

garyden commented Aug 13, 2017

How to run cadvisor on the host directly?

Gary

@jindov

This comment has been minimized.

Show comment
Hide comment
@jindov

jindov Aug 16, 2017

You can use supervisord to run directly, this is my configuration to run cadvisor:

[program:cadvisor]
directory=/build/metric_exporter/cadvisor/src/github.com/google/cadvisor
command=/build/metric_exporter/cadvisor/src/github.com/google/cadvisor/cadvisor -port 9080
autostart=true
autorestart=unexpected
redirect_stderr=true
environment=GOROOT="/usr/local/go",GOPATH="GOPATH=/build/metric_exporter/cadvisor",PATH="$GOPATH/bin:$GOROOT/bin:$PATH"

Jin

jindov commented Aug 16, 2017

You can use supervisord to run directly, this is my configuration to run cadvisor:

[program:cadvisor]
directory=/build/metric_exporter/cadvisor/src/github.com/google/cadvisor
command=/build/metric_exporter/cadvisor/src/github.com/google/cadvisor/cadvisor -port 9080
autostart=true
autorestart=unexpected
redirect_stderr=true
environment=GOROOT="/usr/local/go",GOPATH="GOPATH=/build/metric_exporter/cadvisor",PATH="$GOPATH/bin:$GOROOT/bin:$PATH"

Jin

@stephan2012

This comment has been minimized.

Show comment
Hide comment
@stephan2012

stephan2012 Sep 15, 2017

Same problem with RHEL 7.4, Docker 17.06.2. Doesn't matter if I'm using ZFS or Overlay2.

Any solution for this by now? Or just run cAdvisor directly on the host?

Same problem with RHEL 7.4, Docker 17.06.2. Doesn't matter if I'm using ZFS or Overlay2.

Any solution for this by now? Or just run cAdvisor directly on the host?

@rootfs rootfs referenced this issue in kubernetes/kubernetes Oct 25, 2017

Closed

pod is deleted but rbd not unmap #54214

@amcrn

This comment has been minimized.

Show comment
Hide comment
@amcrn

amcrn Dec 19, 2017

Hope this helps someone else:

Ubuntu 16.X (kernel 4.4.X) and Docker 1.11.2 w/ AUFS works fine.
Ubuntu 14.X (kernel 3.13.X) and Docker 1.11.2 w/ AUFS exhibits the problem.

So, it looks like overlay isn't necessary, a kernel upgrade is all that's required.

amcrn commented Dec 19, 2017

Hope this helps someone else:

Ubuntu 16.X (kernel 4.4.X) and Docker 1.11.2 w/ AUFS works fine.
Ubuntu 14.X (kernel 3.13.X) and Docker 1.11.2 w/ AUFS exhibits the problem.

So, it looks like overlay isn't necessary, a kernel upgrade is all that's required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment