New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker ps command hangs #12606

Closed
daniruddha29 opened this Issue Apr 21, 2015 · 159 comments

Comments

Projects
None yet
@daniruddha29

daniruddha29 commented Apr 21, 2015

Hi All,

We have a large setup of postgres databases which are on dockers. The host machine has 3 TB of RAM. When we are trying to execute docker ps command, it hangs and return nothing.
Any help on this is highly appreciable.

Regards,
Aniruddha

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Apr 21, 2015

Contributor

i've seen this also

Contributor

crosbymichael commented Apr 21, 2015

i've seen this also

@daniruddha29

This comment has been minimized.

Show comment
Hide comment
@daniruddha29

daniruddha29 Apr 21, 2015

any thing you follow, to check what the problem is???

daniruddha29 commented Apr 21, 2015

any thing you follow, to check what the problem is???

@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Apr 21, 2015

Contributor

@daniruddha29 Do you use links or restart policies?

Contributor

LK4D4 commented Apr 21, 2015

@daniruddha29 Do you use links or restart policies?

@daniruddha29

This comment has been minimized.

Show comment
Hide comment
@daniruddha29

daniruddha29 Apr 21, 2015

no, we don't . Does that impact?
Hope, this is not serious issue as the environment is production.

daniruddha29 commented Apr 21, 2015

no, we don't . Does that impact?
Hope, this is not serious issue as the environment is production.

@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Apr 21, 2015

Contributor

@daniruddha29 Just I recall same issues because of restart and links. I'll try to reproduce, any clues about your containers?

Contributor

LK4D4 commented Apr 21, 2015

@daniruddha29 Just I recall same issues because of restart and links. I'll try to reproduce, any clues about your containers?

@daniruddha29

This comment has been minimized.

Show comment
Hide comment
@daniruddha29

daniruddha29 Apr 21, 2015

we are using RHEL 6.5

daniruddha29 commented Apr 21, 2015

we are using RHEL 6.5

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael

crosbymichael Apr 21, 2015

Contributor

@daniruddha29 thanks, we can reproduce and will look for a fix and keep you updated

Contributor

crosbymichael commented Apr 21, 2015

@daniruddha29 thanks, we can reproduce and will look for a fix and keep you updated

@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Apr 21, 2015

Contributor

@daniruddha29 Did you use some volumes for your containers?

Contributor

LK4D4 commented Apr 21, 2015

@daniruddha29 Did you use some volumes for your containers?

@daniruddha29

This comment has been minimized.

Show comment
Hide comment
@daniruddha29

daniruddha29 Apr 21, 2015

yes we are using volumes to host our database mount.

daniruddha29 commented Apr 21, 2015

yes we are using volumes to host our database mount.

@daniruddha29

This comment has been minimized.

Show comment
Hide comment
@daniruddha29

daniruddha29 Apr 21, 2015

let me know if you need any specific info.

daniruddha29 commented Apr 21, 2015

let me know if you need any specific info.

@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Apr 21, 2015

Contributor

@daniruddha29 Any info will be helpful, docker version, docker info, Dockerfile, command line for running containers.

Contributor

LK4D4 commented Apr 21, 2015

@daniruddha29 Any info will be helpful, docker version, docker info, Dockerfile, command line for running containers.

@daniruddha29

This comment has been minimized.

Show comment
Hide comment
@daniruddha29

daniruddha29 Apr 21, 2015

docker version
Client version: 1.2.0
Client API version: 1.14
Go version (client): go1.3.1
Git commit (client): fa7b24f
OS/Arch (client): linux/amd64
Server version: 1.2.0
Server API version: 1.14
Go version (server): go1.3.1
Git commit (server): fa7b24f
++++++++++++++++++++++++++++++++
docker info
Containers: 18
Images: 96
Storage Driver: devicemapper
Pool Name: docker-252:0-656075-pool
Pool Blocksize: 64 Kb
Data file: /var/lib/docker/devicemapper/devicemapper/data
Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
Data Space Used: 18131.5 Mb
Data Space Total: 102400.0 Mb
Metadata Space Used: 17.6 Mb
Metadata Space Total: 2048.0 Mb
Execution Driver: native-0.2
Kernel Version: 3.8.13-44.1.1.el6uek.x86_64
Operating System:
+++++++++++++++++++++++++++++++++++++
i will update the docker file shortly

daniruddha29 commented Apr 21, 2015

docker version
Client version: 1.2.0
Client API version: 1.14
Go version (client): go1.3.1
Git commit (client): fa7b24f
OS/Arch (client): linux/amd64
Server version: 1.2.0
Server API version: 1.14
Go version (server): go1.3.1
Git commit (server): fa7b24f
++++++++++++++++++++++++++++++++
docker info
Containers: 18
Images: 96
Storage Driver: devicemapper
Pool Name: docker-252:0-656075-pool
Pool Blocksize: 64 Kb
Data file: /var/lib/docker/devicemapper/devicemapper/data
Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
Data Space Used: 18131.5 Mb
Data Space Total: 102400.0 Mb
Metadata Space Used: 17.6 Mb
Metadata Space Total: 2048.0 Mb
Execution Driver: native-0.2
Kernel Version: 3.8.13-44.1.1.el6uek.x86_64
Operating System:
+++++++++++++++++++++++++++++++++++++
i will update the docker file shortly

@LK4D4 LK4D4 removed the kind/regression label Apr 21, 2015

@daniruddha29

This comment has been minimized.

Show comment
Hide comment
@daniruddha29

daniruddha29 Apr 21, 2015

its too early to ask.. but did you get anything around that???

daniruddha29 commented Apr 21, 2015

its too early to ask.. but did you get anything around that???

@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Apr 21, 2015

Contributor

@daniruddha29 Nope, sorry :/ Can you try update your docker?

Contributor

LK4D4 commented Apr 21, 2015

@daniruddha29 Nope, sorry :/ Can you try update your docker?

@daniruddha29

This comment has been minimized.

Show comment
Hide comment
@daniruddha29

daniruddha29 Apr 21, 2015

Need to check as this is production environment.
One question, may be very silly one but what will be the impact if continue using it.?

daniruddha29 commented Apr 21, 2015

Need to check as this is production environment.
One question, may be very silly one but what will be the impact if continue using it.?

@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Apr 21, 2015

Contributor

@daniruddha29 Even if it is unfixed issue it won't be backported to 1.2 :(

Contributor

LK4D4 commented Apr 21, 2015

@daniruddha29 Even if it is unfixed issue it won't be backported to 1.2 :(

@daniruddha29

This comment has been minimized.

Show comment
Hide comment
@daniruddha29

daniruddha29 Apr 21, 2015

Haha... definitely not.
One more, is it a bug ?? already known??

daniruddha29 commented Apr 21, 2015

Haha... definitely not.
One more, is it a bug ?? already known??

@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Apr 21, 2015

Contributor

@daniruddha29 Looks like bug, but we can't reproduce it. Maybe it was fixed since 1.2 version.

Contributor

LK4D4 commented Apr 21, 2015

@daniruddha29 Looks like bug, but we can't reproduce it. Maybe it was fixed since 1.2 version.

@daniruddha29

This comment has been minimized.

Show comment
Hide comment
@daniruddha29

daniruddha29 Apr 21, 2015

That is quite useful information.
Thank you very much for all the help.

daniruddha29 commented Apr 21, 2015

That is quite useful information.
Thank you very much for all the help.

@aidanhs

This comment has been minimized.

Show comment
Hide comment
@aidanhs

aidanhs Apr 24, 2015

Contributor

Seeing this issue at the moment. Info:

ahobsons@docker03:~$ docker version
Client version: 1.5.0
Client API version: 1.17
Go version (client): go1.4.1
Git commit (client): a8a31ef
OS/Arch (client): linux/amd64
Server version: 1.5.0
Server API version: 1.17
Go version (server): go1.4.1
Git commit (server): a8a31ef
ahobsons@docker03:~$ docker info
Containers: 24
Images: 298
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 346
Execution Driver: native-0.2
Kernel Version: 3.13.0-46-generic
Operating System: Ubuntu 14.04.2 LTS
CPUs: 4
Total Memory: 7.63 GiB
Name: docker03
ID: 2JDT:3FKJ:P4FW:C3YE:TQ54:USOD:6X7J:XDF5:SH3A:NTKY:X7GJ:CUUM
WARNING: No swap limit support

Machine has spare memory, low load and very little i/o - the daemon looks idle, even when the docker ps command is hung.

Bizarrely, this machine is part of a swarm cluster...and doing docker ps via that works fine.

Contributor

aidanhs commented Apr 24, 2015

Seeing this issue at the moment. Info:

ahobsons@docker03:~$ docker version
Client version: 1.5.0
Client API version: 1.17
Go version (client): go1.4.1
Git commit (client): a8a31ef
OS/Arch (client): linux/amd64
Server version: 1.5.0
Server API version: 1.17
Go version (server): go1.4.1
Git commit (server): a8a31ef
ahobsons@docker03:~$ docker info
Containers: 24
Images: 298
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 346
Execution Driver: native-0.2
Kernel Version: 3.13.0-46-generic
Operating System: Ubuntu 14.04.2 LTS
CPUs: 4
Total Memory: 7.63 GiB
Name: docker03
ID: 2JDT:3FKJ:P4FW:C3YE:TQ54:USOD:6X7J:XDF5:SH3A:NTKY:X7GJ:CUUM
WARNING: No swap limit support

Machine has spare memory, low load and very little i/o - the daemon looks idle, even when the docker ps command is hung.

Bizarrely, this machine is part of a swarm cluster...and doing docker ps via that works fine.

@aidanhs

This comment has been minimized.

Show comment
Hide comment
@aidanhs

aidanhs Apr 24, 2015

Contributor

Ah, swarm appears to have cached information. New containers I start don't appear.
FWIW, docker images, docker run work fine when run directly on the machine.

Contributor

aidanhs commented Apr 24, 2015

Ah, swarm appears to have cached information. New containers I start don't appear.
FWIW, docker images, docker run work fine when run directly on the machine.

@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Apr 24, 2015

Contributor

@aidanhs Yeah, it means that some container acquired lock. So you can't do ps because of one locked container, also you can't stop or kill it from cli. Such bugs is very hard to fix without reproduction case :( It is pretty possible that it was fixed in 1.6, because code around running containers was rewritten to use new libcontainer API.

Contributor

LK4D4 commented Apr 24, 2015

@aidanhs Yeah, it means that some container acquired lock. So you can't do ps because of one locked container, also you can't stop or kill it from cli. Such bugs is very hard to fix without reproduction case :( It is pretty possible that it was fixed in 1.6, because code around running containers was rewritten to use new libcontainer API.

@aidanhs

This comment has been minimized.

Show comment
Hide comment
@aidanhs

aidanhs Apr 24, 2015

Contributor

I assume you're thinking a deadlock then? Or are there other things that can cause this in go? I assume care is taken to always defer unlock after acquiring a lock.

Have you (Docker Inc as a whole, not you specifically) not got any scripts to loop over the goroutines in a running daemon and dump out what they're blocked on? If it was a deadlock that would presumably help.

I've just compiled the daemon with symbols and then loaded those symbols into a gdb attached to the docker daemon. I can now write python scripts to traverse all the goroutines looking for particular things...if I know what I'm looking for. Do you have any links to previous tickets like this?

Contributor

aidanhs commented Apr 24, 2015

I assume you're thinking a deadlock then? Or are there other things that can cause this in go? I assume care is taken to always defer unlock after acquiring a lock.

Have you (Docker Inc as a whole, not you specifically) not got any scripts to loop over the goroutines in a running daemon and dump out what they're blocked on? If it was a deadlock that would presumably help.

I've just compiled the daemon with symbols and then loaded those symbols into a gdb attached to the docker daemon. I can now write python scripts to traverse all the goroutines looking for particular things...if I know what I'm looking for. Do you have any links to previous tickets like this?

@estesp

This comment has been minimized.

Show comment
Hide comment
@estesp

estesp Apr 24, 2015

Contributor

@aidanhs see PR #10786 for dump capability added to Docker daemon.. if you can reproduce easily, add this patch (if possible) and you can then dump goroutines.

Contributor

estesp commented Apr 24, 2015

@aidanhs see PR #10786 for dump capability added to Docker daemon.. if you can reproduce easily, add this patch (if possible) and you can then dump goroutines.

@aidanhs

This comment has been minimized.

Show comment
Hide comment
@aidanhs

aidanhs Apr 24, 2015

Contributor

I can dump goroutine traces with gdb. Unfortunately there are 618 goroutines :)
I've scripted their dumping, result at https://gist.github.com/aidanhs/960beaf2db1de622a2dd. This is pretty big.

But! If we grep out the callers to Lock then we find the callsites goroutines are blocked at:

root@docker03:~# grep -A1 Lock gdb.txt | grep -v '^--$' | grep -v 'Mutex' | sort | uniq -c
      1 #5  0x00000000004bc5ba in github.com/docker/docker/daemon.(*Daemon).ContainerInspect (daemon=0xc2080691e0, job=0xc208857980, ~r1=0) at /go/src/github.com/docker/docker/daemon/inspect.go:17
      1 #5  0x00000000004c861f in github.com/docker/docker/daemon.(*State).IsRunning (s=0xc2081c79d0, ~r0=54) at /go/src/github.com/docker/docker/daemon/state.go:116
      1 #5  0x00000000004c861f in github.com/docker/docker/daemon.(*State).IsRunning (s=0xc2091c2e70, ~r0=246) at /go/src/github.com/docker/docker/daemon/state.go:116
     41 #5  0x00000000004d4f20 in github.com/docker/docker/daemon.func·025 (container=0xc209060a80, ~r1=...) at /go/src/github.com/docker/docker/daemon/list.go:79
      3 #5  0x00000000004d4f20 in github.com/docker/docker/daemon.func·025 (container=0xc209060fc0, ~r1=...) at /go/src/github.com/docker/docker/daemon/list.go:79
      1 #68 0x00000000005eb5bf in sync.(*Pool).getSlow (p=0x12eac50 <net/http.textprotoReaderPool>, x=...) at /usr/local/go/src/sync/pool.go:129

Since I have gdb I can (in theory) print variables from goroutines and so on if you like.
Will have more of a look a bit later. I strongly suspect one of the first three lines is the culprit.

Contributor

aidanhs commented Apr 24, 2015

I can dump goroutine traces with gdb. Unfortunately there are 618 goroutines :)
I've scripted their dumping, result at https://gist.github.com/aidanhs/960beaf2db1de622a2dd. This is pretty big.

But! If we grep out the callers to Lock then we find the callsites goroutines are blocked at:

root@docker03:~# grep -A1 Lock gdb.txt | grep -v '^--$' | grep -v 'Mutex' | sort | uniq -c
      1 #5  0x00000000004bc5ba in github.com/docker/docker/daemon.(*Daemon).ContainerInspect (daemon=0xc2080691e0, job=0xc208857980, ~r1=0) at /go/src/github.com/docker/docker/daemon/inspect.go:17
      1 #5  0x00000000004c861f in github.com/docker/docker/daemon.(*State).IsRunning (s=0xc2081c79d0, ~r0=54) at /go/src/github.com/docker/docker/daemon/state.go:116
      1 #5  0x00000000004c861f in github.com/docker/docker/daemon.(*State).IsRunning (s=0xc2091c2e70, ~r0=246) at /go/src/github.com/docker/docker/daemon/state.go:116
     41 #5  0x00000000004d4f20 in github.com/docker/docker/daemon.func·025 (container=0xc209060a80, ~r1=...) at /go/src/github.com/docker/docker/daemon/list.go:79
      3 #5  0x00000000004d4f20 in github.com/docker/docker/daemon.func·025 (container=0xc209060fc0, ~r1=...) at /go/src/github.com/docker/docker/daemon/list.go:79
      1 #68 0x00000000005eb5bf in sync.(*Pool).getSlow (p=0x12eac50 <net/http.textprotoReaderPool>, x=...) at /usr/local/go/src/sync/pool.go:129

Since I have gdb I can (in theory) print variables from goroutines and so on if you like.
Will have more of a look a bit later. I strongly suspect one of the first three lines is the culprit.

@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Apr 24, 2015

Contributor

I personally fail to observe how it can produce a deadlock :)

Contributor

LK4D4 commented Apr 24, 2015

I personally fail to observe how it can produce a deadlock :)

@elephantfries

This comment has been minimized.

Show comment
Hide comment
@elephantfries

elephantfries Apr 25, 2015

@LK4D4 This problem is still present in 1.6.0. I just had the case yesterday. We use docker in a build system. A build hung and the agent abandoned it (without trying to stop or kill the build container). The agent and the build were running in two different containers. After that, docker ps hung with no way to recover but a reboot. This was on centos atomic 4a524a58cb with docker 1.6.0.

elephantfries commented Apr 25, 2015

@LK4D4 This problem is still present in 1.6.0. I just had the case yesterday. We use docker in a build system. A build hung and the agent abandoned it (without trying to stop or kill the build container). The agent and the build were running in two different containers. After that, docker ps hung with no way to recover but a reboot. This was on centos atomic 4a524a58cb with docker 1.6.0.

@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Apr 25, 2015

Contributor

@elephantfries It would help if you're described your flags to daemon and run.

Contributor

LK4D4 commented Apr 25, 2015

@elephantfries It would help if you're described your flags to daemon and run.

@elephantfries

This comment has been minimized.

Show comment
Hide comment
@elephantfries

elephantfries Jun 8, 2015

@rhvgoyal I can re-post the logs if it would be useful.

With direct thin pool, it seems that we're limited to 100G. I tried larger amounts, like 160G, but then I am running into devmapper confusion. I forgot the exact error but somebody else posted it on #atomic list as well. If you want, I can most likely recreate that problem. The other issues are minor, manageability type. I can't complain about them really but to give you an idea what the thinking is for example it forces us to LVM partitions. We'd much rather give 1T to the / partition ext4 and be done with it. Those atomic hosts are commodities, the simpler the better. Nobody's expanding disks on them, we'd switch to another host if necessary. Another example is when we decide to rm -rf /var/lib/docker (because of unrelated issues), then the pool is unusable and we have to re-create it from scratch. It's the cost of living on the cutting edge of technology, we understand that.

elephantfries commented Jun 8, 2015

@rhvgoyal I can re-post the logs if it would be useful.

With direct thin pool, it seems that we're limited to 100G. I tried larger amounts, like 160G, but then I am running into devmapper confusion. I forgot the exact error but somebody else posted it on #atomic list as well. If you want, I can most likely recreate that problem. The other issues are minor, manageability type. I can't complain about them really but to give you an idea what the thinking is for example it forces us to LVM partitions. We'd much rather give 1T to the / partition ext4 and be done with it. Those atomic hosts are commodities, the simpler the better. Nobody's expanding disks on them, we'd switch to another host if necessary. Another example is when we decide to rm -rf /var/lib/docker (because of unrelated issues), then the pool is unusable and we have to re-create it from scratch. It's the cost of living on the cutting edge of technology, we understand that.

@elephantfries

This comment has been minimized.

Show comment
Hide comment

elephantfries commented Jun 8, 2015

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal

rhvgoyal Jun 8, 2015

Contributor

@elephantfries

Ok, you mentioned lot of issues, let us handle the most important one first. That is some kind of this 100G limitation. Looks like I am not facing this issue.

I created an lvm thin pool (using docker-storage-setup) of size 800G and used around 500G for docker thin pool.

I started docker with option --storage-opt dm.basesize=160G and everything seems to be fine. I am
having container root of size 160G.
[root@d2cc496561d6 /]# df -h
Filesystem Size Used Avail Use% Mounted on

/dev/mapper/docker-253:1-2763198-d2cc496561d6d520cbc0236b4ba88c362c446a7619992123f11c809cded25b47 160G 230M 160G 1% /

I am using upstream docker on F21 and using a dynamically built binary.

So this might be something atomic specific and might have even got fixed into latest. If not, this should be something fixable.

Contributor

rhvgoyal commented Jun 8, 2015

@elephantfries

Ok, you mentioned lot of issues, let us handle the most important one first. That is some kind of this 100G limitation. Looks like I am not facing this issue.

I created an lvm thin pool (using docker-storage-setup) of size 800G and used around 500G for docker thin pool.

I started docker with option --storage-opt dm.basesize=160G and everything seems to be fine. I am
having container root of size 160G.
[root@d2cc496561d6 /]# df -h
Filesystem Size Used Avail Use% Mounted on

/dev/mapper/docker-253:1-2763198-d2cc496561d6d520cbc0236b4ba88c362c446a7619992123f11c809cded25b47 160G 230M 160G 1% /

I am using upstream docker on F21 and using a dynamically built binary.

So this might be something atomic specific and might have even got fixed into latest. If not, this should be something fixable.

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal

rhvgoyal Jun 8, 2015

Contributor

@elephantfries

The issue of manageability, there are many pieces to it.

  • W.r.t planning for dividing space between root and thin pool, I agree that using loop devices is simpler.
  • W.r.t being able to grow pool later, I think using loop device is not best as docker does not have ability to grow pool. You mention that you don't care as you will move to a different atomic host. So you don't care about all the containers which are already in the previous thin pool as well?
  • Sometimes pool might be in bad shape. lvm checks for the pool health during activation. docker does
    none of it. That means your operations on pool later might fail and you might not have an easy
    or automatic way to deal with it.
  • Loop devices are not considered very reliable. Thin pool is more reliable.

IIUC, looks like for you, not having to think about how to partition existing storage is more important than other downsides of loop. I guess its your choice.

Contributor

rhvgoyal commented Jun 8, 2015

@elephantfries

The issue of manageability, there are many pieces to it.

  • W.r.t planning for dividing space between root and thin pool, I agree that using loop devices is simpler.
  • W.r.t being able to grow pool later, I think using loop device is not best as docker does not have ability to grow pool. You mention that you don't care as you will move to a different atomic host. So you don't care about all the containers which are already in the previous thin pool as well?
  • Sometimes pool might be in bad shape. lvm checks for the pool health during activation. docker does
    none of it. That means your operations on pool later might fail and you might not have an easy
    or automatic way to deal with it.
  • Loop devices are not considered very reliable. Thin pool is more reliable.

IIUC, looks like for you, not having to think about how to partition existing storage is more important than other downsides of loop. I guess its your choice.

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal

rhvgoyal Jun 8, 2015

Contributor

@elephantfries

If you have to do "rm -rf /var/lib/docker", then you have to issue to more commands to delete and recreate pool.
" lvremove docke/docker-pool"
"systemctl start docker-storage-setup"

And that should take care of destroying your existing pool and creating a new one for docker use.

Does not sound like a lot of work?

Contributor

rhvgoyal commented Jun 8, 2015

@elephantfries

If you have to do "rm -rf /var/lib/docker", then you have to issue to more commands to delete and recreate pool.
" lvremove docke/docker-pool"
"systemctl start docker-storage-setup"

And that should take care of destroying your existing pool and creating a new one for docker use.

Does not sound like a lot of work?

@elephantfries

This comment has been minimized.

Show comment
Hide comment
@elephantfries

elephantfries Jun 8, 2015

@rhvgoyal Right, your points are valid and I said that I cannot really complain about them. I only mention them to give you an idea into what we're thinking. So, again, nothing wrong with lvm but life is easier without it. Too bad loop devices are unreliable, let's hope overlayfs stabilizes soon. However, devmapper has other uses, beyond docker.

So let's focus on hard problems. I'll try to re-create the devmapper confusion with thin pool larger than 100G. This has been reported and possibly fixed but I'll try anyway.

elephantfries commented Jun 8, 2015

@rhvgoyal Right, your points are valid and I said that I cannot really complain about them. I only mention them to give you an idea into what we're thinking. So, again, nothing wrong with lvm but life is easier without it. Too bad loop devices are unreliable, let's hope overlayfs stabilizes soon. However, devmapper has other uses, beyond docker.

So let's focus on hard problems. I'll try to re-create the devmapper confusion with thin pool larger than 100G. This has been reported and possibly fixed but I'll try anyway.

@elephantfries

This comment has been minimized.

Show comment
Hide comment
@elephantfries

elephantfries Jun 26, 2015

@rhvgoyal There may be time to close this issue. I met vbatts at DockerCon. He was rather pessimistic about loop devices. So maybe that's just how it is: won't fix. I switched to thin pool while setting my sights on overlayfs. Devmapper with thin pool is not without issues but they're less frequent. I could not reproduce 'no space' on thin pools larger than 100G but I did run into "cannot start container" a couple of times. Restarting docker brings it back to operation. Not ideal, but I can live with it.

elephantfries commented Jun 26, 2015

@rhvgoyal There may be time to close this issue. I met vbatts at DockerCon. He was rather pessimistic about loop devices. So maybe that's just how it is: won't fix. I switched to thin pool while setting my sights on overlayfs. Devmapper with thin pool is not without issues but they're less frequent. I could not reproduce 'no space' on thin pools larger than 100G but I did run into "cannot start container" a couple of times. Restarting docker brings it back to operation. Not ideal, but I can live with it.

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal

rhvgoyal Jun 26, 2015

Contributor

@elephantfries

Ok, lets close it. Repoen it again once we have more concrete data we can go after and fix things.

@vbatts, can you please close this.

Contributor

rhvgoyal commented Jun 26, 2015

@elephantfries

Ok, lets close it. Repoen it again once we have more concrete data we can go after and fix things.

@vbatts, can you please close this.

@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Jun 26, 2015

Contributor

Closing

Contributor

LK4D4 commented Jun 26, 2015

Closing

@LK4D4 LK4D4 closed this Jun 26, 2015

@syzer

This comment has been minimized.

Show comment
Hide comment
@syzer

syzer Sep 2, 2015

on mac i got

An error occurred trying to connect: Get https://192.168.59.103:2376/v1.19/containers/json: dial tcp 192.168.59.103:2376: i/o timeout

syzer commented Sep 2, 2015

on mac i got

An error occurred trying to connect: Get https://192.168.59.103:2376/v1.19/containers/json: dial tcp 192.168.59.103:2376: i/o timeout
@oncletom

This comment has been minimized.

Show comment
Hide comment
@oncletom

oncletom Sep 22, 2015

I face a similar issue on an Ubuntu 14.04 EC2 machine – using Docker 1.8.2 with /var/lib/docker mounted on an EBS volume.

I am not sure if it happens during the exited images cleanup (docker rm $(docker ps -a -q -f "status=exited" -f "status=dead") every 10 minutes with hundreds of exited containers).

Let me know if you need any additional details.

Linux ip-10-0-0-52 3.13.0-63-generic #103-Ubuntu SMP Fri Aug 14 21:42:59 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Client:
 Version:      1.8.2
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   0a8c2e3
 Built:        Thu Sep 10 19:19:00 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      1.8.2
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   0a8c2e3
 Built:        Thu Sep 10 19:19:00 UTC 2015
 OS/Arch:      linux/amd64
Containers: 16
Images: 129
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 161
 Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: syslog
Kernel Version: 3.13.0-63-generic
Operating System: Ubuntu 14.04.2 LTS
CPUs: 4
Total Memory: 14.69 GiB
Name: ip-10-0-0-52
ID: 447J:75KW:SP3L:DK2P:YL7N:YLT2:SAZ4:UX3T:264B:V2KX:7FT7:MODF
WARNING: No swap limit support

oncletom commented Sep 22, 2015

I face a similar issue on an Ubuntu 14.04 EC2 machine – using Docker 1.8.2 with /var/lib/docker mounted on an EBS volume.

I am not sure if it happens during the exited images cleanup (docker rm $(docker ps -a -q -f "status=exited" -f "status=dead") every 10 minutes with hundreds of exited containers).

Let me know if you need any additional details.

Linux ip-10-0-0-52 3.13.0-63-generic #103-Ubuntu SMP Fri Aug 14 21:42:59 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Client:
 Version:      1.8.2
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   0a8c2e3
 Built:        Thu Sep 10 19:19:00 UTC 2015
 OS/Arch:      linux/amd64

Server:
 Version:      1.8.2
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   0a8c2e3
 Built:        Thu Sep 10 19:19:00 UTC 2015
 OS/Arch:      linux/amd64
Containers: 16
Images: 129
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 161
 Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: syslog
Kernel Version: 3.13.0-63-generic
Operating System: Ubuntu 14.04.2 LTS
CPUs: 4
Total Memory: 14.69 GiB
Name: ip-10-0-0-52
ID: 447J:75KW:SP3L:DK2P:YL7N:YLT2:SAZ4:UX3T:264B:V2KX:7FT7:MODF
WARNING: No swap limit support
@khimaros

This comment has been minimized.

Show comment
Hide comment
@khimaros

khimaros Sep 30, 2015

Also seeing this with CoreOS 766.3.0 and default Docker configuration on Google Compute Engine.

docker version

# docker version
Client version: 1.7.1
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): 2c2c52b-dirty
OS/Arch (client): linux/amd64
Server version: 1.7.1
Server API version: 1.19
Go version (server): go1.4.2
Git commit (server): 2c2c52b-dirty
OS/Arch (server): linux/amd64

docker info

# docker info
Containers: 8
Images: 55
Storage Driver: overlay
 Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.1.6-coreos-r1
Operating System: CoreOS 766.3.0
CPUs: 1
Total Memory: 3.618 GiB

docker-compose -v

# docker-compose -v
docker-compose version: 1.4.0

I seem to be able to reproduce this by executing many commands of the following form:

# docker-compose run --rm --entrypoint=${command} ${container} /bin/sh -c '<complex command>'

Working on a consistent reproduction case with a simple container.

khimaros commented Sep 30, 2015

Also seeing this with CoreOS 766.3.0 and default Docker configuration on Google Compute Engine.

docker version

# docker version
Client version: 1.7.1
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): 2c2c52b-dirty
OS/Arch (client): linux/amd64
Server version: 1.7.1
Server API version: 1.19
Go version (server): go1.4.2
Git commit (server): 2c2c52b-dirty
OS/Arch (server): linux/amd64

docker info

# docker info
Containers: 8
Images: 55
Storage Driver: overlay
 Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.1.6-coreos-r1
Operating System: CoreOS 766.3.0
CPUs: 1
Total Memory: 3.618 GiB

docker-compose -v

# docker-compose -v
docker-compose version: 1.4.0

I seem to be able to reproduce this by executing many commands of the following form:

# docker-compose run --rm --entrypoint=${command} ${container} /bin/sh -c '<complex command>'

Working on a consistent reproduction case with a simple container.

@yifan-gu

This comment has been minimized.

Show comment
Hide comment
@yifan-gu

yifan-gu Oct 20, 2015

Also seeing this on coreos 766.4 on gce

$ docker info
Containers: 140
Images: 110
Storage Driver: overlay
 Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.1.7-coreos
Operating System: CoreOS 766.4.0
CPUs: 2
Total Memory: 7.309 GiB
Name: e2e-test-yifan-minion-9wkj
ID: 4MAS:EPAP:ECO5:BZ76:PGPB:7IGD:4XCP:ANLE:GQBW:GGFR:3OCY:EZEN
$ docker version
Client version: 1.7.1
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): df2f73d-dirty
OS/Arch (client): linux/amd64
Server version: 1.7.1
Server API version: 1.19
Go version (server): go1.4.2
Git commit (server): df2f73d-dirty
OS/Arch (server): linux/amd64

yifan-gu commented Oct 20, 2015

Also seeing this on coreos 766.4 on gce

$ docker info
Containers: 140
Images: 110
Storage Driver: overlay
 Backing Filesystem: extfs
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.1.7-coreos
Operating System: CoreOS 766.4.0
CPUs: 2
Total Memory: 7.309 GiB
Name: e2e-test-yifan-minion-9wkj
ID: 4MAS:EPAP:ECO5:BZ76:PGPB:7IGD:4XCP:ANLE:GQBW:GGFR:3OCY:EZEN
$ docker version
Client version: 1.7.1
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): df2f73d-dirty
OS/Arch (client): linux/amd64
Server version: 1.7.1
Server API version: 1.19
Go version (server): go1.4.2
Git commit (server): df2f73d-dirty
OS/Arch (server): linux/amd64
@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Oct 21, 2015

Member

@sllawap @yifan-gu the versions you're running on CoreOS are maintained by CoreOS, first of all those are older versions of Docker (which means that the issue might already have been fixed here), but also, the CoreOS may contain patches that make it act different as the "vanilla" Docker version (see the -dirty in the version).

I recommend opening an issue in the CoreOS issue tracker first.

Member

thaJeztah commented Oct 21, 2015

@sllawap @yifan-gu the versions you're running on CoreOS are maintained by CoreOS, first of all those are older versions of Docker (which means that the issue might already have been fixed here), but also, the CoreOS may contain patches that make it act different as the "vanilla" Docker version (see the -dirty in the version).

I recommend opening an issue in the CoreOS issue tracker first.

@sjwoodr

This comment has been minimized.

Show comment
Hide comment
@sjwoodr

sjwoodr Nov 2, 2015

I also encountered this (again) today with docker 1.8.2 on ubuntu 14.04 in EC2. I had to reboot. Restarting docker daemon itself was of no help.

sjwoodr commented Nov 2, 2015

I also encountered this (again) today with docker 1.8.2 on ubuntu 14.04 in EC2. I had to reboot. Restarting docker daemon itself was of no help.

@dazraf

This comment has been minimized.

Show comment
Hide comment
@dazraf

dazraf Feb 17, 2016

Why is this issue not reopened? Is it really fixed?

dazraf commented Feb 17, 2016

Why is this issue not reopened? Is it really fixed?

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Feb 17, 2016

Member

@dazraf I think this thread has become a "catch all" for multiple issues that may not be related, but have similar results; if you're encountering this on 1.10, it's better to open a new issue

Member

thaJeztah commented Feb 17, 2016

@dazraf I think this thread has become a "catch all" for multiple issues that may not be related, but have similar results; if you're encountering this on 1.10, it's better to open a new issue

@alexanderkjeldaas

This comment has been minimized.

Show comment
Hide comment
@alexanderkjeldaas

alexanderkjeldaas Feb 22, 2016

If you're encountering this on 1.10, it might simply be that it is migrating images.

alexanderkjeldaas commented Feb 22, 2016

If you're encountering this on 1.10, it might simply be that it is migrating images.

@kevinoriordan

This comment has been minimized.

Show comment
Hide comment
@kevinoriordan

kevinoriordan Feb 25, 2016

Also seeing this on Amazon Linux 2015.09

Docker version details:

Client:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5/1.9.1
Built:
OS/Arch: linux/amd64

Server:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5/1.9.1
Built:
OS/Arch: linux/amd64

kevinoriordan commented Feb 25, 2016

Also seeing this on Amazon Linux 2015.09

Docker version details:

Client:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5/1.9.1
Built:
OS/Arch: linux/amd64

Server:
Version: 1.9.1
API version: 1.21
Go version: go1.4.2
Git commit: a34a1d5/1.9.1
Built:
OS/Arch: linux/amd64

@jhberges

This comment has been minimized.

Show comment
Hide comment
@jhberges

jhberges Apr 28, 2016

Also seeing this on

docker@xxxx:~$ uname -a
Linux xxxx 3.16.0-45-generic #60~14.04.1-Ubuntu SMP Fri Jul 24 21:16:23 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

docker@xxxx:~$ docker version
Client:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:54:52 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:54:52 2016
 OS/Arch:      linux/amd64

jhberges commented Apr 28, 2016

Also seeing this on

docker@xxxx:~$ uname -a
Linux xxxx 3.16.0-45-generic #60~14.04.1-Ubuntu SMP Fri Jul 24 21:16:23 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

docker@xxxx:~$ docker version
Client:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:54:52 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:54:52 2016
 OS/Arch:      linux/amd64
@brouberol

This comment has been minimized.

Show comment
Hide comment
@brouberol

brouberol May 3, 2016

Also seeing this on

# uname -a
Linux 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.3-7~bpo8+1 (2016-01-19) x86_64 GNU/Linux

# docker version
Client:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:38:58 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:38:58 2016
 OS/Arch:      linux/amd64

The funny thing is, I have NO running container. I initially thought this hang could be related to a high number of containers running.

brouberol commented May 3, 2016

Also seeing this on

# uname -a
Linux 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.3-7~bpo8+1 (2016-01-19) x86_64 GNU/Linux

# docker version
Client:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:38:58 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.10.3
 API version:  1.22
 Go version:   go1.5.3
 Git commit:   20f81dd
 Built:        Thu Mar 10 15:38:58 2016
 OS/Arch:      linux/amd64

The funny thing is, I have NO running container. I initially thought this hang could be related to a high number of containers running.

@LouisKottmann

This comment has been minimized.

Show comment
Hide comment
@LouisKottmann

LouisKottmann May 13, 2016

Also seeing this on:

# docker version
Client:
 Version:      1.11.1
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   5604cbe
 Built:        Tue Apr 26 23:30:23 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.1
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   5604cbe
 Built:        Tue Apr 26 23:30:23 2016
 OS/Arch:      linux/amd64
# uname -a
Linux ip-10-0-1-66 3.13.0-86-generic #130-Ubuntu SMP Mon Apr 18 18:27:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.4 LTS
Release:        14.04
Codename:       trusty

LouisKottmann commented May 13, 2016

Also seeing this on:

# docker version
Client:
 Version:      1.11.1
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   5604cbe
 Built:        Tue Apr 26 23:30:23 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.11.1
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   5604cbe
 Built:        Tue Apr 26 23:30:23 2016
 OS/Arch:      linux/amd64
# uname -a
Linux ip-10-0-1-66 3.13.0-86-generic #130-Ubuntu SMP Mon Apr 18 18:27:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.4 LTS
Release:        14.04
Codename:       trusty
@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah May 13, 2016

Member

@brouberol @LouisKottmann please open a new issue, and also provide the output of docker info.

I'm going to lock this issue; this issue has become a collection of many issue that are not related to each other, but can result in the same behavior ("docker not responding"). If you're having an issue like this, please open a new issue, and try to provide information that can be used to reproduce, or debug. Reporting only "I have this too" likely won't be sufficient to find the cause, or to see if there's an actual bug or if it's a configuration issue.

Member

thaJeztah commented May 13, 2016

@brouberol @LouisKottmann please open a new issue, and also provide the output of docker info.

I'm going to lock this issue; this issue has become a collection of many issue that are not related to each other, but can result in the same behavior ("docker not responding"). If you're having an issue like this, please open a new issue, and try to provide information that can be used to reproduce, or debug. Reporting only "I have this too" likely won't be sufficient to find the cause, or to see if there's an actual bug or if it's a configuration issue.

@moby moby locked and limited conversation to collaborators May 13, 2016

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.