New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`pull` gets stuck waiting for a layer #22598

Closed
kojiromike opened this Issue May 9, 2016 · 27 comments

Comments

Projects
None yet
@kojiromike
Contributor

kojiromike commented May 9, 2016

Output of docker version:

Client:
 Version:      1.11.1
 API version:  1.23
 Go version:   go1.6.2
 Git commit:   5604cbe
 Built:        Wed Apr 27 15:27:26 UTC 2016
 OS/Arch:      darwin/amd64

Server:
 Version:      1.11.1
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   5604cbe
 Built:        Wed Apr 27 00:34:20 2016
 OS/Arch:      linux/amd64

Output of docker info:

Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 1
Server Version: 1.11.1
Storage Driver: aufs
 Root Dir: /mnt/sda1/var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 2
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge null host
Kernel Version: 4.4.8-boot2docker
Operating System: Boot2Docker 1.11.1 (TCL 7.0); HEAD : 7954f54 - Wed Apr 27 16:36:45 UTC 2016
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.858 GiB
Name: default
ID: Y4GM:7ZGQ:XL7U:BGRR:R6ZR:7SDJ:P4WX:ENZD:6KCV:CYS7:TBM2:4FUZ
Docker Root Dir: /mnt/sda1/var/lib/docker
Debug mode (client): false
Debug mode (server): true
 File Descriptors: 30
 Goroutines: 152
 System Time: 2016-05-09T12:25:32.911333681Z
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
 provider=virtualbox

Additional environment details (AWS, VirtualBox, physical, etc.):

Steps to reproduce the issue:

  1. Have a lot of layers to pull. (For example, my docker-compose.yml file has six images, some with fairly many layers, like rabbitmq:3.5.7-management.)
  2. Do docker-compose pull. Note that sometimes, but not always, pull gets stuck waiting for a layer.
  3. Restart the docker machine.
  4. Do docker pull manually for all the images you need. Note that sometimes, but not always, pull gets stuck waiting for a layer (thus demonstrating the problem isn't with docker compose).

Describe the results you received:

3.5.7-management: Pulling from library/rabbitmq
d4bce7fd68df: Downloading 50.86 MB/51.35 MB
a3ed95caeb02: Download complete
c0330dd9a641: Download complete
55756666188f: Download complete
c14677369432: Download complete
26af78564017: Download complete
bd6129eb0c93: Download complete
5effe88cea3c: Download complete
c393a181c961: Download complete
76ed1a563002: Download complete
be99def8c8f1: Download complete
281661af86b4: Download complete
bc1e7dd06368: Download complete
4a321db4bb32: Download complete
c3ed9562f00a: Download complete

That first layer, d4bce7fd68df, has been at 50.86 MB for about 8 minutes now. I've attempted to wait this out before. The longest I waited was over an hour.

Describe the results you expected:

Ideally, the pull should simply not hang. Barring that, it should die with an error so that I can understand the problem and work on automations for working around it.

Additional information you deem important (e.g. issue happens only occasionally):

This issue does not happen on every pull, and it doesn't happen on the same layers every time. It happens more frequently on my home network than it does at work, but it has happened in both places. Restarting the entire docker machine (docker-machine restart default) is more effective at kicking the can than just killing and restarting the pull itself.

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented May 9, 2016

@LukasBacigal

This comment has been minimized.

LukasBacigal commented May 13, 2016

Happens quite often for us at company. Multiple daemon restarts helps for a while.

@nathandunn

This comment has been minimized.

nathandunn commented May 26, 2016

Is this related to #21084? The machine restart worked.

@schmunk42

This comment has been minimized.

Contributor

schmunk42 commented Oct 17, 2016

I have a similar issue, AFAIR since switching to overlayFS (became default in docker-machine I think)

We constantly hanging docker pulland docker-compose pull with the Broken combination below...

Host

  • OS X - Virtualbox
  • docker-machine version 0.7.0, build a650a40

Working combination (client 1.12.2, server 1.11.2, API 1.23, aufs) - so far, still testing

tobias in ~ λ docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 2
Server Version: 1.11.2
Storage Driver: aufs
 Root Dir: /mnt/sda1/var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 26
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null host
Swarm: 
 NodeID: 
 Is Manager: false
 Node Address: 
Security Options:
Kernel Version: 4.4.12-boot2docker
Operating System: Boot2Docker 1.11.2 (TCL 7.1); HEAD : a6645c3 - Wed Jun  1 22:59:51 UTC 2016
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 995.9 MiB
Name: m4
ID: J55D:SJGD:WDHZ:5IUL:SXNM:IHB5:EBFP:QDDA:GRQ4:JAOP:D4VI:NTSB
Docker Root Dir: /mnt/sda1/var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 30
 Goroutines: 136
 System Time: 2016-10-17T18:02:32.973184394Z
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
 provider=virtualbox
Insecure Registries:
 127.0.0.0/8
tobias in ~ λ docker version      
Client:
 Version:      1.12.2
 API version:  1.23
 Go version:   go1.6.3
 Git commit:   bb80604
 Built:        Tue Oct 11 05:27:08 2016
 OS/Arch:      darwin/amd64
 Experimental: true

Server:
 Version:      1.11.2
 API version:  1.23
 Go version:   go1.5.4
 Git commit:   b9f10c9
 Built:        Wed Jun  1 21:20:08 2016
 OS/Arch:      linux/amd64
tobias in ~ λ 

Broken combination (client 1.12.2, server 1.12.2, API 1.2.4 (and 1.2.3), overlayfs)

tobias in ~ λ docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 1.12.2
Storage Driver: aufs
 Root Dir: /mnt/sda1/var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 0
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null host overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: seccomp
Kernel Version: 4.4.24-boot2docker
Operating System: Boot2Docker 1.12.2 (TCL 7.2); HEAD : 9d8e41b - Tue Oct 11 23:40:08 UTC 2016
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 1.955 GiB
Name: m3
ID: TIPS:TXG6:KFQS:6NFS:5W3C:RVC3:PNG6:KV2S:7PHU:BSBA:H7EP:55EJ
Docker Root Dir: /mnt/sda1/var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 13
 Goroutines: 23
 System Time: 2016-10-17T18:04:59.751245129Z
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
 provider=virtualbox
Insecure Registries:
 127.0.0.0/8
Client:
 Version:      1.12.2
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   bb80604
 Built:        Tue Oct 11 05:27:08 2016
 OS/Arch:      darwin/amd64
 Experimental: true

Server:
 Version:      1.12.2
 API version:  1.24
 Go version:   go1.6.3
 Git commit:   bb80604
 Built:        Tue Oct 11 17:00:50 2016
 OS/Arch:      linux/amd64
@cpuguy83

This comment has been minimized.

Contributor

cpuguy83 commented Nov 11, 2016

SIGUSR1 logs on the daemon when this is happening would be really insightful.

@schmunk42

This comment has been minimized.

Contributor

schmunk42 commented Nov 14, 2016

@cpuguy83 How do I get the SIGUSR1 logs? Could you post an example command.

@cpuguy83

This comment has been minimized.

Contributor

cpuguy83 commented Nov 14, 2016

@schmunk42 When you send SIGUSR1 to the docker daemon, it will spit out a stack trace into the logs for the daemon.
The way to get these logs depends on what distro/init system you are running.
If on systemd, then it'll be in journald.

@schmunk42

This comment has been minimized.

Contributor

schmunk42 commented Nov 14, 2016

@cpuguy83 This would apply for Boot2Docker, so it's like this?

docker-machine ssh default
$ kill -SIGUSR1 dockerd
$ cat /var/lib/boot2docker/....
@schmunk42

This comment has been minimized.

Contributor

schmunk42 commented Nov 14, 2016

Here you go: https://gist.github.com/schmunk42/afc90c8c12efb04381d6df47f8049503

Note: Long lines from dump are not displayed correctly in Safari, but in Chrome it looks just fine.

Lines starting with === BEGIN goroutine stack dump === have been triggered during hanging pulls with sudo killall -SIGUSR1 dockerd.

CC: @handcode

@cpuguy83

This comment has been minimized.

Contributor

cpuguy83 commented Nov 15, 2016

@aaronlehmann Best I can tell something is not sending a notification. Stack trace just has a bunch of goroutines sitting in a select (make sense since it's hanging).
Admittedly I'm not very familiar with this part of the codebase.

I posted a formatted stacktrace in a comment in that gist for easier reading.

@aaronlehmann

This comment has been minimized.

Contributor

aaronlehmann commented Nov 15, 2016

@cpuguy83: I don't think anything is wrong on Docker's side. It's just waiting for data from the server and not receiving any. This is the relevant part of the stack trace:

goroutine 36665 [runnable]:
syscall.Syscall(0x0, 0x39, 0xc8215f0c64, 0x539c, 0x539c, 0x539c, 0x0)
    /usr/local/go/src/syscall/asm_linux_amd64.s:18 +0x5
syscall.read(0x39, 0xc8215f0c64, 0x539c, 0x539c, 0x72, 0x0, 0x0)
    /usr/local/go/src/syscall/zsyscall_linux_amd64.go:783 +0x5f
syscall.Read(0x39, 0xc8215f0c64, 0x539c, 0x539c, 0xffffffffffffff01, 0x0, 0x0)
    /usr/local/go/src/syscall/syscall_unix.go:161 +0x4d
net.(*netFD).Read(0xc8216800e0, 0xc8215f0c64, 0x539c, 0x539c, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/fd_unix.go:246 +0x18b
net.(*conn).Read(0xc820154b70, 0xc8215f0c64, 0x539c, 0x539c, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/net.go:172 +0xe4
crypto/tls.(*block).readFromUntil(0xc820b447b0, 0x7f8ea2f37f70, 0xc820154b70, 0x401d, 0x0, 0x0)
    /usr/local/go/src/crypto/tls/conn.go:460 +0xcc
crypto/tls.(*Conn).readRecord(0xc821bf2300, 0x2231817, 0x0, 0x0)
    /usr/local/go/src/crypto/tls/conn.go:607 +0x132f
crypto/tls.(*Conn).Read(0xc821bf2300, 0xc82224a000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
    /usr/local/go/src/crypto/tls/conn.go:939 +0x167
net/http.noteEOFReader.Read(0x7f8ea2f38610, 0xc821bf2300, 0xc8221218c8, 0xc82224a000, 0x8000, 0x8000, 0x4000, 0x0, 0x0)
    /usr/local/go/src/net/http/transport.go:1690 +0x67
net/http.(*noteEOFReader).Read(0xc8223e3800, 0xc82224a000, 0x8000, 0x8000, 0xac00004000, 0x0, 0x0)
    <autogenerated>:284 +0xd0
bufio.(*Reader).Read(0xc82126c600, 0xc82224a000, 0x8000, 0x8000, 0xc821d03080, 0x0, 0x0)
    /usr/local/go/src/bufio/bufio.go:197 +0x126
io.(*LimitedReader).Read(0xc8223e3c80, 0xc82224a000, 0x8000, 0x8000, 0xc8229e1c68, 0x0, 0x0)
    /usr/local/go/src/io/io.go:426 +0xbd
net/http.(*body).readLocked(0xc822313a80, 0xc82224a000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/http/transfer.go:651 +0xa1
net/http.(*body).Read(0xc822313a80, 0xc82224a000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/http/transfer.go:643 +0x110
net/http.(*bodyEOFSignal).Read(0xc822313ac0, 0xc82224a000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
    /usr/local/go/src/net/http/transport.go:1616 +0x26a
github.com/docker/distribution/registry/client/transport.(*onEOFReader).Read(0xc8223e3cc0, 0xc82224a000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
    /go/src/github.com/docker/docker/vendor/src/github.com/docker/distribution/registry/client/transport/transport.go:129 +0x70
github.com/docker/distribution/registry/client/transport.(*httpReadSeeker).Read(0xc821eb9440, 0xc82224a000, 0x8000, 0x8000, 0x4000, 0x0, 0x0)
    /go/src/github.com/docker/docker/vendor/src/github.com/docker/distribution/registry/client/transport/http_reader.go:86 +0xf7
io.copyBuffer(0x7f8ea2f459d0, 0xc8200264c0, 0x7f8ea04943a0, 0xc821eb9440, 0xc82224a000, 0x8000, 0x8000, 0x3ffbc53, 0x0, 0x0)
    /usr/local/go/src/io/io.go:380 +0x247
io.Copy(0x7f8ea2f459d0, 0xc8200264c0, 0x7f8ea04943a0, 0xc821eb9440, 0xc821eb9440, 0x0, 0x0)
    /usr/local/go/src/io/io.go:350 +0x64
github.com/docker/docker/pkg/ioutils.NewCancelReadCloser.func1(0xc8200264c0, 0x7f8ea0494588, 0xc821eb9440, 0x7f8ea0ea35c0, 0xc821af3500, 0xc8223e3d00)
    /go/src/github.com/docker/docker/pkg/ioutils/readers.go:111 +0x99
created by github.com/docker/docker/pkg/ioutils.NewCancelReadCloser
    /go/src/github.com/docker/docker/pkg/ioutils/readers.go:121 +0x2de
@schmunk42

This comment has been minimized.

Contributor

schmunk42 commented Nov 15, 2016

As mentioned above, there are combinations which have less problems, AFAIR I haven't seen this at all on Docker for Mac.

Von meinem iPhone gesendet

Am 15.11.2016 um 06:11 schrieb Aaron Lehmann notifications@github.com:

@cpuguy83: I don't think anything is wrong on Docker's side. It's just waiting for data from the server and not receiving any. This is the relevant part of the stack trace:

goroutine 36665 [runnable]:
syscall.Syscall(0x0, 0x39, 0xc8215f0c64, 0x539c, 0x539c, 0x539c, 0x0)
/usr/local/go/src/syscall/asm_linux_amd64.s:18 +0x5
syscall.read(0x39, 0xc8215f0c64, 0x539c, 0x539c, 0x72, 0x0, 0x0)
/usr/local/go/src/syscall/zsyscall_linux_amd64.go:783 +0x5f
syscall.Read(0x39, 0xc8215f0c64, 0x539c, 0x539c, 0xffffffffffffff01, 0x0, 0x0)
/usr/local/go/src/syscall/syscall_unix.go:161 +0x4d
net.(_netFD).Read(0xc8216800e0, 0xc8215f0c64, 0x539c, 0x539c, 0x0, 0x0, 0x0)
/usr/local/go/src/net/fd_unix.go:246 +0x18b
net.(_conn).Read(0xc820154b70, 0xc8215f0c64, 0x539c, 0x539c, 0x0, 0x0, 0x0)
/usr/local/go/src/net/net.go:172 +0xe4
crypto/tls.(_block).readFromUntil(0xc820b447b0, 0x7f8ea2f37f70, 0xc820154b70, 0x401d, 0x0, 0x0)
/usr/local/go/src/crypto/tls/conn.go:460 +0xcc
crypto/tls.(_Conn).readRecord(0xc821bf2300, 0x2231817, 0x0, 0x0)
/usr/local/go/src/crypto/tls/conn.go:607 +0x132f
crypto/tls.(_Conn).Read(0xc821bf2300, 0xc82224a000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
/usr/local/go/src/crypto/tls/conn.go:939 +0x167
net/http.noteEOFReader.Read(0x7f8ea2f38610, 0xc821bf2300, 0xc8221218c8, 0xc82224a000, 0x8000, 0x8000, 0x4000, 0x0, 0x0)
/usr/local/go/src/net/http/transport.go:1690 +0x67
net/http.(_noteEOFReader).Read(0xc8223e3800, 0xc82224a000, 0x8000, 0x8000, 0xac00004000, 0x0, 0x0)
:284 +0xd0
bufio.(_Reader).Read(0xc82126c600, 0xc82224a000, 0x8000, 0x8000, 0xc821d03080, 0x0, 0x0)
/usr/local/go/src/bufio/bufio.go:197 +0x126
io.(_LimitedReader).Read(0xc8223e3c80, 0xc82224a000, 0x8000, 0x8000, 0xc8229e1c68, 0x0, 0x0)
/usr/local/go/src/io/io.go:426 +0xbd
net/http.(_body).readLocked(0xc822313a80, 0xc82224a000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
/usr/local/go/src/net/http/transfer.go:651 +0xa1
net/http.(_body).Read(0xc822313a80, 0xc82224a000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
/usr/local/go/src/net/http/transfer.go:643 +0x110
net/http.(_bodyEOFSignal).Read(0xc822313ac0, 0xc82224a000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
/usr/local/go/src/net/http/transport.go:1616 +0x26a
github.com/docker/distribution/registry/client/transport.(_onEOFReader).Read(0xc8223e3cc0, 0xc82224a000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
/go/src/github.com/docker/docker/vendor/src/github.com/docker/distribution/registry/client/transport/transport.go:129 +0x70
github.com/docker/distribution/registry/client/transport.(*httpReadSeeker).Read(0xc821eb9440, 0xc82224a000, 0x8000, 0x8000, 0x4000, 0x0, 0x0)
/go/src/github.com/docker/docker/vendor/src/github.com/docker/distribution/registry/client/transport/http_reader.go:86 +0xf7
io.copyBuffer(0x7f8ea2f459d0, 0xc8200264c0, 0x7f8ea04943a0, 0xc821eb9440, 0xc82224a000, 0x8000, 0x8000, 0x3ffbc53, 0x0, 0x0)
/usr/local/go/src/io/io.go:380 +0x247
io.Copy(0x7f8ea2f459d0, 0xc8200264c0, 0x7f8ea04943a0, 0xc821eb9440, 0xc821eb9440, 0x0, 0x0)
/usr/local/go/src/io/io.go:350 +0x64
github.com/docker/docker/pkg/ioutils.NewCancelReadCloser.func1(0xc8200264c0, 0x7f8ea0494588, 0xc821eb9440, 0x7f8ea0ea35c0, 0xc821af3500, 0xc8223e3d00)
/go/src/github.com/docker/docker/pkg/ioutils/readers.go:111 +0x99
created by github.com/docker/docker/pkg/ioutils.NewCancelReadCloser
/go/src/github.com/docker/docker/pkg/ioutils/readers.go:121 +0x2de

You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@schmunk42

This comment has been minimized.

Contributor

schmunk42 commented Nov 30, 2016

I debugged this (one more time) today...

  • disabled all custom routing in our network -> no change ... several hosts can pull without hangs in our network 💥
  • tried to pull images while connected to a VPN -> no hangs (so far), but slow since the VPN has limited bandwith 😕
  • created a custom Docker host based on Debian with vagrant -> also no hangs at all from the same machine (also VirtualBox) 🎉

Looks to me that this is a Boot2Docker issue, despite the working VPN version, also because this is not happening on Docker-for-Mac, which runs on Alpine.

CC: @handcode

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented Dec 1, 2016

@schmunk42 I see you mention vagrant working, but perhaps worth checking #27734 (and https://www.virtualbox.org/ticket/16084)

@schmunk42

This comment has been minimized.

Contributor

schmunk42 commented Feb 9, 2017

Fixed for me with Vagrant 1.9.1 & VirtualBox 5.1

Thanks @thaJeztah

CC: @handcode, @motin, @Quexer69

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented Feb 14, 2017

Good to hear @schmunk42 !

@kojiromike is this still an issue for you?

@kojiromike

This comment has been minimized.

Contributor

kojiromike commented Feb 17, 2017

Nope, I haven't seen it in a while.

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented Feb 17, 2017

Let me go ahead and close this then, thanks!

@thaJeztah thaJeztah closed this Feb 17, 2017

@clifton

This comment has been minimized.

clifton commented Mar 27, 2017

FWIW I'm running latest docker for mac beta and this happens to me on a regular basis (usually when pulling from amazon ecr).

@jazoom

This comment has been minimized.

jazoom commented Apr 18, 2017

Docker for Windows (beta). Happening a lot pulling from GitLab.

@samdark

This comment has been minimized.

samdark commented May 4, 2017

Having the same issue. Docker for Windows.

@schmunk42

This comment has been minimized.

Contributor

schmunk42 commented May 4, 2017

@samdark Which storage driver do you use? docker info
I've seen this mostly on aufs, maybe it's worth to build a machine with docker-machine and overlay2.

@samdark

This comment has been minimized.

samdark commented May 4, 2017

After about 10 trials with partial failures and docker restarting it worked out. It's not my image so I can't really rebuild it.

@Redundancy

This comment has been minimized.

Redundancy commented May 4, 2017

This was just happening for me in docker windows. Stopping and starting again restarts all the layer downloads, including the previously completed layers. It took around 7 attempts to get node:6

@friism

This comment has been minimized.

Contributor

friism commented May 4, 2017

@samdark @Redundancy if using Docker for Windows, if you haven't already, can you please run the built-in diagnostics and open issues with details that reproduces the problem (eg. what image are you pulling and so on) on the Docker for Windows issue tracker: https://github.com/docker/for-win/issues/new

@samdark

This comment has been minimized.

samdark commented May 4, 2017

@friism already commented there. Thanks for pointers.

@thaJeztah

This comment has been minimized.

Member

thaJeztah commented May 4, 2017

Let me lock the conversation on this issue, because commenting on closed issues can easily lead to those comments being overlooked; also, important information may be missing in the comments (information that's requested in the issue template)

If you're running into this, open a new issue 👍

@moby moby locked and limited conversation to collaborators May 4, 2017

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.