New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent pulls fail #9718

Closed
jhspaybar opened this Issue Dec 18, 2014 · 80 comments

Comments

Projects
None yet
@jhspaybar
Contributor

jhspaybar commented Dec 18, 2014

This issue seems to be triggered while pulling multiple images which map up to an identical base image. A reproducible example is below. This has been reproduced on Ubuntu 14.04 with DeviceMapper and the Amazon ECS-Optimized Amazon Linux AMI. This has also been reproduced on Docker 1.3.3 and Docker 1.4.1.

    [ec2-user@ip-172-31-35-232 ~]$ docker version
    Client version: 1.4.1
    Client API version: 1.16
    Go version (client): go1.3.3
    Git commit (client): 5bc2ff8
    OS/Arch (client): linux/amd64
    Server version: 1.4.1
    Server API version: 1.16
    Go version (server): go1.3.3
    Git commit (server): 5bc2ff8
    [ec2-user@ip-172-31-35-232 ~]$ 
    [ec2-user@ip-172-31-35-232 ~]$ docker -D info
    Containers: 0
    Images: 0
    Storage Driver: devicemapper
    Pool Name: docker-202:1-1835239-pool
    Pool Blocksize: 65.54 kB
    Data file: /var/lib/docker/devicemapper/devicemapper/data
    Metadata file: /var/lib/docker/devicemapper/devicemapper/metadata
    Data Space Used: 307.2 MB
    Data Space Total: 107.4 GB
    Metadata Space Used: 725 kB
    Metadata Space Total: 2.147 GB
    Library Version: 1.02.82-git (2013-10-04)
    Execution Driver: native-0.2
    Kernel Version: 3.14.26-24.46.amzn1.x86_64
    Operating System: Amazon Linux AMI 2014.09
    CPUs: 1
    Total Memory: 1.957 GiB
    Name: ip-172-31-35-232
    ID: 2RCA:OSRS:4M6S:XHJG:MAUC:IYAA:KIGE:K2HM:VBRF:SCXN:LTRD:T3WP
    Debug mode (server): false
    Debug mode (client): true
    Fds: 57
    Goroutines: 58
    EventsListeners: 0
    Init Path: /usr/bin/docker
    Docker Root Dir: /var/lib/docker
    [ec2-user@ip-172-31-35-232 ~]$ uname -a
    Linux ip-172-31-35-232 3.14.26-24.46.amzn1.x86_64 #1 SMP Wed Dec 10 10:02:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
    [ec2-user@ip-172-31-35-232 ~]$ docker images
    REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
    [ec2-user@ip-172-31-35-232 ~]$ cat repro
    docker run nginx &
    docker run mysql &
    [ec2-user@ip-172-31-35-232 ~]$ sh repro
    [ec2-user@ip-172-31-35-232 ~]$ Unable to find image 'mysql:latest' locally
    Unable to find image 'nginx:latest' locally
    mysql:latest: The image you are pulling has been verified

    f10807909bc5: Pulling fs layer
    f6fab3b798be: Pulling fs layer
    d832c6f40cc3: Pulling fs layer
    cfab4f0d8972: Pulling fs layer
    cf8221608a63: Pulling fs layer
    4755e012c4cf: Pulling fs layer
    e012a865bac8: Pulling fs layer
    e2778dfe9ae9: Pulling fs layer
    b15206a696ec: Pulling fs layer
    9280b0f1af9a: Pulling fs layer
    9f1c487e78cd: Pulling fs layer
    10491040baf9: Pulling fs layer
    d8978d323757: Pulling fs layer
    f10807909bc5: Download complete
    f10807909bc5: Extracting [=================================================> ] 89.69 MB/90.17 MB 0
    d832c6f40cc3: Download complete
    f10807909bc5: Pull complete
    f10807909bc5: Download complete
    f6fab3b798be: Error downloading dependent layers
    d21beea329f5: Download complete
    04499cf33a0e: Download complete
    b15206a696ec: Downloading [====> ] 10.8 MB/118.7 MB 19s
    b15206a696ec: Download complete
    92360dceda1a: Download complete
    9495a555677d: Download complete
    911e2df03930: Download complete
    88daddf047c6: Download complete
    db8a64f62509: Download complete
    e46b3488b010: Download complete
    e46b3488b010: Error pulling image (latest) from nginx, Driver devicemapper failed to create image rootfs f10807909bc552de261ca7463effc467600c3dca68d8e7704425283a6911d2ca: device f10807909bc552de261ca7463effc467600c3dca68d8e7704425283a6911d2ca already exists 3dca68d8e7704425283a6911d2ca already exists
    FATA[0032] Error pulling image (latest) from nginx, Driver devicemapper failed to create image rootfs f10807909bc552de261ca7463effc467600c3dca68d8e7704425283a6911d2ca: device f10807909bc552dea385835786ac: Error pulling image (latest) from mysql, Error mounting '/dev/mapper/docker-202:1-1835239-f6fab3b798be3174f45aa1eb731f8182705555f89c9026d8c1ef230cbf8301dd' on '/var/lib/docker/devicemapper/mnt/f6fab3b798be3174f45aa1eb731f8182705555f89c9026d8c1ef230cbf8301dd': invalid argument 9c9026d8c1ef230cbf8301dd': invalid argument
    FATA[0034] Error pulling image (latest) from mysql, Error mounting '/dev/mapper/docker-202:1-1835239-f6fab3b798be3174f45aa1eb731f8182705555f89c9026d8c1ef230cbf8301dd' on '/var/lib/docker/devicemapper/mnt/f6fab3b798be3174f45aa1eb731f8182705555f89c9026d8c1ef230cbf8301dd': invalid argument
    ^C
    [ec2-user@ip-172-31-35-232 ~]$ docker ps -a
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    [ec2-user@ip-172-31-35-232 ~]$ docker images
    REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
    <none> <none> f10807909bc5 5 weeks ago 85.1 MB
@unclejack

This comment has been minimized.

Show comment
Hide comment
@unclejack

unclejack Dec 18, 2014

Contributor

@alexlarsson @rhvgoyal @snitm @vbatts This seems to be caused by the devicemapper driver itself. Do you happen to have an idea what might be causing this?

Contributor

unclejack commented Dec 18, 2014

@alexlarsson @rhvgoyal @snitm @vbatts This seems to be caused by the devicemapper driver itself. Do you happen to have an idea what might be causing this?

@elgalu

This comment has been minimized.

Show comment
Hide comment
@elgalu

elgalu Dec 18, 2014

Contributor

I can recreate (kind of random). If I retry the docker build, at some point it works ok.
Host:

  • Ubuntu 14.04.1 LTS
  • Docker version 1.4.1, build 5bc2ff8
Contributor

elgalu commented Dec 18, 2014

I can recreate (kind of random). If I retry the docker build, at some point it works ok.
Host:

  • Ubuntu 14.04.1 LTS
  • Docker version 1.4.1, build 5bc2ff8
@snitm

This comment has been minimized.

Show comment
Hide comment
@snitm

snitm Dec 18, 2014

Contributor

@unclejack I'll take a closer look tomorrow, thanks for letting me know!

Contributor

snitm commented Dec 18, 2014

@unclejack I'll take a closer look tomorrow, thanks for letting me know!

@snitm

This comment has been minimized.

Show comment
Hide comment
@snitm

snitm Dec 18, 2014

Contributor

(my immediate thought is: shouldn't existing locking enable one of the pulls to win the race for creating the base image, etc? we'll see...)

Contributor

snitm commented Dec 18, 2014

(my immediate thought is: shouldn't existing locking enable one of the pulls to win the race for creating the base image, etc? we'll see...)

@jhspaybar

This comment has been minimized.

Show comment
Hide comment
@jhspaybar

jhspaybar Dec 18, 2014

Contributor

https://github.com/docker/docker/blob/master/daemon/graphdriver/devmapper/deviceset.go#L1101

This is where the error surfaces. The reason why the locks don't work is that they happen inside this function, but the multiple pulls are entering this method since the read much higher in the stack finds nothing for both pulls. So, it is locking it properly, but potentially returning an erroneous error. I change line 1101 to return nil and it made it a bit deeper and then ended up with some issues around container creation. This might be the right path, but requires more than just the 1 line change I tested.

Contributor

jhspaybar commented Dec 18, 2014

https://github.com/docker/docker/blob/master/daemon/graphdriver/devmapper/deviceset.go#L1101

This is where the error surfaces. The reason why the locks don't work is that they happen inside this function, but the multiple pulls are entering this method since the read much higher in the stack finds nothing for both pulls. So, it is locking it properly, but potentially returning an erroneous error. I change line 1101 to return nil and it made it a bit deeper and then ended up with some issues around container creation. This might be the right path, but requires more than just the 1 line change I tested.

@metalivedev

This comment has been minimized.

Show comment
Hide comment
@metalivedev

metalivedev Dec 18, 2014

Contributor

With some help, I've reproduced this as well.
Here is an example script, tested on an Ubuntu 14.04 droplet on Digital Ocean:

#!/bin/bash
wget https://get.docker.com/builds/Linux/x86_64/docker-latest -O /usr/bin/docker
chmod +x /usr/bin/docker 
apt-get update && apt-get install cgroup-lite
docker -dD -s devicemapper > /var/log/docker.log 2>&1 &
docker kill $(docker ps -q)
docker rm $(docker ps -aq)
docker rmi $(docker images -q)
# This runs both at the same time
docker run nginx & docker run mysql &

With results like those quoted above:

a385835786ac: Error pulling image (latest) from mysql, Image f6fab3b798be3174f45aa1eb731f8182705555f89c9026d8c1ef230cbf8301dd already exiFATA[0010] Error pulling image (latest) from mysql, Image f6fab3b798be3174f45aa1eb731f8182705555f89c9026d8c1ef230cbf8301dd already exists
e46b3488b010: Error pulling image (latest) from nginx, endpoint: https://registry-1.docker.io/v1/, Driver devicemapper failed to create ie46b3488b010: Error pulling image (latest) from nginx, Driver devicemapper failed to create image rootfs 04499cf33a0ea15a4274f2fe7e66789eFATA[0055] Error pulling image (latest) from nginx, Driver devicemapper failed to create image rootfs 04499cf33a0ea15a4274f2fe7e66789ec031f08ae64623c68d3c38aeacd5ecb6: device 04499cf33a0ea15a4274f2fe7e66789ec031f08ae64623c68d3c38aeacd5ecb6 already exists 

I seem to be able to reproduce this even if I've pre-downloaded the images.

Contributor

metalivedev commented Dec 18, 2014

With some help, I've reproduced this as well.
Here is an example script, tested on an Ubuntu 14.04 droplet on Digital Ocean:

#!/bin/bash
wget https://get.docker.com/builds/Linux/x86_64/docker-latest -O /usr/bin/docker
chmod +x /usr/bin/docker 
apt-get update && apt-get install cgroup-lite
docker -dD -s devicemapper > /var/log/docker.log 2>&1 &
docker kill $(docker ps -q)
docker rm $(docker ps -aq)
docker rmi $(docker images -q)
# This runs both at the same time
docker run nginx & docker run mysql &

With results like those quoted above:

a385835786ac: Error pulling image (latest) from mysql, Image f6fab3b798be3174f45aa1eb731f8182705555f89c9026d8c1ef230cbf8301dd already exiFATA[0010] Error pulling image (latest) from mysql, Image f6fab3b798be3174f45aa1eb731f8182705555f89c9026d8c1ef230cbf8301dd already exists
e46b3488b010: Error pulling image (latest) from nginx, endpoint: https://registry-1.docker.io/v1/, Driver devicemapper failed to create ie46b3488b010: Error pulling image (latest) from nginx, Driver devicemapper failed to create image rootfs 04499cf33a0ea15a4274f2fe7e66789eFATA[0055] Error pulling image (latest) from nginx, Driver devicemapper failed to create image rootfs 04499cf33a0ea15a4274f2fe7e66789ec031f08ae64623c68d3c38aeacd5ecb6: device 04499cf33a0ea15a4274f2fe7e66789ec031f08ae64623c68d3c38aeacd5ecb6 already exists 

I seem to be able to reproduce this even if I've pre-downloaded the images.

@jessfraz jessfraz added this to the 1.4.2 milestone Dec 18, 2014

@unclejack

This comment has been minimized.

Show comment
Hide comment
@unclejack

unclejack Dec 19, 2014

Contributor

@snitm Do you need some more information regarding this? Is there anything else we can do to help with this?

Contributor

unclejack commented Dec 19, 2014

@snitm Do you need some more information regarding this? Is there anything else we can do to help with this?

@snitm

This comment has been minimized.

Show comment
Hide comment
@snitm

snitm Dec 19, 2014

Contributor

Honestly, this may be a task for @vbatts given there appears to be a fundamental concurrency problem with the devicemapper graph driver but I'll take a closer look now (was busy earlier today fixing a kernel issue).

Contributor

snitm commented Dec 19, 2014

Honestly, this may be a task for @vbatts given there appears to be a fundamental concurrency problem with the devicemapper graph driver but I'll take a closer look now (was busy earlier today fixing a kernel issue).

@snitm

This comment has been minimized.

Show comment
Hide comment
@snitm

snitm Dec 19, 2014

Contributor

I've reproduced the issue. Looking at code now.

Contributor

snitm commented Dec 19, 2014

I've reproduced the issue. Looking at code now.

@snitm

This comment has been minimized.

Show comment
Hide comment
@snitm

snitm Dec 19, 2014

Contributor

the base image isn't being properly established so any attempt to create snapshots of it are failing:

docker run nginx & docker run nginx &

...

Dec 19 16:16:20 rhel-storage-02 kernel: device-mapper: thin: Creation of new snapshot 14 of device 1 failed.
Dec 19 16:16:20 rhel-storage-02 docker-1.4.1: time="2014-12-19T16:16:20-05:00" level="error" msg="Error from V2 registry: Driver devicemapper failed to create image rootfs f10807909bc552de261ca7463effc467600c3dca68d8e7704425283a6911d2ca: Error running DeviceCreate (createSnapDevice) dm_task_run failed"

Dec 19 16:16:25 rhel-storage-02 kernel: device-mapper: thin: Creation of new snapshot 15 of device 1 failed.
Dec 19 16:16:25 rhel-storage-02 docker-1.4.1: No such image: nginx (tag: latest)

trying dmsetup manually:

dmsetup message thin-pool 0 create_snap 14 1

device-mapper: message ioctl on thin-pool failed: No data available
Command failed
Dec 19 16:37:42 rhel-storage-02 kernel: device-mapper: thin: Creation of new snapshot 14 of device 1 failed.

dmsetup message thin-pool 0 create_snap 15 1

device-mapper: message ioctl on thin-pool failed: No data available
Command failed
Dec 19 16:37:49 rhel-storage-02 kernel: device-mapper: thin: Creation of new snapshot 15 of device 1 failed.

But if I first create the device id 1 all is fine:

dmsetup message thin-pool 0 create_thin 1

echo $?

0

dmsetup message thin-pool 0 create_snap 15 1

echo $?

0

So like I mentioned yesterday (above) I'd have thought one of these competing image pulls would've won the race to establish the base image (via deviceset.go:setupBaseImage)

Is docker even calling NewDeviceSet() with doInit=True in this test case?

Contributor

snitm commented Dec 19, 2014

the base image isn't being properly established so any attempt to create snapshots of it are failing:

docker run nginx & docker run nginx &

...

Dec 19 16:16:20 rhel-storage-02 kernel: device-mapper: thin: Creation of new snapshot 14 of device 1 failed.
Dec 19 16:16:20 rhel-storage-02 docker-1.4.1: time="2014-12-19T16:16:20-05:00" level="error" msg="Error from V2 registry: Driver devicemapper failed to create image rootfs f10807909bc552de261ca7463effc467600c3dca68d8e7704425283a6911d2ca: Error running DeviceCreate (createSnapDevice) dm_task_run failed"

Dec 19 16:16:25 rhel-storage-02 kernel: device-mapper: thin: Creation of new snapshot 15 of device 1 failed.
Dec 19 16:16:25 rhel-storage-02 docker-1.4.1: No such image: nginx (tag: latest)

trying dmsetup manually:

dmsetup message thin-pool 0 create_snap 14 1

device-mapper: message ioctl on thin-pool failed: No data available
Command failed
Dec 19 16:37:42 rhel-storage-02 kernel: device-mapper: thin: Creation of new snapshot 14 of device 1 failed.

dmsetup message thin-pool 0 create_snap 15 1

device-mapper: message ioctl on thin-pool failed: No data available
Command failed
Dec 19 16:37:49 rhel-storage-02 kernel: device-mapper: thin: Creation of new snapshot 15 of device 1 failed.

But if I first create the device id 1 all is fine:

dmsetup message thin-pool 0 create_thin 1

echo $?

0

dmsetup message thin-pool 0 create_snap 15 1

echo $?

0

So like I mentioned yesterday (above) I'd have thought one of these competing image pulls would've won the race to establish the base image (via deviceset.go:setupBaseImage)

Is docker even calling NewDeviceSet() with doInit=True in this test case?

@snitm

This comment has been minimized.

Show comment
Hide comment
@snitm

snitm Dec 19, 2014

Contributor

I'm not sure how I got into that state... but the first thing docker does on the very first start is the equivalent of: dmsetup message 0 create_thin 1

I feel like I'm in the twilight zone...

Anyway, here is /var/log/messages (with dm-thin-pool kernel code modified to log all "dmsetup message"s received) after a fresh removal of /var/lib/docker, a fresh recreate of /dev/mapper/thin-pool, and a new start of docker followed by:

docker run nginx & docker run mysql &

Dec 19 17:19:22 rhel-storage-02 systemd: Starting Docker Socket for the API.
Dec 19 17:19:22 rhel-storage-02 systemd: Listening on Docker Socket for the API.
Dec 19 17:19:22 rhel-storage-02 systemd: Starting Docker Application Container Engine...
Dec 19 17:19:23 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:19:23-05:00" level="info" msg="+job serveapi(fd://)"
Dec 19 17:19:23 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:19:23-05:00" level="info" msg="Listening for HTTP on fd ()"
Dec 19 17:19:23 rhel-storage-02 kernel: device-mapper: thin: pool_message: received:
Dec 19 17:19:23 rhel-storage-02 kernel: create_thin 1
Dec 19 17:19:23 rhel-storage-02 kernel: device-mapper: thin: pool_message: received:
Dec 19 17:19:23 rhel-storage-02 kernel: set_transaction_id 0 1
Dec 19 17:19:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:19:35-05:00" level="info" msg="+job init_networkdriver()"
Dec 19 17:19:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:19:35-05:00" level="info" msg="-job init_networkdriver() = OK (0)"
Dec 19 17:19:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:19:35-05:00" level="info" msg="Loading containers: start."
Dec 19 17:19:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:19:35-05:00" level="info" msg="Loading containers: done."
Dec 19 17:19:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:19:35-05:00" level="info" msg="docker daemon: 1.4.1 5bc2ff8-dirty; execdriver: native-0.2; graphdriver: devicemapper"
Dec 19 17:19:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:19:35-05:00" level="info" msg="+job acceptconnections()"
[root@rhel-storage-02 ~]# Dec 19 17:19:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:19:35-05:00" level="info" msg="-job acceptconnections() = OK (0)"
Dec 19 17:19:35 rhel-storage-02 systemd: Started Docker Application Container Engine.
Dec 19 17:21:34 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:34-05:00" level="info" msg="POST /v1.15/containers/create"
Dec 19 17:21:34 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:34-05:00" level="info" msg="+job create()"
Dec 19 17:21:34 rhel-storage-02 docker-1.4.1: No such image: nginx (tag: latest)
Dec 19 17:21:34 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:34-05:00" level="info" msg="-job create() = ERR (1)"
Dec 19 17:21:34 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:34-05:00" level="error" msg="Handler for POST /containers/create returned error: No such image: nginx (tag: latest)"
Dec 19 17:21:34 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:34-05:00" level="error" msg="HTTP Error: statusCode=404 No such image: nginx (tag: latest)"
Dec 19 17:21:34 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:34-05:00" level="info" msg="POST /v1.15/images/create?fromImage=nginx&tag=latest"
Dec 19 17:21:34 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:34-05:00" level="info" msg="+job pull(nginx, latest)"
Dec 19 17:21:34 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:34-05:00" level="info" msg="+job trust_update_base()"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="info" msg="-job trust_update_base() = OK (0)"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="info" msg="POST /v1.15/containers/create"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="info" msg="+job create()"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: No such image: mysql (tag: latest)
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="info" msg="-job create() = ERR (1)"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="error" msg="Handler for POST /containers/create returned error: No such image: mysql (tag: latest)"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="error" msg="HTTP Error: statusCode=404 No such image: mysql (tag: latest)"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="info" msg="POST /v1.15/images/create?fromImage=mysql&tag=latest"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="info" msg="+job pull(mysql, latest)"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="info" msg="+job trust_update_base()"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="info" msg="+job trust_key_check(/library/nginx)"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="info" msg="-job trust_update_base() = OK (0)"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="info" msg="-job trust_key_check(/library/nginx) = OK (0)"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="info" msg="+job trust_key_check(/library/mysql)"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="info" msg="-job trust_key_check(/library/mysql) = OK (0)"
Dec 19 17:21:37 rhel-storage-02 kernel: device-mapper: thin: pool_message: received:
Dec 19 17:21:37 rhel-storage-02 kernel: create_snap 2 1
Dec 19 17:21:37 rhel-storage-02 kernel: device-mapper: thin: pool_message: received:
Dec 19 17:21:37 rhel-storage-02 kernel: set_transaction_id 1 2
Dec 19 17:21:37 rhel-storage-02 kernel: XFS (dm-8): Mounting V4 Filesystem
Dec 19 17:21:37 rhel-storage-02 kernel: XFS (dm-8): Ending clean mount
Dec 19 17:22:43 rhel-storage-02 kernel: device-mapper: thin: pool_message: received:
Dec 19 17:22:43 rhel-storage-02 kernel: create_snap 3 2
Dec 19 17:22:43 rhel-storage-02 kernel: device-mapper: thin: pool_message: received:
Dec 19 17:22:43 rhel-storage-02 kernel: set_transaction_id 2 3
Dec 19 17:22:43 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:22:43-05:00" level="error" msg="Error from V2 registry: Driver devicemapper failed to create image rootfs d832c6f40cc364b7c6426a5dde8caab40650e443f6de6a7b1f40f653d5cc2a8f: Unknown device f6fab3b798be3174f45aa1eb731f8182705555f89c9026d8c1ef230cbf8301dd"
Dec 19 17:22:43 rhel-storage-02 kernel: XFS (dm-8): Mounting V4 Filesystem
Dec 19 17:22:43 rhel-storage-02 kernel: XFS (dm-8): Ending clean mount
Dec 19 17:22:44 rhel-storage-02 kernel: XFS (dm-9): Mounting V4 Filesystem
Dec 19 17:22:44 rhel-storage-02 kernel: XFS (dm-9): Ending clean mount
Dec 19 17:22:50 rhel-storage-02 kernel: device-mapper: thin: pool_message: received:
Dec 19 17:22:50 rhel-storage-02 kernel: delete 3
Dec 19 17:22:50 rhel-storage-02 kernel: device-mapper: thin: pool_message: received:
Dec 19 17:22:50 rhel-storage-02 kernel: set_transaction_id 3 4
Dec 19 17:22:50 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:22:50-05:00" level="error" msg="Error from V2 registry: Driver devicemapper failed to create image rootfs f6fab3b798be3174f45aa1eb731f8182705555f89c9026d8c1ef230cbf8301dd: Unknown device f10807909bc552de261ca7463effc467600c3dca68d8e7704425283a6911d2ca"
Dec 19 17:22:50 rhel-storage-02 kernel: device-mapper: thin: pool_message: received:
Dec 19 17:22:50 rhel-storage-02 kernel: create_snap 4 2
Dec 19 17:22:50 rhel-storage-02 kernel: device-mapper: thin: pool_message: received:
Dec 19 17:22:50 rhel-storage-02 kernel: set_transaction_id 4 5
Dec 19 17:22:50 rhel-storage-02 kernel: XFS (dm-8): Mounting V4 Filesystem
Dec 19 17:22:50 rhel-storage-02 kernel: XFS (dm-8): Ending clean mount
Dec 19 17:23:02 rhel-storage-02 kernel: XFS (dm-9): Mounting V4 Filesystem
Dec 19 17:23:02 rhel-storage-02 kernel: XFS (dm-9): Ending clean mount
Dec 19 17:23:08 rhel-storage-02 kernel: device-mapper: thin: pool_message: received:
Dec 19 17:23:08 rhel-storage-02 kernel: delete 4
Dec 19 17:23:08 rhel-storage-02 kernel: device-mapper: thin: pool_message: received:
Dec 19 17:23:08 rhel-storage-02 kernel: set_transaction_id 5 6
Dec 19 17:23:08 rhel-storage-02 docker-1.4.1: Error pulling image (latest) from mysql, rename /var/lib/docker/graph/_tmp/7870829cf5d08d7efa51320c50bb053e9825d7757bab8accaec3be4ed33ac4c2 /var/lib/docker/graph/f10807909bc552de261ca7463effc467600c3dca68d8e7704425283a6911d2ca: file exists
Dec 19 17:23:08 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:23:08-05:00" level="info" msg="-job pull(mysql, latest) = ERR (1)"
Dec 19 17:23:09 rhel-storage-02 docker-1.4.1: Error pulling image (latest) from nginx, Driver devicemapper failed to create image rootfs f6fab3b798be3174f45aa1eb731f8182705555f89c9026d8c1ef230cbf8301dd: Unknown device f10807909bc552de261ca7463effc467600c3dca68d8e7704425283a6911d2ca
Dec 19 17:23:09 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:23:09-05:00" level="info" msg="-job pull(nginx, latest) = ERR (1)"

The "Error pulling image (latest) from mysql, rename ..." at 17:23:08 was odd...

Contributor

snitm commented Dec 19, 2014

I'm not sure how I got into that state... but the first thing docker does on the very first start is the equivalent of: dmsetup message 0 create_thin 1

I feel like I'm in the twilight zone...

Anyway, here is /var/log/messages (with dm-thin-pool kernel code modified to log all "dmsetup message"s received) after a fresh removal of /var/lib/docker, a fresh recreate of /dev/mapper/thin-pool, and a new start of docker followed by:

docker run nginx & docker run mysql &

Dec 19 17:19:22 rhel-storage-02 systemd: Starting Docker Socket for the API.
Dec 19 17:19:22 rhel-storage-02 systemd: Listening on Docker Socket for the API.
Dec 19 17:19:22 rhel-storage-02 systemd: Starting Docker Application Container Engine...
Dec 19 17:19:23 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:19:23-05:00" level="info" msg="+job serveapi(fd://)"
Dec 19 17:19:23 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:19:23-05:00" level="info" msg="Listening for HTTP on fd ()"
Dec 19 17:19:23 rhel-storage-02 kernel: device-mapper: thin: pool_message: received:
Dec 19 17:19:23 rhel-storage-02 kernel: create_thin 1
Dec 19 17:19:23 rhel-storage-02 kernel: device-mapper: thin: pool_message: received:
Dec 19 17:19:23 rhel-storage-02 kernel: set_transaction_id 0 1
Dec 19 17:19:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:19:35-05:00" level="info" msg="+job init_networkdriver()"
Dec 19 17:19:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:19:35-05:00" level="info" msg="-job init_networkdriver() = OK (0)"
Dec 19 17:19:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:19:35-05:00" level="info" msg="Loading containers: start."
Dec 19 17:19:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:19:35-05:00" level="info" msg="Loading containers: done."
Dec 19 17:19:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:19:35-05:00" level="info" msg="docker daemon: 1.4.1 5bc2ff8-dirty; execdriver: native-0.2; graphdriver: devicemapper"
Dec 19 17:19:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:19:35-05:00" level="info" msg="+job acceptconnections()"
[root@rhel-storage-02 ~]# Dec 19 17:19:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:19:35-05:00" level="info" msg="-job acceptconnections() = OK (0)"
Dec 19 17:19:35 rhel-storage-02 systemd: Started Docker Application Container Engine.
Dec 19 17:21:34 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:34-05:00" level="info" msg="POST /v1.15/containers/create"
Dec 19 17:21:34 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:34-05:00" level="info" msg="+job create()"
Dec 19 17:21:34 rhel-storage-02 docker-1.4.1: No such image: nginx (tag: latest)
Dec 19 17:21:34 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:34-05:00" level="info" msg="-job create() = ERR (1)"
Dec 19 17:21:34 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:34-05:00" level="error" msg="Handler for POST /containers/create returned error: No such image: nginx (tag: latest)"
Dec 19 17:21:34 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:34-05:00" level="error" msg="HTTP Error: statusCode=404 No such image: nginx (tag: latest)"
Dec 19 17:21:34 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:34-05:00" level="info" msg="POST /v1.15/images/create?fromImage=nginx&tag=latest"
Dec 19 17:21:34 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:34-05:00" level="info" msg="+job pull(nginx, latest)"
Dec 19 17:21:34 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:34-05:00" level="info" msg="+job trust_update_base()"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="info" msg="-job trust_update_base() = OK (0)"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="info" msg="POST /v1.15/containers/create"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="info" msg="+job create()"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: No such image: mysql (tag: latest)
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="info" msg="-job create() = ERR (1)"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="error" msg="Handler for POST /containers/create returned error: No such image: mysql (tag: latest)"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="error" msg="HTTP Error: statusCode=404 No such image: mysql (tag: latest)"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="info" msg="POST /v1.15/images/create?fromImage=mysql&tag=latest"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="info" msg="+job pull(mysql, latest)"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="info" msg="+job trust_update_base()"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="info" msg="+job trust_key_check(/library/nginx)"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="info" msg="-job trust_update_base() = OK (0)"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="info" msg="-job trust_key_check(/library/nginx) = OK (0)"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="info" msg="+job trust_key_check(/library/mysql)"
Dec 19 17:21:35 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:21:35-05:00" level="info" msg="-job trust_key_check(/library/mysql) = OK (0)"
Dec 19 17:21:37 rhel-storage-02 kernel: device-mapper: thin: pool_message: received:
Dec 19 17:21:37 rhel-storage-02 kernel: create_snap 2 1
Dec 19 17:21:37 rhel-storage-02 kernel: device-mapper: thin: pool_message: received:
Dec 19 17:21:37 rhel-storage-02 kernel: set_transaction_id 1 2
Dec 19 17:21:37 rhel-storage-02 kernel: XFS (dm-8): Mounting V4 Filesystem
Dec 19 17:21:37 rhel-storage-02 kernel: XFS (dm-8): Ending clean mount
Dec 19 17:22:43 rhel-storage-02 kernel: device-mapper: thin: pool_message: received:
Dec 19 17:22:43 rhel-storage-02 kernel: create_snap 3 2
Dec 19 17:22:43 rhel-storage-02 kernel: device-mapper: thin: pool_message: received:
Dec 19 17:22:43 rhel-storage-02 kernel: set_transaction_id 2 3
Dec 19 17:22:43 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:22:43-05:00" level="error" msg="Error from V2 registry: Driver devicemapper failed to create image rootfs d832c6f40cc364b7c6426a5dde8caab40650e443f6de6a7b1f40f653d5cc2a8f: Unknown device f6fab3b798be3174f45aa1eb731f8182705555f89c9026d8c1ef230cbf8301dd"
Dec 19 17:22:43 rhel-storage-02 kernel: XFS (dm-8): Mounting V4 Filesystem
Dec 19 17:22:43 rhel-storage-02 kernel: XFS (dm-8): Ending clean mount
Dec 19 17:22:44 rhel-storage-02 kernel: XFS (dm-9): Mounting V4 Filesystem
Dec 19 17:22:44 rhel-storage-02 kernel: XFS (dm-9): Ending clean mount
Dec 19 17:22:50 rhel-storage-02 kernel: device-mapper: thin: pool_message: received:
Dec 19 17:22:50 rhel-storage-02 kernel: delete 3
Dec 19 17:22:50 rhel-storage-02 kernel: device-mapper: thin: pool_message: received:
Dec 19 17:22:50 rhel-storage-02 kernel: set_transaction_id 3 4
Dec 19 17:22:50 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:22:50-05:00" level="error" msg="Error from V2 registry: Driver devicemapper failed to create image rootfs f6fab3b798be3174f45aa1eb731f8182705555f89c9026d8c1ef230cbf8301dd: Unknown device f10807909bc552de261ca7463effc467600c3dca68d8e7704425283a6911d2ca"
Dec 19 17:22:50 rhel-storage-02 kernel: device-mapper: thin: pool_message: received:
Dec 19 17:22:50 rhel-storage-02 kernel: create_snap 4 2
Dec 19 17:22:50 rhel-storage-02 kernel: device-mapper: thin: pool_message: received:
Dec 19 17:22:50 rhel-storage-02 kernel: set_transaction_id 4 5
Dec 19 17:22:50 rhel-storage-02 kernel: XFS (dm-8): Mounting V4 Filesystem
Dec 19 17:22:50 rhel-storage-02 kernel: XFS (dm-8): Ending clean mount
Dec 19 17:23:02 rhel-storage-02 kernel: XFS (dm-9): Mounting V4 Filesystem
Dec 19 17:23:02 rhel-storage-02 kernel: XFS (dm-9): Ending clean mount
Dec 19 17:23:08 rhel-storage-02 kernel: device-mapper: thin: pool_message: received:
Dec 19 17:23:08 rhel-storage-02 kernel: delete 4
Dec 19 17:23:08 rhel-storage-02 kernel: device-mapper: thin: pool_message: received:
Dec 19 17:23:08 rhel-storage-02 kernel: set_transaction_id 5 6
Dec 19 17:23:08 rhel-storage-02 docker-1.4.1: Error pulling image (latest) from mysql, rename /var/lib/docker/graph/_tmp/7870829cf5d08d7efa51320c50bb053e9825d7757bab8accaec3be4ed33ac4c2 /var/lib/docker/graph/f10807909bc552de261ca7463effc467600c3dca68d8e7704425283a6911d2ca: file exists
Dec 19 17:23:08 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:23:08-05:00" level="info" msg="-job pull(mysql, latest) = ERR (1)"
Dec 19 17:23:09 rhel-storage-02 docker-1.4.1: Error pulling image (latest) from nginx, Driver devicemapper failed to create image rootfs f6fab3b798be3174f45aa1eb731f8182705555f89c9026d8c1ef230cbf8301dd: Unknown device f10807909bc552de261ca7463effc467600c3dca68d8e7704425283a6911d2ca
Dec 19 17:23:09 rhel-storage-02 docker-1.4.1: time="2014-12-19T17:23:09-05:00" level="info" msg="-job pull(nginx, latest) = ERR (1)"

The "Error pulling image (latest) from mysql, rename ..." at 17:23:08 was odd...

@snitm

This comment has been minimized.

Show comment
Hide comment
@snitm

snitm Dec 19, 2014

Contributor

And retrying 2 additional times yields the same result, e.g.:

Error pulling image (latest) from nginx, rename /var/lib/docker/graph/_tmp/d0f0ac4673f8a8c389385a336a9f2b48460e10abde44ab71c1cef5f631630e58 /var/lib/docker/graph/f10807909bc552de261ca7463effc467600c3dca68d8e7704425283a6911d2ca: file exists [2014-12-19T17:34:59.742552029-05:00] [fatal] log.go:64 Error pulling image (latest) from nginx, rename /var/lib/docker/graph/_tmp/d0f0ac4673f8a8c389385a336a9f2b48460e10abde44ab71c1cef5f631630e58 /var/lib/docker/graph/f10807909bc552de261ca7463effc467600c3dca68d8e770442528a385835786ac: Error pulling image (latest) from mysql, Driver devicemapper failed to create image rootfs f6fab3b798be3174f45aa1eb731f8182705555f89c9026d8c1ef230cbf8301dd: Unknown device f10807909bc552de261ca7463effc467600c3dca68d8e7704425283a6911d2ca [2014-12-19T17:35:00.909822345-05:00] [fatal] log.go:64 Error pulling image (latest) from mysql, Driver devicemapper failed to create image rootfs f6fab3b798be3174f45aa1eb731f8182705555f89c9026d8c1ef230cbf8301dd: Unknown device f10807909bc552de261ca7463effc467600c3dca68d8e7704425283a6911d2ca

I unfortunately need to transition to holiday break mode... these failures seem like docker core isn't driving the requests to the graphdriver properly. Unless I'm missing something that devmapper graphdriver is doing wrong...

Contributor

snitm commented Dec 19, 2014

And retrying 2 additional times yields the same result, e.g.:

Error pulling image (latest) from nginx, rename /var/lib/docker/graph/_tmp/d0f0ac4673f8a8c389385a336a9f2b48460e10abde44ab71c1cef5f631630e58 /var/lib/docker/graph/f10807909bc552de261ca7463effc467600c3dca68d8e7704425283a6911d2ca: file exists [2014-12-19T17:34:59.742552029-05:00] [fatal] log.go:64 Error pulling image (latest) from nginx, rename /var/lib/docker/graph/_tmp/d0f0ac4673f8a8c389385a336a9f2b48460e10abde44ab71c1cef5f631630e58 /var/lib/docker/graph/f10807909bc552de261ca7463effc467600c3dca68d8e770442528a385835786ac: Error pulling image (latest) from mysql, Driver devicemapper failed to create image rootfs f6fab3b798be3174f45aa1eb731f8182705555f89c9026d8c1ef230cbf8301dd: Unknown device f10807909bc552de261ca7463effc467600c3dca68d8e7704425283a6911d2ca [2014-12-19T17:35:00.909822345-05:00] [fatal] log.go:64 Error pulling image (latest) from mysql, Driver devicemapper failed to create image rootfs f6fab3b798be3174f45aa1eb731f8182705555f89c9026d8c1ef230cbf8301dd: Unknown device f10807909bc552de261ca7463effc467600c3dca68d8e7704425283a6911d2ca

I unfortunately need to transition to holiday break mode... these failures seem like docker core isn't driving the requests to the graphdriver properly. Unless I'm missing something that devmapper graphdriver is doing wrong...

@snitm

This comment has been minimized.

Show comment
Hide comment
@snitm

snitm Dec 19, 2014

Contributor

Does docker core put synchronization of 2 competing "docker run"s completely on the graphdriver? If so then maybe devmapper graphdriver has always had inadequate locking? @vbatts any insight on this?

Contributor

snitm commented Dec 19, 2014

Does docker core put synchronization of 2 competing "docker run"s completely on the graphdriver? If so then maybe devmapper graphdriver has always had inadequate locking? @vbatts any insight on this?

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal

rhvgoyal Jan 5, 2015

Contributor

I think this issue is not related to devicemapper. The crux of the issue seems to be that there is a common layer of fs to be downloaded between two images (nignx and mysql). In my tests that layer is "1aeada447715". In latest docker I can see that first nginx instance downloads and extracts that layer and then docker mysql tries to download and extract same layer and then it finds that directory is not
empty and it fails.

Error pulling image (latest) from mysql, rename /var/lib/docker/graph/_tmp/31889e84f5404c06a1b73c39bf57018597a749360999e03a7b17a57978c4237f /var/lib/docker/graph/1aeada4477158496dc31ee5c6e7174240140d83fddf94bc57fc02bee1b04e44f: directory not empty

If two docker instances are run one after another, then second one somehow recognizes that
image layer has already been downloaded and extracted and it does not have to do it again and
things work.

So docker core needs to fix this so that two instances don't try to extract same layer twice and run
into various kind of issues as that layer has already been extracted.

Contributor

rhvgoyal commented Jan 5, 2015

I think this issue is not related to devicemapper. The crux of the issue seems to be that there is a common layer of fs to be downloaded between two images (nignx and mysql). In my tests that layer is "1aeada447715". In latest docker I can see that first nginx instance downloads and extracts that layer and then docker mysql tries to download and extract same layer and then it finds that directory is not
empty and it fails.

Error pulling image (latest) from mysql, rename /var/lib/docker/graph/_tmp/31889e84f5404c06a1b73c39bf57018597a749360999e03a7b17a57978c4237f /var/lib/docker/graph/1aeada4477158496dc31ee5c6e7174240140d83fddf94bc57fc02bee1b04e44f: directory not empty

If two docker instances are run one after another, then second one somehow recognizes that
image layer has already been downloaded and extracted and it does not have to do it again and
things work.

So docker core needs to fix this so that two instances don't try to extract same layer twice and run
into various kind of issues as that layer has already been extracted.

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal

rhvgoyal Jan 5, 2015

Contributor

One can try running two separate commands in separate terminals to clearly see that a common fs layer is being downloaded and extracted twice.

Contributor

rhvgoyal commented Jan 5, 2015

One can try running two separate commands in separate terminals to clearly see that a common fs layer is being downloaded and extracted twice.

@vbatts

This comment has been minimized.

Show comment
Hide comment
@vbatts

vbatts Jan 7, 2015

Contributor

Okay. if this issue is not specific to a graphdriver, but a race in ./graph/ then lets get some tests written for exposing this.

Contributor

vbatts commented Jan 7, 2015

Okay. if this issue is not specific to a graphdriver, but a race in ./graph/ then lets get some tests written for exposing this.

@vbatts vbatts added the area/storage label Jan 7, 2015

evanphx added a commit to vektra/strive that referenced this issue Jan 9, 2015

Serialize any image pulling
There are known issues with concurrent docker pulls. Keep things simple
for now.

moby/moby#9718

@crosbymichael crosbymichael modified the milestones: 1.4.2, 1.5.0 Jan 12, 2015

@crosbymichael crosbymichael removed this from the 1.5.0 milestone Jan 21, 2015

@termie

This comment has been minimized.

Show comment
Hide comment
@termie

termie Mar 10, 2015

Contributor

-subscribe-

Contributor

termie commented Mar 10, 2015

-subscribe-

@JeanMertz

This comment has been minimized.

Show comment
Hide comment
@JeanMertz

JeanMertz May 14, 2015

It has been pretty quiet in this issue for some time now. However, when booting up a fresh CoreOS instance, and pulling/starting several (5+) images at the same time, I am still seeing issues.

A snippet from our logs:

May 14 08:17:06 manager-01 dockerd[967]: time="2015-05-14T08:17:06Z" level=error msg="Error from V2 registry: Image 511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158 already exists"
May 14 08:17:06 manager-01 dockerd[967]: Error pulling image (latest) from gliderlabs/registrator, Driver overlay failed to create image rootfs 46e263e5de56e0bb41d61486776d70185b196d792cbbc1b48ab30effb04e882d: lstat /var/lib/docker/overlay/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158: no such file or directory
May 14 08:17:06 manager-01 dockerd[967]: time="2015-05-14T08:17:06Z" level=error msg="Error from V2 registry: Driver overlay failed to create image rootfs 46e263e5de56e0bb41d61486776d70185b196d792cbbc1b48ab30effb04e882d: lstat /var/lib/docker/overlay/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158: no such file or directory"
May 14 08:17:09 manager-01 dockerd[967]: Error pulling image (latest) from blendle/aws-cli, Driver overlay failed to create image rootfs a8be93256def01920787a50e83e749075ffe63596c36fdd01885f4826074e4dd: lstat /var/lib/docker/overlay/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158: no such file or directory
May 14 08:17:10 manager-01 dockerd[967]: Error pulling image (latest) from progrium/consul, Driver overlay failed to create image rootfs 46e263e5de56e0bb41d61486776d70185b196d792cbbc1b48ab30effb04e882d: lstat /var/lib/docker/overlay/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158: no such file or directory
May 14 08:17:11 manager-01 dockerd[967]: Error pulling image (latest) from gliderlabs/registrator, Driver overlay failed to create image rootfs 46e263e5de56e0bb41d61486776d70185b196d792cbbc1b48ab30effb04e882d: lstat /var/lib/docker/overlay/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158: no such file or directory
May 14 08:17:17 manager-01 dockerd[967]: Error pulling image (latest) from gliderlabs/registrator, Driver overlay failed to create image rootfs 46e263e5de56e0bb41d61486776d70185b196d792cbbc1b48ab30effb04e882d: lstat /var/lib/docker/overlay/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158: no such file or directory
May 14 09:48:18 manager-01 dockerd[967]: Error pulling image (latest) from gliderlabs/registrator, Driver overlay failed to create image rootfs 46e263e5de56e0bb41d61486776d70185b196d792cbbc1b48ab30effb04e882d: lstat /var/lib/docker/overlay/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158: no such file or directory
May 14 09:48:33 manager-01 dockerd[967]: Error pulling image (latest) from gliderlabs/registrator, Driver overlay failed to create image rootfs 46e263e5de56e0bb41d61486776d70185b196d792cbbc1b48ab30effb04e882d: lstat /var/lib/docker/overlay/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158: no such file or directory
% docker info
Containers: 2
Images: 11
Storage Driver: overlay
 Backing Filesystem: extfs
Execution Driver: native-0.2
Kernel Version: 4.0.1
Operating System: CoreOS 668.2.0
CPUs: 2
Total Memory: 7.308 GiB
Name: manager-01
ID: 4T43:7O24:ZTRK:KT5A:OGFC:QLCZ:LHWF:4MRP:2GCA:SWRB:TF45:QDLX
% docker version
Client version: 1.6.0
Client API version: 1.18
Go version (client): go1.4.2
Git commit (client): 4749651-dirty
OS/Arch (client): linux/amd64
Server version: 1.6.0
Server API version: 1.18
Go version (server): go1.4.2
Git commit (server): 4749651-dirty
OS/Arch (server): linux/amd64
% cat /etc/lsb-release 
DISTRIB_ID=CoreOS
DISTRIB_RELEASE=668.2.0
DISTRIB_CODENAME="Red Dog"
DISTRIB_DESCRIPTION="CoreOS 668.2.0"
% docker images
REPOSITORY           TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
newrelic/nrsysmond   latest              ad8731e9cadf        12 days ago         193 MB
busybox              latest              8c2e06607696        3 weeks ago         2.43 MB
<none>               <none>              511136ea3c5a        23 months ago       0 B

It has been pretty quiet in this issue for some time now. However, when booting up a fresh CoreOS instance, and pulling/starting several (5+) images at the same time, I am still seeing issues.

A snippet from our logs:

May 14 08:17:06 manager-01 dockerd[967]: time="2015-05-14T08:17:06Z" level=error msg="Error from V2 registry: Image 511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158 already exists"
May 14 08:17:06 manager-01 dockerd[967]: Error pulling image (latest) from gliderlabs/registrator, Driver overlay failed to create image rootfs 46e263e5de56e0bb41d61486776d70185b196d792cbbc1b48ab30effb04e882d: lstat /var/lib/docker/overlay/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158: no such file or directory
May 14 08:17:06 manager-01 dockerd[967]: time="2015-05-14T08:17:06Z" level=error msg="Error from V2 registry: Driver overlay failed to create image rootfs 46e263e5de56e0bb41d61486776d70185b196d792cbbc1b48ab30effb04e882d: lstat /var/lib/docker/overlay/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158: no such file or directory"
May 14 08:17:09 manager-01 dockerd[967]: Error pulling image (latest) from blendle/aws-cli, Driver overlay failed to create image rootfs a8be93256def01920787a50e83e749075ffe63596c36fdd01885f4826074e4dd: lstat /var/lib/docker/overlay/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158: no such file or directory
May 14 08:17:10 manager-01 dockerd[967]: Error pulling image (latest) from progrium/consul, Driver overlay failed to create image rootfs 46e263e5de56e0bb41d61486776d70185b196d792cbbc1b48ab30effb04e882d: lstat /var/lib/docker/overlay/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158: no such file or directory
May 14 08:17:11 manager-01 dockerd[967]: Error pulling image (latest) from gliderlabs/registrator, Driver overlay failed to create image rootfs 46e263e5de56e0bb41d61486776d70185b196d792cbbc1b48ab30effb04e882d: lstat /var/lib/docker/overlay/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158: no such file or directory
May 14 08:17:17 manager-01 dockerd[967]: Error pulling image (latest) from gliderlabs/registrator, Driver overlay failed to create image rootfs 46e263e5de56e0bb41d61486776d70185b196d792cbbc1b48ab30effb04e882d: lstat /var/lib/docker/overlay/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158: no such file or directory
May 14 09:48:18 manager-01 dockerd[967]: Error pulling image (latest) from gliderlabs/registrator, Driver overlay failed to create image rootfs 46e263e5de56e0bb41d61486776d70185b196d792cbbc1b48ab30effb04e882d: lstat /var/lib/docker/overlay/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158: no such file or directory
May 14 09:48:33 manager-01 dockerd[967]: Error pulling image (latest) from gliderlabs/registrator, Driver overlay failed to create image rootfs 46e263e5de56e0bb41d61486776d70185b196d792cbbc1b48ab30effb04e882d: lstat /var/lib/docker/overlay/511136ea3c5a64f264b78b5433614aec563103b4d4702f3ba7d4d2698e22c158: no such file or directory
% docker info
Containers: 2
Images: 11
Storage Driver: overlay
 Backing Filesystem: extfs
Execution Driver: native-0.2
Kernel Version: 4.0.1
Operating System: CoreOS 668.2.0
CPUs: 2
Total Memory: 7.308 GiB
Name: manager-01
ID: 4T43:7O24:ZTRK:KT5A:OGFC:QLCZ:LHWF:4MRP:2GCA:SWRB:TF45:QDLX
% docker version
Client version: 1.6.0
Client API version: 1.18
Go version (client): go1.4.2
Git commit (client): 4749651-dirty
OS/Arch (client): linux/amd64
Server version: 1.6.0
Server API version: 1.18
Go version (server): go1.4.2
Git commit (server): 4749651-dirty
OS/Arch (server): linux/amd64
% cat /etc/lsb-release 
DISTRIB_ID=CoreOS
DISTRIB_RELEASE=668.2.0
DISTRIB_CODENAME="Red Dog"
DISTRIB_DESCRIPTION="CoreOS 668.2.0"
% docker images
REPOSITORY           TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
newrelic/nrsysmond   latest              ad8731e9cadf        12 days ago         193 MB
busybox              latest              8c2e06607696        3 weeks ago         2.43 MB
<none>               <none>              511136ea3c5a        23 months ago       0 B
@alicegklein

This comment has been minimized.

Show comment
Hide comment
@alicegklein

alicegklein May 18, 2015

On 2 machines when I run
$ docker run -it ubuntu bash
hangs pulling fs layer

The last part of of the run output:
07f8e8c5e660: Download complete
07f8e8c5e660: Download complete
07f8e8c5e660: Verifying Checksum
07f8e8c5e660: Downloading 32 B/32 B
07f8e8c5e660: Layer already being pulled by another client. Waiting.
07f8e8c5e660: Pulling fs layer
latest: Pulling from ubuntu
07f8e8c5e660: Pulling fs layer

On 2 machines when I run
$ docker run -it ubuntu bash
hangs pulling fs layer

The last part of of the run output:
07f8e8c5e660: Download complete
07f8e8c5e660: Download complete
07f8e8c5e660: Verifying Checksum
07f8e8c5e660: Downloading 32 B/32 B
07f8e8c5e660: Layer already being pulled by another client. Waiting.
07f8e8c5e660: Pulling fs layer
latest: Pulling from ubuntu
07f8e8c5e660: Pulling fs layer

@gregarndt

This comment has been minimized.

Show comment
Hide comment
@gregarndt

gregarndt May 22, 2015

This seems to be somewhat related to the original issue described, although not specific to devicemapper. We have been using 1.6.1 in production for a couple of weeks and did not hit an issue until we moved to the new registry at registry.hub.docker.com (we were previously using quay.io on v1).

When a machine attempts to make multiple concurrent pulls for the image, we see this on some instances:

64dc4f9ad22a - Error pulling image (0.3.3) from taskcluster/tester, endpoint: https://registry-1.docker.io/v1/, Driver aufs failed to create image rootfs a5208e8002346d5588c27b5a0d6b49c0735e44ccb776bd746b23b67af3cf2440: open /mnt/var/lib/docker/aufs/layers/93c381d2c25533cc6fcf6ea8d208745601b3cd07d43209a3b7d1b09c58d634af: no such file or directory

This is from the docker logs:
ERRO[7781] Error from V2 registry: Driver aufs failed to create image rootfs a5208e8002346d5588c27b5a0d6b49c0735e44ccb776bd746b23b67af3cf2440: open /mnt/var/lib/docker/aufs/layers/93c381d2c25533cc6fcf6ea8d208745601b3cd07d43209a3b7d1b09c58d634af: no such file or directory
Error pulling image (0.3.3) from taskcluster/tester, Driver aufs failed to create image rootfs a5208e8002346d5588c27b5a0d6b49c0735e44ccb776bd746b23b67af3cf2440: open /mnt/var/lib/docker/aufs/layers/93c381d2c25533cc6fcf6ea8d208745601b3cd07d43209a3b7d1b09c58d634af: no such file or directory

This leaves an untagged image () listed and the image can never be downloaded unless a docker restart happens. I've also noticed that while this image is pulling down, there is already a container running that must share a layer that the above image also uses because when attempting to remove the untagged image I get a warning that it can't be removed because a container is using some layer it needs to remove. The other container is up and running fine though.

If this should be moved to a new issue let me know or if there is some other information I could provide I would be happy to do so.

These are ec2 nodes where we're seeing the issue.

$ docker info
Containers: 3
Images: 16
Storage Driver: aufs
Root Dir: /mnt/var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 21
Dirperm1 Supported: false
Execution Driver: native-0.2
Kernel Version: 3.13.0-48-generic
Operating System: Ubuntu 14.04.2 LTS
CPUs: 4
Total Memory: 14.69 GiB

$ docker version
Client version: 1.6.1
Client API version: 1.18
Go version (client): go1.4.2
Git commit (client): 97cd073
OS/Arch (client): linux/amd64
Server version: 1.6.1
Server API version: 1.18
Go version (server): go1.4.2
Git commit (server): 97cd073
OS/Arch (server): linux/amd64

$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.2 LTS"

This seems to be somewhat related to the original issue described, although not specific to devicemapper. We have been using 1.6.1 in production for a couple of weeks and did not hit an issue until we moved to the new registry at registry.hub.docker.com (we were previously using quay.io on v1).

When a machine attempts to make multiple concurrent pulls for the image, we see this on some instances:

64dc4f9ad22a - Error pulling image (0.3.3) from taskcluster/tester, endpoint: https://registry-1.docker.io/v1/, Driver aufs failed to create image rootfs a5208e8002346d5588c27b5a0d6b49c0735e44ccb776bd746b23b67af3cf2440: open /mnt/var/lib/docker/aufs/layers/93c381d2c25533cc6fcf6ea8d208745601b3cd07d43209a3b7d1b09c58d634af: no such file or directory

This is from the docker logs:
ERRO[7781] Error from V2 registry: Driver aufs failed to create image rootfs a5208e8002346d5588c27b5a0d6b49c0735e44ccb776bd746b23b67af3cf2440: open /mnt/var/lib/docker/aufs/layers/93c381d2c25533cc6fcf6ea8d208745601b3cd07d43209a3b7d1b09c58d634af: no such file or directory
Error pulling image (0.3.3) from taskcluster/tester, Driver aufs failed to create image rootfs a5208e8002346d5588c27b5a0d6b49c0735e44ccb776bd746b23b67af3cf2440: open /mnt/var/lib/docker/aufs/layers/93c381d2c25533cc6fcf6ea8d208745601b3cd07d43209a3b7d1b09c58d634af: no such file or directory

This leaves an untagged image () listed and the image can never be downloaded unless a docker restart happens. I've also noticed that while this image is pulling down, there is already a container running that must share a layer that the above image also uses because when attempting to remove the untagged image I get a warning that it can't be removed because a container is using some layer it needs to remove. The other container is up and running fine though.

If this should be moved to a new issue let me know or if there is some other information I could provide I would be happy to do so.

These are ec2 nodes where we're seeing the issue.

$ docker info
Containers: 3
Images: 16
Storage Driver: aufs
Root Dir: /mnt/var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 21
Dirperm1 Supported: false
Execution Driver: native-0.2
Kernel Version: 3.13.0-48-generic
Operating System: Ubuntu 14.04.2 LTS
CPUs: 4
Total Memory: 14.69 GiB

$ docker version
Client version: 1.6.1
Client API version: 1.18
Go version (client): go1.4.2
Git commit (client): 97cd073
OS/Arch (client): linux/amd64
Server version: 1.6.1
Server API version: 1.18
Go version (server): go1.4.2
Git commit (server): 97cd073
OS/Arch (server): linux/amd64

$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.2 LTS"

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah May 24, 2015

Member

@vbatts @stevvooe perhaps you can have a look here;

and did not hit an issue until we moved to the new registry at registry.hub.docker.com (we were previously using quay.io on v1).

I think this will be because pulling from the new registry will pull up to 5 layers in parallel; an issue that was not so common may become a lot more relevant because of that?

Member

thaJeztah commented May 24, 2015

@vbatts @stevvooe perhaps you can have a look here;

and did not hit an issue until we moved to the new registry at registry.hub.docker.com (we were previously using quay.io on v1).

I think this will be because pulling from the new registry will pull up to 5 layers in parallel; an issue that was not so common may become a lot more relevant because of that?

@stevvooe

This comment has been minimized.

Show comment
Hide comment
@stevvooe

stevvooe May 26, 2015

Contributor

@thaJeztah Is it possible there is a bookkeeping mistake in the layer references for the downloaded layer that has encountered an error? If you kill the other container referencing that image, does it allow the pull?

Contributor

stevvooe commented May 26, 2015

@thaJeztah Is it possible there is a bookkeeping mistake in the layer references for the downloaded layer that has encountered an error? If you kill the other container referencing that image, does it allow the pull?

@gregarndt

This comment has been minimized.

Show comment
Hide comment
@gregarndt

gregarndt May 26, 2015

Yup, if the other container is killed (which was created from a different image), the original image that had issues can be pulled.

Yup, if the other container is killed (which was created from a different image), the original image that had issues can be pulled.

@alicegklein

This comment has been minimized.

Show comment
Hide comment
@alicegklein

alicegklein May 27, 2015

Thanks Greg,  Yes it now completes.
From: Greg Arndt notifications@github.com
To: docker/docker docker@noreply.github.com
Cc: alicegklein alicegklein@yahoo.com
Sent: Tuesday, May 26, 2015 5:02 PM
Subject: Re: [docker] Concurrent pulls with DeviceMapper fails (#9718)

Yup, if the other container is killed (which was created from a different image), the original image that had issues can be pulled.—
Reply to this email directly or view it on GitHub.

Thanks Greg,  Yes it now completes.
From: Greg Arndt notifications@github.com
To: docker/docker docker@noreply.github.com
Cc: alicegklein alicegklein@yahoo.com
Sent: Tuesday, May 26, 2015 5:02 PM
Subject: Re: [docker] Concurrent pulls with DeviceMapper fails (#9718)

Yup, if the other container is killed (which was created from a different image), the original image that had issues can be pulled.—
Reply to this email directly or view it on GitHub.

@termie

This comment has been minimized.

Show comment
Hide comment
@termie

termie Jun 1, 2015

Contributor

this is becoming an increasingly problematic issue for my company, is there a workaround that will let me specify my own files rather than going through the docker graph store?

Contributor

termie commented Jun 1, 2015

this is becoming an increasingly problematic issue for my company, is there a workaround that will let me specify my own files rather than going through the docker graph store?

@msiedlarek

This comment has been minimized.

Show comment
Hide comment
@msiedlarek

msiedlarek Jun 2, 2015

+1

Docker: 1.6.2
Storage Driver: overlay
Backing Filesystem: extfs
Execution Driver: native-0.2
Kernel Version: 4.0.3

We also use the new registry v2.0.1.

This issue basically breaks our deployment and, I would think, any serious Docker deployment beyond a single-container to-do app.

+1

Docker: 1.6.2
Storage Driver: overlay
Backing Filesystem: extfs
Execution Driver: native-0.2
Kernel Version: 4.0.3

We also use the new registry v2.0.1.

This issue basically breaks our deployment and, I would think, any serious Docker deployment beyond a single-container to-do app.

@vbatts

This comment has been minimized.

Show comment
Hide comment
@vbatts

vbatts Jun 2, 2015

Contributor

i'm struggling to reproduce this. It it not a devicemapper specific issue, but likely a racy, lack-of-lock issue with storage drivers in general. I'm trying pulling concurrent with same name and different named images that reference same layer IDs, and consistently see the message that the image or layer is already being fetched.

Could you provide a concise reproduction steps?

Contributor

vbatts commented Jun 2, 2015

i'm struggling to reproduce this. It it not a devicemapper specific issue, but likely a racy, lack-of-lock issue with storage drivers in general. I'm trying pulling concurrent with same name and different named images that reference same layer IDs, and consistently see the message that the image or layer is already being fetched.

Could you provide a concise reproduction steps?

@gregarndt

This comment has been minimized.

Show comment
Hide comment
@gregarndt

gregarndt Jun 3, 2015

@vbatts I'll attempt to have a more reproducible set of steps. So far we're only seeing this on our AWS ec2 nodes, but not every pull. So far the only thing I've noticed that works is restarting docker.

@vbatts I'll attempt to have a more reproducible set of steps. So far we're only seeing this on our AWS ec2 nodes, but not every pull. So far the only thing I've noticed that works is restarting docker.

@skippy

This comment has been minimized.

Show comment
Hide comment
@skippy

skippy Sep 9, 2015

thanks folks. fingers crossed on the patches. This is hitting me pretty hard on 1.8.1 as I use the alpine:3.2 as a common base image. deleting that image and then pulling the stuck images works (as long as they are done serially). Other info if needed:

$ docker info
Containers: 2
Images: 49
Storage Driver: overlay
 Backing Filesystem: extfs
Execution Driver: native-0.2
Kernel Version: 4.0.5
Operating System: CoreOS 723.3.0
CPUs: 2
Total Memory: 3.863 GiB
Name: ip-10-0-11-133.us-west-2.compute.internal
ID: GQOC:A7Q3:2NEU:GFK6:QILO:VGQE:PSON:DFG6:DWHZ:3FMG:ZK2F:PFXQ
$ docker version
Client version: 1.6.2
Client API version: 1.18
Go version (client): go1.4.2
Git commit (client): 7c8fca2-dirty
OS/Arch (client): linux/amd64
Server version: 1.6.2
Server API version: 1.18
Go version (server): go1.4.2
Git commit (server): 7c8fca2-dirty
OS/Arch (server): linux/amd64
$ cat /etc/lsb-release
DISTRIB_ID=CoreOS
DISTRIB_RELEASE=723.3.0
DISTRIB_CODENAME="Red Dog"
DISTRIB_DESCRIPTION="CoreOS 723.3.0"

skippy commented Sep 9, 2015

thanks folks. fingers crossed on the patches. This is hitting me pretty hard on 1.8.1 as I use the alpine:3.2 as a common base image. deleting that image and then pulling the stuck images works (as long as they are done serially). Other info if needed:

$ docker info
Containers: 2
Images: 49
Storage Driver: overlay
 Backing Filesystem: extfs
Execution Driver: native-0.2
Kernel Version: 4.0.5
Operating System: CoreOS 723.3.0
CPUs: 2
Total Memory: 3.863 GiB
Name: ip-10-0-11-133.us-west-2.compute.internal
ID: GQOC:A7Q3:2NEU:GFK6:QILO:VGQE:PSON:DFG6:DWHZ:3FMG:ZK2F:PFXQ
$ docker version
Client version: 1.6.2
Client API version: 1.18
Go version (client): go1.4.2
Git commit (client): 7c8fca2-dirty
OS/Arch (client): linux/amd64
Server version: 1.6.2
Server API version: 1.18
Go version (server): go1.4.2
Git commit (server): 7c8fca2-dirty
OS/Arch (server): linux/amd64
$ cat /etc/lsb-release
DISTRIB_ID=CoreOS
DISTRIB_RELEASE=723.3.0
DISTRIB_CODENAME="Red Dog"
DISTRIB_DESCRIPTION="CoreOS 723.3.0"
@dchen1107

This comment has been minimized.

Show comment
Hide comment
@dchen1107

dchen1107 Sep 15, 2015

Contributor

Our customers are still hitting the same issue with docker 1.8.2: kubernetes/kubernetes#10623 (comment)

Contributor

dchen1107 commented Sep 15, 2015

Our customers are still hitting the same issue with docker 1.8.2: kubernetes/kubernetes#10623 (comment)

@mboersma

This comment has been minimized.

Show comment
Hide comment
@mboersma

mboersma Sep 28, 2015

We are seeing the same issue consistently, and it's what has kept our product Deis on Docker 1.5.0, where apparently we will stay until this is fixed. (Nothing to do with devicemapper, BTW: this is overlayfs.)

Our product consists of several systemd units, each of which downloads its Docker image when started. Because we follow Docker best practices, the images share base layers in common. So these multiple docker pull operations immediately collide and corrupt the Docker graph. What the user sees is a systemd unit that constantly restarts.

This all works just fine with Docker 1.5 / Registry v0.9.1. We can't upgrade to the new Go registry or newer versions of Docker because it will trigger this bug simply by deploying multiple app units with a deis scale operation. See deis/deis#3814.

We are seeing the same issue consistently, and it's what has kept our product Deis on Docker 1.5.0, where apparently we will stay until this is fixed. (Nothing to do with devicemapper, BTW: this is overlayfs.)

Our product consists of several systemd units, each of which downloads its Docker image when started. Because we follow Docker best practices, the images share base layers in common. So these multiple docker pull operations immediately collide and corrupt the Docker graph. What the user sees is a systemd unit that constantly restarts.

This all works just fine with Docker 1.5 / Registry v0.9.1. We can't upgrade to the new Go registry or newer versions of Docker because it will trigger this bug simply by deploying multiple app units with a deis scale operation. See deis/deis#3814.

@jmitchell

This comment has been minimized.

Show comment
Hide comment
@jmitchell

jmitchell Sep 28, 2015

Does this need to be assigned a new milestone? 1.7.1 is closed.

Does this need to be assigned a new milestone? 1.7.1 is closed.

@LK4D4 LK4D4 modified the milestones: 1.7.1, 1.9.0 Sep 28, 2015

@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Sep 28, 2015

Contributor

@jmitchell Yup, thank you. People reporting that it works in master, though.

Contributor

LK4D4 commented Sep 28, 2015

@jmitchell Yup, thank you. People reporting that it works in master, though.

@d3ming

This comment has been minimized.

Show comment
Hide comment
@d3ming

d3ming Oct 2, 2015

We are getting this issue fairly consistently with 1.8.2 whereas it used to work on an older version (I believe 1.6ish)

d3ming commented Oct 2, 2015

We are getting this issue fairly consistently with 1.8.2 whereas it used to work on an older version (I believe 1.6ish)

@pditommaso

This comment has been minimized.

Show comment
Hide comment
@pditommaso

pditommaso Oct 2, 2015

Same here. I confirm that 1.6.2 is the last working version.

Same here. I confirm that 1.6.2 is the last working version.

@stdexcept

This comment has been minimized.

Show comment
Hide comment
@stdexcept

stdexcept Oct 6, 2015

This issue seriously affects us when new machines join our coreos cluster. This is definitely blocking us, as many others I can imagine.

This issue seriously affects us when new machines join our coreos cluster. This is definitely blocking us, as many others I can imagine.

@bitglue

This comment has been minimized.

Show comment
Hide comment
@bitglue

bitglue Oct 6, 2015

An expedient hack in many cases is to implement locking external to docker on any pulls or builds. flock(1) can accomplish this pretty easily, so wherever you have

docker pull ...
docker build ...

Then pick some file (doesn't matter what really, as long as it's the same everywhere) and change those commands to:

flock /var/run/lock/docker docker pull ...
flock /var/run/lock/docker docker build ...

This has the effect of serializing all pull and build operations, avoiding the buggy concurrency controls in docker.

bitglue commented Oct 6, 2015

An expedient hack in many cases is to implement locking external to docker on any pulls or builds. flock(1) can accomplish this pretty easily, so wherever you have

docker pull ...
docker build ...

Then pick some file (doesn't matter what really, as long as it's the same everywhere) and change those commands to:

flock /var/run/lock/docker docker pull ...
flock /var/run/lock/docker docker build ...

This has the effect of serializing all pull and build operations, avoiding the buggy concurrency controls in docker.

@pditommaso

This comment has been minimized.

Show comment
Hide comment
@pditommaso

pditommaso Oct 6, 2015

@bitglue The Docker runtime should do that.

@bitglue The Docker runtime should do that.

@bitglue

This comment has been minimized.

Show comment
Hide comment
@bitglue

bitglue Oct 6, 2015

@pditommaso you know what an expedient hack is, right?

bitglue commented Oct 6, 2015

@pditommaso you know what an expedient hack is, right?

@pditommaso

This comment has been minimized.

Show comment
Hide comment
@pditommaso

pditommaso Oct 6, 2015

@bitglue Um yes, but in my use case that hack is not an option.

@bitglue Um yes, but in my use case that hack is not an option.

@burke

This comment has been minimized.

Show comment
Hide comment
@burke

burke Oct 9, 2015

Contributor

This was fixed by #15955 AFAICT.

Contributor

burke commented Oct 9, 2015

This was fixed by #15955 AFAICT.

@vbatts

This comment has been minimized.

Show comment
Hide comment
@vbatts

vbatts Oct 9, 2015

Contributor

Burke, have checked this on master?
On Oct 9, 2015 14:19, "Burke Libbey" notifications@github.com wrote:

This was fixed by #15955 #15955
AFAICT.


Reply to this email directly or view it on GitHub
#9718 (comment).

Contributor

vbatts commented Oct 9, 2015

Burke, have checked this on master?
On Oct 9, 2015 14:19, "Burke Libbey" notifications@github.com wrote:

This was fixed by #15955 #15955
AFAICT.


Reply to this email directly or view it on GitHub
#9718 (comment).

@burke

This comment has been minimized.

Show comment
Hide comment
@burke

burke Oct 9, 2015

Contributor

Yep, I tried my repro from #16893 and it's fixed.

Contributor

burke commented Oct 9, 2015

Yep, I tried my repro from #16893 and it's fixed.

@vbatts

This comment has been minimized.

Show comment
Hide comment
@vbatts

vbatts Oct 9, 2015

Contributor

Good to hear. Then docker-1.9 it is.
On Oct 9, 2015 14:26, "Burke Libbey" notifications@github.com wrote:

Yep, I tried my repro from #16893
#16893 and it's fixed.


Reply to this email directly or view it on GitHub
#9718 (comment).

Contributor

vbatts commented Oct 9, 2015

Good to hear. Then docker-1.9 it is.
On Oct 9, 2015 14:26, "Burke Libbey" notifications@github.com wrote:

Yep, I tried my repro from #16893
#16893 and it's fixed.


Reply to this email directly or view it on GitHub
#9718 (comment).

@icecrime

This comment has been minimized.

Show comment
Hide comment
@icecrime

icecrime Oct 9, 2015

Contributor

Thanks @burke! :-)

Contributor

icecrime commented Oct 9, 2015

Thanks @burke! :-)

@icecrime icecrime closed this Oct 10, 2015

helgi added a commit to helgi/controller that referenced this issue Sep 27, 2016

ref(registry): remove simpleflock and depend on Docker 1.9 minimum
Concurrent pull issue was resolved in 1.9 (moby/moby#9718) and Kubernetes is validated against that as well

helgi added a commit to helgi/controller that referenced this issue Sep 27, 2016

ref(registry): remove simpleflock and depend on Docker 1.9 minimum
Concurrent pull issue was resolved in 1.9 (moby/moby#9718) and Kubernetes is validated against that as well

helgi added a commit to deis/controller that referenced this issue Sep 29, 2016

ref(registry): remove simpleflock and depend on Docker 1.9 minimum (#…
…1090)

Concurrent pull issue was resolved in 1.9 (moby/moby#9718) and Kubernetes is validated against that as well
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment