docker timeouts when waiting for mkfs to finish on devicemapper storage driver device #16653

Closed
jaryn opened this Issue Sep 29, 2015 · 16 comments

Comments

Projects
None yet
5 participants
@jaryn

jaryn commented Sep 29, 2015

Description

docker version: docker-1.7.1-115.el7.x86_64
docker info: N/A
uname -a: Linux test-node-compute-f429d.example.com 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 11:36:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
platform: libvirt (kvm)

By default on RHEL 7.1 docker creates sparse file one upon start (systemctl start docker). That file is used as a loopback-device (seemingly using dmsetup syscall eqivalents), then issues

mkfs.ext4 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 /dev/mapper/docker-253:1-243367-base

This command may take several minutes to complete. Meanwhile the avgqu-sz gauge of sar shoots up to >25000. Then the systemctl start reports timeout.

Reproducer

Remove the backing files. I am not sure about the order of dmsetup lines here.

# systemctl stop docker
#dmsetup remove /dev/dm-0
#dmsetup remove /dev/dm-1
# rm /var/lib/docker/devicemapper/devicemapper/*
# systemctl start docker

... BLOCKING

# sar -d 1
...
15:28:24          DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
15:28:25     dev253-0      0,00      0,00      0,00      0,00      0,00      0,00      0,00      0,00
15:28:25    dev253-16      0,00      0,00      0,00      0,00      0,00      0,00      0,00      0,00
15:28:25     dev252-0      0,00      0,00      0,00      0,00    112,86      0,00      0,00    100,40
15:28:25     dev252-1      0,00      0,00      0,00      0,00  34883,39      0,00      0,00    100,40

For some time, we can see the mkfs being performed

#ps aux | grep mkfs
mkfs.ext4 -E nodiscard,lazy_itable_init=0,lazy_journal_init=0 /dev/mapper/docker-253:1-243367-base

But the user-space process disappears before the kernel-space process, while the queue is still quite full.

The result is

[root@test-node-compute-f429d openshift]# systemctl status -l docker.service
docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled)
  Drop-In: /usr/lib/systemd/system/docker.service.d
           └─docker-sdn-ovs.conf
   Active: failed (Result: timeout) since Út 2015-09-29 15:30:25 UTC; 1min 53s ago
     Docs: http://docs.docker.com
 Main PID: 25025

zář 29 15:27:25 test-node-compute-f429d.example.com systemd[1]: Starting Docker Application Container Engine...
zář 29 15:27:25 test-node-compute-f429d.example.com docker[25025]: time="2015-09-29T15:27:25.975880434Z" level=warning msg="--storage-opt dm.thinpooldev is preferred over --storage-opt dm.datadev or dm.metadatadev"
zář 29 15:27:25 test-node-compute-f429d.example.com docker[25025]: time="2015-09-29T15:27:25.976662822Z" level=info msg="Listening for HTTP on unix (/var/run/docker.sock)"
zář 29 15:27:25 test-node-compute-f429d.example.com docker[25025]: time="2015-09-29T15:27:25.978128795Z" level=error msg="Unable to delete device: Error running DeleteDevice dm_task_run failed"
zář 29 15:28:55 test-node-compute-f429d.example.com systemd[1]: docker.service operation timed out. Terminating.
zář 29 15:30:25 test-node-compute-f429d.example.com systemd[1]: docker.service stopping timed out (2). Killing.
zář 29 15:30:25 test-node-compute-f429d.example.com systemd[1]: Failed to start Docker Application Container Engine.
zář 29 15:30:25 test-node-compute-f429d.example.com systemd[1]: Unit docker.service entered failed state.

It seems mkfs wrote about at least 150MB of data:

#du -h /var/lib/docker/devicemapper/devicemapper/*
150M    /var/lib/docker/devicemapper/devicemapper/data
672K    /var/lib/docker/devicemapper/devicemapper/metadata

subsequent mkfs executed on the device exits fast, I think this is caused by the fact it is thin-provisioned.

I can't see any meaningfull error message in the logs that would help me debugging this problem.

Note that when one tries to start the docker again, docker will re-create the sparse file so the start is likely to fail again even though the FS is created fine every time, because this is done asychronously

Expected state

Docker started on system no matter how much loaded it is. This would probably need checking of the mkfs progress.

Maybe the -D mkfs option may help.

   -D     Use  direct I/O when writing to the disk.  This avoids mke2fs dirtying a lot of
          buffer cache memory, which may impact other  applications  running  on  a  busy
          server.   This  option  will  cause mke2fs to run much more slowly, however, so
          there is a tradeoff to using direct I/O.
@GordonTheTurtle

This comment has been minimized.

Show comment
Hide comment
@GordonTheTurtle

GordonTheTurtle Sep 29, 2015

Hi!

Please read this important information about creating issues.

If you are reporting a new issue, make sure that we do not have any duplicates already open. You can ensure this by searching the issue list for this repository. If there is a duplicate, please close your issue and add a comment to the existing issue instead.

If you suspect your issue is a bug, please edit your issue description to include the BUG REPORT INFORMATION shown below. If you fail to provide this information within 7 days, we cannot debug your issue and will close it. We will, however, reopen it if you later provide the information.

This is an automated, informational response.

Thank you.

For more information about reporting issues, see https://github.com/docker/docker/blob/master/CONTRIBUTING.md#reporting-other-issues


BUG REPORT INFORMATION

Use the commands below to provide key information from your environment:

docker version:
docker info:
uname -a:

Provide additional environment details (AWS, VirtualBox, physical, etc.):

List the steps to reproduce the issue:
1.
2.
3.

Describe the results you received:

Describe the results you expected:

Provide additional info you think is important:

----------END REPORT ---------

#ENEEDMOREINFO

Hi!

Please read this important information about creating issues.

If you are reporting a new issue, make sure that we do not have any duplicates already open. You can ensure this by searching the issue list for this repository. If there is a duplicate, please close your issue and add a comment to the existing issue instead.

If you suspect your issue is a bug, please edit your issue description to include the BUG REPORT INFORMATION shown below. If you fail to provide this information within 7 days, we cannot debug your issue and will close it. We will, however, reopen it if you later provide the information.

This is an automated, informational response.

Thank you.

For more information about reporting issues, see https://github.com/docker/docker/blob/master/CONTRIBUTING.md#reporting-other-issues


BUG REPORT INFORMATION

Use the commands below to provide key information from your environment:

docker version:
docker info:
uname -a:

Provide additional environment details (AWS, VirtualBox, physical, etc.):

List the steps to reproduce the issue:
1.
2.
3.

Describe the results you received:

Describe the results you expected:

Provide additional info you think is important:

----------END REPORT ---------

#ENEEDMOREINFO

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Sep 30, 2015

Contributor

ping @rhvgoyal

Contributor

cpuguy83 commented Sep 30, 2015

ping @rhvgoyal

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal

rhvgoyal Sep 30, 2015

Contributor

@jaryn

This primarily happens because your underlying storage is slow. And recently we had increased the default size of base image to 100G (from 10G) so on slower storage problem became worse and it started to run into systemd timeout.

Quick work around are following.

  • Don't use loop devices. Use lvm thin pool on top of a resonably fast block device.
  • Use 10G base device size. Pass daemon option --storage-opt dm.basesize=10G
  • Increase systemd timeout. (man systemd.unit).

In the long run, I think we should revert the patch to reduce the base image size to say 20G and allow a docker create/run option which can specify the size of container at creation time. (Say --storage-opt dm.size=100G). That way one can start with a small base size but specify a bigger size for specific containers. This might still run into build issues though.

Contributor

rhvgoyal commented Sep 30, 2015

@jaryn

This primarily happens because your underlying storage is slow. And recently we had increased the default size of base image to 100G (from 10G) so on slower storage problem became worse and it started to run into systemd timeout.

Quick work around are following.

  • Don't use loop devices. Use lvm thin pool on top of a resonably fast block device.
  • Use 10G base device size. Pass daemon option --storage-opt dm.basesize=10G
  • Increase systemd timeout. (man systemd.unit).

In the long run, I think we should revert the patch to reduce the base image size to say 20G and allow a docker create/run option which can specify the size of container at creation time. (Say --storage-opt dm.size=100G). That way one can start with a small base size but specify a bigger size for specific containers. This might still run into build issues though.

@jaryn

This comment has been minimized.

Show comment
Hide comment
@jaryn

jaryn Oct 1, 2015

Well, my underlying storage is SSD, so my guess it is not the bottleneck.

However although I have been able to reproduce this issue several times before I reported it I am not able to reproduce it any more.

jaryn commented Oct 1, 2015

Well, my underlying storage is SSD, so my guess it is not the bottleneck.

However although I have been able to reproduce this issue several times before I reported it I am not able to reproduce it any more.

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal

rhvgoyal Oct 1, 2015

Contributor

@jaryn Trying using lvm thin pool instead of loop devices and should have a better experience.

Contributor

rhvgoyal commented Oct 1, 2015

@jaryn Trying using lvm thin pool instead of loop devices and should have a better experience.

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal

rhvgoyal Oct 1, 2015

Contributor

@jaryn docker-storage-setup utility can easily setup lvm thin pool for you.

https://github.com/projectatomic/docker-storage-setup

We also ship it as part of docker package in fedora/rhel/centos.

Contributor

rhvgoyal commented Oct 1, 2015

@jaryn docker-storage-setup utility can easily setup lvm thin pool for you.

https://github.com/projectatomic/docker-storage-setup

We also ship it as part of docker package in fedora/rhel/centos.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Oct 30, 2015

Member

@rhvgoyal perhaps you can have a look at #17486, it's a WIP for improvements around the docs for storage drivers; perhaps that tool should be mentioned in the docs as well, wdyt?

Member

thaJeztah commented Oct 30, 2015

@rhvgoyal perhaps you can have a look at #17486, it's a WIP for improvements around the docs for storage drivers; perhaps that tool should be mentioned in the docs as well, wdyt?

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal

rhvgoyal Oct 30, 2015

Contributor

@thaJeztah I think mentioning docker-storage-setup in docker docs is a good idea. It should help people setup lvm thin pool relatively easily. (Better then reading man lvmthin and doing it yourself).

Contributor

rhvgoyal commented Oct 30, 2015

@thaJeztah I think mentioning docker-storage-setup in docker docs is a good idea. It should help people setup lvm thin pool relatively easily. (Better then reading man lvmthin and doing it yourself).

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Oct 30, 2015

Member

Thanks @rhvgoyal, I left a comment there to consider including it

Member

thaJeztah commented Oct 30, 2015

Thanks @rhvgoyal, I left a comment there to consider including it

@jfryman jfryman referenced this issue in StackStorm/st2workroom Nov 11, 2015

Merged

Change docker dm_basesize to 20g #251

jfryman added a commit to StackStorm/st2workroom that referenced this issue Nov 12, 2015

disable timeout locally for systemd + docker
This commit ensures that the docker service can take as long
as it so desires to startup. This bypasses the issues experienced
as a result of moby/moby#16653,
and should still ensure that RHEL 6 hosts are not affected by the
`dm.basesize` option change.

@jfryman jfryman referenced this issue in StackStorm/st2workroom Nov 12, 2015

Merged

disable timeout locally for systemd + docker #256

@jaryn

This comment has been minimized.

Show comment
Hide comment
@jaryn

jaryn Nov 28, 2015

Please note my complain that there was no useful error message anywhere, that would help me to immediately see what is the problem. Can / have this been improved?

jaryn commented Nov 28, 2015

Please note my complain that there was no useful error message anywhere, that would help me to immediately see what is the problem. Can / have this been improved?

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Nov 28, 2015

Member

@jaryn I'm not sure that's possible, given that the timeout is produced by systemd, open to suggestions though.

Member

thaJeztah commented Nov 28, 2015

@jaryn I'm not sure that's possible, given that the timeout is produced by systemd, open to suggestions though.

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal

rhvgoyal Nov 30, 2015

Contributor

I think we can put one debug/info message in docker which says that docker is creating filesystem on base device. That way when it times out, user can look at the logs and figure out that creating filesystem was taking a long time.

Contributor

rhvgoyal commented Nov 30, 2015

I think we can put one debug/info message in docker which says that docker is creating filesystem on base device. That way when it times out, user can look at the logs and figure out that creating filesystem was taking a long time.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Nov 30, 2015

Member

@rhvgoyal guess that could work, i.e. a message before and after the filesystem was created. Are you interested in creating a PR for that? 😇

Member

thaJeztah commented Nov 30, 2015

@rhvgoyal guess that could work, i.e. a message before and after the filesystem was created. Are you interested in creating a PR for that? 😇

@rhvgoyal

This comment has been minimized.

Show comment
Hide comment
@rhvgoyal

rhvgoyal Nov 30, 2015

Contributor

@thaJeztah Ok, I will create one.

Contributor

rhvgoyal commented Nov 30, 2015

@thaJeztah Ok, I will create one.

@jaryn

This comment has been minimized.

Show comment
Hide comment
@jaryn

jaryn Dec 2, 2015

Thanks you all.

jaryn commented Dec 2, 2015

Thanks you all.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Dec 2, 2015

Member

Thanks for reporting @jaryn, your feedback helps us improve the project!

Member

thaJeztah commented Dec 2, 2015

Thanks for reporting @jaryn, your feedback helps us improve the project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment