Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User-docker sucks with docker-17.12.1+ #2300

Closed
kingsd041 opened this issue Mar 21, 2018 · 18 comments

Comments

Projects
None yet
@kingsd041
Copy link
Member

commented Mar 21, 2018

RancherOS Version: (ros os version)
v.1.3.0-rc1
Where are you running RancherOS? (docker-machine, AWS, GCE, baremetal, etc.)
AWS

Switch docker-17.12.1-ce to docker-17.09.1-ce, the docker driver uses vfs by default。

root@ip-172-31-35-98:/var/log# docker info | grep Storage
Storage Driver: vfs
root@ip-172-31-35-98:/var/log# docker -v
Docker version 17.09.1-ce, build 19e2cf6

I found the following error log in docker.log

time="2018-03-21T03:35:05.234106479Z" level=error msg="'overlay2' is not supported over overlayfs"
time="2018-03-21T03:35:05.236377569Z" level=error msg="'overlay' is not supported over overlayfs"
time="2018-03-21T03:35:05.237043516Z" level=error msg="devmapper: Udev sync is not supported. This will lead to data loss and unexpected behavior. Install a dynamic binary to use devicemapper or select a different storage driver. For more information, see https://docs.docker.com/engine/reference/commandline/dockerd/#storage-driver-options"
@niusmallnan

This comment has been minimized.

Copy link
Member

commented Mar 23, 2018

In my testing, the problem may be on docker-17.12.1-ce. If you switch other docker versions there is no problem, such as:
docker-17.09.1-ce ---> docker-xxx-ce(not 17.12.1) ---> docker-17.09.1-ce

If the problem has already occurred, user-docker cannot work. The workaround is to rebuild the console, you can use ros console switch xxx to rebuild it.

@niusmallnan niusmallnan added this to the v1.3.0 milestone Mar 28, 2018

@niusmallnan niusmallnan modified the milestones: v1.3.0, v1.4.0 Apr 9, 2018

@Jason-ZW

This comment has been minimized.

Copy link
Member

commented Apr 12, 2018

Another situation can cause the same prbolem:
docker-17.09.1-ce ---> docker-17.12.1-ce ---> system-docker restart docker

There's something different about docker-17.09:

  • when docker-daemon stopping, it unmounts the directory /var/lib/docker from mounttable

The unmount logic can cause overlay problem in ROS, the directory /var/lib/docker filesystem-type change from ext4 to overlay after unmounts because of console container's default filesystem type is Overlay.

The following backing filesystems are supported by Overlay2 & Overlay driver:

  • ext4
  • xfs

So the errors occur as below:

time="2018-03-21T03:35:05.234106479Z" level=error msg="'overlay2' is not supported over overlayfs"
time="2018-03-21T03:35:05.236377569Z" level=error msg="'overlay' is not supported over overlayfs"
time="2018-03-21T03:35:05.237043516Z" level=error msg="devmapper: Udev sync is not supported. This will lead to data loss and unexpected behavior. Install a dynamic binary to use devicemapper or select a different storage driver. For more information, see https://docs.docker.com/engine/reference/commandline/dockerd/#storage-driver-options"

Maybe my understanding is the wrong, please let me know if anyone who has the idea or solution.
A related comment: moby/moby#36833 (comment)

@niusmallnan niusmallnan changed the title Switch docker-17.12.1-ce to docker-17.09.1-ce, the docker driver uses vfs by default User-docker sucks with docker-17.12.1+ Apr 16, 2018

@niusmallnan

This comment has been minimized.

Copy link
Member

commented Apr 17, 2018

Docker will umount the data root dir caused by this moby/moby#36107 .
This PR has been merged into 17.12.1-ce.
We can see the daemon logs when the daemon is stopping.

.... mountpoint=/var/lib/docker, unmounting daemon root

Look at these code, Docker will umount data root in these scenarios.
https://github.com/docker/docker-ce/blob/17.12/components/engine/daemon/daemon_linux.go#L116-L127

Check our mount info in RancherOS:

36 35 98:0 /mnt1 /mnt2 rw,noatime master:1 - ext3 /dev/root rw,errors=continue
(1)(2)(3)   (4)   (5)      (6)      (7)   (8) (9)   (10)         (11)

(1) mount ID:  unique identifier of the mount (may be reused after umount)
(2) parent ID:  ID of parent (or of self for the top of the mount tree)
(3) major:minor:  value of st_dev for files on filesystem
(4) root:  root of the mount within the filesystem
(5) mount point:  mount point relative to the process's root
(6) mount options:  per mount options
(7) optional fields:  zero or more fields of the form "tag[:value]"
(8) separator:  marks the end of the optional fields
(9) filesystem type:  name of filesystem of the form "type[.subtype]"
(10) mount source:  filesystem specific information or "none"
(11) super options:  per super block options*/

$ cat /proc/self/mountinfo | grep /var/lib/docker
516 467 202:1 /var/lib/docker /var/lib/docker rw,relatime shared:57 - ext4 /dev/xvda1 rw,data=ordered
....

The root and mount point are the same, so Docker can umount /var/lib/docker.

In general, we have three ways to solve this problem:

  1. Reboot/rebuild the console so that /var/lib/docker can re-mount. Users can do this by restarting Host or switching console.

  2. Make mount root and mount point different. We can update the volumes of container-data-volumes in os-config.yml, perhaps /var/lib/user-docker:/var/lib/docker, then we can get a different mountinfo:

516 467 202:1 /var/lib/user-docker /var/lib/docker rw,relatime shared:57 - ext4 /dev/xvda1 rw,data=ordered
# Docker should not umount the data root(`/var/lib/docker`).
  1. We can do something before Docker daemon starts, such as re-mounting /var/lib/docker.
mkdir /tmp/tmpmount
mount /dev/xvda1 /tmp/tmpmount
mount -o bind /tmp/tmpmount/var/lib/docker /var/lib/docker
umount /tmp/tmpmount

# Then you can check mount info
$ cat /proc/self/mountinfo | grep /var/lib/docker
... /var/lib/docker /var/lib/docker rw,relatime ...
@niusmallnan

This comment has been minimized.

Copy link
Member

commented May 20, 2018

To fix this issue, I decided to change container-data-volumes in os-config.yml, the user-docker data will save to /var/lib/user-docker.
But this will cause user-docker data loss, especially when users upgrade to v1.4.0 using ros os upgrade.
In order to restore these user-docker data, you can refer to the following:

$ system-docker stop docker

$ system-docker run --rm -it -v /:/host alpine
/ # rm -rf /host/var/lib/user-docker/*
/ # cp -a /host/var/lib/docker/* /host/var/lib/user-docker/

$ system-docker start docker
@thaJeztah

This comment has been minimized.

Copy link

commented May 20, 2018

This situation should also be addressed by moby/moby#36879

@niusmallnan

This comment has been minimized.

Copy link
Member

commented May 21, 2018

@thaJeztah Cool, we will test it in next docker-ce stable release.
Expect this PR to be merged. docker/docker-ce#522

@kingsd041

This comment has been minimized.

Copy link
Member Author

commented May 24, 2018

Fixed in rancheros v1.4.0-rc2

@kingsd041 kingsd041 closed this May 24, 2018

@niranjan94

This comment has been minimized.

Copy link

commented May 31, 2018

@niusmallnan I was able to restore my user-docker data by following the steps mentioned by you. Thank you for that. :)

But now, I realise that there as a lot of space being used since the same data is present on both locations /host/var/lib/docker/ and /host/var/lib/user-docker/. Any straightforward way to prune/remove the user-docker data from /host/var/lib/docker/ ?

@jlelse

This comment has been minimized.

Copy link

commented May 31, 2018

Is it safe to delete the old folder after copying it to user-docker?

@niusmallnan

This comment has been minimized.

Copy link
Member

commented Jun 1, 2018

Yes, you can delete the old folder.

@pioto

This comment has been minimized.

Copy link

commented Jun 1, 2018

@niusmallnan, I think you should also use cp -a instead of cp -rf in the above comment ( #2300 (comment) ), so that permissions are preserved properly.

For example, without this, I found that I could not start up rancher/server again

@prologic

This comment has been minimized.

Copy link

commented Jun 10, 2018

Is this still the case?

@stuckj

This comment has been minimized.

Copy link

commented Aug 8, 2018

@prologic, yes, this just happened to me upgrading from 1.0.4 to 1.4.0. @niusmallnan's steps to restore containers / volumes worked for me as well.

@efrecon

This comment has been minimized.

Copy link

commented Aug 15, 2018

I had to restart the machine with the following command to ensure (user-)docker actually finds the old containers, volumes and networks again. I did this instead of the last $ system-docker start docker in the instructions above.

system-docker shutdown -r now
@djmaze

This comment has been minimized.

Copy link

commented Aug 15, 2018

@spikespaz

This comment has been minimized.

Copy link

commented Aug 18, 2018

Where is os-config.yml so I can change container-data-volumes as suggested in the fix above?

Also, I updated before reading this and I am applying the fix after the fact. Am I screwed, or can I recover the data this way?

@wywywywy

This comment has been minimized.

Copy link

commented Feb 25, 2019

Would a symlink work as well? Just in case I need to roll back.

I'm on 1.3.0 wanting to upgrade to the latest.

@niusmallnan

This comment has been minimized.

Copy link
Member

commented Feb 26, 2019

@wywywywy Please try this comment #2300 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.