Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail docker exec after restart docker daemon with live-restore=true and MountFlags=slave #35873

Open
qkboy opened this issue Dec 25, 2017 · 3 comments

Comments

@qkboy
Copy link

qkboy commented Dec 25, 2017

Description
Something like #29342 ,but the same problem in the latest docker-ce 17.09.1-ce version.

Steps to reproduce the issue:
1.Build a image use other user, like USER apps
2.Enable docker with live-restore=true
3.Add MountFlags=slave into systemd docker.service configure , restart docker daemon service
4.Run a container with last image
5.Use docker exec to run some command in this container, such as docker exec -it test id. it's ok.
6.Restart docker daemon
7.Run the same command in step 5 again. docker exec -it test id then get failed.

Describe the results you received:

# docker exec -it test id
unable to find user apps: no matching entries in passwd file

Describe the results you expected:

# docker exec -it test id
uid=500(apps) gid=500(apps) groups=500(apps)

Additional information you deem important (e.g. issue happens only occasionally):
According to the #29365 #29459 pull request.
Adds the daemon.Mount during the restore to bring up the container.BaseFS. Can resolve docker exec -u user problem , but not all right.
When docker stop container , will get warning log:

level=warning msg="error locating sandbox id 78e584163857ff32f41b741e9cea89bcac15cbf9ac75843bfb54d64c1dfb7688: sandbox 78e584163857ff32f41b741e9cea89bcac15cbf9ac75843bfb54d64c1dfb7688 not found"
level=warning msg="failed to cleanup ipc mounts:\nfailed to umount /var/lib/docker/containers/a015cba4cb78c224d145df78d7bbd96fff70437f88df310d0708913f645a285f/shm: invalid argument"

Uncomment MountFlags=slave or change to MountFlags=shared also avoid this problem. But then will produce another issue: When docker rm, the container cannot exit. Because container's overlay work directory report device busy,

Output of docker version:

# docker -v
Docker version 17.09.1-ce, build 19e2cf6
# dockerd -v
Docker version 17.09.1-ce, build 19e2cf6

Output of docker info:

# docker info
Containers: 2
 Running: 1
 Paused: 0
 Stopped: 1
Images: 2
Server Version: 17.09.1-ce
Storage Driver: overlay
 Backing Filesystem: extfs
 Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 3f2f8b84a77f73d38244dd690525642a72156c64
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-693.11.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.51GiB
Name: k8s-node03
ID: X3JY:IF2R:KLBW:7ITB:OBAY:H7RR:FNN4:QJ2X:VX5J:EKC7:WQSA:CZKS
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Cluster Store: etcd://192.168.86.171:2379
Cluster Advertise: 192.168.86.177:2376
Insecure Registries:
 registry.vclound.com:5000
 127.0.0.0/8
Live Restore Enabled: true

Additional environment details (AWS, VirtualBox, physical, etc.):

# cat /etc/centos-release
CentOS Linux release 7.4.1708 (Core) 
# uname -a
Linux k8s-node03 3.10.0-693.11.1.el7.x86_64 #1 SMP Mon Dec 4 23:52:40 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
@ktibi
Copy link

ktibi commented May 3, 2018

Same issue on Docker version 17.12.1-ce, build 7390fc6 on centos 7.5

My fix is to stop & start the container. I don't know why but "restart action" does not seem to be doing the same action.

@cpuguy83
Copy link
Member

cpuguy83 commented May 3, 2018

This is expected behavior. You must use shared propagation with live restore.

Note that issues with removal with shared propagation are fixed in 17.12.1 and up.

This is a known issue that can't really be fixed with live restore because the mounts exist in a different mount namespace when using systemd's mount propagation settings.

It is recommended to use shared mount propagation in all cases, but is absolutely necessary for live-restore.

It is likely that this information is missing from the docs.

@ktibi
Copy link

ktibi commented May 4, 2018

Hi @cpuguy83, thx for you answer.

I use shared mountflags

systemctl show docker | grep Mount
MountFlags=1048576

I think Redhat systeme doesn't support live-restore option : https://access.redhat.com/solutions/2991041

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants