Fail docker exec after restart docker daemon with live-restore=true and MountFlags=slave #35873

qkboy · 2017-12-25T15:56:46Z

Description
Something like #29342 ,but the same problem in the latest docker-ce 17.09.1-ce version.

Steps to reproduce the issue:
1.Build a image use other user, like USER apps
2.Enable docker with live-restore=true
3.Add MountFlags=slave into systemd docker.service configure , restart docker daemon service
4.Run a container with last image
5.Use docker exec to run some command in this container, such as docker exec -it test id. it's ok.
6.Restart docker daemon
7.Run the same command in step 5 again. docker exec -it test id then get failed.

Describe the results you received:

# docker exec -it test id
unable to find user apps: no matching entries in passwd file

Describe the results you expected:

# docker exec -it test id
uid=500(apps) gid=500(apps) groups=500(apps)

Additional information you deem important (e.g. issue happens only occasionally):
According to the #29365 #29459 pull request.
Adds the daemon.Mount during the restore to bring up the container.BaseFS. Can resolve docker exec -u user problem , but not all right.
When docker stop container , will get warning log:

level=warning msg="error locating sandbox id 78e584163857ff32f41b741e9cea89bcac15cbf9ac75843bfb54d64c1dfb7688: sandbox 78e584163857ff32f41b741e9cea89bcac15cbf9ac75843bfb54d64c1dfb7688 not found"
level=warning msg="failed to cleanup ipc mounts:\nfailed to umount /var/lib/docker/containers/a015cba4cb78c224d145df78d7bbd96fff70437f88df310d0708913f645a285f/shm: invalid argument"

Uncomment MountFlags=slave or change to MountFlags=shared also avoid this problem. But then will produce another issue: When docker rm, the container cannot exit. Because container's overlay work directory report device busy,

Output of docker version:

# docker -v
Docker version 17.09.1-ce, build 19e2cf6
# dockerd -v
Docker version 17.09.1-ce, build 19e2cf6

Output of docker info:

# docker info
Containers: 2
 Running: 1
 Paused: 0
 Stopped: 1
Images: 2
Server Version: 17.09.1-ce
Storage Driver: overlay
 Backing Filesystem: extfs
 Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 3f2f8b84a77f73d38244dd690525642a72156c64
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 3.10.0-693.11.1.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.51GiB
Name: k8s-node03
ID: X3JY:IF2R:KLBW:7ITB:OBAY:H7RR:FNN4:QJ2X:VX5J:EKC7:WQSA:CZKS
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Experimental: false
Cluster Store: etcd://192.168.86.171:2379
Cluster Advertise: 192.168.86.177:2376
Insecure Registries:
 registry.vclound.com:5000
 127.0.0.0/8
Live Restore Enabled: true

Additional environment details (AWS, VirtualBox, physical, etc.):

# cat /etc/centos-release
CentOS Linux release 7.4.1708 (Core) 
# uname -a
Linux k8s-node03 3.10.0-693.11.1.el7.x86_64 #1 SMP Mon Dec 4 23:52:40 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

The text was updated successfully, but these errors were encountered:

ktibi · 2018-05-03T11:52:47Z

Same issue on Docker version 17.12.1-ce, build 7390fc6 on centos 7.5

My fix is to stop & start the container. I don't know why but "restart action" does not seem to be doing the same action.

cpuguy83 · 2018-05-03T23:34:24Z

This is expected behavior. You must use shared propagation with live restore.

Note that issues with removal with shared propagation are fixed in 17.12.1 and up.

This is a known issue that can't really be fixed with live restore because the mounts exist in a different mount namespace when using systemd's mount propagation settings.

It is recommended to use shared mount propagation in all cases, but is absolutely necessary for live-restore.

It is likely that this information is missing from the docs.

ktibi · 2018-05-04T12:33:48Z

Hi @cpuguy83, thx for you answer.

I use shared mountflags

systemctl show docker | grep Mount
MountFlags=1048576

I think Redhat systeme doesn't support live-restore option : https://access.redhat.com/solutions/2991041

thaJeztah added the area/runtime label Dec 27, 2017

haxorof mentioned this issue Apr 10, 2018

starting setns process caused \"fork/exec /proc/self/exe: no such file or directory\"" haxorof/ansible-role-docker-ce#33

Closed

github-actions bot mentioned this issue Mar 17, 2021

Pod terminating qiwihui/pocket_readings#1049

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail docker exec after restart docker daemon with live-restore=true and MountFlags=slave #35873

Fail docker exec after restart docker daemon with live-restore=true and MountFlags=slave #35873

qkboy commented Dec 25, 2017

ktibi commented May 3, 2018

cpuguy83 commented May 3, 2018

ktibi commented May 4, 2018

Fail docker exec after restart docker daemon with live-restore=true and MountFlags=slave #35873

Fail docker exec after restart docker daemon with live-restore=true and MountFlags=slave #35873

Comments

qkboy commented Dec 25, 2017

ktibi commented May 3, 2018

cpuguy83 commented May 3, 2018

ktibi commented May 4, 2018