Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libcontainer: fix a bug when setting shared rootfs propagation mode #1815

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dongsupark
Copy link

So far when the input mount flags contain MS_SHARED, the flag has not been applied to the container rootfs. That's because we call rootfsParentMountPrivate() after applying the original mount flags. Though we actually need to mount the container rootfs with MS_PRIVATE, to avoid failure from pivot_root() in the Linux kernel.

Thus if the mount flags contain MS_SHARED, we need a special case handling. First do pivotRoot() (or msMoveRoot, chroot) with the rootfs with a mount flag MS_PRIVATE. Then after pivotRoot(), again mount the rootfs with MS_SHARED.

With this fix, validation/linux_rootfs_propagation.t of runtime-tools works well with the shared mode finally.

Fixes #1755

/cc @alban

Signed-off-by: Dongsu Park dongsu@kinvolk.io

@mrunalp
Copy link
Contributor

mrunalp commented Jun 13, 2018

@rhvgoyal PTAL

@AkihiroSuda
Copy link
Member

Can we have integration test (in this repo)?

@lifubang
Copy link
Member

@dongsupark Are you still interested in this PR?

@dongsupark
Copy link
Author

@lifubang Thanks for the ping.
But I have completely lost the context. First I have to try to remember what I did. ;-)

@lifubang
Copy link
Member

Please see #3948

@sstosh
Copy link

sstosh commented Jul 28, 2023

The root directory mount propagation was set to shared when this commit applied, like crun.

[root@fedora38 runc]# podman info --format {{.Host.OCIRuntime.Version}}
runc version 1.1.0+dev
commit: v1.1.0-647-ga5777e87-dirty
spec: 1.1.0-rc.3
go: go1.20.5
libseccomp: 2.5.3

[root@fedora38 runc]# mkdir ~/hoge && mount -t tmpfs tmpfs ~/hoge
[root@fedora38 runc]# mount | grep "/hoge "
tmpfs on /root/hoge type tmpfs (rw,relatime,seclabel,inode64)

[root@fedora38 runc]# podman run -itd --privileged --name testcon --volume ~/hoge:/hoge:z,shared fedora-minimal:34 /bin/bash
1cb7167f2d47f18440c9f1b792a49bbc5c2f52def79dc9c746c9536a6e24f229

[root@fedora38 runc]# cat /var/lib/containers/storage/overlay-containers/1cb7167f2d47f18440c9f1b792a49bbc5c2f52def79dc9c746c9536a6e24f229/userdata/config.json | jq .linux.rootfsPropagation
"shared"

[root@fedora38 runc]# podman exec testcon mkdir /test
[root@fedora38 runc]# podman exec testcon mount -t tmpfs tmpfs /test
[root@fedora38 runc]# podman exec testcon findmnt -o "TARGET,PROPAGATION"
TARGET               PROPAGATION
/                    shared
|-/test              shared
|-/sys               private
| `-/sys/fs/cgroup   private
|-/proc              private
|-/dev               private
| |-/dev/console     private
| |-/dev/mqueue      private
| |-/dev/shm         private
| `-/dev/pts         private
|-/hoge              shared
|-/run/.containerenv private
|-/etc/hosts         private
|-/run/secrets       private
|-/etc/hostname      private
`-/etc/resolv.conf   private

In other cases, however, the mount propagation was set differently from crun.

For example, the following Steps to reproduce shows that the mount propagation for
the root directory was shared for crun, but shared,slave for runc.

Since the .linux.rootfsPropagation field in config.json is shared,
it seems correct to be shared like crun.

Steps to reproduce:

  • runc
[test@fedora38 ~]$ podman info --format {{.Host.OCIRuntime.Version}}
runc version 1.1.0+dev
commit: v1.1.0-647-ga5777e87-dirty
spec: 1.1.0-rc.3
go: go1.20.5
libseccomp: 2.5.3

[test@fedora38 ~]$ mount | grep /tmp
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel,nr_inodes=1048576,inode64)
[test@fedora38 ~]$ mkdir ~/{vol1,vol2}

[test@fedora38 ~]$ podman --storage-driver vfs --root /tmp/hoge run -itd --name testcon --volume ~/vol1:/myvol1:z,shared --volume ~/vol2:/myvol2:z fedora-minimal:34 /bin/bash
... snip ...
f4c1e806d6d475aec845ee0b5e29df5e36badeec00749b8d96be931ce43699a6

[test@fedora38 ~]$ cat /tmp/hoge/vfs-containers/f4c1e806d6d475aec845ee0b5e29df5e36badeec00749b8d96be931ce43699a6/userdata/config.json | jq .linux.rootfsPropagation
"shared"

[test@fedora38 ~]$ podman --root /tmp/hoge --storage-driver vfs exec testcon findmnt -o "TARGET,PROPAGATION" | grep -e "\/ " -e "\/myvol1 " -e "\/myvol2 "
/                       shared,slave
|-/myvol1               shared,slave
|-/myvol2               private
  • crun
[test@fedora38 ~]$ podman info --format {{.Host.OCIRuntime.Version}}
crun version 1.8.5
commit: b6f80f766c9a89eb7b1440c0a70ab287434b17ed
rundir: /run/user/1000/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL

[test@fedora38 ~]$ mount | grep /tmp
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel,nr_inodes=1048576,inode64)
[test@fedora38 ~]$ mkdir ~/{vol1,vol2}

[test@fedora38 ~]$ podman --storage-driver vfs --root /tmp/hoge run -itd --name testcon --volume ~/vol1:/myvol1:z,shared --volume ~/vol2:/myvol2:z fedora-minimal:34 /bin/bash
... snip ...
1d73a122ae6e4f7fe455b88677e91b7e20fd421ff141a0cb620b08f3d5cd2f48

[test@fedora38 ~]$ cat /tmp/hoge/vfs-containers/1d73a122ae6e4f7fe455b88677e91b7e20fd421ff141a0cb620b08f3d5cd2f48/userdata/config.json | jq .linux.rootfsPropagation
"shared"

[test@fedora38 ~]$ podman --root /tmp/hoge --storage-driver vfs exec testcon findmnt -o "TARGET,PROPAGATION" | grep -e "\/ " -e "\/myvol1 " -e "\/myvol2 "
/                       shared
|-/myvol1               shared,slave
|-/myvol2               private

@dongsupark
Copy link
Author

Rebased the PR, addressed review comments, and added an integration test.

In other cases, however, the mount propagation was set differently from crun.
For example, the following Steps to reproduce shows that the mount propagation for the root directory was shared for crun, but shared,slave for runc.

Yes, I am seeing that as well. I could not figure it out.
That issue however seems to happen already in main as well as v1.1.8 of runc.

So far when the input mount flags contain `MS_SHARED`, the flag has
not been applied to the container rootfs. That's because we call
`rootfsParentMountPrivate()` after applying the original mount flags.
As a result, the original flags are overwritten. Though it's also true
that we actually need to mount the container rootfs with `MS_PRIVATE`,
to avoid failure from `pivot_root()` in the Linux kernel.

Thus if the mount flags contain `MS_SHARED`, we need a special case
handling. First do `pivotRoot()` (or `msMoveRoot`, `chroot`) with the
rootfs with a mount flag `MS_PRIVATE`. Then after `pivotRoot()`, again
mount the rootfs with `MS_SHARED`.

With this fix, `validation/linux_rootfs_propagation.t` of runtime-tools
works well with the shared mode finally.

Fixes opencontainers#1755

Signed-off-by: Dongsu Park <dpark@linux.microsoft.com>
Signed-off-by: Dongsu Park <dpark@linux.microsoft.com>
@sstosh
Copy link

sstosh commented Jul 31, 2023

I confirmed that the mount propagation also contains slave in runc main branch.
Therefore, at least, it doesn't seem to be caused by rootfsParentMountShared().

[test@fedora38 ~]$ podman info --format {{.Host.OCIRuntime.Version}}
runc version 1.1.0+dev
commit: v1.1.0-656-g14c7ab7f
spec: 1.1.0
go: go1.20.5
libseccomp: 2.5.3

[test@fedora38 ~]$ mount | grep /tmp
tmpfs on /tmp type tmpfs (rw,nosuid,nodev,seclabel,nr_inodes=1048576,inode64)
[test@fedora38 ~]$ mkdir ~/{vol1,vol2}

[test@fedora38 ~]$ podman --storage-driver vfs --root /tmp/hoge run -itd --name testcon --volume ~/vol1:/myvol1:z,shared --volume ~/vol2:/myvol2:z fedora-minimal:34 /bin/bash
... snip ...
15fdd0d1fce19b8755254cf913eacb8d7355fe6a9739db219c063964a47a711d

[test@fedora38 ~]$ cat /tmp/hoge/vfs-containers/15fdd0d1fce19b8755254cf913eacb8d7355fe6a9739db219c063964a47a711d/userdata/config.json | jq .linux.rootfsPropagation
"shared"

[test@fedora38 ~]$ podman --root /tmp/hoge --storage-driver vfs exec testcon findmnt -o "TARGET,PROPAGATION" | grep -e "\/ " -e "\/myvol1 " -e "\/myvol2 "
/                       private,slave
|-/myvol1               shared,slave
|-/myvol2               private

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rootfsPropagation=shared does not work
5 participants