Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provides a new flag to make rootfs all sharable|slave|private propagation settable #77

Closed
wants to merge 3 commits into from

Conversation

rootfs
Copy link

@rootfs rootfs commented Jul 1, 2015

This is a port of docker-archive/libcontainer#632

Current a container has to nsenter the host's mount namespace to mount filesystem and
share with other containers. This approach doesn't work if the filesystem mount
calls helper utility (/sbin/mount.XXX). This limitation makes containerized kubelet unable to mount certain filesystems.

This commit provides a new flag to make rootfs sharable. Since moving a shared rootfs is semantically confusing for pivot_root(2) and MS_MOVE. A new function changeRoot() is provided to switch rootfs to new destination.

// "private": rootfs is mounted as MS_PRIVATE
// "shared": rootfs is mounted as MS_SHARED
// "slave": rootfs is mounted as MS_SLAVE
RootMount string `json:"root_mount"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could make this more strongly typed like Namespaces instead of strings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, how about calling it RootfsMountMode?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on both.

@rhatdan
Copy link
Contributor

rhatdan commented Jul 13, 2015

@rootfs Could you change this like @mrunalp suggested

  • "Could make this more strongly typed like Namespaces instead of strings. ... how about calling it RootfsMountMode?"

@rootfs
Copy link
Author

rootfs commented Jul 13, 2015

@rhatdan rootfs@d30dec2

@@ -0,0 +1,4 @@
// mount_propagation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Package names are preferred without underscores. https://blog.golang.org/package-names

@kelseyhightower
Copy link

Is there a chance this will land before Docker 1.8?

@@ -0,0 +1,11 @@
// +build linux
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't build tag, it should be separated from package by empty line.
Better just remove it, you use postfix filename notation anyway.

@@ -28,6 +28,8 @@ type Linux struct {
Capabilities []string `json:"capabilities"`
// Devices are a list of device nodes that are created and enabled for the container.
Devices []string `json:"devices"`
// RootfsPropagation is the rootfs mount propagation mode for the container.
RootfsPropagation string `json:"rootfsPropagation"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns how this is appeared here without updating Godeps.json.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a separate PR against specs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged it. Would you mind to update properly?

@kelseyhightower
Copy link

@rootfs This is so close to the finish line, let me know how I can help. I'm happy to review and test.

@LK4D4
Copy link
Contributor

LK4D4 commented Jul 17, 2015

@kelseyhightower Help with testing always appreciated :)

@rootfs
Copy link
Author

rootfs commented Jul 17, 2015

@kelseyhightower many thanks for helping. Let me know if you are able to run my runc branch with this config.json

if p, exists := mountPropagationMapping[spec.Linux.RootfsPropagation]; exists {
config.RootfsMountMode = p
} else {
return nil, fmt.Errorf("invalid rootfs propagation mode:", spec.Linux.RootfsPropagation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should default to something when it isn't specified.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, which default to use?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slave

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@LK4D4
Copy link
Contributor

LK4D4 commented Jul 18, 2015

I wonder where is jenkins :)

@mrunalp
Copy link
Contributor

mrunalp commented Jul 18, 2015

rprivate is causing issues with the chmod on the console.

@rootfs
Copy link
Author

rootfs commented Jul 19, 2015

@mrunalp what are the issues? I've tested chmod with rprivate, so far so good. Thanks.

@mrunalp
Copy link
Contributor

mrunalp commented Jul 20, 2015

@rootfs Actually, it can't find the console for me. I will look more.

FATA[0000] Container start failed: chmod /dev/pts/4: no such file or directory 

@ibuildthecloud
Copy link

My concern is if a container mounts a path that is not under bind mount. Won't that cause issues. Again I'm probably missing something here, so sorry if that's a stupid question. I'll try out this patch later today so I can understand it more.

@rootfs
Copy link
Author

rootfs commented Aug 3, 2015

@ibuildthecloud bind mount is probably the only place containers can create new mountpoints that also share with host.

I also have a docker patch based on this PR

@rootfs
Copy link
Author

rootfs commented Aug 3, 2015

Background information about propagation modes and their use cases can be found in [1] and [2]

  1. Propagation modes: Propogate container's mountpoint to the host docker-archive/libcontainer#632 (comment)
  2. Use Cases: Propogate container's mountpoint to the host docker-archive/libcontainer#632 (comment)

@mrunalp
Copy link
Contributor

mrunalp commented Aug 3, 2015

@rootfs I see container mounts leaked on host after container exit when testing "shared"

....
cgroup on /busybox/sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /busybox/sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /busybox/sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /busybox/sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /busybox/sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /busybox/sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /busybox/sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /busybox/sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
proc on /busybox/proc/sys type proc (rw,relatime)
proc on /busybox/proc/sys type proc (rw,relatime)
proc on /busybox/proc/sysrq-trigger type proc (rw,relatime)
proc on /busybox/proc/sysrq-trigger type proc (rw,relatime)
proc on /busybox/proc/irq type proc (rw,relatime)
proc on /busybox/proc/irq type proc (rw,relatime)
proc on /busybox/proc/bus type proc (rw,relatime)
proc on /busybox/proc/bus type proc (rw,relatime)
tmpfs on /busybox/proc/kcore type tmpfs (rw,nosuid,seclabel,size=65536k,mode=755)
tmpfs on /busybox/proc/kcore type tmpfs (rw,nosuid,seclabel,size=65536k,mode=755)
....

@rhvgoyal
Copy link
Contributor

rhvgoyal commented Aug 3, 2015

Was just chatting with dan walsh and he said that we probably want to make sure that "shared, slave, private" becomes the property of volume being mounted inside the container. (And container root remains PRIVATE). That means only select volumes and any mount under those will become shared and user can control those instead of all of the mounts under container root being shared.

@rhatdan
Copy link
Contributor

rhatdan commented Aug 3, 2015

Right, I believe / should be shared by default (Or arguably slave.) But everything under $rootfs would be PRIVATE. I think the internal mounts should all be done privately. Volume mounts from the host should be controlled on a individual basis, but the default would be shared.

@rhvgoyal
Copy link
Contributor

rhvgoyal commented Aug 3, 2015

If / is SLAVE then I don't think volumes can be shared. May be we can scan the mount config and if any of the volumes are shared, then use SHARED for /, if any of the volumes are slave, then use
SLAVE for root otherwise fall back to PRIVATE.

@rootfs
Copy link
Author

rootfs commented Aug 3, 2015

@mrunalp somehow in docker there is no leak :)
@rhvgoyal i am not sure if we can use the same parent mount namespace for multiple containers. A new mount namespace doesn't have this problem.
@rhatdan I think if we can make rootfs shared, then directory that are direct under rootfs are propagated. We can make 1st level mountpoint slave/private so 2nd level mountpoint are not propagated.

@mrunalp
Copy link
Contributor

mrunalp commented Aug 3, 2015

@rootfs Docker uses its own mount namespace and so might not be visible in the default namespace.

@rhatdan
Copy link
Contributor

rhatdan commented Aug 3, 2015

@mrunalp Right we need to address that issue also. But I want to just get to the point that we could share mounts between containers.

@rhatdan
Copy link
Contributor

rhatdan commented Aug 3, 2015

@rootfs Right I want to rprivate on rootfs not on /. Bottom line we want to allow volume mounts differently then we do internal mounts.

@rootfs
Copy link
Author

rootfs commented Aug 3, 2015

@rhatdan got it, let me try it, thanks.

@rhvgoyal
Copy link
Contributor

rhvgoyal commented Aug 3, 2015

Right, once we have the mechanism to have some mounts as "shared" between container and daemon we can look into not running docker daemon in a separate namespace. (Or train users to
get into docker namespace).

@rootfs
Copy link
Author

rootfs commented Aug 3, 2015

@rhatdan new commit makes rootfs private if / is shared.
@mrunalp no more leak

…tion mode settable

Signed-off-by: Huamin Chen <hchen@redhat.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
@mrunalp
Copy link
Contributor

mrunalp commented Aug 3, 2015

@rootfs You could take out the Godep update code as we just merged a PR to update the spec. See #173

@rootfs
Copy link
Author

rootfs commented Aug 3, 2015

@mrunalp cool, thanks. just rebased.

@rhvgoyal
Copy link
Contributor

rhvgoyal commented Aug 3, 2015

I have taken @rhatdan paches to allow shared/slave volume mounts in a PRIVATE rootfs. Modified it a bit to allow shared mounts. While I can make a volume mount sharable but it does not seem to propagating to host namespace. I am not sure why. I am trying to debug it.

In the mean time, here are the two commits. Hoping somebody can notice what am I doing wrong.

rhvgoyal@cd9a387
rhvgoyal@5cdc224

@rhvgoyal
Copy link
Contributor

rhvgoyal commented Aug 3, 2015

I think this notion that container rootfs can be private and it can still have mounts under it which are receiving events from host namespace (slave) or sending back events to host namespace (shared) is flawed. Once you make rootfs private, looks like any connection from root.

For example, I created a directory rootfs which contains container root fs. and I did following.

$ unshare -m
$ mount --bind rootfs rootfs
Above mount operation becomes visible in host as well.
$ mount --make-private rootfs
This breaks sharing. Now any mount operation under rootfs, does not get propagated to host namespace.

So I think after clone operation we will have to make sure rootfs is SHARED to make sure events propagate back to hostnamespace.

EDIT:

Looks like above is true only for one level (for certain type of mounts). We can still have mounts deeper in the hierarchy which are shared. For example, I tried following.

$ unshare -m
$mount --bind rootfs rootfs
(Above mount point becomes visible in host as well)
$ mount --make-private rootfs
$ mkdir rootfs/mnt1
$ mkdir mnt1-source
$ mount --bind mnt1-source rootfs/mnt
(This mount operation is not visible in host as parent mount is PRIVATE. Note mnt1 is SHARED though. Not sure where default propagation of a mount point comes from.)
$ mkdir rootfs/mnt1/mnt1child
$ mkdir mnt1-child-source
$ mount --bind mnt1-child-source rootfs/mnt1/mnt1child
(This mount operation does propagate to host namespace as parent mount mnt1 is SHARED).

In summary, rootfs was PRIVATE so any mounts directly under rootfs did not propagate to host. But children of rootfs themselves were SHARED so grand children mounts of rootfs did propagate back to host. So looks like PRIVATE (non-recursive) is effective only for immediate children.

Having said that, it was true only if mount source was outside above rootfs/. If I did convert some directories under rootfs to mount points by bind mounting these over themselves, then propagation did not happen to host and every child mount point was PRIVATE.

@rootfs
Copy link
Author

rootfs commented Aug 3, 2015

Did you run the tests in runc or docker?

@rhvgoyal
Copy link
Contributor

rhvgoyal commented Aug 3, 2015

@rootfs

I was playing with runc. In above example, I am not using runc or docker. Just trying to make use of "unshare -m" to get a separate mount namespace and playing with mount operations and see what is being propagated back to host mount namespace.

@rootfs
Copy link
Author

rootfs commented Aug 3, 2015

@rhatdan pointed out there were two rootfs: one that lives on the host and is where container's / binds to, the other is container's /. Making the first rootfs private is ok, because it blocks container's /dev, /proc, etc from showing up on the host. As long as container's / (i.e. second rootfs) is shared, mounts created under container's / is propagated to host.

rootfs referenced this pull request in rootfs/runc Aug 10, 2015
Signed-off-by: Huamin Chen <hchen@redhat.com>
@rootfs
Copy link
Author

rootfs commented Aug 17, 2015

closed. please comment on the new PR #208

@rootfs rootfs closed this Aug 17, 2015
stefanberger pushed a commit to stefanberger/runc that referenced this pull request Sep 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants