New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provides a new flag to make rootfs all sharable|slave|private propagation settable #77

Closed
wants to merge 3 commits into
base: master
from

Conversation

Projects
None yet
9 participants
@rootfs
Copy link

rootfs commented Jul 1, 2015

This is a port of docker/libcontainer#632

Current a container has to nsenter the host's mount namespace to mount filesystem and
share with other containers. This approach doesn't work if the filesystem mount
calls helper utility (/sbin/mount.XXX). This limitation makes containerized kubelet unable to mount certain filesystems.

This commit provides a new flag to make rootfs sharable. Since moving a shared rootfs is semantically confusing for pivot_root(2) and MS_MOVE. A new function changeRoot() is provided to switch rootfs to new destination.

// "private": rootfs is mounted as MS_PRIVATE
// "shared": rootfs is mounted as MS_SHARED
// "slave": rootfs is mounted as MS_SLAVE
RootMount string `json:"root_mount"`

This comment has been minimized.

@mrunalp

mrunalp Jul 1, 2015

Contributor

Could make this more strongly typed like Namespaces instead of strings.

This comment has been minimized.

@mrunalp

mrunalp Jul 1, 2015

Contributor

Also, how about calling it RootfsMountMode?

This comment has been minimized.

@LK4D4

LK4D4 Jul 2, 2015

Contributor

Agree on both.

@rhatdan

This comment has been minimized.

Copy link
Contributor

rhatdan commented Jul 13, 2015

@rootfs Could you change this like @mrunalp suggested

  • "Could make this more strongly typed like Namespaces instead of strings. ... how about calling it RootfsMountMode?"
@rootfs

This comment has been minimized.

Copy link

rootfs commented Jul 13, 2015

@@ -0,0 +1,4 @@
// mount_propagation

This comment has been minimized.

@mrunalp

mrunalp Jul 13, 2015

Contributor

Package names are preferred without underscores. https://blog.golang.org/package-names

@@ -339,9 +343,12 @@ func mknodDevice(dest string, node *configs.Device) error {

func prepareRoot(config *configs.Config) error {
flag := syscall.MS_SLAVE | syscall.MS_REC
if config.Privatefs {
if config.RootfsMountMode == configs.PRIVATE {

This comment has been minimized.

@mrunalp

mrunalp Jul 13, 2015

Contributor

Maybe use a switch here and also handle configs.SLAVE?

This comment has been minimized.

@rhatdan

rhatdan Jul 14, 2015

Contributor

configs.SLAVE is the default. Does this properly handle configs.SHARED?

This comment has been minimized.

@rootfs

rootfs Jul 14, 2015

SHARED is handled below.

@kelseyhightower

This comment has been minimized.

Copy link

kelseyhightower commented Jul 17, 2015

Is there a chance this will land before Docker 1.8?

@@ -0,0 +1,11 @@
// +build linux

This comment has been minimized.

@LK4D4

LK4D4 Jul 17, 2015

Contributor

this isn't build tag, it should be separated from package by empty line.
Better just remove it, you use postfix filename notation anyway.

@@ -0,0 +1,4 @@
// mountpropagation

This comment has been minimized.

@LK4D4

LK4D4 Jul 17, 2015

Contributor

weird comment

@rootfs rootfs force-pushed the rootfs:dev branch from 7af6006 to 4feed9e Jul 17, 2015

@@ -28,6 +28,8 @@ type Linux struct {
Capabilities []string `json:"capabilities"`
// Devices are a list of device nodes that are created and enabled for the container.
Devices []string `json:"devices"`
// RootfsPropagation is the rootfs mount propagation mode for the container.
RootfsPropagation string `json:"rootfsPropagation"`

This comment has been minimized.

@LK4D4

LK4D4 Jul 17, 2015

Contributor

I have concerns how this is appeared here without updating Godeps.json.

This comment has been minimized.

@rootfs

rootfs Jul 17, 2015

there is a separate PR against specs

This comment has been minimized.

@LK4D4

LK4D4 Jul 17, 2015

Contributor

I merged it. Would you mind to update properly?

@kelseyhightower

This comment has been minimized.

Copy link

kelseyhightower commented Jul 17, 2015

@rootfs This is so close to the finish line, let me know how I can help. I'm happy to review and test.

@LK4D4

This comment has been minimized.

Copy link
Contributor

LK4D4 commented Jul 17, 2015

@kelseyhightower Help with testing always appreciated :)

@rootfs

This comment has been minimized.

Copy link

rootfs commented Jul 17, 2015

@kelseyhightower many thanks for helping. Let me know if you are able to run my runc branch with this config.json

spec.go Outdated
if p, exists := mountPropagationMapping[spec.Linux.RootfsPropagation]; exists {
config.RootfsMountMode = p
} else {
return nil, fmt.Errorf("invalid rootfs propagation mode:", spec.Linux.RootfsPropagation)

This comment has been minimized.

@mrunalp

mrunalp Jul 17, 2015

Contributor

I think we should default to something when it isn't specified.

This comment has been minimized.

@rootfs

rootfs Jul 17, 2015

sure, which default to use?

This comment has been minimized.

@rhatdan

rhatdan Jul 17, 2015

Contributor

slave

This comment has been minimized.

@rootfs
@LK4D4

This comment has been minimized.

Copy link
Contributor

LK4D4 commented Jul 18, 2015

I wonder where is jenkins :)

@mrunalp

This comment has been minimized.

Copy link
Contributor

mrunalp commented Jul 18, 2015

rprivate is causing issues with the chmod on the console.

@rootfs

This comment has been minimized.

Copy link

rootfs commented Jul 19, 2015

@mrunalp what are the issues? I've tested chmod with rprivate, so far so good. Thanks.

@mrunalp

This comment has been minimized.

Copy link
Contributor

mrunalp commented Jul 20, 2015

@rootfs Actually, it can't find the console for me. I will look more.

FATA[0000] Container start failed: chmod /dev/pts/4: no such file or directory 
@rhvgoyal

This comment has been minimized.

Copy link
Contributor

rhvgoyal commented Aug 3, 2015

Was just chatting with dan walsh and he said that we probably want to make sure that "shared, slave, private" becomes the property of volume being mounted inside the container. (And container root remains PRIVATE). That means only select volumes and any mount under those will become shared and user can control those instead of all of the mounts under container root being shared.

@rhatdan

This comment has been minimized.

Copy link
Contributor

rhatdan commented Aug 3, 2015

Right, I believe / should be shared by default (Or arguably slave.) But everything under $rootfs would be PRIVATE. I think the internal mounts should all be done privately. Volume mounts from the host should be controlled on a individual basis, but the default would be shared.

@rhvgoyal

This comment has been minimized.

Copy link
Contributor

rhvgoyal commented Aug 3, 2015

If / is SLAVE then I don't think volumes can be shared. May be we can scan the mount config and if any of the volumes are shared, then use SHARED for /, if any of the volumes are slave, then use
SLAVE for root otherwise fall back to PRIVATE.

@rootfs

This comment has been minimized.

Copy link

rootfs commented Aug 3, 2015

@mrunalp somehow in docker there is no leak :)
@rhvgoyal i am not sure if we can use the same parent mount namespace for multiple containers. A new mount namespace doesn't have this problem.
@rhatdan I think if we can make rootfs shared, then directory that are direct under rootfs are propagated. We can make 1st level mountpoint slave/private so 2nd level mountpoint are not propagated.

@mrunalp

This comment has been minimized.

Copy link
Contributor

mrunalp commented Aug 3, 2015

@rootfs Docker uses its own mount namespace and so might not be visible in the default namespace.

@rhatdan

This comment has been minimized.

Copy link
Contributor

rhatdan commented Aug 3, 2015

@mrunalp Right we need to address that issue also. But I want to just get to the point that we could share mounts between containers.

@rhatdan

This comment has been minimized.

Copy link
Contributor

rhatdan commented Aug 3, 2015

@rootfs Right I want to rprivate on rootfs not on /. Bottom line we want to allow volume mounts differently then we do internal mounts.

@rootfs

This comment has been minimized.

Copy link

rootfs commented Aug 3, 2015

@rhatdan got it, let me try it, thanks.

@rhvgoyal

This comment has been minimized.

Copy link
Contributor

rhvgoyal commented Aug 3, 2015

Right, once we have the mechanism to have some mounts as "shared" between container and daemon we can look into not running docker daemon in a separate namespace. (Or train users to
get into docker namespace).

@rootfs

This comment has been minimized.

Copy link

rootfs commented Aug 3, 2015

@rhatdan new commit makes rootfs private if / is shared.
@mrunalp no more leak

rootfs added some commits Jul 1, 2015

provides a new flag to make rootfs all sharable|slave|private propaga…
…tion mode settable

Signed-off-by: Huamin Chen <hchen@redhat.com>
review feedback: if / is shared, make rootfs private
Signed-off-by: Huamin Chen <hchen@redhat.com>

@rootfs rootfs force-pushed the rootfs:dev branch from fdddb0c to 98e63f8 Aug 3, 2015

fix rebase
Signed-off-by: Huamin Chen <hchen@redhat.com>
@mrunalp

This comment has been minimized.

Copy link
Contributor

mrunalp commented Aug 3, 2015

@rootfs You could take out the Godep update code as we just merged a PR to update the spec. See #173

@rootfs

This comment has been minimized.

Copy link

rootfs commented Aug 3, 2015

@mrunalp cool, thanks. just rebased.

@rhvgoyal

This comment has been minimized.

Copy link
Contributor

rhvgoyal commented Aug 3, 2015

I have taken @rhatdan paches to allow shared/slave volume mounts in a PRIVATE rootfs. Modified it a bit to allow shared mounts. While I can make a volume mount sharable but it does not seem to propagating to host namespace. I am not sure why. I am trying to debug it.

In the mean time, here are the two commits. Hoping somebody can notice what am I doing wrong.

rhvgoyal@cd9a387
rhvgoyal@5cdc224

@rhvgoyal

This comment has been minimized.

Copy link
Contributor

rhvgoyal commented Aug 3, 2015

I think this notion that container rootfs can be private and it can still have mounts under it which are receiving events from host namespace (slave) or sending back events to host namespace (shared) is flawed. Once you make rootfs private, looks like any connection from root.

For example, I created a directory rootfs which contains container root fs. and I did following.

$ unshare -m
$ mount --bind rootfs rootfs
Above mount operation becomes visible in host as well.
$ mount --make-private rootfs
This breaks sharing. Now any mount operation under rootfs, does not get propagated to host namespace.

So I think after clone operation we will have to make sure rootfs is SHARED to make sure events propagate back to hostnamespace.

EDIT:

Looks like above is true only for one level (for certain type of mounts). We can still have mounts deeper in the hierarchy which are shared. For example, I tried following.

$ unshare -m
$mount --bind rootfs rootfs
(Above mount point becomes visible in host as well)
$ mount --make-private rootfs
$ mkdir rootfs/mnt1
$ mkdir mnt1-source
$ mount --bind mnt1-source rootfs/mnt
(This mount operation is not visible in host as parent mount is PRIVATE. Note mnt1 is SHARED though. Not sure where default propagation of a mount point comes from.)
$ mkdir rootfs/mnt1/mnt1child
$ mkdir mnt1-child-source
$ mount --bind mnt1-child-source rootfs/mnt1/mnt1child
(This mount operation does propagate to host namespace as parent mount mnt1 is SHARED).

In summary, rootfs was PRIVATE so any mounts directly under rootfs did not propagate to host. But children of rootfs themselves were SHARED so grand children mounts of rootfs did propagate back to host. So looks like PRIVATE (non-recursive) is effective only for immediate children.

Having said that, it was true only if mount source was outside above rootfs/. If I did convert some directories under rootfs to mount points by bind mounting these over themselves, then propagation did not happen to host and every child mount point was PRIVATE.

@rootfs

This comment has been minimized.

Copy link

rootfs commented Aug 3, 2015

Did you run the tests in runc or docker?

@rhvgoyal

This comment has been minimized.

Copy link
Contributor

rhvgoyal commented Aug 3, 2015

@rootfs

I was playing with runc. In above example, I am not using runc or docker. Just trying to make use of "unshare -m" to get a separate mount namespace and playing with mount operations and see what is being propagated back to host mount namespace.

@rootfs

This comment has been minimized.

Copy link

rootfs commented Aug 3, 2015

@rhatdan pointed out there were two rootfs: one that lives on the host and is where container's / binds to, the other is container's /. Making the first rootfs private is ok, because it blocks container's /dev, /proc, etc from showing up on the host. As long as container's / (i.e. second rootfs) is shared, mounts created under container's / is propagated to host.

@rhvgoyal

This comment has been minimized.

Copy link

rhvgoyal commented on libcontainer/rootfs_linux.go in 165cfe0 Aug 4, 2015

I think decision to use changeRoot() should be based on a separate config option. Say "useChangeRoot". That way it is more generic and one can use chroot() even if one is not using shared mount propagation mode.

May be we can ask caller to specify useChangeRoot if SHARED mode is being used. That way, libcontainer does not decide to automatically use chroot().

This comment has been minimized.

Copy link

rhvgoyal replied Aug 4, 2015

If we are using chroot() and say we export a config knob for that, then can we bypass creating a mount namespace completely.

IOW, there does not seem to be a need for creating a separate namespace if we are using chroot().

If that's the case, then we don't have to get into this business of specifying propagation mode. We can just ask
user to remove "mount" from namespaces[] in spec file and specify useChangeRoot and that should be equivalent
of SHARED mode?

PRIVATE is default now and we can export Privatefs so that one can do mounting using SLAVE.

This comment has been minimized.

Copy link

rhvgoyal replied Aug 4, 2015

What I mean is that using SHARED root is not necessary for what we are trying to achieve. Same can be achieved by doing simple chroot() and not using namespace at all. So why not enable that shortest path atleast for the case of SHARED.

Exporting Privatefs so that user can set it to false should be good enough for the case of SLAVE root.

@rhvgoyal

This comment has been minimized.

Copy link

rhvgoyal commented on spec.go in 165cfe0 Aug 4, 2015

Are you changing the default. Current default seems to be PRIVATE and now you seem to be switching it to SLAVE?

This comment has been minimized.

Copy link
Owner

rootfs replied Aug 4, 2015

this is from Dan. I believe slave is less confusing, containers will receive host mount propagation.

@rhvgoyal

This comment has been minimized.

Copy link

rhvgoyal commented on libcontainer/container_linux.go in 165cfe0 Aug 4, 2015

All the mount points will be visible in the host as well as these will leak into another containers being run using --privileged. So Unmount() can fail with EBUSY (I think). If that's the case, can we do lazy unmount and continue processing. That way it will be unmounted when there are no users?

@rhvgoyal

This comment has been minimized.

Copy link

rhvgoyal commented on libcontainer/container_linux.go in 165cfe0 Aug 4, 2015

This whole logic of what will be unmounted should be in a separate function.

This comment has been minimized.

Copy link
Owner

rootfs replied Aug 4, 2015

this is gone now. new commit doesn't need to unmount.

rootfs referenced this pull request in rootfs/runc Aug 10, 2015

if mnt uses host namespace, just chroot()
Signed-off-by: Huamin Chen <hchen@redhat.com>

rootfs referenced this pull request in rootfs/docker Aug 13, 2015

Add rootfs mountpropagation flags
Signed-off-by: Huamin Chen <hchen@redhat.com>
@rootfs

This comment has been minimized.

Copy link

rootfs commented Aug 17, 2015

closed. please comment on the new PR #208

@rootfs rootfs closed this Aug 17, 2015

stefanberger pushed a commit to stefanberger/runc that referenced this pull request Sep 8, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment