-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
provides a new flag to make rootfs all sharable|slave|private propagation settable #77
Conversation
// "private": rootfs is mounted as MS_PRIVATE | ||
// "shared": rootfs is mounted as MS_SHARED | ||
// "slave": rootfs is mounted as MS_SLAVE | ||
RootMount string `json:"root_mount"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could make this more strongly typed like Namespaces instead of strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, how about calling it RootfsMountMode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree on both.
@@ -0,0 +1,4 @@ | |||
// mount_propagation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Package names are preferred without underscores. https://blog.golang.org/package-names
Is there a chance this will land before Docker 1.8? |
@@ -0,0 +1,11 @@ | |||
// +build linux |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't build tag, it should be separated from package
by empty line.
Better just remove it, you use postfix filename notation anyway.
@@ -28,6 +28,8 @@ type Linux struct { | |||
Capabilities []string `json:"capabilities"` | |||
// Devices are a list of device nodes that are created and enabled for the container. | |||
Devices []string `json:"devices"` | |||
// RootfsPropagation is the rootfs mount propagation mode for the container. | |||
RootfsPropagation string `json:"rootfsPropagation"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have concerns how this is appeared here without updating Godeps.json
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a separate PR against specs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merged it. Would you mind to update properly?
@rootfs This is so close to the finish line, let me know how I can help. I'm happy to review and test. |
@kelseyhightower Help with testing always appreciated :) |
@kelseyhightower many thanks for helping. Let me know if you are able to run my runc branch with this config.json |
if p, exists := mountPropagationMapping[spec.Linux.RootfsPropagation]; exists { | ||
config.RootfsMountMode = p | ||
} else { | ||
return nil, fmt.Errorf("invalid rootfs propagation mode:", spec.Linux.RootfsPropagation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should default to something when it isn't specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, which default to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slave
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
I wonder where is jenkins :) |
rprivate is causing issues with the chmod on the console. |
@mrunalp what are the issues? I've tested chmod with rprivate, so far so good. Thanks. |
@rootfs Actually, it can't find the console for me. I will look more.
|
My concern is if a container mounts a path that is not under bind mount. Won't that cause issues. Again I'm probably missing something here, so sorry if that's a stupid question. I'll try out this patch later today so I can understand it more. |
@ibuildthecloud bind mount is probably the only place containers can create new mountpoints that also share with host. I also have a docker patch based on this PR |
Background information about propagation modes and their use cases can be found in [1] and [2] |
@rootfs I see container mounts leaked on host after container exit when testing "shared"
|
Was just chatting with dan walsh and he said that we probably want to make sure that "shared, slave, private" becomes the property of volume being mounted inside the container. (And container root remains PRIVATE). That means only select volumes and any mount under those will become shared and user can control those instead of all of the mounts under container root being shared. |
Right, I believe / should be shared by default (Or arguably slave.) But everything under $rootfs would be PRIVATE. I think the internal mounts should all be done privately. Volume mounts from the host should be controlled on a individual basis, but the default would be shared. |
If / is SLAVE then I don't think volumes can be shared. May be we can scan the mount config and if any of the volumes are shared, then use SHARED for /, if any of the volumes are slave, then use |
@mrunalp somehow in docker there is no leak :) |
@rootfs Docker uses its own mount namespace and so might not be visible in the default namespace. |
@mrunalp Right we need to address that issue also. But I want to just get to the point that we could share mounts between containers. |
@rootfs Right I want to rprivate on rootfs not on /. Bottom line we want to allow volume mounts differently then we do internal mounts. |
@rhatdan got it, let me try it, thanks. |
Right, once we have the mechanism to have some mounts as "shared" between container and daemon we can look into not running docker daemon in a separate namespace. (Or train users to |
…tion mode settable Signed-off-by: Huamin Chen <hchen@redhat.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
@mrunalp cool, thanks. just rebased. |
I have taken @rhatdan paches to allow shared/slave volume mounts in a PRIVATE rootfs. Modified it a bit to allow shared mounts. While I can make a volume mount sharable but it does not seem to propagating to host namespace. I am not sure why. I am trying to debug it. In the mean time, here are the two commits. Hoping somebody can notice what am I doing wrong. |
I think this notion that container rootfs can be private and it can still have mounts under it which are receiving events from host namespace (slave) or sending back events to host namespace (shared) is flawed. Once you make rootfs private, looks like any connection from root. For example, I created a directory rootfs which contains container root fs. and I did following. $ unshare -m So I think after clone operation we will have to make sure rootfs is SHARED to make sure events propagate back to hostnamespace. EDIT: Looks like above is true only for one level (for certain type of mounts). We can still have mounts deeper in the hierarchy which are shared. For example, I tried following. $ unshare -m In summary, rootfs was PRIVATE so any mounts directly under rootfs did not propagate to host. But children of rootfs themselves were SHARED so grand children mounts of rootfs did propagate back to host. So looks like PRIVATE (non-recursive) is effective only for immediate children. Having said that, it was true only if mount source was outside above rootfs/. If I did convert some directories under rootfs to mount points by bind mounting these over themselves, then propagation did not happen to host and every child mount point was PRIVATE. |
Did you run the tests in runc or docker? |
I was playing with runc. In above example, I am not using runc or docker. Just trying to make use of "unshare -m" to get a separate mount namespace and playing with mount operations and see what is being propagated back to host mount namespace. |
@rhatdan pointed out there were two rootfs: one that lives on the host and is where container's |
Signed-off-by: Huamin Chen <hchen@redhat.com>
closed. please comment on the new PR #208 |
runtime.md: convert oc to runc
This is a port of docker-archive/libcontainer#632
Current a container has to nsenter the host's mount namespace to mount filesystem and
share with other containers. This approach doesn't work if the filesystem mount
calls helper utility (/sbin/mount.XXX). This limitation makes containerized kubelet unable to mount certain filesystems.
This commit provides a new flag to make rootfs sharable. Since moving a shared rootfs is semantically confusing for pivot_root(2) and MS_MOVE. A new function changeRoot() is provided to switch rootfs to new destination.