-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: recursive read-only bind-mount emulation using FUSE #37838
Comments
This is something that could be implemented via a volume driver today, right? |
I would almost prefer to be able to disable recursive binds which would allow users to manually RO-bind each mountpoint. |
Yes: https://github.com/AkihiroSuda/securebind-docker
+1 to have |
Userspace tools often need to atomically remount a whole mount tree read-only. This is currently not possible. Currently MS_REC | MS_REMOUNT | MS_RDONLY requests will only mount the topmost mount read-only but leave all other mounts in the tree unchanged. This behavior is very unexpected and has caused several security sensitive issues. Just recently it was discovered that all read-only rootfs mounts in runC [1] are actually not made read-only because it relies on the combination of exactly these three flags. This patches changes the current behavior and performs remounts and applies the changes to all mounts in the tree if MS_REC is passed. As far as I can tell the regression potential is extremely small. Any tool that requests an operation with MS_REC will expect that the change applies to the whole mount tree otherwise there is no point in passing this flag at all. So any tool that sends MS_REC will get what they asked for with this change. Any tool that passes MS_REC but knows about the current unexpected behavior will need to have additional racy code to remount each mount in the tree already in place. For them the change will mean that the first MS_REC request will now get them a complete read-only tree right away and any additional individual remounts will be superfluous but harmless. [1]: moby/moby#37838 Signed-off-by: Christian Brauner <christian@brauner.io>
Userspace tools often need to atomically remount a whole mount tree read-only. This is currently not possible. Currently MS_REC | MS_REMOUNT | MS_RDONLY requests will only mount the topmost mount read-only but leave all other mounts in the tree unchanged. This behavior is very unexpected and has caused several security sensitive issues. Just recently it was discovered that all read-only rootfs mounts in runC [1] are actually not made read-only because it relies on the combination of exactly these three flags. This patches changes the current behavior and performs remounts and applies the changes to all mounts in the tree if MS_REC is passed. As far as I can tell the regression potential is extremely small. Any tool that requests an operation with MS_REC will expect that the change applies to the whole mount tree otherwise there is no point in passing this flag at all. So any tool that sends MS_REC will get what they asked for with this change. Any tool that passes MS_REC but knows about the current unexpected behavior will need to have additional racy code to remount each mount in the tree already in place. For them the change will mean that the first MS_REC request will now get them a complete read-only tree right away and any additional individual remounts will be superfluous but harmless. [1]: moby/moby#37838 Signed-off-by: Christian Brauner <christian@brauner.io>
@brauner is working on kernel-side fix. My FUSE proposal can be withdrawn if his patch is going to be merged. |
This work will be slow since this needs to be changed:
I expect pushback/discussion on both points. We'll see. |
Userspace tools often need to atomically remount a whole mount tree read-only. This is a feature that init systems, containers runtimes and other tools have sorely missed in the old mount API. For example, assume the whole /sys mount tree is to be remounted read-only. With the old mount API this is currently not possible. If you were to specify: mount("none", "/sys", NULL, MS_RDONLY|MS_REMOUNT|MS_BIND|MS_REC, NULL) = 0 the MS_REC | MS_REMOUNT | MS_RDONLY | MS_BIND request will only mount the topmost mount read-only but leave all other mounts in the tree unchanged. This behavior is very unexpected and has caused several security sensitive issues. Just recently it was discovered that all read-only rootfs mounts in runC [1] are actually not made read-only because it relies on the combination of exactly these three flags. This patches changes the new mount API to allow for recursive read-only remount of a whole mount tree. [1]: moby/moby#37838 Signed-off-by: Christian Brauner <christian@brauner.io> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Seth Forshee <seth.forshee@canonical.com> Cc: Serge Hallyn <serge@hallyn.com>
PR for |
Userspace tools often need to atomically remount a whole mount tree read-only. This is currently not possible. Currently MS_REC | MS_REMOUNT | MS_RDONLY requests will only mount the topmost mount read-only but leave all other mounts in the tree unchanged. This behavior is very unexpected and has caused several security sensitive issues. Just recently it was discovered that all read-only rootfs mounts in runC [1] are actually not made read-only because it relies on the combination of exactly these three flags. This patch enables recursive read-only remounts via the newly introduced MS_REC_RDONLY flag. [1]: moby/moby#37838 Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Christian Brauner <christian@brauner.io>
Userspace tools often need to atomically remount a whole mount tree read-only. This is currently not possible. Currently MS_REC | MS_REMOUNT | MS_RDONLY requests will only mount the topmost mount read-only but leave all other mounts in the tree unchanged. This behavior is very unexpected and has caused several security sensitive issues. Just recently it was discovered that all read-only rootfs mounts in runC [1] are actually not made read-only because it relies on the combination of exactly these three flags. This patch enables recursive read-only remounts via the newly introduced MS_REC_RDONLY flag. [1]: moby/moby#37838 Acked-by: Serge Hallyn <serge@hallyn.com> Signed-off-by: Christian Brauner <christian@brauner.io>
Closing this for now, as |
Kernel 5.12 gained support for fully recursively-readonly mounts. Integration to OCI is discussed in opencontainers/runtime-spec#1090 |
docker run -v /:/host:ro
(docker run --mount type=bind,src=/,dst=/host,ro
) is not really read-only, because there is no way in the kernel to bind-mount a filesystem tree recursively as read-only.So I suggest emulating "recursive read-only bind-mount" using FUSE.
CLI would be like
docker run --mount type=bind,src=/,dst=/host,ro,bind-driver=fuse
.If we don't want to use FUSE, an alternative way is to bind the tree as
rprivate
and then bind each of submounts explicitly.However,
rprivate
does not work for/
,/var/lib
, and/var/lib/docker/*
: 589a0afAlso, it doesn't work for those who wants to use the shared propagation.
@justincormack @cpuguy83 @dmcgowan
The text was updated successfully, but these errors were encountered: