Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

root.readOnly: true does not work on filesystems mounted with nodev or nosuid #1247

Open
smaeul opened this issue Jan 1, 2017 · 13 comments
Open

Comments

@smaeul
Copy link

smaeul commented Jan 1, 2017

I'm using runc 1.0.0_rc2 on Linux 4.7.10 with grsecurity. Say I have a container rootfs mounted like so:
rpool/srv/test/rootfs-1 on /srv/test/rootfs type zfs (rw,nosuid,nodev,noatime,xattr,noacl)
I have in my /srv/test/config.json: "root": { "path": "rootfs", "readonly": true }

When I run runc run -b /srv/test test, I get container_linux.go:247: starting container process caused "process_linux.go:359: container init caused \"rootfs_linux.go:110: setting rootfs as readonly caused \\\"operation not permitted\\\"\""

strace tells me the failing mount call is
mount("/", "/", 0xc4200da900, MS_RDONLY|MS_REMOUNT|MS_BIND|MS_REC, NULL) = -1 EPERM

This looks to be from setReadonly() manually specifying the mount flags and not respecting existing ones.

If I remount the rootfs like so:
host0/srv/test/rootfs-1 on /srv/test/rootfs type zfs (rw,noatime,xattr,noacl)
starting the container succeeds.

This can be fixed by changing setReadonly() to also pass MS_NODEV/MS_NOEXEC/MS_NOSUID to mount() if they are already present in the mount options.

@cyphar
Copy link
Member

cyphar commented Jan 2, 2017

@justincormack just changed the code to be the way it is today (#1222) , because MS_REMOUNT doesn't retain some flags properly. WDYT @justincormack?

@mrunalp
Copy link
Contributor

mrunalp commented Jan 7, 2017

We saw similar issue when trying to bind mount a directory which is a mount. We can read the existing mount options and pass them on remount. I am working on a fix for it.

@justincormack
Copy link
Contributor

justincormack commented Jan 21, 2017

Hmm, I don't think you need to read the existing mount options, it should be possibly to do a bind again if this fails and then remount read only. Will take a look at this case. Hmm, maybe we do need to read them...

@rhatdan
Copy link
Contributor

rhatdan commented Nov 27, 2018

We are now seeing this issue in testing and running a locked down crio instance.

@vikaschoudhary16
Copy link
Contributor

Few findings related to this while working with crio:

  1. crictl able to create a pod if crio is running without uid-mappings. Even if storage.conf have nodev. In config.json rootfs is also "readonly: true"
  2. Keeping everything same as in 1, except running crio with uid-mappings, results in the runc error reported in this issue.
  3. Moving further from step 2, if now i remove nodev from storage.conf, error seen in step 2 is gone.

@cyphar
Copy link
Member

cyphar commented Nov 28, 2018

It should be noted that currently runc actually doesn't do read-only mounts entirely correctly. MS_RDONLY|MS_REMOUNT|MS_BIND|MS_REC doesn't do what you'd think (MS_REC is ignored) and thus readonly: true only makes / read-only and no other filesystem.

LXC sets all of the filesystems to be readonly (except /proc I think -- maybe @brauner can remind me) and is what we should be doing, but also I think that the ignoring of MS_REC is a kernel bug (though given the new mount API discussions I think asking for another mount-related change is going to be "pushing it" with the maintainers).

@AkihiroSuda
Copy link
Member

@brauner is working on the kernel-side fix for recursive read-only mount moby/moby#37838

@cyphar
Copy link
Member

cyphar commented Nov 28, 2018

Right, I know. I talked to him about it a while ago (he's currently busy with a few other projects though).

@rhatdan
Copy link
Contributor

rhatdan commented Nov 28, 2018

My understanding of MS_REC is ONLY for setting the way "sharing" is handled. No other mount options are handled with it. SHARED,RSHARED, PRIVATE,RPRIVATE, SLAVE, RSLAVE.

@cyphar
Copy link
Member

cyphar commented Nov 29, 2018

MS_REC is used for bind-mounts (MS_REC|MS_BIND).

@rhatdan
Copy link
Contributor

rhatdan commented Nov 29, 2018

Right but it only sets the propogation on bind-mounts not the read-only option.

@cyphar
Copy link
Member

cyphar commented Nov 29, 2018

I think we're talking past each other -- MS_REC|MS_BIND allows for recursive bind-mounts which is quite different to just setting propagation. So MS_REC is used for things other than sharing.

You're quite right that MS_REC|MS_RDONLY doesn't do anything at the moment (and that in this case it only would affect the sharing) -- my point is that it probably should do what most users would expect it to do (recursively make things read-only). In the meantime we'll have to do the same looping over /proc/mountinfo ourselves...

@rhatdan
Copy link
Contributor

rhatdan commented Nov 29, 2018

Yes, I guess we are in agreement. I have failed at least twice making this assumption, and have had to jump through hoops to get the ro bindmount to work in code. Podman has a hacky version to bind mount the cgroup hierarchy read/only into a container

giuseppe added a commit to giuseppe/libpod that referenced this issue Feb 25, 2019
This is a workaround for the runc issue:

opencontainers/runc#1247

If the source of a bind mount has any of nosuid, noexec or nodev, be
sure to propagate them to the bind mount so that when runc tries to
remount using MS_RDONLY, these options are also used.

Closes: containers#2312

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/libpod that referenced this issue Feb 25, 2019
This is a workaround for the runc issue:

opencontainers/runc#1247

If the source of a bind mount has any of nosuid, noexec or nodev, be
sure to propagate them to the bind mount so that when runc tries to
remount using MS_RDONLY, these options are also used.

Closes: containers#2312

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/libpod that referenced this issue Feb 25, 2019
This is a workaround for the runc issue:

opencontainers/runc#1247

If the source of a bind mount has any of nosuid, noexec or nodev, be
sure to propagate them to the bind mount so that when runc tries to
remount using MS_RDONLY, these options are also used.

Closes: containers#2312

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
giuseppe added a commit to giuseppe/libpod that referenced this issue Feb 25, 2019
This is a workaround for the runc issue:

opencontainers/runc#1247

If the source of a bind mount has any of nosuid, noexec or nodev, be
sure to propagate them to the bind mount so that when runc tries to
remount using MS_RDONLY, these options are also used.

Closes: containers#2312

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants