rootless: make /sys/fs/cgroup/* read-only #1869

AkihiroSuda · 2018-08-16T10:46:21Z

The default rootless spec bind-mounts /sys with "rbind,ro" (because mounting sysfs
requires netns to be unshared), however, it does not make subfilesystems mounted
under /sys read-only.

So, files under /sys/fs/cgroup were unexpectedly writable when they are chmod/chowned
via privileged helpers such as pam_cgfs.

This patch fixes the issue by mounting /sys/fs/cgroup/* as read-only explicitly.

Signed-off-by: Akihiro Suda suda.akihiro@lab.ntt.co.jp

giuseppe

just a small comment, tested locally and it works for me

giuseppe · 2018-08-16T11:42:56Z

libcontainer/specconv/example.go

+		{
+			Source:      "/sys",
+			Destination: "/sys",
+			Type:        "none",


should this be "bind"?

this is the only place where I see it used. In the runtime-specs example bind is used as the type for a bind mount (https://github.com/opencontainers/runtime-spec/blob/master/config.md#example-linux), both Docker and Podman generate "bind".

Anyway since it is only in the example and not in the specs, and you are just moving this code snippet, it is probably fine to leave it as it is.

type: "none" is entirely valid. There was a bug previously in runc when we didn't handle this properly, but the type field (when it comes to mount(2)) doesn't affect bindmounts -- what matters is that the bind option is set. Docker set bind presumably by accident, and Podman probably copied the mistake (to be clear -- it doesn't matter either way, but it does lead to confusion).

But this code is used for more than example -- it's used by at least a few projects to convert a configuration to the rootless version of said configuration.

crosbymichael · 2018-08-16T14:47:55Z

libcontainer/specconv/example.go

@@ -162,6 +162,10 @@ func ToRootless(spec *specs.Spec) {
 	var namespaces []specs.LinuxNamespace

 	// Remove networkns from the spec.
+	//
+	// TODO: removal of networkns should be optional,


FYI. When someone puts TODOs in PRs, it makes reviewers nervous.

todo when?
by who?
what is the issue now?
what breaks while this isn't done?

AkihiroSuda · 2018-08-16T19:21:35Z

addressed comments

giuseppe · 2018-08-16T20:01:42Z

Thanks, LGTM

cyphar · 2018-08-17T04:35:21Z

libcontainer/specconv/example.go

+	var mounts []specs.Mount
+	for _, mount := range spec.Mounts {
+		// Ignore all mounts that are under /sys.
+		if strings.HasPrefix(mount.Destination, "/sys") {


I'm not sure if this is always fine -- if someone wants to mount a fake filesystem over parts of /sys then I don't see why we would block it (let alone ignore it).

Ultimately this is used during config generation so it's not the end of the world, but it is something to keep in mind.

cyphar · 2018-08-17T04:37:17Z

libcontainer/specconv/example.go

+		// Without this hack, `df` fails with "df: /sys/firmware/efi/efivars"
+		// because the efivars entry exists in mtab (because /sys is rbind-mounted)
+		// but /sys/firmware is masked.
+		if strings.HasPrefix(masked, "/sys") {


I'm not sure this is fine for rootless mode -- if you unmask paths you could be leaking information to containers (so while they might not be able to write to the masked paths, they are able to read them). I mean, you probably can't read most of /sys that is masked anyway, but I'm not sure it's a great idea.

The default rootless spec bind-mounts /sys with "rbind,ro" (because mounting sysfs requires netns to be unshared), however, it does not make subfilesystems mounted under /sys read-only. So, files under /sys/fs/cgroup were unexpectedly writable when they are chmod/chowned via privileged helpers such as pam_cgfs. This patch fixes the issue by mounting /sys/fs/cgroup/* as read-only explicitly. Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>

AkihiroSuda · 2018-08-17T08:13:06Z

addressed comments, although it became more complicated

AkihiroSuda · 2018-09-07T05:39:11Z

@giuseppe latest revision still LGTY?

giuseppe · 2018-09-07T05:53:34Z

@AkihiroSuda yes, latest version LGTM

AkihiroSuda · 2018-09-12T15:08:11Z

@cyphar PTAL

AkihiroSuda · 2018-10-03T10:43:53Z

ping @cyphar

AkihiroSuda · 2018-10-16T15:17:06Z

On second thought maybe we should rather suggest users not mounting /sys when rootless && netns is not unshared.

Rootless BuildKit already removed /sys mount in moby/buildkit#689

For Rootless Moby, sysfs mount (not bind-mount) is preserved, because netns is unshared via RootlessKit before starting dockerd:
moby/moby#38050

I'm going to close this PR but I may change my mind later.
RFC. @giuseppe @cyphar @jessfraz

AkihiroSuda force-pushed the bind-sys-2 branch from e4d76aa to 6de6a84 Compare August 16, 2018 10:47

giuseppe reviewed Aug 16, 2018

View reviewed changes

crosbymichael reviewed Aug 16, 2018

View reviewed changes

AkihiroSuda force-pushed the bind-sys-2 branch from 6de6a84 to 6450010 Compare August 16, 2018 19:20

cyphar reviewed Aug 17, 2018

View reviewed changes

AkihiroSuda force-pushed the bind-sys-2 branch from 6450010 to 28dc2e6 Compare August 17, 2018 08:12

AkihiroSuda mentioned this pull request Oct 16, 2018

Disable rootless mode except RootlessCgMgr when executed as the root in userns (fix Docker-in-LXD regression) #1862

Merged

AkihiroSuda closed this Oct 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rootless: make /sys/fs/cgroup/* read-only #1869

rootless: make /sys/fs/cgroup/* read-only #1869

AkihiroSuda commented Aug 16, 2018

giuseppe left a comment

giuseppe Aug 16, 2018

AkihiroSuda Aug 16, 2018

giuseppe Aug 16, 2018

cyphar Aug 17, 2018 •

edited

Loading

crosbymichael Aug 16, 2018

AkihiroSuda commented Aug 16, 2018

giuseppe commented Aug 16, 2018

cyphar Aug 17, 2018

cyphar Aug 17, 2018

AkihiroSuda commented Aug 17, 2018

AkihiroSuda commented Sep 7, 2018

giuseppe commented Sep 7, 2018

AkihiroSuda commented Sep 12, 2018

AkihiroSuda commented Oct 3, 2018

AkihiroSuda commented Oct 16, 2018

rootless: make /sys/fs/cgroup/* read-only #1869

rootless: make /sys/fs/cgroup/* read-only #1869

Conversation

AkihiroSuda commented Aug 16, 2018

giuseppe left a comment

Choose a reason for hiding this comment

giuseppe Aug 16, 2018

Choose a reason for hiding this comment

AkihiroSuda Aug 16, 2018

Choose a reason for hiding this comment

giuseppe Aug 16, 2018

Choose a reason for hiding this comment

cyphar Aug 17, 2018 • edited Loading

Choose a reason for hiding this comment

crosbymichael Aug 16, 2018

Choose a reason for hiding this comment

AkihiroSuda commented Aug 16, 2018

giuseppe commented Aug 16, 2018

cyphar Aug 17, 2018

Choose a reason for hiding this comment

cyphar Aug 17, 2018

Choose a reason for hiding this comment

AkihiroSuda commented Aug 17, 2018

AkihiroSuda commented Sep 7, 2018

giuseppe commented Sep 7, 2018

AkihiroSuda commented Sep 12, 2018

AkihiroSuda commented Oct 3, 2018

AkihiroSuda commented Oct 16, 2018

cyphar Aug 17, 2018 •

edited

Loading