bwrap: capabilities support by giuseppe · Pull Request #101 · containers/bubblewrap

giuseppe · 2016-09-23T16:30:17Z

a few patches that modify how bubblewrap works in privileged mode to allow execution of systemd:

https://asciinema.org/a/7gex3vdcumd1n72i3knstra2b

alexlarsson · 2016-10-03T12:47:24Z

Allowing a fresh mount of sysfs is a potential security leak. Sysadmins and containment system generally cover parts of /sys in order to not allow the unprivileged user access to them. Mounting a new instance will override this.

Can't we use a bind mount here instead?

alexlarsson · 2016-10-03T13:11:30Z

bubblewrap.c

-      data[1].effective = REQUIRED_CAPS_1;
-      data[1].permitted = REQUIRED_CAPS_1;
-      data[1].inheritable = 0;
+      data[0].effective = caps0;


Don't you need to set permitted?

alexlarsson · 2016-10-03T13:13:01Z

bubblewrap.c

      /* Drop root uid, but retain the required permitted caps */
      if (setuid (getuid ()) < 0)
        die_with_error ("unable to drop privs");
+      did_setuid = TRUE;


This approach means the file-caps based version of bwrap doesn't work. Although honestly, thats kinda usless vs setuid as it needs cap_sysadmin anyway.

alexlarsson · 2016-10-03T13:21:31Z

bubblewrap.c

+static uint32_t requested_caps[2] = {BASE_CAPS0, BASE_CAPS1};
+
+/* low 32bit caps needed */
+#define REQUIRED_SETUP_CAPS_0 (BASE_CAPS0 | CAP_TO_MASK (CAP_NET_ADMIN) | CAP_TO_MASK (CAP_SYS_ADMIN))


You dropped CHROOT, SETUID and SETGID from REQUIRED_CAPS. These are needed for chroot() and user namespaces to work in the setup.

they are coming from BASE_CAPS0

alexlarsson · 2016-10-03T13:23:54Z

bubblewrap.c

+                    | CAP_TO_MASK (CAP_DAC_OVERRIDE) | CAP_TO_MASK(CAP_SETFCAP) | CAP_TO_MASK(CAP_SETPCAP) \
+                    | CAP_TO_MASK (CAP_SETGID) | CAP_TO_MASK (CAP_SETUID) | CAP_TO_MASK (CAP_MKNOD) | CAP_TO_MASK (CAP_CHOWN) \
+                    | CAP_TO_MASK (CAP_FOWNER) | CAP_TO_MASK (CAP_FSETID) | CAP_TO_MASK (CAP_KILL) \
+                    | CAP_TO_MASK (CAP_SYS_CHROOT))


Adding CAP_DAC_OVERRIDE here is vastly increasing the risk of having bubblewrap setuid. The whole reason we're doing this dance where we're only saving the caps we need for setting up the sandbox but running as the final user is that we can then be reasonably sure that the things that end up in the final sandbox is what the original user can access.

If the setup is run with DAC_OVERRIDE, then any source path you specifiy will be read with root privileges, which will let regular users access root-only files.

alexlarsson · 2016-10-03T13:25:03Z

bubblewrap.c

+
+  if (prctl (PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) < 0)
+    die ("could not set no new privs");
 }


Why this? We already do this in main()

alexlarsson · 2016-10-03T13:30:33Z

bubblewrap.c

+  if (opt_unshare_user)
+    {
+      caps0 |= CAP_TO_MASK (CAP_CHOWN) | CAP_TO_MASK(CAP_SETUID) | CAP_TO_MASK (CAP_SETGID) | CAP_TO_MASK (CAP_FOWNER) | CAP_TO_MASK (CAP_DAC_OVERRIDE);
+    }


I don't understand why you want the sandbox to have these capabilities by default just because some namespace is enabled. For instance, DAC_OVERRIDE means that a file that is read-only to you is still readable, no? That seems very unexpected.
Overall, this seems quite risky.

thanks for the rewiew. Yes, this patch series is risky (and some parts too hacky), so I wanted to share my work in progress so to get comments before I spend more time on it.

IIUIC, even with DAC_OVERRIDE, the container should not be able to access files owned by users that have no mapping in the container user namespace. The only users that get a mapping in the container namespace are the uid of the user using bwrap and the range 10000-10999.

Are you sure about that? Anyway, you still automatically grant the process properties that it wouldn't have if it were run with that capabilities. For instance if the user is "1000" it can now write to a file owned by user 1000 which is not marked with "u+w" permissions.

alexlarsson · 2016-10-03T13:32:52Z

bubblewrap.c

+    }
+  line3 = xasprintf ("%d %d 1\n", sandbox_id, parent_id);
+  return strconcat4 (line1, line2, line3, line4);
+}


This breaks any use of non-privileged user namespaces use, because that only allows a single mapping.

alexlarsson · 2016-10-03T13:34:09Z

Overall, this patch series seems very risky to me. It would need very thorough review to make sure it doesn't make it possible to gain increased privileges for non-root users.

cgwalters · 2016-10-03T13:40:32Z

bubblewrap.c


  if (is_privileged)
    {
+      uint32_t caps0 = REQUIRED_SETUP_CAPS_0 | requested_caps[0];


Isn't this a huge security hole if bubblewrap is installed setuid? I think we should be verifying whether getuid() == 0 or so.

is_privileged means "we're not using user namespaces" in the code, not that we're initially launched by root.
The reason its doing this check instead of comparing effective uid is to also support filesystem caps instead of setuid.

cgwalters · 2016-10-13T15:25:50Z

You could probably break out the "--no-reaper" as a separate PR, it looks safe to merge now, separately from higher risk changes.

giuseppe · 2016-10-18T09:49:59Z

I have done some simplifications:

the root user is not treated differently than other users.
--cap-add supports only a subset of caps, which are safe to use in the sandbox.
by default, do not leave any cap in the process.
fix the FILECAPS mode.
now --sys uses a bind mount to /sys.
The additional uids/gids to use in the userns are read from /etc/subuid and /etc/subgid, the same conf files are used by newgidmap and newuidmap.

I had a discussion with Eric Biederman about enabling setgroups, he said that it is safe to enable setgroups when there is a mapping for the user in /etc/subuid, as there shouldn't be any mapping on a system where negative groups privileges are used (w.r.t CVE-2014-8989). I have not done this in the patch set yet, and I preferred to leave it as it is for now.
My suggestion to address this issue was to add a third mode "shadow" to /proc/self/setgroups so that the groups that were present on the userns creation could not be dropped. This is more invasive as it requires changes in the kernel: https://github.com/giuseppe/linux/tree/proc-setgroups-shadow.
Not a big issue anyway, as the systemd changes got merged and now it treats setgroups (0, NULL) as a warning.

I've also dropped CAP_AUDIT_*, as systemd will detect when these caps are not available and not error out.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

alexlarsson · 2017-06-29T20:45:28Z

This looks good to go to me. However, its waiting on some test, but I don't really understand which one. @cgwalters do you understand that?

giuseppe · 2017-06-29T21:55:13Z

that is weird, it seems both tests passed. Anyway, I think that once you will r+ Homu will repeat the tests before merging.

The previous failure was related to the fact I was using --unshare-user instead of --unshare-user-try

alexlarsson · 2017-06-29T23:02:29Z

@rh-atomic-bot r+ 2c03392

rh-atomic-bot · 2017-06-29T23:02:34Z

⌛ Testing commit 2c03392 with merge 215aa3e...

It allows to configure the user namespace from outside. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #101 Approved by: alexlarsson

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #101 Approved by: alexlarsson

When --unshare-user is used in the unprivileged case, all caps are left to the sandboxed application. Change it to leave only the specified ones. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #101 Approved by: alexlarsson

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #101 Approved by: alexlarsson

rh-atomic-bot · 2017-06-29T23:07:52Z

☀️ Test successful - status-redhatci
Approved by: alexlarsson
Pushing 215aa3e to master...

cgwalters · 2017-06-30T15:10:43Z

Possible regression: coreos/rpm-ostree#855

jlebon · 2017-06-30T18:01:58Z

bubblewrap.c

 }

+#define CAP_TO_MASK_0(x) (1L << ((x) & 31))
+#define CAP_TO_MASK_1(x) CAP_TO_MASK_0(x - 32)


NIt: this looks like a dupe.

jlebon · 2017-06-30T18:31:59Z

bubblewrap.c

-drop_privs (void)
+drop_privs (bool keep_requested_caps)
 {
-  if (!is_privileged)


I haven't tested it, though I suspect that this will cause https://pagure.io/releng/issue/6550 to regress; systemd-nspawn blocks capset(), but with this check removed, we'll always be calling it. (Note that #122 originally worked around this by only setting caps in the setuid case).

This is also what's causing coreos/rpm-ostree#855; we're now dropping CAP_DAC_OVERRIDE, so when rpm-ostree runs %pre scripts that do e.g. groupadd, they get EPERM on /etc/gshadow.

jlebon · 2017-06-30T18:42:53Z

Given that this is a sensitive area, I'd rather let someone else address these issues. I'm not comfortable enough in the security semantics involved to confidently suggest a patch.

cgwalters · 2017-06-30T18:49:41Z

This PR discussion is huge; let's track this issue as #197

In <containers#101>, specifically commit cde7fab we started dropping all capabilities, even if the caller was privileged. This broke rpm-ostree, which runs RPM scripts using bwrap, and some of those scripts depend on capabilities (mostly `CAP_DAC_OVERRIDE`). Fix this by retaining capabilities by default if the caller's uid is zero. I considered having the logic be to simply retain any capabilities the invoking process has (imagine filecaps binaries like `ping` or `/usr/bin/gnome-keyring-daemon` using bwrap) but we currently explicitly abort in that scenario to catch broken packages which used file capabilites for bwrap itself (we switched to suid). For now this works, and if down the line there's a real-world use case for capability-bearing non-zero-uid processes to invoke bwrap *and* retain those privileges, we can revisit. Closes: containers#197

…lter In <containers#101>, specifically commit cde7fab we started dropping all capabilities, even if the caller was privileged. This broke rpm-ostree, which runs RPM scripts using bwrap, and some of those scripts depend on capabilities (mostly `CAP_DAC_OVERRIDE`). Fix this by retaining capabilities by default if the caller's uid is zero. I considered having the logic be to simply retain any capabilities the invoking process has (imagine filecaps binaries like `ping` or `/usr/bin/gnome-keyring-daemon` using bwrap) but we currently explicitly abort in that scenario to catch broken packages which used file capabilites for bwrap itself (we switched to suid). For now this works, and if down the line there's a real-world use case for capability-bearing non-zero-uid processes to invoke bwrap *and* retain those privileges, we can revisit. Another twist here is that we need to do some gymnastics to first avoid calling `capset()` if we don't need to, as that can fail due to systemd installing a seccomp filter that denies it (for dubious reasons). Then we also need to ignore `EPERM` when dropping caps in the init process. (I considered unilaterally handling `EPERM`, but it seems nicer to avoid calling `capset()` unless we need to) Closes: containers#197

…lter In <#101>, specifically commit cde7fab we started dropping all capabilities, even if the caller was privileged. This broke rpm-ostree, which runs RPM scripts using bwrap, and some of those scripts depend on capabilities (mostly `CAP_DAC_OVERRIDE`). Fix this by retaining capabilities by default if the caller's uid is zero. I considered having the logic be to simply retain any capabilities the invoking process has (imagine filecaps binaries like `ping` or `/usr/bin/gnome-keyring-daemon` using bwrap) but we currently explicitly abort in that scenario to catch broken packages which used file capabilites for bwrap itself (we switched to suid). For now this works, and if down the line there's a real-world use case for capability-bearing non-zero-uid processes to invoke bwrap *and* retain those privileges, we can revisit. Another twist here is that we need to do some gymnastics to first avoid calling `capset()` if we don't need to, as that can fail due to systemd installing a seccomp filter that denies it (for dubious reasons). Then we also need to ignore `EPERM` when dropping caps in the init process. (I considered unilaterally handling `EPERM`, but it seems nicer to avoid calling `capset()` unless we need to) Closes: #197 Closes: #205 Approved by: alexlarsson

In containers/bubblewrap#205, the issue that we were seeing with `bubblewrap` (containers/bubblewrap#101) was fixed. As such, we can thaw out `bubblewrap` and continue to track git master.

giuseppe force-pushed the privileged-systemd branch 2 times, most recently from ad41b72 to fbac2a4 Compare September 29, 2016 13:37

alexlarsson reviewed Oct 3, 2016

View reviewed changes

cgwalters reviewed Oct 3, 2016

View reviewed changes

giuseppe mentioned this pull request Oct 4, 2016

[RFC] run systemd in an unprivileged container systemd/systemd#4280

Merged

giuseppe force-pushed the privileged-systemd branch 4 times, most recently from 4509cbe to 6c61aa5 Compare October 7, 2016 11:38

giuseppe force-pushed the privileged-systemd branch 9 times, most recently from 640344a to 6ff4c83 Compare October 13, 2016 12:50

cgwalters mentioned this pull request Oct 13, 2016

privilege escalation via ptrace (CVE-2016-8659) #107

Closed

giuseppe force-pushed the privileged-systemd branch 4 times, most recently from 7b801f2 to 24f7b11 Compare October 18, 2016 08:57

giuseppe changed the title ~~[RFC] Privileged systemd~~ bwrap: capabilities support Oct 18, 2016

giuseppe force-pushed the privileged-systemd branch from fa03fe2 to 5b314da Compare June 29, 2017 17:29

tests: add tests for --cap-add

2c03392

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

giuseppe force-pushed the privileged-systemd branch from 5b314da to 2c03392 Compare June 29, 2017 17:48

rh-atomic-bot pushed a commit that referenced this pull request Jun 29, 2017

bubblewrap: add option --userns-block-fd

6724b41

It allows to configure the user namespace from outside. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #101 Approved by: alexlarsson

rh-atomic-bot pushed a commit that referenced this pull request Jun 29, 2017

demos: add demo userns-block-fd.py

0bffcf1

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #101 Approved by: alexlarsson

rh-atomic-bot pushed a commit that referenced this pull request Jun 29, 2017

bubblewrap.c: fix typo

e4cd0e2

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #101 Approved by: alexlarsson

rh-atomic-bot pushed a commit that referenced this pull request Jun 29, 2017

tests: add tests for --cap-add

215aa3e

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #101 Approved by: alexlarsson

rh-atomic-bot closed this in 71660f4 Jun 29, 2017

giuseppe mentioned this pull request Jun 30, 2017

Support new bwrap features projectatomic/bwrap-oci#4

Closed

cgwalters mentioned this pull request Jun 30, 2017

rpm-ostree install failing to apply override for httpd coreos/rpm-ostree#855

Closed

jlebon reviewed Jun 30, 2017

View reviewed changes

cgwalters mentioned this pull request Jun 30, 2017

regressions from https://github.com/projectatomic/bubblewrap/pull/101 #197

Closed

cgwalters mentioned this pull request Aug 8, 2017

Retain all caps when invoked by uid 0 #205

Closed

miabbott mentioned this pull request Aug 14, 2017

overlay: Thaw bubblewrap 🔥 CentOS/sig-atomic-buildscripts#289

Merged

cgwalters mentioned this pull request Apr 23, 2018

Question: Keeping specific capabilities #249

Closed

Conversation

giuseppe commented Sep 23, 2016

Uh oh!

alexlarsson commented Oct 3, 2016

Uh oh!

alexlarsson Oct 3, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexlarsson commented Oct 3, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cgwalters commented Oct 13, 2016

Uh oh!

giuseppe commented Oct 18, 2016

Uh oh!

alexlarsson commented Jun 29, 2017

Uh oh!

giuseppe commented Jun 29, 2017

Uh oh!

alexlarsson commented Jun 29, 2017

Uh oh!

rh-atomic-bot commented Jun 29, 2017

Uh oh!

rh-atomic-bot commented Jun 29, 2017

Uh oh!

cgwalters commented Jun 30, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jlebon commented Jun 30, 2017

Uh oh!

cgwalters commented Jun 30, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

alexlarsson Oct 3, 2016 •

edited

Loading