Skip to content

bwrap: capabilities support#101

Closed
giuseppe wants to merge 6 commits intocontainers:masterfrom
giuseppe:privileged-systemd
Closed

bwrap: capabilities support#101
giuseppe wants to merge 6 commits intocontainers:masterfrom
giuseppe:privileged-systemd

Conversation

@giuseppe
Copy link
Member

a few patches that modify how bubblewrap works in privileged mode to allow execution of systemd:

https://asciinema.org/a/7gex3vdcumd1n72i3knstra2b

@giuseppe giuseppe force-pushed the privileged-systemd branch 2 times, most recently from ad41b72 to fbac2a4 Compare September 29, 2016 13:37
@alexlarsson
Copy link
Collaborator

Allowing a fresh mount of sysfs is a potential security leak. Sysadmins and containment system generally cover parts of /sys in order to not allow the unprivileged user access to them. Mounting a new instance will override this.

Can't we use a bind mount here instead?

bubblewrap.c Outdated
data[1].effective = REQUIRED_CAPS_1;
data[1].permitted = REQUIRED_CAPS_1;
data[1].inheritable = 0;
data[0].effective = caps0;
Copy link
Collaborator

@alexlarsson alexlarsson Oct 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to set permitted?

bubblewrap.c Outdated
/* Drop root uid, but retain the required permitted caps */
if (setuid (getuid ()) < 0)
die_with_error ("unable to drop privs");
did_setuid = TRUE;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach means the file-caps based version of bwrap doesn't work. Although honestly, thats kinda usless vs setuid as it needs cap_sysadmin anyway.

bubblewrap.c Outdated
static uint32_t requested_caps[2] = {BASE_CAPS0, BASE_CAPS1};

/* low 32bit caps needed */
#define REQUIRED_SETUP_CAPS_0 (BASE_CAPS0 | CAP_TO_MASK (CAP_NET_ADMIN) | CAP_TO_MASK (CAP_SYS_ADMIN))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You dropped CHROOT, SETUID and SETGID from REQUIRED_CAPS. These are needed for chroot() and user namespaces to work in the setup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are coming from BASE_CAPS0

bubblewrap.c Outdated
| CAP_TO_MASK (CAP_DAC_OVERRIDE) | CAP_TO_MASK(CAP_SETFCAP) | CAP_TO_MASK(CAP_SETPCAP) \
| CAP_TO_MASK (CAP_SETGID) | CAP_TO_MASK (CAP_SETUID) | CAP_TO_MASK (CAP_MKNOD) | CAP_TO_MASK (CAP_CHOWN) \
| CAP_TO_MASK (CAP_FOWNER) | CAP_TO_MASK (CAP_FSETID) | CAP_TO_MASK (CAP_KILL) \
| CAP_TO_MASK (CAP_SYS_CHROOT))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding CAP_DAC_OVERRIDE here is vastly increasing the risk of having bubblewrap setuid. The whole reason we're doing this dance where we're only saving the caps we need for setting up the sandbox but running as the final user is that we can then be reasonably sure that the things that end up in the final sandbox is what the original user can access.

If the setup is run with DAC_OVERRIDE, then any source path you specifiy will be read with root privileges, which will let regular users access root-only files.


if (prctl (PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) < 0)
die ("could not set no new privs");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this? We already do this in main()

bubblewrap.c Outdated
if (opt_unshare_user)
{
caps0 |= CAP_TO_MASK (CAP_CHOWN) | CAP_TO_MASK(CAP_SETUID) | CAP_TO_MASK (CAP_SETGID) | CAP_TO_MASK (CAP_FOWNER) | CAP_TO_MASK (CAP_DAC_OVERRIDE);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you want the sandbox to have these capabilities by default just because some namespace is enabled. For instance, DAC_OVERRIDE means that a file that is read-only to you is still readable, no? That seems very unexpected.
Overall, this seems quite risky.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the rewiew. Yes, this patch series is risky (and some parts too hacky), so I wanted to share my work in progress so to get comments before I spend more time on it.

IIUIC, even with DAC_OVERRIDE, the container should not be able to access files owned by users that have no mapping in the container user namespace. The only users that get a mapping in the container namespace are the uid of the user using bwrap and the range 10000-10999.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about that? Anyway, you still automatically grant the process properties that it wouldn't have if it were run with that capabilities. For instance if the user is "1000" it can now write to a file owned by user 1000 which is not marked with "u+w" permissions.

bubblewrap.c Outdated
}
line3 = xasprintf ("%d %d 1\n", sandbox_id, parent_id);
return strconcat4 (line1, line2, line3, line4);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks any use of non-privileged user namespaces use, because that only allows a single mapping.

@alexlarsson
Copy link
Collaborator

Overall, this patch series seems very risky to me. It would need very thorough review to make sure it doesn't make it possible to gain increased privileges for non-root users.

bubblewrap.c Outdated

if (is_privileged)
{
uint32_t caps0 = REQUIRED_SETUP_CAPS_0 | requested_caps[0];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a huge security hole if bubblewrap is installed setuid? I think we should be verifying whether getuid() == 0 or so.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_privileged means "we're not using user namespaces" in the code, not that we're initially launched by root.
The reason its doing this check instead of comparing effective uid is to also support filesystem caps instead of setuid.

@giuseppe giuseppe force-pushed the privileged-systemd branch 4 times, most recently from 4509cbe to 6c61aa5 Compare October 7, 2016 11:38
@giuseppe giuseppe force-pushed the privileged-systemd branch 9 times, most recently from 640344a to 6ff4c83 Compare October 13, 2016 12:50
@cgwalters
Copy link
Collaborator

You could probably break out the "--no-reaper" as a separate PR, it looks safe to merge now, separately from higher risk changes.

@giuseppe giuseppe force-pushed the privileged-systemd branch 4 times, most recently from 7b801f2 to 24f7b11 Compare October 18, 2016 08:57
@giuseppe giuseppe changed the title [RFC] Privileged systemd bwrap: capabilities support Oct 18, 2016
@giuseppe
Copy link
Member Author

I have done some simplifications:

  1. the root user is not treated differently than other users.
  2. --cap-add supports only a subset of caps, which are safe to use in the sandbox.
  3. by default, do not leave any cap in the process.
  4. fix the FILECAPS mode.
  5. now --sys uses a bind mount to /sys.
  6. The additional uids/gids to use in the userns are read from /etc/subuid and /etc/subgid, the same conf files are used by newgidmap and newuidmap.

I had a discussion with Eric Biederman about enabling setgroups, he said that it is safe to enable setgroups when there is a mapping for the user in /etc/subuid, as there shouldn't be any mapping on a system where negative groups privileges are used (w.r.t CVE-2014-8989). I have not done this in the patch set yet, and I preferred to leave it as it is for now.
My suggestion to address this issue was to add a third mode "shadow" to /proc/self/setgroups so that the groups that were present on the userns creation could not be dropped. This is more invasive as it requires changes in the kernel: https://github.com/giuseppe/linux/tree/proc-setgroups-shadow.
Not a big issue anyway, as the systemd changes got merged and now it treats setgroups (0, NULL) as a warning.

I've also dropped CAP_AUDIT_*, as systemd will detect when these caps are not available and not error out.

@giuseppe giuseppe force-pushed the privileged-systemd branch from fa03fe2 to 5b314da Compare June 29, 2017 17:29
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe giuseppe force-pushed the privileged-systemd branch from 5b314da to 2c03392 Compare June 29, 2017 17:48
@alexlarsson
Copy link
Collaborator

This looks good to go to me. However, its waiting on some test, but I don't really understand which one. @cgwalters do you understand that?

@giuseppe
Copy link
Member Author

that is weird, it seems both tests passed. Anyway, I think that once you will r+ Homu will repeat the tests before merging.

The previous failure was related to the fact I was using --unshare-user instead of --unshare-user-try

@alexlarsson
Copy link
Collaborator

@rh-atomic-bot r+ 2c03392

@rh-atomic-bot
Copy link

⌛ Testing commit 2c03392 with merge 215aa3e...

rh-atomic-bot pushed a commit that referenced this pull request Jun 29, 2017
It allows to configure the user namespace from outside.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

Closes: #101
Approved by: alexlarsson
rh-atomic-bot pushed a commit that referenced this pull request Jun 29, 2017
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

Closes: #101
Approved by: alexlarsson
rh-atomic-bot pushed a commit that referenced this pull request Jun 29, 2017
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

Closes: #101
Approved by: alexlarsson
rh-atomic-bot pushed a commit that referenced this pull request Jun 29, 2017
When --unshare-user is used in the unprivileged case, all caps are
left to the sandboxed application.  Change it to leave only the
specified ones.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

Closes: #101
Approved by: alexlarsson
rh-atomic-bot pushed a commit that referenced this pull request Jun 29, 2017
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

Closes: #101
Approved by: alexlarsson
@rh-atomic-bot
Copy link

☀️ Test successful - status-redhatci
Approved by: alexlarsson
Pushing 215aa3e to master...

@cgwalters
Copy link
Collaborator

Possible regression: coreos/rpm-ostree#855

}

#define CAP_TO_MASK_0(x) (1L << ((x) & 31))
#define CAP_TO_MASK_1(x) CAP_TO_MASK_0(x - 32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIt: this looks like a dupe.

drop_privs (void)
drop_privs (bool keep_requested_caps)
{
if (!is_privileged)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested it, though I suspect that this will cause https://pagure.io/releng/issue/6550 to regress; systemd-nspawn blocks capset(), but with this check removed, we'll always be calling it. (Note that #122 originally worked around this by only setting caps in the setuid case).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also what's causing coreos/rpm-ostree#855; we're now dropping CAP_DAC_OVERRIDE, so when rpm-ostree runs %pre scripts that do e.g. groupadd, they get EPERM on /etc/gshadow.

@jlebon
Copy link
Contributor

jlebon commented Jun 30, 2017

Given that this is a sensitive area, I'd rather let someone else address these issues. I'm not comfortable enough in the security semantics involved to confidently suggest a patch.

@cgwalters
Copy link
Collaborator

This PR discussion is huge; let's track this issue as #197

cgwalters added a commit to cgwalters/bubblewrap that referenced this pull request Aug 8, 2017
In <containers#101>, specifically
commit cde7fab we started dropping
all capabilities, even if the caller was privileged.

This broke rpm-ostree, which runs RPM scripts using bwrap, and some
of those scripts depend on capabilities (mostly `CAP_DAC_OVERRIDE`).

Fix this by retaining capabilities by default if the caller's uid is zero.

I considered having the logic be to simply retain any capabilities the invoking
process has (imagine filecaps binaries like `ping` or
`/usr/bin/gnome-keyring-daemon` using bwrap) but we currently explicitly abort
in that scenario to catch broken packages which used file capabilites for bwrap
itself (we switched to suid). For now this works, and if down the line there's a
real-world use case for capability-bearing non-zero-uid processes to invoke
bwrap *and* retain those privileges, we can revisit.

Closes: containers#197
cgwalters added a commit to cgwalters/bubblewrap that referenced this pull request Aug 9, 2017
…lter

In <containers#101>, specifically
commit cde7fab we started dropping
all capabilities, even if the caller was privileged.

This broke rpm-ostree, which runs RPM scripts using bwrap, and some
of those scripts depend on capabilities (mostly `CAP_DAC_OVERRIDE`).

Fix this by retaining capabilities by default if the caller's uid is zero.

I considered having the logic be to simply retain any capabilities the invoking
process has (imagine filecaps binaries like `ping` or
`/usr/bin/gnome-keyring-daemon` using bwrap) but we currently explicitly abort
in that scenario to catch broken packages which used file capabilites for bwrap
itself (we switched to suid). For now this works, and if down the line there's a
real-world use case for capability-bearing non-zero-uid processes to invoke
bwrap *and* retain those privileges, we can revisit.

Another twist here is that we need to do some gymnastics to first avoid calling
`capset()` if we don't need to, as that can fail due to systemd installing a
seccomp filter that denies it (for dubious reasons).  Then we also need to ignore
`EPERM` when dropping caps in the init process.  (I considered unilaterally
handling `EPERM`, but it seems nicer to avoid calling `capset()` unless we need to)

Closes: containers#197
rh-atomic-bot pushed a commit that referenced this pull request Aug 14, 2017
…lter

In <#101>, specifically
commit cde7fab we started dropping
all capabilities, even if the caller was privileged.

This broke rpm-ostree, which runs RPM scripts using bwrap, and some
of those scripts depend on capabilities (mostly `CAP_DAC_OVERRIDE`).

Fix this by retaining capabilities by default if the caller's uid is zero.

I considered having the logic be to simply retain any capabilities the invoking
process has (imagine filecaps binaries like `ping` or
`/usr/bin/gnome-keyring-daemon` using bwrap) but we currently explicitly abort
in that scenario to catch broken packages which used file capabilites for bwrap
itself (we switched to suid). For now this works, and if down the line there's a
real-world use case for capability-bearing non-zero-uid processes to invoke
bwrap *and* retain those privileges, we can revisit.

Another twist here is that we need to do some gymnastics to first avoid calling
`capset()` if we don't need to, as that can fail due to systemd installing a
seccomp filter that denies it (for dubious reasons).  Then we also need to ignore
`EPERM` when dropping caps in the init process.  (I considered unilaterally
handling `EPERM`, but it seems nicer to avoid calling `capset()` unless we need to)

Closes: #197

Closes: #205
Approved by: alexlarsson
miabbott added a commit to miabbott/sig-atomic-buildscripts that referenced this pull request Aug 14, 2017
In containers/bubblewrap#205, the issue that we were seeing with
`bubblewrap` (containers/bubblewrap#101) was fixed.  As such, we
can thaw out `bubblewrap` and continue to track git master.
cgwalters pushed a commit to CentOS/sig-atomic-buildscripts that referenced this pull request Aug 14, 2017
In containers/bubblewrap#205, the issue that we were seeing with
`bubblewrap` (containers/bubblewrap#101) was fixed.  As such, we
can thaw out `bubblewrap` and continue to track git master.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants