New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bwrap: capabilities support #101

Closed
wants to merge 6 commits into
base: master
from

Conversation

Projects
None yet
7 participants
@giuseppe
Member

giuseppe commented Sep 23, 2016

a few patches that modify how bubblewrap works in privileged mode to allow execution of systemd:

https://asciinema.org/a/7gex3vdcumd1n72i3knstra2b

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Oct 3, 2016

Member

Allowing a fresh mount of sysfs is a potential security leak. Sysadmins and containment system generally cover parts of /sys in order to not allow the unprivileged user access to them. Mounting a new instance will override this.

Can't we use a bind mount here instead?

Member

alexlarsson commented Oct 3, 2016

Allowing a fresh mount of sysfs is a potential security leak. Sysadmins and containment system generally cover parts of /sys in order to not allow the unprivileged user access to them. Mounting a new instance will override this.

Can't we use a bind mount here instead?

Show outdated Hide outdated bubblewrap.c
data[1].effective = REQUIRED_CAPS_1;
data[1].permitted = REQUIRED_CAPS_1;
data[1].inheritable = 0;
data[0].effective = caps0;

This comment has been minimized.

@alexlarsson

alexlarsson Oct 3, 2016

Member

Don't you need to set permitted?

@alexlarsson

alexlarsson Oct 3, 2016

Member

Don't you need to set permitted?

Show outdated Hide outdated bubblewrap.c
@@ -408,17 +421,16 @@ acquire_caps (void)
/* Drop root uid, but retain the required permitted caps */
if (setuid (getuid ()) < 0)
die_with_error ("unable to drop privs");
did_setuid = TRUE;

This comment has been minimized.

@alexlarsson

alexlarsson Oct 3, 2016

Member

This approach means the file-caps based version of bwrap doesn't work. Although honestly, thats kinda usless vs setuid as it needs cap_sysadmin anyway.

@alexlarsson

alexlarsson Oct 3, 2016

Member

This approach means the file-caps based version of bwrap doesn't work. Although honestly, thats kinda usless vs setuid as it needs cap_sysadmin anyway.

Show outdated Hide outdated bubblewrap.c
static uint32_t requested_caps[2] = {BASE_CAPS0, BASE_CAPS1};
/* low 32bit caps needed */
#define REQUIRED_SETUP_CAPS_0 (BASE_CAPS0 | CAP_TO_MASK (CAP_NET_ADMIN) | CAP_TO_MASK (CAP_SYS_ADMIN))

This comment has been minimized.

@alexlarsson

alexlarsson Oct 3, 2016

Member

You dropped CHROOT, SETUID and SETGID from REQUIRED_CAPS. These are needed for chroot() and user namespaces to work in the setup.

@alexlarsson

alexlarsson Oct 3, 2016

Member

You dropped CHROOT, SETUID and SETGID from REQUIRED_CAPS. These are needed for chroot() and user namespaces to work in the setup.

This comment has been minimized.

@giuseppe

giuseppe Oct 3, 2016

Member

they are coming from BASE_CAPS0

@giuseppe

giuseppe Oct 3, 2016

Member

they are coming from BASE_CAPS0

Show outdated Hide outdated bubblewrap.c
| CAP_TO_MASK (CAP_DAC_OVERRIDE) | CAP_TO_MASK(CAP_SETFCAP) | CAP_TO_MASK(CAP_SETPCAP) \
| CAP_TO_MASK (CAP_SETGID) | CAP_TO_MASK (CAP_SETUID) | CAP_TO_MASK (CAP_MKNOD) | CAP_TO_MASK (CAP_CHOWN) \
| CAP_TO_MASK (CAP_FOWNER) | CAP_TO_MASK (CAP_FSETID) | CAP_TO_MASK (CAP_KILL) \
| CAP_TO_MASK (CAP_SYS_CHROOT))

This comment has been minimized.

@alexlarsson

alexlarsson Oct 3, 2016

Member

Adding CAP_DAC_OVERRIDE here is vastly increasing the risk of having bubblewrap setuid. The whole reason we're doing this dance where we're only saving the caps we need for setting up the sandbox but running as the final user is that we can then be reasonably sure that the things that end up in the final sandbox is what the original user can access.

If the setup is run with DAC_OVERRIDE, then any source path you specifiy will be read with root privileges, which will let regular users access root-only files.

@alexlarsson

alexlarsson Oct 3, 2016

Member

Adding CAP_DAC_OVERRIDE here is vastly increasing the risk of having bubblewrap setuid. The whole reason we're doing this dance where we're only saving the caps we need for setting up the sandbox but running as the final user is that we can then be reasonably sure that the things that end up in the final sandbox is what the original user can access.

If the setup is run with DAC_OVERRIDE, then any source path you specifiy will be read with root privileges, which will let regular users access root-only files.

Show outdated Hide outdated bubblewrap.c
if (capset (&hdr, data) < 0)
die_with_error ("capset failed");
if (prctl (PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) < 0)
die ("could not set no new privs");
}

This comment has been minimized.

@alexlarsson

alexlarsson Oct 3, 2016

Member

Why this? We already do this in main()

@alexlarsson

alexlarsson Oct 3, 2016

Member

Why this? We already do this in main()

Show outdated Hide outdated bubblewrap.c
if (opt_unshare_user)
{
caps0 |= CAP_TO_MASK (CAP_CHOWN) | CAP_TO_MASK(CAP_SETUID) | CAP_TO_MASK (CAP_SETGID) | CAP_TO_MASK (CAP_FOWNER) | CAP_TO_MASK (CAP_DAC_OVERRIDE);
}

This comment has been minimized.

@alexlarsson

alexlarsson Oct 3, 2016

Member

I don't understand why you want the sandbox to have these capabilities by default just because some namespace is enabled. For instance, DAC_OVERRIDE means that a file that is read-only to you is still readable, no? That seems very unexpected.
Overall, this seems quite risky.

@alexlarsson

alexlarsson Oct 3, 2016

Member

I don't understand why you want the sandbox to have these capabilities by default just because some namespace is enabled. For instance, DAC_OVERRIDE means that a file that is read-only to you is still readable, no? That seems very unexpected.
Overall, this seems quite risky.

This comment has been minimized.

@giuseppe

giuseppe Oct 3, 2016

Member

thanks for the rewiew. Yes, this patch series is risky (and some parts too hacky), so I wanted to share my work in progress so to get comments before I spend more time on it.

IIUIC, even with DAC_OVERRIDE, the container should not be able to access files owned by users that have no mapping in the container user namespace. The only users that get a mapping in the container namespace are the uid of the user using bwrap and the range 10000-10999.

@giuseppe

giuseppe Oct 3, 2016

Member

thanks for the rewiew. Yes, this patch series is risky (and some parts too hacky), so I wanted to share my work in progress so to get comments before I spend more time on it.

IIUIC, even with DAC_OVERRIDE, the container should not be able to access files owned by users that have no mapping in the container user namespace. The only users that get a mapping in the container namespace are the uid of the user using bwrap and the range 10000-10999.

This comment has been minimized.

@alexlarsson

alexlarsson Oct 3, 2016

Member

Are you sure about that? Anyway, you still automatically grant the process properties that it wouldn't have if it were run with that capabilities. For instance if the user is "1000" it can now write to a file owned by user 1000 which is not marked with "u+w" permissions.

@alexlarsson

alexlarsson Oct 3, 2016

Member

Are you sure about that? Anyway, you still automatically grant the process properties that it wouldn't have if it were run with that capabilities. For instance if the user is "1000" it can now write to a file owned by user 1000 which is not marked with "u+w" permissions.

Show outdated Hide outdated bubblewrap.c
}
line3 = xasprintf ("%d %d 1\n", sandbox_id, parent_id);
return strconcat4 (line1, line2, line3, line4);
}

This comment has been minimized.

@alexlarsson

alexlarsson Oct 3, 2016

Member

This breaks any use of non-privileged user namespaces use, because that only allows a single mapping.

@alexlarsson

alexlarsson Oct 3, 2016

Member

This breaks any use of non-privileged user namespaces use, because that only allows a single mapping.

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Oct 3, 2016

Member

Overall, this patch series seems very risky to me. It would need very thorough review to make sure it doesn't make it possible to gain increased privileges for non-root users.

Member

alexlarsson commented Oct 3, 2016

Overall, this patch series seems very risky to me. It would need very thorough review to make sure it doesn't make it possible to gain increased privileges for non-root users.

Show outdated Hide outdated bubblewrap.c
}
if (is_privileged)
{
uint32_t caps0 = REQUIRED_SETUP_CAPS_0 | requested_caps[0];

This comment has been minimized.

@cgwalters

cgwalters Oct 3, 2016

Member

Isn't this a huge security hole if bubblewrap is installed setuid? I think we should be verifying whether getuid() == 0 or so.

@cgwalters

cgwalters Oct 3, 2016

Member

Isn't this a huge security hole if bubblewrap is installed setuid? I think we should be verifying whether getuid() == 0 or so.

This comment has been minimized.

@alexlarsson

alexlarsson Oct 3, 2016

Member

is_privileged means "we're not using user namespaces" in the code, not that we're initially launched by root.
The reason its doing this check instead of comparing effective uid is to also support filesystem caps instead of setuid.

@alexlarsson

alexlarsson Oct 3, 2016

Member

is_privileged means "we're not using user namespaces" in the code, not that we're initially launched by root.
The reason its doing this check instead of comparing effective uid is to also support filesystem caps instead of setuid.

@cgwalters

This comment has been minimized.

Show comment
Hide comment
@cgwalters

cgwalters Oct 13, 2016

Member

You could probably break out the "--no-reaper" as a separate PR, it looks safe to merge now, separately from higher risk changes.

Member

cgwalters commented Oct 13, 2016

You could probably break out the "--no-reaper" as a separate PR, it looks safe to merge now, separately from higher risk changes.

@giuseppe giuseppe changed the title from [RFC] Privileged systemd to bwrap: capabilities support Oct 18, 2016

@giuseppe

This comment has been minimized.

Show comment
Hide comment
@giuseppe

giuseppe Oct 18, 2016

Member

I have done some simplifications:

  1. the root user is not treated differently than other users.
  2. --cap-add supports only a subset of caps, which are safe to use in the sandbox.
  3. by default, do not leave any cap in the process.
  4. fix the FILECAPS mode.
  5. now --sys uses a bind mount to /sys.
  6. The additional uids/gids to use in the userns are read from /etc/subuid and /etc/subgid, the same conf files are used by newgidmap and newuidmap.

I had a discussion with Eric Biederman about enabling setgroups, he said that it is safe to enable setgroups when there is a mapping for the user in /etc/subuid, as there shouldn't be any mapping on a system where negative groups privileges are used (w.r.t CVE-2014-8989). I have not done this in the patch set yet, and I preferred to leave it as it is for now.
My suggestion to address this issue was to add a third mode "shadow" to /proc/self/setgroups so that the groups that were present on the userns creation could not be dropped. This is more invasive as it requires changes in the kernel: https://github.com/giuseppe/linux/tree/proc-setgroups-shadow.
Not a big issue anyway, as the systemd changes got merged and now it treats setgroups (0, NULL) as a warning.

I've also dropped CAP_AUDIT_*, as systemd will detect when these caps are not available and not error out.

Member

giuseppe commented Oct 18, 2016

I have done some simplifications:

  1. the root user is not treated differently than other users.
  2. --cap-add supports only a subset of caps, which are safe to use in the sandbox.
  3. by default, do not leave any cap in the process.
  4. fix the FILECAPS mode.
  5. now --sys uses a bind mount to /sys.
  6. The additional uids/gids to use in the userns are read from /etc/subuid and /etc/subgid, the same conf files are used by newgidmap and newuidmap.

I had a discussion with Eric Biederman about enabling setgroups, he said that it is safe to enable setgroups when there is a mapping for the user in /etc/subuid, as there shouldn't be any mapping on a system where negative groups privileges are used (w.r.t CVE-2014-8989). I have not done this in the patch set yet, and I preferred to leave it as it is for now.
My suggestion to address this issue was to add a third mode "shadow" to /proc/self/setgroups so that the groups that were present on the userns creation could not be dropped. This is more invasive as it requires changes in the kernel: https://github.com/giuseppe/linux/tree/proc-setgroups-shadow.
Not a big issue anyway, as the systemd changes got merged and now it treats setgroups (0, NULL) as a warning.

I've also dropped CAP_AUDIT_*, as systemd will detect when these caps are not available and not error out.

@giuseppe

This comment has been minimized.

Show comment
Hide comment
@giuseppe

giuseppe Oct 25, 2016

Member

Is there anything left before we can start reviewing this PR?

Member

giuseppe commented Oct 25, 2016

Is there anything left before we can start reviewing this PR?

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Oct 25, 2016

Member

Can you give a highlevel overview of the security aspects of this? I mean, what are you exposing and how is it safe to give that to unprivileged users?

For instance it seems to me that this relies on user namespaces to contain things, and given the discussion in #107 it seems like if the goal is to give out "a safe subset of user-namespace-like functionallity" then we can't actually use user namespaces at all in the setuid version of bwrap (due to the ptrace issue).

Member

alexlarsson commented Oct 25, 2016

Can you give a highlevel overview of the security aspects of this? I mean, what are you exposing and how is it safe to give that to unprivileged users?

For instance it seems to me that this relies on user namespaces to contain things, and given the discussion in #107 it seems like if the goal is to give out "a safe subset of user-namespace-like functionallity" then we can't actually use user namespaces at all in the setuid version of bwrap (due to the ptrace issue).

@giuseppe

This comment has been minimized.

Show comment
Hide comment
@giuseppe

giuseppe Oct 26, 2016

Member

there are basically two big changes introduced by this series:

  1. we don't map only a single user in the new user namespace, but a range of them. This is needed so that systemd will be able to use users with uid < 1000 in the new user namespace. Those are mapped to some high values in the parent namespace. If the file /etc/subuids (the same file used by newuidmap) is present, then the range specified there is used.

  2. we leave some caps in the sandbox, user namespaces restricts what users are available so that caps like CAP_SETGID, CAP_SETUID (and all of those which are in the ALLOWED_CAPS0_UNSHARE_USER mask) allows to use only uids/gids that have a mapping in the new namespace. In this namespace only the parent uid/gid + a range of high uids/gids have a mapping, so making not possible to switch to some other user in the parent namespace.

With this new patchset, bubblewrap needs to keep CAP_DAC_OVERRIDE as that can be required in the user namespace, this makes possible to drop completely the need of PR_SET_DUMPABLE, I have added a patch that does it.

Member

giuseppe commented Oct 26, 2016

there are basically two big changes introduced by this series:

  1. we don't map only a single user in the new user namespace, but a range of them. This is needed so that systemd will be able to use users with uid < 1000 in the new user namespace. Those are mapped to some high values in the parent namespace. If the file /etc/subuids (the same file used by newuidmap) is present, then the range specified there is used.

  2. we leave some caps in the sandbox, user namespaces restricts what users are available so that caps like CAP_SETGID, CAP_SETUID (and all of those which are in the ALLOWED_CAPS0_UNSHARE_USER mask) allows to use only uids/gids that have a mapping in the new namespace. In this namespace only the parent uid/gid + a range of high uids/gids have a mapping, so making not possible to switch to some other user in the parent namespace.

With this new patchset, bubblewrap needs to keep CAP_DAC_OVERRIDE as that can be required in the user namespace, this makes possible to drop completely the need of PR_SET_DUMPABLE, I have added a patch that does it.

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Oct 26, 2016

Member

If you have CAP_DAC_OVERRIDE you can use bubblewrap to overwrite /etc/passwd with something like:

--bind / / --file some-fd /etc/passwd

Doesn't seem very safe...

Member

alexlarsson commented Oct 26, 2016

If you have CAP_DAC_OVERRIDE you can use bubblewrap to overwrite /etc/passwd with something like:

--bind / / --file some-fd /etc/passwd

Doesn't seem very safe...

@giuseppe

This comment has been minimized.

Show comment
Hide comment
@giuseppe

giuseppe Oct 26, 2016

Member

we drop all the caps in the unprivileged setup process, so that should not happen no?

I get this:

$ echo hi | bwrap --unshare-user --unshare-pid --unshare-net --cap-add ALL --bind / / --file 0 /etc/passwd echo done
Can't create file /etc/passwd: Permission denied
Invalid size 0 from unprivileged helper

Member

giuseppe commented Oct 26, 2016

we drop all the caps in the unprivileged setup process, so that should not happen no?

I get this:

$ echo hi | bwrap --unshare-user --unshare-pid --unshare-net --cap-add ALL --bind / / --file 0 /etc/passwd echo done
Can't create file /etc/passwd: Permission denied
Invalid size 0 from unprivileged helper

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Oct 26, 2016

Member

Ah, that is true, I think its still unsafe though due to the ptrace hole discussed in the above link.
As soon as you clone NEW_USER the child is ptraceable by users in the parent namespace. If this process has DAC_OVERRIDE, can't you then make it do whatever it wants?

Honestly, i'm not sure what the sematics of DAC_OVERRIDE is in a user namespace...

Member

alexlarsson commented Oct 26, 2016

Ah, that is true, I think its still unsafe though due to the ptrace hole discussed in the above link.
As soon as you clone NEW_USER the child is ptraceable by users in the parent namespace. If this process has DAC_OVERRIDE, can't you then make it do whatever it wants?

Honestly, i'm not sure what the sematics of DAC_OVERRIDE is in a user namespace...

@giuseppe

This comment has been minimized.

Show comment
Hide comment
@giuseppe

giuseppe Oct 26, 2016

Member

the only risk I see happens if ptrace could be exploited before the uid_map/gid_map are in place.
I didn't expect that it would be possible to ptrace a process in a new user namespace even before the uid_map is set.

To avoid the problem altogether, I've added another patch that changes the uid of the process back to 0 until the uid_map file is written, basically locking ptrace from non root users until the user mapping is in place.

Once that is done, it is not possible to write a new map file and all the caps are restricted to the users defined in the user namespace, so that won't be an issue with DAC_OVERRIDE as well. DAC_OVERRIDE is restricted to the users in the namespace. If an user has no mapping in the new user namespace, then it will fail with EPERM.

Member

giuseppe commented Oct 26, 2016

the only risk I see happens if ptrace could be exploited before the uid_map/gid_map are in place.
I didn't expect that it would be possible to ptrace a process in a new user namespace even before the uid_map is set.

To avoid the problem altogether, I've added another patch that changes the uid of the process back to 0 until the uid_map file is written, basically locking ptrace from non root users until the user mapping is in place.

Once that is done, it is not possible to write a new map file and all the caps are restricted to the users defined in the user namespace, so that won't be an issue with DAC_OVERRIDE as well. DAC_OVERRIDE is restricted to the users in the namespace. If an user has no mapping in the new user namespace, then it will fail with EPERM.

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Oct 26, 2016

Member

There is still a fundamental question about what bubblewrap provides here that we need to figure out. Lots of distros (including rhel7) disable user namespaces or make it only accessible by root, because the attack surface for user namespaces is so large (see e.g. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-3135).

The story so far is that we expose only a subset of the user namespace operations, which we deem safe (mainly bind mounts, etc), so it is safe to allow this setuid binary to run for everyone, even though everyone is not allowed use of user namespace.

However, due to the ptrace issue we're not really any more secure than a general user namespace, because bwrap can be attacked via ptrace to allow arbitrary code execution in the privileged-in-user-ns part of the setup, and bwrap could thus be used to attack e.g. the CVE above. This means that a setuid bwrap is essentially neutering the distro decision to limit user namespaces to root, which I think is a bad idea.

So, the question here is if we can use user namespaces at all...

Member

alexlarsson commented Oct 26, 2016

There is still a fundamental question about what bubblewrap provides here that we need to figure out. Lots of distros (including rhel7) disable user namespaces or make it only accessible by root, because the attack surface for user namespaces is so large (see e.g. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-3135).

The story so far is that we expose only a subset of the user namespace operations, which we deem safe (mainly bind mounts, etc), so it is safe to allow this setuid binary to run for everyone, even though everyone is not allowed use of user namespace.

However, due to the ptrace issue we're not really any more secure than a general user namespace, because bwrap can be attacked via ptrace to allow arbitrary code execution in the privileged-in-user-ns part of the setup, and bwrap could thus be used to attack e.g. the CVE above. This means that a setuid bwrap is essentially neutering the distro decision to limit user namespaces to root, which I think is a bad idea.

So, the question here is if we can use user namespaces at all...

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Oct 26, 2016

Member

(And obviously, the present patch fully depends on user namespaces)

Member

alexlarsson commented Oct 26, 2016

(And obviously, the present patch fully depends on user namespaces)

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Oct 26, 2016

Member

@giuseppe Also, i'm not sure why having a uid of 0 disallows ptrace? The parent-user-namespace check seems to override everything to always allow ptrace.

Member

alexlarsson commented Oct 26, 2016

@giuseppe Also, i'm not sure why having a uid of 0 disallows ptrace? The parent-user-namespace check seems to override everything to always allow ptrace.

@giuseppe

This comment has been minimized.

Show comment
Hide comment
@giuseppe

giuseppe Oct 26, 2016

Member

@alexlarsson I had a look in the kernel to understand how the access is granted, the relevant code seems to be in __ptrace_may_access (ptrace.c):

    tcred = __task_cred(task);
    if (uid_eq(caller_uid, tcred->euid) &&
        uid_eq(caller_uid, tcred->suid) &&
        uid_eq(caller_uid, tcred->uid)  &&
        gid_eq(caller_gid, tcred->egid) &&
        gid_eq(caller_gid, tcred->sgid) &&
        gid_eq(caller_gid, tcred->gid))
        goto ok;
    if (ptrace_has_cap(tcred->user_ns, mode))
        goto ok;
    rcu_read_unlock();
    return -EPERM;

if I understand it correctly, the access is granted either if the process has CAP_SYS_PTRACE, or it has the same credentials as the target process.

I've tried adding a sleep between the raw_clone and in facts I could ptrace the container without the setuid trick. With the latest patch, I get an error when I try to ptrace the process.

Member

giuseppe commented Oct 26, 2016

@alexlarsson I had a look in the kernel to understand how the access is granted, the relevant code seems to be in __ptrace_may_access (ptrace.c):

    tcred = __task_cred(task);
    if (uid_eq(caller_uid, tcred->euid) &&
        uid_eq(caller_uid, tcred->suid) &&
        uid_eq(caller_uid, tcred->uid)  &&
        gid_eq(caller_gid, tcred->egid) &&
        gid_eq(caller_gid, tcred->sgid) &&
        gid_eq(caller_gid, tcred->gid))
        goto ok;
    if (ptrace_has_cap(tcred->user_ns, mode))
        goto ok;
    rcu_read_unlock();
    return -EPERM;

if I understand it correctly, the access is granted either if the process has CAP_SYS_PTRACE, or it has the same credentials as the target process.

I've tried adding a sleep between the raw_clone and in facts I could ptrace the container without the setuid trick. With the latest patch, I get an error when I try to ptrace the process.

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Oct 26, 2016

Member

Oh, i see, the override with user namespaces is below, after the ok::

    if (dumpable != SUID_DUMP_USER &&
        !ptrace_has_cap(__task_cred(task)->user_ns, mode)) {
        rcu_read_unlock();
        return -EPERM;
    }

So, by running the privileged part as the real uid 0 we can avoid the ptrace issue, but then we risk running into filesystem access problems, because if we can somehow trick the privileged part into doing file i/o we will be doing the file lookups as root, not the user. The main reason I chose to run as the user was to guarantee that all file i/o we did (say --bind /etc/shadow /my/file) would have the permissions of the user.

However, we can probably avoid this issue by using fsuid == real user. Then we'll get file dac checks as per the actual user.

Member

alexlarsson commented Oct 26, 2016

Oh, i see, the override with user namespaces is below, after the ok::

    if (dumpable != SUID_DUMP_USER &&
        !ptrace_has_cap(__task_cred(task)->user_ns, mode)) {
        rcu_read_unlock();
        return -EPERM;
    }

So, by running the privileged part as the real uid 0 we can avoid the ptrace issue, but then we risk running into filesystem access problems, because if we can somehow trick the privileged part into doing file i/o we will be doing the file lookups as root, not the user. The main reason I chose to run as the user was to guarantee that all file i/o we did (say --bind /etc/shadow /my/file) would have the permissions of the user.

However, we can probably avoid this issue by using fsuid == real user. Then we'll get file dac checks as per the actual user.

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Oct 26, 2016

Member

i.e. I want the above --bind /etc/shadow /my/file to fail with EPERM on sandbox startup, not later from accessing /my/file.

Member

alexlarsson commented Oct 26, 2016

i.e. I want the above --bind /etc/shadow /my/file to fail with EPERM on sandbox startup, not later from accessing /my/file.

@giuseppe

This comment has been minimized.

Show comment
Hide comment
@giuseppe

giuseppe Oct 26, 2016

Member

I get the same behaviour with the current bubblewrap:

$ bwrap --bind / / --tmpfs /tmp --bind /etc/shadow /tmp/shadow cat /tmp/shadow
cat: /tmp/shadow: Permission denied
Member

giuseppe commented Oct 26, 2016

I get the same behaviour with the current bubblewrap:

$ bwrap --bind / / --tmpfs /tmp --bind /etc/shadow /tmp/shadow cat /tmp/shadow
cat: /tmp/shadow: Permission denied
@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Oct 26, 2016

Member

I guess bind mounts don't need Open rights. But at least you can't e.g. mount a readable-by-the-user file from inside a readable-by-the-user directory, right?

Member

alexlarsson commented Oct 26, 2016

I guess bind mounts don't need Open rights. But at least you can't e.g. mount a readable-by-the-user file from inside a readable-by-the-user directory, right?

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Oct 26, 2016

Member

Eh, readable-only-by-root dir I mean

Member

alexlarsson commented Oct 26, 2016

Eh, readable-only-by-root dir I mean

@giuseppe

This comment has been minimized.

Show comment
Hide comment
@giuseppe

giuseppe Oct 26, 2016

Member

hmm. I just checked that and it will work in the same way. In the patch, only the write_uid_gid_map part is done with setuid (0), once the map is written, then the original uid is restored. All the mounts are done after that, where the uid is already the one of the user using bubblewrap.

Member

giuseppe commented Oct 26, 2016

hmm. I just checked that and it will work in the same way. In the patch, only the write_uid_gid_map part is done with setuid (0), once the map is written, then the original uid is restored. All the mounts are done after that, where the uid is already the one of the user using bubblewrap.

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Oct 26, 2016

Member

@giuseppe Ah, but in that model as soon as you drop back from uid0 you're ptracable again, and yes, the uid map may neuter DAC_OVERRIDE, but you still have all sort of other permissions in the user namespace that you can exploit (such as CVE-2016-3135, which just needed CAP_NET_ADMIN).

So, basically, in order to support even a basic (ignoring this patch-set) version of bwrap safely that has user namespaces support I think the only realistic approach is to always run the privileged part of the sandbox setup as (real) uid 0 to avoid ptraceability, but with fsuid=user uid to avoid being able to access files with as-uid-0 permissions. That would let us honestly say what the readme does:

Bubblewrap could be viewed as setuid implementation of a subset of user namespaces. Emphasis on subset - specifically relevant to the above CVE, bubblewrap does not allow control over iptables.

But, once we go beyond just fixing what bubblewrap does today, if we let the sandbox retain capabilities in the user namespace, even if there is some uid/gid remapping, then we can't really claim the above anymore. Basically we'd be overriding the decision of the distro to disable user namespaces in general for non-root. I'm not sure that is a good idea?

Member

alexlarsson commented Oct 26, 2016

@giuseppe Ah, but in that model as soon as you drop back from uid0 you're ptracable again, and yes, the uid map may neuter DAC_OVERRIDE, but you still have all sort of other permissions in the user namespace that you can exploit (such as CVE-2016-3135, which just needed CAP_NET_ADMIN).

So, basically, in order to support even a basic (ignoring this patch-set) version of bwrap safely that has user namespaces support I think the only realistic approach is to always run the privileged part of the sandbox setup as (real) uid 0 to avoid ptraceability, but with fsuid=user uid to avoid being able to access files with as-uid-0 permissions. That would let us honestly say what the readme does:

Bubblewrap could be viewed as setuid implementation of a subset of user namespaces. Emphasis on subset - specifically relevant to the above CVE, bubblewrap does not allow control over iptables.

But, once we go beyond just fixing what bubblewrap does today, if we let the sandbox retain capabilities in the user namespace, even if there is some uid/gid remapping, then we can't really claim the above anymore. Basically we'd be overriding the decision of the distro to disable user namespaces in general for non-root. I'm not sure that is a good idea?

@giuseppe

This comment has been minimized.

Show comment
Hide comment
@giuseppe

giuseppe Oct 26, 2016

Member

is there a centralized option to enable/disable unprivileged user namespaces? We could be conservative and use them only when explicitly enabled, and it won't be the default. I agree that this patch set might expose more security issues, but on the other hand, the alternative is to run a container as root, which won't be safer than this.

Member

giuseppe commented Oct 26, 2016

is there a centralized option to enable/disable unprivileged user namespaces? We could be conservative and use them only when explicitly enabled, and it won't be the default. I agree that this patch set might expose more security issues, but on the other hand, the alternative is to run a container as root, which won't be safer than this.

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Oct 26, 2016

Member

It varies from distro to distro. There is no way upstream (but its been proposed), but some distros have sysctls that let you enable it. We could hook of something like that.

Member

alexlarsson commented Oct 26, 2016

It varies from distro to distro. There is no way upstream (but its been proposed), but some distros have sysctls that let you enable it. We could hook of something like that.

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Oct 26, 2016

Member

@cgwalters Do you have an opinion here?

Member

alexlarsson commented Oct 26, 2016

@cgwalters Do you have an opinion here?

@giuseppe

This comment has been minimized.

Show comment
Hide comment
@giuseppe

giuseppe Oct 26, 2016

Member

@alexlarsson @cgwalters honoring /proc/sys/kernel/unprivileged_userns_clone would be fine?

Member

giuseppe commented Oct 26, 2016

@alexlarsson @cgwalters honoring /proc/sys/kernel/unprivileged_userns_clone would be fine?

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Oct 26, 2016

Member

@giuseppe That is not available on my F24 kernel. I know it is in debian jessie. Is it in rhel?

But anyway, if that is available and set to true, then clearly its fine to use it because anyone can use user-ns, no setuid required at all.

Member

alexlarsson commented Oct 26, 2016

@giuseppe That is not available on my F24 kernel. I know it is in debian jessie. Is it in rhel?

But anyway, if that is available and set to true, then clearly its fine to use it because anyone can use user-ns, no setuid required at all.

@giuseppe

This comment has been minimized.

Show comment
Hide comment
@giuseppe

giuseppe Oct 26, 2016

Member

I've not found anything present for Fedora, so I thought we could use that, as anyway if it is set, it was explicitly requested. I'll investigate better if something exists for Fedora.

Member

giuseppe commented Oct 26, 2016

I've not found anything present for Fedora, so I thought we could use that, as anyway if it is set, it was explicitly requested. I'll investigate better if something exists for Fedora.

@giuseppe

This comment has been minimized.

Show comment
Hide comment
@giuseppe

giuseppe Oct 27, 2016

Member

After looking into it, I am not sure proc/sys/kernel/unprivileged_userns_clone would be the right thing to do, we can add this check but for now I've changed the patch set to allow user namespaces only when the user is explicitly allowed in the /etc/subuid and /etc/subgid files. These are the same configuration files used by newuidmap and Docker 1.10 for its user namespaces. While Docker is used by root, so it is not exactly the same thing, newuidmap is a setuid program, which makes it somehow similar to this use case.

Member

giuseppe commented Oct 27, 2016

After looking into it, I am not sure proc/sys/kernel/unprivileged_userns_clone would be the right thing to do, we can add this check but for now I've changed the patch set to allow user namespaces only when the user is explicitly allowed in the /etc/subuid and /etc/subgid files. These are the same configuration files used by newuidmap and Docker 1.10 for its user namespaces. While Docker is used by root, so it is not exactly the same thing, newuidmap is a setuid program, which makes it somehow similar to this use case.

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Oct 27, 2016

Member

That makes a lot of sense to me. I mean, the entire thing hinges on being allowed to use some subset of uids anyway, right? We can't just grab a range and use it, because it may conflict with other use on the machine.

Member

alexlarsson commented Oct 27, 2016

That makes a lot of sense to me. I mean, the entire thing hinges on being allowed to use some subset of uids anyway, right? We can't just grab a range and use it, because it may conflict with other use on the machine.

@giuseppe

This comment has been minimized.

Show comment
Hide comment
@giuseppe

giuseppe Oct 27, 2016

Member

yes exactly. I dropped completely the patch that would set "to some random high uids/gids" and rely completely on what is defined in the /etc/sub{uid,gid} files.

Member

giuseppe commented Oct 27, 2016

yes exactly. I dropped completely the patch that would set "to some random high uids/gids" and rely completely on what is defined in the /etc/sub{uid,gid} files.

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Oct 27, 2016

Member

So, newuidmap is setuid, allowing it to write to the uid_map and extend the user namespace uid-mapping in ways which would otherwise not be allowed by the kernel support for unprivileged user namespaces. That makes sense, because these are root-owned files that defines the access. Using these files in a setuid app that actually creates user namespaces is a bit different though, because it additionally allows you to create such a user namespace and retain capabilities in it.

So, it does let the sysadmin control who can do this, but it also increases the security risk of adding a user to /etc/subuid. Is this ok? Maybe, but at the very least it needs to be clearly messages. @cgwalters opinions?

Member

alexlarsson commented Oct 27, 2016

So, newuidmap is setuid, allowing it to write to the uid_map and extend the user namespace uid-mapping in ways which would otherwise not be allowed by the kernel support for unprivileged user namespaces. That makes sense, because these are root-owned files that defines the access. Using these files in a setuid app that actually creates user namespaces is a bit different though, because it additionally allows you to create such a user namespace and retain capabilities in it.

So, it does let the sysadmin control who can do this, but it also increases the security risk of adding a user to /etc/subuid. Is this ok? Maybe, but at the very least it needs to be clearly messages. @cgwalters opinions?

@giuseppe

This comment has been minimized.

Show comment
Hide comment
@giuseppe

giuseppe Oct 27, 2016

Member

I had a private conversation with Eric Biederman about this and he suggested me to use these two files for the configuration of the user namespaces and also to check whether setgroups is permitted, since a system that uses negative file permissions should not allow an user in /etc/subgid.

We could make it even stricter, I didn't find any place where this setting could be stored, /proc/sys/kernel/unprivileged_userns_clone seems to have a different meaning and would be better to not expect it to be set in order to use the bubblewrap features.

Member

giuseppe commented Oct 27, 2016

I had a private conversation with Eric Biederman about this and he suggested me to use these two files for the configuration of the user namespaces and also to check whether setgroups is permitted, since a system that uses negative file permissions should not allow an user in /etc/subgid.

We could make it even stricter, I didn't find any place where this setting could be stored, /proc/sys/kernel/unprivileged_userns_clone seems to have a different meaning and would be better to not expect it to be set in order to use the bubblewrap features.

@giuseppe

This comment has been minimized.

Show comment
Hide comment
@giuseppe

giuseppe Jun 28, 2017

Member

I've addressed your comments and cleaned up some of the mess I've introduced with some fixup commits. Now the patches are even smaller (dropped one altogether) than in the last iteration.

Member

giuseppe commented Jun 28, 2017

I've addressed your comments and cleaned up some of the mess I've introduced with some fixup commits. Now the patches are even smaller (dropped one altogether) than in the last iteration.

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Jun 29, 2017

Member

That is better, but still needs work.
Only pid 1 has the caps:

./bwrap --unshare-pid --unshare-user  --cap-add ALL  --bind / / --proc /proc sh
sh-4.4$ grep Cap /proc/*/status
/proc/1/status:CapInh:	0000000000000000
/proc/1/status:CapPrm:	0000003fffffffff
/proc/1/status:CapEff:	0000003fffffffff
/proc/1/status:CapBnd:	0000003fffffffff
/proc/1/status:CapAmb:	0000000000000000
/proc/2/status:CapInh:	0000000000000000
/proc/2/status:CapPrm:	0000000000000000
/proc/2/status:CapEff:	0000000000000000
/proc/2/status:CapBnd:	0000003fffffffff
/proc/2/status:CapAmb:	0000000000000000

This is because you set inherited to 0, which means caps will not follow across exec.
You need to set inherited to the requested set.

Additionally, in the non --as-pid-1 case you need to drop all the caps again after fork:in off pid1, because we don't want that to retain the caps.

Member

alexlarsson commented Jun 29, 2017

That is better, but still needs work.
Only pid 1 has the caps:

./bwrap --unshare-pid --unshare-user  --cap-add ALL  --bind / / --proc /proc sh
sh-4.4$ grep Cap /proc/*/status
/proc/1/status:CapInh:	0000000000000000
/proc/1/status:CapPrm:	0000003fffffffff
/proc/1/status:CapEff:	0000003fffffffff
/proc/1/status:CapBnd:	0000003fffffffff
/proc/1/status:CapAmb:	0000000000000000
/proc/2/status:CapInh:	0000000000000000
/proc/2/status:CapPrm:	0000000000000000
/proc/2/status:CapEff:	0000000000000000
/proc/2/status:CapBnd:	0000003fffffffff
/proc/2/status:CapAmb:	0000000000000000

This is because you set inherited to 0, which means caps will not follow across exec.
You need to set inherited to the requested set.

Additionally, in the non --as-pid-1 case you need to drop all the caps again after fork:in off pid1, because we don't want that to retain the caps.

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Jun 29, 2017

Member

Now the capabilities are only in the inherited set:

$ ./bwrap --unshare-pid --unshare-user  --cap-add ALL  --bind / / --proc /proc sh
sh-4.4$ grep Cap /proc/*/status
/proc/1/status:CapInh:	0000000000000000
/proc/1/status:CapPrm:	0000000000000000
/proc/1/status:CapEff:	0000000000000000
/proc/1/status:CapBnd:	0000003fffffffff
/proc/1/status:CapAmb:	0000000000000000
/proc/2/status:CapInh:	0000003fffffffff
/proc/2/status:CapPrm:	0000000000000000
/proc/2/status:CapEff:	0000000000000000
/proc/2/status:CapBnd:	0000003fffffffff
/proc/2/status:CapAmb:	0000000000000000

Additionally, it seems like the capability bounding set is set for all caps, rather than just the requested ones.

Member

alexlarsson commented Jun 29, 2017

Now the capabilities are only in the inherited set:

$ ./bwrap --unshare-pid --unshare-user  --cap-add ALL  --bind / / --proc /proc sh
sh-4.4$ grep Cap /proc/*/status
/proc/1/status:CapInh:	0000000000000000
/proc/1/status:CapPrm:	0000000000000000
/proc/1/status:CapEff:	0000000000000000
/proc/1/status:CapBnd:	0000003fffffffff
/proc/1/status:CapAmb:	0000000000000000
/proc/2/status:CapInh:	0000003fffffffff
/proc/2/status:CapPrm:	0000000000000000
/proc/2/status:CapEff:	0000000000000000
/proc/2/status:CapBnd:	0000003fffffffff
/proc/2/status:CapAmb:	0000000000000000

Additionally, it seems like the capability bounding set is set for all caps, rather than just the requested ones.

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Jun 29, 2017

Member

It seems like we have to mess around with the "ambient" capabilities set in order to be able to pass capabilities across execve() when not being root.

Member

alexlarsson commented Jun 29, 2017

It seems like we have to mess around with the "ambient" capabilities set in order to be able to pass capabilities across execve() when not being root.

@giuseppe

This comment has been minimized.

Show comment
Hide comment
@giuseppe

giuseppe Jun 29, 2017

Member

@alexlarsson I've not checked the uid != case as I was looking for a way for running systemd. Anyway, I agree it makes sense to cover it in this PR as well. Going to work on it

Member

giuseppe commented Jun 29, 2017

@alexlarsson I've not checked the uid != case as I was looking for a way for running systemd. Anyway, I agree it makes sense to cover it in this PR as well. Going to work on it

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Jun 29, 2017

Member

Maybe if this is is too complicated we should make it only work in the as-pid-1 case?

Member

alexlarsson commented Jun 29, 2017

Maybe if this is is too complicated we should make it only work in the as-pid-1 case?

@giuseppe

This comment has been minimized.

Show comment
Hide comment
@giuseppe

giuseppe Jun 29, 2017

Member

@alexlarsson added a fixup patch to take care of the ambient capabilities. This is what I have now:

$ ./bwrap  --unshare-pid --unshare-user --cap-add cap_sys_admin --bind / / --proc /proc sh -c "grep Cap /proc/*/status"
/proc/1/status:CapInh:	0000000000000000
/proc/1/status:CapPrm:	0000000000000000
/proc/1/status:CapEff:	0000000000000000
/proc/1/status:CapBnd:	0000000000200000
/proc/1/status:CapAmb:	0000000000000000
/proc/2/status:CapInh:	0000000000200000
/proc/2/status:CapPrm:	0000000000200000
/proc/2/status:CapEff:	0000000000200000
/proc/2/status:CapBnd:	0000000000200000
/proc/2/status:CapAmb:	0000000000200000
/proc/self/status:CapInh:	0000000000200000
/proc/self/status:CapPrm:	0000000000200000
/proc/self/status:CapEff:	0000000000200000
/proc/self/status:CapBnd:	0000000000200000
/proc/self/status:CapAmb:	0000000000200000
/proc/thread-self/status:CapInh:	0000000000200000
/proc/thread-self/status:CapPrm:	0000000000200000
/proc/thread-self/status:CapEff:	0000000000200000
/proc/thread-self/status:CapBnd:	0000000000200000
/proc/thread-self/status:CapAmb:	0000000000200000


$ ./bwrap  --unshare-pid --unshare-user  --bind / / --proc /proc sh -c "grep Cap /proc/*/status"
/proc/1/status:CapInh:	0000000000000000
/proc/1/status:CapPrm:	0000000000000000
/proc/1/status:CapEff:	0000000000000000
/proc/1/status:CapBnd:	0000000000000000
/proc/1/status:CapAmb:	0000000000000000
/proc/2/status:CapInh:	0000000000000000
/proc/2/status:CapPrm:	0000000000000000
/proc/2/status:CapEff:	0000000000000000
/proc/2/status:CapBnd:	0000000000000000
/proc/2/status:CapAmb:	0000000000000000
/proc/self/status:CapInh:	0000000000000000
/proc/self/status:CapPrm:	0000000000000000
/proc/self/status:CapEff:	0000000000000000
/proc/self/status:CapBnd:	0000000000000000
/proc/self/status:CapAmb:	0000000000000000
/proc/thread-self/status:CapInh:	0000000000000000
/proc/thread-self/status:CapPrm:	0000000000000000
/proc/thread-self/status:CapEff:	0000000000000000
/proc/thread-self/status:CapBnd:	0000000000000000
/proc/thread-self/status:CapAmb:	0000000000000000

and the systemd demo still works :-)

Member

giuseppe commented Jun 29, 2017

@alexlarsson added a fixup patch to take care of the ambient capabilities. This is what I have now:

$ ./bwrap  --unshare-pid --unshare-user --cap-add cap_sys_admin --bind / / --proc /proc sh -c "grep Cap /proc/*/status"
/proc/1/status:CapInh:	0000000000000000
/proc/1/status:CapPrm:	0000000000000000
/proc/1/status:CapEff:	0000000000000000
/proc/1/status:CapBnd:	0000000000200000
/proc/1/status:CapAmb:	0000000000000000
/proc/2/status:CapInh:	0000000000200000
/proc/2/status:CapPrm:	0000000000200000
/proc/2/status:CapEff:	0000000000200000
/proc/2/status:CapBnd:	0000000000200000
/proc/2/status:CapAmb:	0000000000200000
/proc/self/status:CapInh:	0000000000200000
/proc/self/status:CapPrm:	0000000000200000
/proc/self/status:CapEff:	0000000000200000
/proc/self/status:CapBnd:	0000000000200000
/proc/self/status:CapAmb:	0000000000200000
/proc/thread-self/status:CapInh:	0000000000200000
/proc/thread-self/status:CapPrm:	0000000000200000
/proc/thread-self/status:CapEff:	0000000000200000
/proc/thread-self/status:CapBnd:	0000000000200000
/proc/thread-self/status:CapAmb:	0000000000200000


$ ./bwrap  --unshare-pid --unshare-user  --bind / / --proc /proc sh -c "grep Cap /proc/*/status"
/proc/1/status:CapInh:	0000000000000000
/proc/1/status:CapPrm:	0000000000000000
/proc/1/status:CapEff:	0000000000000000
/proc/1/status:CapBnd:	0000000000000000
/proc/1/status:CapAmb:	0000000000000000
/proc/2/status:CapInh:	0000000000000000
/proc/2/status:CapPrm:	0000000000000000
/proc/2/status:CapEff:	0000000000000000
/proc/2/status:CapBnd:	0000000000000000
/proc/2/status:CapAmb:	0000000000000000
/proc/self/status:CapInh:	0000000000000000
/proc/self/status:CapPrm:	0000000000000000
/proc/self/status:CapEff:	0000000000000000
/proc/self/status:CapBnd:	0000000000000000
/proc/self/status:CapAmb:	0000000000000000
/proc/thread-self/status:CapInh:	0000000000000000
/proc/thread-self/status:CapPrm:	0000000000000000
/proc/thread-self/status:CapEff:	0000000000000000
/proc/thread-self/status:CapBnd:	0000000000000000
/proc/thread-self/status:CapAmb:	0000000000000000

and the systemd demo still works :-)

giuseppe added some commits Sep 23, 2016

bubblewrap: add --cap-add and --cap-drop
When using namespaces, permit to leave some capabilities in the
sandbox.  This can be helpful to run a system instance of systemd.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
bubblewrap: add option --userns-block-fd
It allows to configure the user namespace from outside.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
demos: add demo userns-block-fd.py
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
bubblewrap: do not always leave caps in the unprivileged case
When --unshare-user is used in the unprivileged case, all caps are
left to the sandboxed application.  Change it to leave only the
specified ones.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
bubblewrap.c: fix typo
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
tests: add tests for --cap-add
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@alexlarsson

This comment has been minimized.

Show comment
Hide comment
@alexlarsson

alexlarsson Jun 29, 2017

Member

This looks good to go to me. However, its waiting on some test, but I don't really understand which one. @cgwalters do you understand that?

Member

alexlarsson commented Jun 29, 2017

This looks good to go to me. However, its waiting on some test, but I don't really understand which one. @cgwalters do you understand that?

@giuseppe

This comment has been minimized.

Show comment
Hide comment
@giuseppe

giuseppe Jun 29, 2017

Member

that is weird, it seems both tests passed. Anyway, I think that once you will r+ Homu will repeat the tests before merging.

The previous failure was related to the fact I was using --unshare-user instead of --unshare-user-try

Member

giuseppe commented Jun 29, 2017

that is weird, it seems both tests passed. Anyway, I think that once you will r+ Homu will repeat the tests before merging.

The previous failure was related to the fact I was using --unshare-user instead of --unshare-user-try

@alexlarsson

This comment has been minimized.

Show comment
Hide comment
Member

alexlarsson commented Jun 29, 2017

@rh-atomic-bot

This comment has been minimized.

Show comment
Hide comment
@rh-atomic-bot

rh-atomic-bot Jun 29, 2017

⌛️ Testing commit 2c03392 with merge 215aa3e...

rh-atomic-bot commented Jun 29, 2017

⌛️ Testing commit 2c03392 with merge 215aa3e...

rh-atomic-bot added a commit that referenced this pull request Jun 29, 2017

bubblewrap: add option --userns-block-fd
It allows to configure the user namespace from outside.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

Closes: #101
Approved by: alexlarsson

rh-atomic-bot added a commit that referenced this pull request Jun 29, 2017

demos: add demo userns-block-fd.py
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

Closes: #101
Approved by: alexlarsson

rh-atomic-bot added a commit that referenced this pull request Jun 29, 2017

bubblewrap.c: fix typo
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

Closes: #101
Approved by: alexlarsson

rh-atomic-bot added a commit that referenced this pull request Jun 29, 2017

bubblewrap: do not always leave caps in the unprivileged case
When --unshare-user is used in the unprivileged case, all caps are
left to the sandboxed application.  Change it to leave only the
specified ones.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

Closes: #101
Approved by: alexlarsson

rh-atomic-bot added a commit that referenced this pull request Jun 29, 2017

tests: add tests for --cap-add
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

Closes: #101
Approved by: alexlarsson
@rh-atomic-bot

This comment has been minimized.

Show comment
Hide comment
@rh-atomic-bot

rh-atomic-bot Jun 29, 2017

☀️ Test successful - status-redhatci
Approved by: alexlarsson
Pushing 215aa3e to master...

rh-atomic-bot commented Jun 29, 2017

☀️ Test successful - status-redhatci
Approved by: alexlarsson
Pushing 215aa3e to master...

@cgwalters

This comment has been minimized.

Show comment
Hide comment
@cgwalters
Member

cgwalters commented Jun 30, 2017

Possible regression: projectatomic/rpm-ostree#855

@@ -450,8 +457,13 @@ do_init (int event_fd, pid_t initial_pid, struct sock_fprog *seccomp_prog)
return initial_exit_status;
}
#define CAP_TO_MASK_0(x) (1L << ((x) & 31))
#define CAP_TO_MASK_1(x) CAP_TO_MASK_0(x - 32)

This comment has been minimized.

@jlebon

jlebon Jun 30, 2017

Member

NIt: this looks like a dupe.

@jlebon

jlebon Jun 30, 2017

Member

NIt: this looks like a dupe.

{
if (!is_privileged)

This comment has been minimized.

@jlebon

jlebon Jun 30, 2017

Member

I haven't tested it, though I suspect that this will cause https://pagure.io/releng/issue/6550 to regress; systemd-nspawn blocks capset(), but with this check removed, we'll always be calling it. (Note that #122 originally worked around this by only setting caps in the setuid case).

@jlebon

jlebon Jun 30, 2017

Member

I haven't tested it, though I suspect that this will cause https://pagure.io/releng/issue/6550 to regress; systemd-nspawn blocks capset(), but with this check removed, we'll always be calling it. (Note that #122 originally worked around this by only setting caps in the setuid case).

This comment has been minimized.

@jlebon

jlebon Jun 30, 2017

Member

This is also what's causing projectatomic/rpm-ostree#855; we're now dropping CAP_DAC_OVERRIDE, so when rpm-ostree runs %pre scripts that do e.g. groupadd, they get EPERM on /etc/gshadow.

@jlebon

jlebon Jun 30, 2017

Member

This is also what's causing projectatomic/rpm-ostree#855; we're now dropping CAP_DAC_OVERRIDE, so when rpm-ostree runs %pre scripts that do e.g. groupadd, they get EPERM on /etc/gshadow.

@jlebon

This comment has been minimized.

Show comment
Hide comment
@jlebon

jlebon Jun 30, 2017

Member

Given that this is a sensitive area, I'd rather let someone else address these issues. I'm not comfortable enough in the security semantics involved to confidently suggest a patch.

Member

jlebon commented Jun 30, 2017

Given that this is a sensitive area, I'd rather let someone else address these issues. I'm not comfortable enough in the security semantics involved to confidently suggest a patch.

@cgwalters

This comment has been minimized.

Show comment
Hide comment
@cgwalters

cgwalters Jun 30, 2017

Member

This PR discussion is huge; let's track this issue as #197

Member

cgwalters commented Jun 30, 2017

This PR discussion is huge; let's track this issue as #197

cgwalters added a commit to cgwalters/bubblewrap that referenced this pull request Aug 8, 2017

Retain all caps when invoked by uid 0
In <projectatomic#101>, specifically
commit cde7fab we started dropping
all capabilities, even if the caller was privileged.

This broke rpm-ostree, which runs RPM scripts using bwrap, and some
of those scripts depend on capabilities (mostly `CAP_DAC_OVERRIDE`).

Fix this by retaining capabilities by default if the caller's uid is zero.

I considered having the logic be to simply retain any capabilities the invoking
process has (imagine filecaps binaries like `ping` or
`/usr/bin/gnome-keyring-daemon` using bwrap) but we currently explicitly abort
in that scenario to catch broken packages which used file capabilites for bwrap
itself (we switched to suid). For now this works, and if down the line there's a
real-world use case for capability-bearing non-zero-uid processes to invoke
bwrap *and* retain those privileges, we can revisit.

Closes: projectatomic#197

cgwalters added a commit to cgwalters/bubblewrap that referenced this pull request Aug 9, 2017

Retain all caps when invoked by uid 0, work around systemd seccomp fi…
…lter

In <projectatomic#101>, specifically
commit cde7fab we started dropping
all capabilities, even if the caller was privileged.

This broke rpm-ostree, which runs RPM scripts using bwrap, and some
of those scripts depend on capabilities (mostly `CAP_DAC_OVERRIDE`).

Fix this by retaining capabilities by default if the caller's uid is zero.

I considered having the logic be to simply retain any capabilities the invoking
process has (imagine filecaps binaries like `ping` or
`/usr/bin/gnome-keyring-daemon` using bwrap) but we currently explicitly abort
in that scenario to catch broken packages which used file capabilites for bwrap
itself (we switched to suid). For now this works, and if down the line there's a
real-world use case for capability-bearing non-zero-uid processes to invoke
bwrap *and* retain those privileges, we can revisit.

Another twist here is that we need to do some gymnastics to first avoid calling
`capset()` if we don't need to, as that can fail due to systemd installing a
seccomp filter that denies it (for dubious reasons).  Then we also need to ignore
`EPERM` when dropping caps in the init process.  (I considered unilaterally
handling `EPERM`, but it seems nicer to avoid calling `capset()` unless we need to)

Closes: projectatomic#197

rh-atomic-bot added a commit that referenced this pull request Aug 14, 2017

Retain all caps when invoked by uid 0, work around systemd seccomp fi…
…lter

In <#101>, specifically
commit cde7fab we started dropping
all capabilities, even if the caller was privileged.

This broke rpm-ostree, which runs RPM scripts using bwrap, and some
of those scripts depend on capabilities (mostly `CAP_DAC_OVERRIDE`).

Fix this by retaining capabilities by default if the caller's uid is zero.

I considered having the logic be to simply retain any capabilities the invoking
process has (imagine filecaps binaries like `ping` or
`/usr/bin/gnome-keyring-daemon` using bwrap) but we currently explicitly abort
in that scenario to catch broken packages which used file capabilites for bwrap
itself (we switched to suid). For now this works, and if down the line there's a
real-world use case for capability-bearing non-zero-uid processes to invoke
bwrap *and* retain those privileges, we can revisit.

Another twist here is that we need to do some gymnastics to first avoid calling
`capset()` if we don't need to, as that can fail due to systemd installing a
seccomp filter that denies it (for dubious reasons).  Then we also need to ignore
`EPERM` when dropping caps in the init process.  (I considered unilaterally
handling `EPERM`, but it seems nicer to avoid calling `capset()` unless we need to)

Closes: #197

Closes: #205
Approved by: alexlarsson

miabbott added a commit to miabbott/sig-atomic-buildscripts that referenced this pull request Aug 14, 2017

overlay: Thaw bubblewrap
In projectatomic/bubblewrap#205, the issue that we were seeing with
`bubblewrap` (projectatomic/bubblewrap#101) was fixed.  As such, we
can thaw out `bubblewrap` and continue to track git master.

cgwalters added a commit to CentOS/sig-atomic-buildscripts that referenced this pull request Aug 14, 2017

overlay: Thaw bubblewrap (#289)
In projectatomic/bubblewrap#205, the issue that we were seeing with
`bubblewrap` (projectatomic/bubblewrap#101) was fixed.  As such, we
can thaw out `bubblewrap` and continue to track git master.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment