Conversation
ad41b72 to
fbac2a4
Compare
|
Allowing a fresh mount of sysfs is a potential security leak. Sysadmins and containment system generally cover parts of /sys in order to not allow the unprivileged user access to them. Mounting a new instance will override this. Can't we use a bind mount here instead? |
bubblewrap.c
Outdated
| data[1].effective = REQUIRED_CAPS_1; | ||
| data[1].permitted = REQUIRED_CAPS_1; | ||
| data[1].inheritable = 0; | ||
| data[0].effective = caps0; |
There was a problem hiding this comment.
Don't you need to set permitted?
bubblewrap.c
Outdated
| /* Drop root uid, but retain the required permitted caps */ | ||
| if (setuid (getuid ()) < 0) | ||
| die_with_error ("unable to drop privs"); | ||
| did_setuid = TRUE; |
There was a problem hiding this comment.
This approach means the file-caps based version of bwrap doesn't work. Although honestly, thats kinda usless vs setuid as it needs cap_sysadmin anyway.
bubblewrap.c
Outdated
| static uint32_t requested_caps[2] = {BASE_CAPS0, BASE_CAPS1}; | ||
|
|
||
| /* low 32bit caps needed */ | ||
| #define REQUIRED_SETUP_CAPS_0 (BASE_CAPS0 | CAP_TO_MASK (CAP_NET_ADMIN) | CAP_TO_MASK (CAP_SYS_ADMIN)) |
There was a problem hiding this comment.
You dropped CHROOT, SETUID and SETGID from REQUIRED_CAPS. These are needed for chroot() and user namespaces to work in the setup.
There was a problem hiding this comment.
they are coming from BASE_CAPS0
bubblewrap.c
Outdated
| | CAP_TO_MASK (CAP_DAC_OVERRIDE) | CAP_TO_MASK(CAP_SETFCAP) | CAP_TO_MASK(CAP_SETPCAP) \ | ||
| | CAP_TO_MASK (CAP_SETGID) | CAP_TO_MASK (CAP_SETUID) | CAP_TO_MASK (CAP_MKNOD) | CAP_TO_MASK (CAP_CHOWN) \ | ||
| | CAP_TO_MASK (CAP_FOWNER) | CAP_TO_MASK (CAP_FSETID) | CAP_TO_MASK (CAP_KILL) \ | ||
| | CAP_TO_MASK (CAP_SYS_CHROOT)) |
There was a problem hiding this comment.
Adding CAP_DAC_OVERRIDE here is vastly increasing the risk of having bubblewrap setuid. The whole reason we're doing this dance where we're only saving the caps we need for setting up the sandbox but running as the final user is that we can then be reasonably sure that the things that end up in the final sandbox is what the original user can access.
If the setup is run with DAC_OVERRIDE, then any source path you specifiy will be read with root privileges, which will let regular users access root-only files.
|
|
||
| if (prctl (PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0) < 0) | ||
| die ("could not set no new privs"); | ||
| } |
There was a problem hiding this comment.
Why this? We already do this in main()
bubblewrap.c
Outdated
| if (opt_unshare_user) | ||
| { | ||
| caps0 |= CAP_TO_MASK (CAP_CHOWN) | CAP_TO_MASK(CAP_SETUID) | CAP_TO_MASK (CAP_SETGID) | CAP_TO_MASK (CAP_FOWNER) | CAP_TO_MASK (CAP_DAC_OVERRIDE); | ||
| } |
There was a problem hiding this comment.
I don't understand why you want the sandbox to have these capabilities by default just because some namespace is enabled. For instance, DAC_OVERRIDE means that a file that is read-only to you is still readable, no? That seems very unexpected.
Overall, this seems quite risky.
There was a problem hiding this comment.
thanks for the rewiew. Yes, this patch series is risky (and some parts too hacky), so I wanted to share my work in progress so to get comments before I spend more time on it.
IIUIC, even with DAC_OVERRIDE, the container should not be able to access files owned by users that have no mapping in the container user namespace. The only users that get a mapping in the container namespace are the uid of the user using bwrap and the range 10000-10999.
There was a problem hiding this comment.
Are you sure about that? Anyway, you still automatically grant the process properties that it wouldn't have if it were run with that capabilities. For instance if the user is "1000" it can now write to a file owned by user 1000 which is not marked with "u+w" permissions.
bubblewrap.c
Outdated
| } | ||
| line3 = xasprintf ("%d %d 1\n", sandbox_id, parent_id); | ||
| return strconcat4 (line1, line2, line3, line4); | ||
| } |
There was a problem hiding this comment.
This breaks any use of non-privileged user namespaces use, because that only allows a single mapping.
|
Overall, this patch series seems very risky to me. It would need very thorough review to make sure it doesn't make it possible to gain increased privileges for non-root users. |
bubblewrap.c
Outdated
|
|
||
| if (is_privileged) | ||
| { | ||
| uint32_t caps0 = REQUIRED_SETUP_CAPS_0 | requested_caps[0]; |
There was a problem hiding this comment.
Isn't this a huge security hole if bubblewrap is installed setuid? I think we should be verifying whether getuid() == 0 or so.
There was a problem hiding this comment.
is_privileged means "we're not using user namespaces" in the code, not that we're initially launched by root.
The reason its doing this check instead of comparing effective uid is to also support filesystem caps instead of setuid.
4509cbe to
6c61aa5
Compare
640344a to
6ff4c83
Compare
|
You could probably break out the "--no-reaper" as a separate PR, it looks safe to merge now, separately from higher risk changes. |
7b801f2 to
24f7b11
Compare
|
I have done some simplifications:
I had a discussion with Eric Biederman about enabling setgroups, he said that it is safe to enable setgroups when there is a mapping for the user in I've also dropped CAP_AUDIT_*, as systemd will detect when these caps are not available and not error out. |
fa03fe2 to
5b314da
Compare
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
5b314da to
2c03392
Compare
|
This looks good to go to me. However, its waiting on some test, but I don't really understand which one. @cgwalters do you understand that? |
|
that is weird, it seems both tests passed. Anyway, I think that once you will The previous failure was related to the fact I was using |
It allows to configure the user namespace from outside. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #101 Approved by: alexlarsson
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #101 Approved by: alexlarsson
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #101 Approved by: alexlarsson
When --unshare-user is used in the unprivileged case, all caps are left to the sandboxed application. Change it to leave only the specified ones. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #101 Approved by: alexlarsson
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Closes: #101 Approved by: alexlarsson
|
☀️ Test successful - status-redhatci |
|
Possible regression: coreos/rpm-ostree#855 |
| } | ||
|
|
||
| #define CAP_TO_MASK_0(x) (1L << ((x) & 31)) | ||
| #define CAP_TO_MASK_1(x) CAP_TO_MASK_0(x - 32) |
There was a problem hiding this comment.
NIt: this looks like a dupe.
| drop_privs (void) | ||
| drop_privs (bool keep_requested_caps) | ||
| { | ||
| if (!is_privileged) |
There was a problem hiding this comment.
I haven't tested it, though I suspect that this will cause https://pagure.io/releng/issue/6550 to regress; systemd-nspawn blocks capset(), but with this check removed, we'll always be calling it. (Note that #122 originally worked around this by only setting caps in the setuid case).
There was a problem hiding this comment.
This is also what's causing coreos/rpm-ostree#855; we're now dropping CAP_DAC_OVERRIDE, so when rpm-ostree runs %pre scripts that do e.g. groupadd, they get EPERM on /etc/gshadow.
|
Given that this is a sensitive area, I'd rather let someone else address these issues. I'm not comfortable enough in the security semantics involved to confidently suggest a patch. |
|
This PR discussion is huge; let's track this issue as #197 |
In <containers#101>, specifically commit cde7fab we started dropping all capabilities, even if the caller was privileged. This broke rpm-ostree, which runs RPM scripts using bwrap, and some of those scripts depend on capabilities (mostly `CAP_DAC_OVERRIDE`). Fix this by retaining capabilities by default if the caller's uid is zero. I considered having the logic be to simply retain any capabilities the invoking process has (imagine filecaps binaries like `ping` or `/usr/bin/gnome-keyring-daemon` using bwrap) but we currently explicitly abort in that scenario to catch broken packages which used file capabilites for bwrap itself (we switched to suid). For now this works, and if down the line there's a real-world use case for capability-bearing non-zero-uid processes to invoke bwrap *and* retain those privileges, we can revisit. Closes: containers#197
…lter In <containers#101>, specifically commit cde7fab we started dropping all capabilities, even if the caller was privileged. This broke rpm-ostree, which runs RPM scripts using bwrap, and some of those scripts depend on capabilities (mostly `CAP_DAC_OVERRIDE`). Fix this by retaining capabilities by default if the caller's uid is zero. I considered having the logic be to simply retain any capabilities the invoking process has (imagine filecaps binaries like `ping` or `/usr/bin/gnome-keyring-daemon` using bwrap) but we currently explicitly abort in that scenario to catch broken packages which used file capabilites for bwrap itself (we switched to suid). For now this works, and if down the line there's a real-world use case for capability-bearing non-zero-uid processes to invoke bwrap *and* retain those privileges, we can revisit. Another twist here is that we need to do some gymnastics to first avoid calling `capset()` if we don't need to, as that can fail due to systemd installing a seccomp filter that denies it (for dubious reasons). Then we also need to ignore `EPERM` when dropping caps in the init process. (I considered unilaterally handling `EPERM`, but it seems nicer to avoid calling `capset()` unless we need to) Closes: containers#197
…lter In <#101>, specifically commit cde7fab we started dropping all capabilities, even if the caller was privileged. This broke rpm-ostree, which runs RPM scripts using bwrap, and some of those scripts depend on capabilities (mostly `CAP_DAC_OVERRIDE`). Fix this by retaining capabilities by default if the caller's uid is zero. I considered having the logic be to simply retain any capabilities the invoking process has (imagine filecaps binaries like `ping` or `/usr/bin/gnome-keyring-daemon` using bwrap) but we currently explicitly abort in that scenario to catch broken packages which used file capabilites for bwrap itself (we switched to suid). For now this works, and if down the line there's a real-world use case for capability-bearing non-zero-uid processes to invoke bwrap *and* retain those privileges, we can revisit. Another twist here is that we need to do some gymnastics to first avoid calling `capset()` if we don't need to, as that can fail due to systemd installing a seccomp filter that denies it (for dubious reasons). Then we also need to ignore `EPERM` when dropping caps in the init process. (I considered unilaterally handling `EPERM`, but it seems nicer to avoid calling `capset()` unless we need to) Closes: #197 Closes: #205 Approved by: alexlarsson
In containers/bubblewrap#205, the issue that we were seeing with `bubblewrap` (containers/bubblewrap#101) was fixed. As such, we can thaw out `bubblewrap` and continue to track git master.
In containers/bubblewrap#205, the issue that we were seeing with `bubblewrap` (containers/bubblewrap#101) was fixed. As such, we can thaw out `bubblewrap` and continue to track git master.
a few patches that modify how bubblewrap works in privileged mode to allow execution of systemd:
https://asciinema.org/a/7gex3vdcumd1n72i3knstra2b