-
-
Notifications
You must be signed in to change notification settings - Fork 709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't map ourselves to root in the user namespace. #3609
Conversation
CI looks ok; marking this as ready for review. |
src/sandstorm/run-bundle.c++
Outdated
@@ -1432,6 +1426,25 @@ private: | |||
tmpfsUidOpts = ",uid=0,gid=0"; | |||
} else { | |||
unshareUidNamespaceOnce(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this normally won't create a user namespace because we already unshared the user namespace in the parent process -- hence the "once".
What we probably need to do here is change unshareUidNamespaceOnce()
to unshare every time if getuid()
returns non-zero.
There's a further complexity with regard to mounting /proc
. In the logic below, we either mount proc if we're in a PID namespace, or we bind-mount the original proc if not. However, if we're in a PID namespace, it is owned by the parent userns, on which we've already lost capabilities. So this doesn't work anymore. I guess we may have to nest the PID namespaces as well, so we can create a new one here so that it's mountable? Ugh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what ends up happening is:
sandstorm start
does not create a userns here, as you suggest.sandstorm continue
does create a userns, since there is no prior call to this in that codepath. This is where it counts, after the exec.
I don't know why the fork seems to be necessary.
This raises another concern: If we're spawning a new userns on each exec, is it possible this is going to grow without bound, adding another level of nesting after each update? This probably wouldn't grow that quickly, but I could forsee some future point in time where we hit a limit and everybody's server crashes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unsharedUidNamespace
flag is actually passed through sandstorm continue
via the --userns
flag, so in fact a user NS is never created here during either sandstorm start
or sandstorm continue
. (I think the reason this line exists is for the call site from sandstorm mongo
.)
My guess is that this change works for the initial startup because in that case there's no exec()
where the capabilities are dropped, but it probably won't work after the first update.
I think we probably need to create both a nested userns and a nested pidns. And I think to do that without creating yet another process in the hierarchy, we might want to use clone()
instead of exec()
to create the server monitor, so that we can pass CLONE_NEWUSER
and CLONE_NEWPID
at that time, without affecting the state of the parent process.
src/sandstorm/run-bundle.c++
Outdated
@@ -1432,6 +1426,25 @@ private: | |||
tmpfsUidOpts = ",uid=0,gid=0"; | |||
} else { | |||
unshareUidNamespaceOnce(); | |||
|
|||
// In order to actually enter the new user namespace, we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to fork to enter a userns. You do need to fork to enter a PID namespace.
This doesn't look to me like it should work... as noted, I don't think we are creating a userns there, and I don't see why the added fork() would change anything in any case. Are you sure you tested it on an affected kernel? |
To make sure we are on the same page about the behavior I am observing:
You have talked me into being confused by these observations, but empirically the fork() does seem to fix the problem. I don't know why, and we should obviously not merge this until we do. |
Hm, I noticed:
If, using the apparently-working patched version I do a sandstorm update to the version on master, it appears to still work after the update, but if I then stop & restart sandstorm as above, then it stops working. So I think what's happening is that the fork is somehow preventing sandstorm from entirely restarting at all. |
91b8b3f
to
342f266
Compare
I took a stab at doing this with clone() and a pid namespace. Per the commit message, it's still not working; for some reason waitpid() fails during shutdown:
I'm out of steam for the night though, so marking this as a draft and I'll debug later. |
342f266
to
4c03091
Compare
I spent more time trying to debug this, and am stumped. Things I have observed:
I'm not sure what to try next. |
I think you're hitting this, from the man page:
To fix this you need to OR Meanwhile, though, you probably shouldn't use the glibc wrapper here. The glibc wrapper is really intended for thread use cases, and it doesn't allow you to use clone() as a drop-in replacement for fork(). If you invoke the raw syscall, you can do this (so you won't need to allocate a stack).
|
Doing this is the source of sandstorm-io#3584. Instead, make sure we've fully entered a new user namespace before we have to do anything that would require the capabilities that are dropped on exec(). We also need to be in a new pid namespace, since we try to mount /proc so it needs to be a procfs that we own. We use clone() instead of unshare() for this so we don't have to disturb the process hierarchy.
4c03091
to
2bb450f
Compare
Aha! Ok, the latest version fixes the problem by adding |
Hm, it looks like CI is hanging -- it's passing locally... |
Looks like that worked! |
Take 2 for fixing #3584; this follows @kentonv's suggestion (2). As it turns out, we were already creating a userns in the right spot, but it doesn't take effect until after a fork(), so we have to do some footwork to make sure the rest of the logic happens in a child process.
Caveat: a few tests still fail on my machine, but for reasons that don't seem to be related: things like:
Despite the fact that, when I run this with
SHOW_BROWSER=true
, the thing it's querying for is there. I'm also seeing these failures on my other branch, which I wasn't before. Before I fuss with this more I want to see if it passes in CI; marking this as a draft for now, but if I don't see those failures in CI I'll mark it as ready for review again.Also, we do not appear to have tests that actually check that updates work correctly (we should add those), but I did check this manually. With only the first hunk (not mapping to uid 0), updates are indeed broken, but with the full patch they work again, as does launching apps afterwards.