Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't map ourselves to root in the user namespace. #3609

Merged
merged 3 commits into from
Apr 10, 2022

Conversation

zenhack
Copy link
Collaborator

@zenhack zenhack commented Feb 13, 2022

Take 2 for fixing #3584; this follows @kentonv's suggestion (2). As it turns out, we were already creating a userns in the right spot, but it doesn't take effect until after a fork(), so we have to do some footwork to make sure the rest of the logic happens in a child process.

Caveat: a few tests still fail on my machine, but for reasons that don't seem to be related: things like:

 ✖ tests/system-api

   - Test system api (18.338s)
   Timed out while waiting for element <.grainlog-contents > pre> to be present for 5000 milliseconds.  - expected "visible" but got: "not found"
       at Object.module.exports.Test system api (/home/isd/src/foreign/sandstorm/tests/tests/system-api.js:38:6)
       at processTicksAndRejections (internal/process/task_queues.js:77:11)

Despite the fact that, when I run this with SHOW_BROWSER=true, the thing it's querying for is there. I'm also seeing these failures on my other branch, which I wasn't before. Before I fuss with this more I want to see if it passes in CI; marking this as a draft for now, but if I don't see those failures in CI I'll mark it as ready for review again.

Also, we do not appear to have tests that actually check that updates work correctly (we should add those), but I did check this manually. With only the first hunk (not mapping to uid 0), updates are indeed broken, but with the full patch they work again, as does launching apps afterwards.

@zenhack zenhack marked this pull request as draft February 13, 2022 06:42
@zenhack
Copy link
Collaborator Author

zenhack commented Feb 13, 2022

CI looks ok; marking this as ready for review.

@zenhack zenhack marked this pull request as ready for review February 13, 2022 07:20
@ocdtrekkie ocdtrekkie linked an issue Feb 13, 2022 that may be closed by this pull request
@ocdtrekkie ocdtrekkie added bug ready-for-review We think this is ready for review sandstorm-dev Issues hacking on Sandstorm labels Feb 13, 2022
@@ -1432,6 +1426,25 @@ private:
tmpfsUidOpts = ",uid=0,gid=0";
} else {
unshareUidNamespaceOnce();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this normally won't create a user namespace because we already unshared the user namespace in the parent process -- hence the "once".

What we probably need to do here is change unshareUidNamespaceOnce() to unshare every time if getuid() returns non-zero.

There's a further complexity with regard to mounting /proc. In the logic below, we either mount proc if we're in a PID namespace, or we bind-mount the original proc if not. However, if we're in a PID namespace, it is owned by the parent userns, on which we've already lost capabilities. So this doesn't work anymore. I guess we may have to nest the PID namespaces as well, so we can create a new one here so that it's mountable? Ugh.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what ends up happening is:

  • sandstorm start does not create a userns here, as you suggest.
  • sandstorm continue does create a userns, since there is no prior call to this in that codepath. This is where it counts, after the exec.

I don't know why the fork seems to be necessary.

This raises another concern: If we're spawning a new userns on each exec, is it possible this is going to grow without bound, adding another level of nesting after each update? This probably wouldn't grow that quickly, but I could forsee some future point in time where we hit a limit and everybody's server crashes...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unsharedUidNamespace flag is actually passed through sandstorm continue via the --userns flag, so in fact a user NS is never created here during either sandstorm start or sandstorm continue. (I think the reason this line exists is for the call site from sandstorm mongo.)

My guess is that this change works for the initial startup because in that case there's no exec() where the capabilities are dropped, but it probably won't work after the first update.

I think we probably need to create both a nested userns and a nested pidns. And I think to do that without creating yet another process in the hierarchy, we might want to use clone() instead of exec() to create the server monitor, so that we can pass CLONE_NEWUSER and CLONE_NEWPID at that time, without affecting the state of the parent process.

@@ -1432,6 +1426,25 @@ private:
tmpfsUidOpts = ",uid=0,gid=0";
} else {
unshareUidNamespaceOnce();

// In order to actually enter the new user namespace, we
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to fork to enter a userns. You do need to fork to enter a PID namespace.

@kentonv
Copy link
Member

kentonv commented Feb 13, 2022

This doesn't look to me like it should work... as noted, I don't think we are creating a userns there, and I don't see why the added fork() would change anything in any case. Are you sure you tested it on an affected kernel?

@zenhack
Copy link
Collaborator Author

zenhack commented Feb 14, 2022

To make sure we are on the same page about the behavior I am observing:

  • With the full patch:
    • if I do ./tests/tmp-sandstorm/sandstorm start, poke around, a bit, everything appears to work.
    • After then doing ./tests/tmp-sandstorm/sandstorm update sandstorm-0-fast.tar.xz, it continues to work as expected.
  • With only the first hunk, i.e. no fork(), everything works on start, but then update breaks it; sandstorm fails to come up after an update, with this in the log:
** Shutting down due to signal
Not killing Dev daemon because it is not running.
** Restarting
WARNING: Auto-updates are disabled by config.
*** Uncaught exception ***
sandstorm/run-bundle.c++:1453: failed: unshare(CLONE_NEWNS): Operation not permitted
stack: 623b78 61df67 61b957 63a3f6 63a2ac 495ea7 495a84
** Server monitor died. Aborting.
  • Without any change, grains do not start, even before an update, as in the tests on master.

You have talked me into being confused by these observations, but empirically the fork() does seem to fix the problem. I don't know why, and we should obviously not merge this until we do.

@zenhack
Copy link
Collaborator Author

zenhack commented Feb 14, 2022

Hm, I noticed:

> ./tests/tmp-sandstorm/sandstorm stop
Waiting for PID 277346 to terminate...
Did not terminate after 10 seconds; killing...
Sandstorm server stopped.

If, using the apparently-working patched version I do a sandstorm update to the version on master, it appears to still work after the update, but if I then stop & restart sandstorm as above, then it stops working. So I think what's happening is that the fork is somehow preventing sandstorm from entirely restarting at all.

@zenhack
Copy link
Collaborator Author

zenhack commented Feb 15, 2022

I took a stab at doing this with clone() and a pid namespace. Per the commit message, it's still not working; for some reason waitpid() fails during shutdown:

** Shutting down due to signal
*** Uncaught exception ***
sandstorm/run-bundle.c++:2042: failed: waitpid(sandstormPid, &status, 0): No child processes
stack: 61be6b 619fc5 61977c 495ea7 495a84

I'm out of steam for the night though, so marking this as a draft and I'll debug later.

@zenhack zenhack marked this pull request as draft February 15, 2022 01:18
@zenhack
Copy link
Collaborator Author

zenhack commented Feb 20, 2022

I spent more time trying to debug this, and am stumped. Things I have observed:

  • If we just wait() instead of waitpid(), we still get ECHILD, so it's not that we're specifying the wrong pid or something. According to wait(2), this should only happen if the calling process really has no children that haven't been waited for.
  • However I have confirmed (by inspecting the process hierarchy manually) that the server monitor definitely is a child of the main process.
  • If we leave out the wait/waitpid entirely, we do get a zombie process.

I'm not sure what to try next.

@ocdtrekkie ocdtrekkie removed the ready-for-review We think this is ready for review label Feb 23, 2022
@zenhack zenhack mentioned this pull request Mar 13, 2022
@kentonv
Copy link
Member

kentonv commented Mar 13, 2022

I think you're hitting this, from the man page:

   The child termination signal
       When  the child process terminates, a signal may be sent to the parent.  The termina‐
       tion signal is specified in the low byte of flags (clone()) or in cl_args.exit_signal
       (clone3()).   If  this  signal  is specified as anything other than SIGCHLD, then the
       parent process must specify the __WALL or __WCLONE options when waiting for the child
       with wait(2).  If no signal (i.e., zero) is specified, then the parent process is not
       signaled when the child terminates.

To fix this you need to OR SIGCHLD into the flags.

Meanwhile, though, you probably shouldn't use the glibc wrapper here. The glibc wrapper is really intended for thread use cases, and it doesn't allow you to use clone() as a drop-in replacement for fork(). If you invoke the raw syscall, you can do this (so you won't need to allocate a stack).

syscall(SYS_clone, CLONE_NEWUSER | CLONE_NEWPID | SIGCHLD, nullptr, nullptr, nullptr, nullptr);

Doing this is the source of sandstorm-io#3584.

Instead, make sure we've fully entered a new user namespace before
we have to do anything that would require the capabilities that are
dropped on exec(). We also need to be in a new pid namespace, since
we try to mount /proc so it needs to be a procfs that we own. We use
clone() instead of unshare() for this so we don't have to disturb
the process hierarchy.
@zenhack
Copy link
Collaborator Author

zenhack commented Mar 13, 2022

Aha! Ok, the latest version fixes the problem by adding SIGCHLD, and also invokes the raw system call as you suggest; marking as ready for review.

@zenhack zenhack marked this pull request as ready for review March 13, 2022 22:50
@zenhack zenhack requested a review from kentonv March 13, 2022 22:50
@ocdtrekkie ocdtrekkie added the ready-for-review We think this is ready for review label Mar 13, 2022
@zenhack
Copy link
Collaborator Author

zenhack commented Apr 3, 2022

Hm, it looks like CI is hanging -- it's passing locally...

@kentonv
Copy link
Member

kentonv commented Apr 10, 2022

Looks like that worked!

@kentonv kentonv merged commit 7968a62 into sandstorm-io:master Apr 10, 2022
@zenhack zenhack deleted the userns-dont-map-root branch April 10, 2022 21:09
@ocdtrekkie ocdtrekkie removed the ready-for-review We think this is ready for review label Apr 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug sandstorm-dev Issues hacking on Sandstorm
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Permissions error causing test failures
3 participants