Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permissions error causing test failures #3584

Closed
zenhack opened this issue Nov 15, 2021 · 14 comments · Fixed by #3609
Closed

Permissions error causing test failures #3584

zenhack opened this issue Nov 15, 2021 · 14 comments · Fixed by #3609
Labels
bug sandstorm-dev Issues hacking on Sandstorm

Comments

@zenhack
Copy link
Collaborator

zenhack commented Nov 15, 2021

I finally spent a bit of time debugging the batch of new (~19) test failures we started seeing a month or two ago. For reference, I've attached a full log from make test TESTCASE=tests/backup-restore.js, which is one of the relevant tests; the many of the other tests fail with the same error.

The interesting bit is this, error, which comes when setting up the sandbox for grains during test:

kj/io.c++:365: failed: miniposix::write(fd, pos, size): Operation not permitted; fd = 3

The stack trace pointed to this call:

writeUserNSMap("uid", kj::str(fakeUid, " ", uid, " 1\n"));

The error results in grains failing to start, and therefore causing the overall test to fail. I haven't ever seen this happen on an actual install (even a dev one), only when running the test suite.

I suspect something changed in the kernel that is affecting this (nothing changed on our end), but I don't really understand why this fails, or exactly what the relevant difference is between test & prod.

I would be interested to hear from folks wrt:

  • Whether you can reproduce the error on your system.
  • What distro & kernel version you're running.

Probably the next step is to see if we can an find a system that still works, and see if we can get it to fail on a newer kernel, and if so bisect.

t1.log

@zenhack
Copy link
Collaborator Author

zenhack commented Nov 15, 2021

Wrt my own data point:

$ uname -r
5.14.16-arch1-1

(on current Archlinux)

@zenhack
Copy link
Collaborator Author

zenhack commented Nov 16, 2021

I tried this with the LTS kernel, and it still fails:

$ uname -r
5.10.79-1-lts

5.10 was tagged in December 2020, and I only started having problems a couple months ago, so it seems unlikely that it's as simple as a kernel regression. Drat.

@ocdtrekkie
Copy link
Collaborator

You know what comes up when I search for this? #249

zenhack added a commit to zenhack/sandstorm that referenced this issue Nov 17, 2021
...between the supervisor and the backup code. We should ideally pull
most of what's common out into a shared module, but I was looking at
this while debugging sandstorm-io#3584, and it seemed like a good start.
@zenhack
Copy link
Collaborator Author

zenhack commented Jan 17, 2022

Did some debugging, it appears that this fails when the value of realUid is zero, i.e. the root account in the parent namespace. I think this is happening because, when not started as root, we create a separate user namespace when entering the sandstorm chroot (enterChroot in run-bundle.c++), where we set ourselves to uid zero in that namespace. So I think this doesn't ever fail in prod because the user namespaces code path for enterChroot never gets taken, at least on "normal" configurations.

This raises a couple questions:

  • Is this code path only for the test suite's benefit, or are there scenarios where you would use this to run a "real" sandstorm box entirely as non-root?
  • If there are hypothetical such configurations, does anyone actually use those? If so, and they're not broken, then I have misunderstood what's going on here, but I suspect that there probably are not users of such configurations.
  • If that is the case, is it actually worth supporting this codepath at all? I'm not super keen on needing root to run the test suite, but I also really don't like that we currently just aren't doing any automated testing of any of the codepaths that require root, and it seems generally better to keep prod & test as close as possible. It's particularly annoying to be debugging test failures that don't happen in prod, just because I want to use the tests to for other things.

Thoughts?

cc: @kentonv

@kentonv
Copy link
Member

kentonv commented Jan 17, 2022

We shouldn't ever be setting UID 0 in a userns... I think we set UID 1000.

We do actually support installing Sandstorm without root privileges, in which case the userns sandbox will be used. It's not just for tests. I don't know how many people actually do this though.

@xet7
Copy link
Contributor

xet7 commented Jan 17, 2022

@kentonv

Would Sandstorm work at:

  • Chroot
  • Docker
  • Snap
  • Flatpak
  • AppImage ?

@zenhack
Copy link
Collaborator Author

zenhack commented Jan 17, 2022

@kentonv, the sequence of events I'm seeing is:

  1. On startup, sandstorm enters the chroot and maps itself to uid zero:
    // Set up the UID namespace. We map ourselves as UID zero because this allows capabilities
    // to be inherited through exec(), which we need to support update and restart. With any
    // other UID, capabilities can only be inherited through exec() if the target exec'd file
    // has its inheritable capabilities set filled. By default, the inheritable capability set
    // for all files is empty, and only the filesystem's superuser (i.e. not us) can change them.
    // But if our UID is zero, then the file's attributes are ignored and all capabilities are
    // inherited.
    writeSetgroupsIfPresent("deny\n");
    writeUserNSMap("uid", kj::str("0 ", uid, " 1\n"));
    writeUserNSMap("gid", kj::str("0 ", gid, " 1\n"));
  2. Then, on grain startup, we try to hide our real uid by mapping it to 1000 inside the grain's user namespace. But getuid() returns zero, so we end up writing "0 1000 1\n" to the file, which is failing the kernel's permission checks.

@kentonv
Copy link
Member

kentonv commented Jan 18, 2022

Ah, hmm. I forgot that that's how that worked.

But it has worked for a long time and it still appears to work on my Debian Bullseye machine which is kernel 5.10.

@zenhack
Copy link
Collaborator Author

zenhack commented Jan 23, 2022

Hm, I am actually seeing the same error in a bullseye VM.

You mentioned on IRC that you were seeing (seemingly unrelated, spurious) test failures though when putting together the release -- have you in fact seen those tests pass recently?

I've been able to reproduce this on arch on older kernels, so per above I agree it's unlikely this is a kernel regression -- there must be some other system component at issue, but so far I am still scratching my head as to what that could be.

@zenhack
Copy link
Collaborator Author

zenhack commented Jan 24, 2022

from capabilities(7):

       CAP_SETFCAP (since Linux 2.6.24)
              Set arbitrary capabilities on a file.

              Since  Linux  5.12,  this  capability is also needed to map user ID 0 in a new user name‐
              space; see user_namespaces(7) for details.

And tracing through the code I can confirm the check it's failing isn't there in 5.10.
So it looks like it is a kernel change? I don't know why it's failing on my bullseye VM then though, but it seems like this will hit debian too sooner or later if we don't deal with it.

user_namespaces(7) elaborates on the reasoning for this:

          This  rule has been in place since Linux 5.12.  It eliminates an earlier security bug whereby
          a UID 0 process that lacks the CAP_SETFCAP capability, which is needed  to  create  a  binary
          with  namespaced file capabilities (as described in capabilities(7)), could nevertheless cre‐
          ate such a binary, by the following steps:

          *  Create a new user namespace with the identity mapping (i.e., UID 0 in the new  user  name‐
             space  maps to UID 0 in the parent namespace), so that UID 0 in both namespaces is equiva‐
             lent to the same root user ID.

          *  Since the child process has the CAP_SETFCAP capability, it  could  create  a  binary  with
             namespaced  file  capabilities  that  would then be effective in the parent user namespace
             (because the root user IDs are the same in the two namespaces).

So we can't just ask the kernel devs to revert the change; we will have to work around it somehow. I'll have to think about how to do this.

@zenhack
Copy link
Collaborator Author

zenhack commented Jan 24, 2022

One possibility, which is more work than I'd like but I think would work:

Rather than re-execing ourselves, we could instead hand the fdBundle and pid file back to a parent process outside the user namespace and exit, after which the parent process would restart us. Then, we would not need to do the first remapping.

Will continue to mull this over, but would appreciate @kentonv's thoughts.

@zenhack
Copy link
Collaborator Author

zenhack commented Jan 24, 2022

There's actually no need to pass the bundle back, since the parent can just hold on to the bundle. I've got a version of the above fix locally that seems to get some tests to pass; I haven't pinned down if all of the other failures are unrelated. I'll open a PR soonish probably.

@ocdtrekkie ocdtrekkie added bug sandstorm-dev Issues hacking on Sandstorm labels Jan 24, 2022
zenhack added a commit to zenhack/sandstorm that referenced this issue Jan 25, 2022
Rather than having the update monitor itself exec the newly downloaded
version of sandstorm, it simply exits (with status 0), and its parent
process takes this as an indictation that it should re-spawn it (with
the new version). This avoids the need to map ourselves to UID 0 in the
user namespace, which avoids the EPERM error we were getting when
later hiding the real UID from a grain.

Per discussion on the issue, the old design no longer works as of Linux
5.12, after which spawning the grain we would need CAP_SETFCAP, which we
do not want to have.
@zenhack
Copy link
Collaborator Author

zenhack commented Jan 25, 2022

Opened a draft pr (#3603).

@ocdtrekkie ocdtrekkie added this to the Fix tests (2021) milestone Jan 27, 2022
@ocdtrekkie ocdtrekkie linked a pull request Jan 31, 2022 that will close this issue
3 tasks
@kentonv
Copy link
Member

kentonv commented Feb 12, 2022

What the hell? They just broke mapping UID 0 in a user namespace in kernel 5.12? And Linus didn't throw a tantrum?

It feels like there are several ways they could have fixed the bug without outright breaking the ability for a non-privileged user to create a userns and be UID 0 inside it...

zenhack added a commit to zenhack/sandstorm that referenced this issue Feb 13, 2022
Doing this is the source of sandstorm-io#3584, and contrary to the now-deleted
comment, it doesn't seem to be necessary -- both launching the
supervisor and taking backups seems to work as expected.
zenhack added a commit to zenhack/sandstorm that referenced this issue Feb 13, 2022
Doing this is the source of sandstorm-io#3584.

Instead, make sure we've fully entered a new user namespace before
we have to do anything that would require the capabilities that are
dropped on exec(). The only marginally tricky bit here is we have to
fork() for the unshare() to fully take effect.
@ocdtrekkie ocdtrekkie linked a pull request Feb 13, 2022 that will close this issue
zenhack added a commit to zenhack/sandstorm that referenced this issue Feb 15, 2022
Doing this is the source of sandstorm-io#3584.

Instead, make sure we've fully entered a new user namespace before
we have to do anything that would require the capabilities that are
dropped on exec(). We also need to be in a new pid namespace, since
we try to mount /proc so it needs to be a procfs that we own. We use
clone() instead of unshare() for this so we don't have to disturb
the process hierarchy.

Restart/update doesn't quite work yet; when these are invoked, the
server just dies with this in sandstorm.log:

** Shutting down due to signal
*** Uncaught exception ***
sandstorm/run-bundle.c++:2042: failed: waitpid(sandstormPid, &status, 0): No child processes
stack: 61be6b 619fc5 61977c 495ea7 495a84

I still need to debug why it can't seem to find the server monitor.
zenhack added a commit to zenhack/sandstorm that referenced this issue Feb 20, 2022
Doing this is the source of sandstorm-io#3584.

Instead, make sure we've fully entered a new user namespace before
we have to do anything that would require the capabilities that are
dropped on exec(). We also need to be in a new pid namespace, since
we try to mount /proc so it needs to be a procfs that we own. We use
clone() instead of unshare() for this so we don't have to disturb
the process hierarchy.

Restart/update doesn't quite work yet; when these are invoked, the
server just dies with this in sandstorm.log:

** Shutting down due to signal
*** Uncaught exception ***
sandstorm/run-bundle.c++:2042: failed: waitpid(sandstormPid, &status, 0): No child processes
stack: 61be6b 619fc5 61977c 495ea7 495a84

I still need to debug why it can't seem to find the server monitor.
zenhack added a commit to zenhack/sandstorm that referenced this issue Mar 13, 2022
Doing this is the source of sandstorm-io#3584.

Instead, make sure we've fully entered a new user namespace before
we have to do anything that would require the capabilities that are
dropped on exec(). We also need to be in a new pid namespace, since
we try to mount /proc so it needs to be a procfs that we own. We use
clone() instead of unshare() for this so we don't have to disturb
the process hierarchy.
kentonv pushed a commit that referenced this issue Apr 10, 2022
Doing this is the source of #3584.

Instead, make sure we've fully entered a new user namespace before
we have to do anything that would require the capabilities that are
dropped on exec(). We also need to be in a new pid namespace, since
we try to mount /proc so it needs to be a procfs that we own. We use
clone() instead of unshare() for this so we don't have to disturb
the process hierarchy.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug sandstorm-dev Issues hacking on Sandstorm
Projects
None yet
4 participants