Permissions error causing test failures #3584

zenhack · 2021-11-15T00:25:42Z

I finally spent a bit of time debugging the batch of new (~19) test failures we started seeing a month or two ago. For reference, I've attached a full log from make test TESTCASE=tests/backup-restore.js, which is one of the relevant tests; the many of the other tests fail with the same error.

The interesting bit is this, error, which comes when setting up the sandbox for grains during test:

kj/io.c++:365: failed: miniposix::write(fd, pos, size): Operation not permitted; fd = 3

The stack trace pointed to this call:

sandstorm/src/sandstorm/supervisor.c++

Line 936 in 0d53a66

writeUserNSMap("uid", kj::str(fakeUid, " ", uid, " 1\n"));

The error results in grains failing to start, and therefore causing the overall test to fail. I haven't ever seen this happen on an actual install (even a dev one), only when running the test suite.

I suspect something changed in the kernel that is affecting this (nothing changed on our end), but I don't really understand why this fails, or exactly what the relevant difference is between test & prod.

I would be interested to hear from folks wrt:

Whether you can reproduce the error on your system.
What distro & kernel version you're running.

Probably the next step is to see if we can an find a system that still works, and see if we can get it to fail on a newer kernel, and if so bisect.

t1.log

The text was updated successfully, but these errors were encountered:

zenhack · 2021-11-15T00:28:02Z

Wrt my own data point:

$ uname -r
5.14.16-arch1-1

(on current Archlinux)

zenhack · 2021-11-16T02:18:54Z

I tried this with the LTS kernel, and it still fails:

$ uname -r
5.10.79-1-lts

5.10 was tagged in December 2020, and I only started having problems a couple months ago, so it seems unlikely that it's as simple as a kernel regression. Drat.

ocdtrekkie · 2021-11-17T22:38:08Z

You know what comes up when I search for this? #249

...between the supervisor and the backup code. We should ideally pull most of what's common out into a shared module, but I was looking at this while debugging sandstorm-io#3584, and it seemed like a good start.

zenhack · 2022-01-17T06:54:19Z

Did some debugging, it appears that this fails when the value of realUid is zero, i.e. the root account in the parent namespace. I think this is happening because, when not started as root, we create a separate user namespace when entering the sandstorm chroot (enterChroot in run-bundle.c++), where we set ourselves to uid zero in that namespace. So I think this doesn't ever fail in prod because the user namespaces code path for enterChroot never gets taken, at least on "normal" configurations.

This raises a couple questions:

Is this code path only for the test suite's benefit, or are there scenarios where you would use this to run a "real" sandstorm box entirely as non-root?
If there are hypothetical such configurations, does anyone actually use those? If so, and they're not broken, then I have misunderstood what's going on here, but I suspect that there probably are not users of such configurations.
If that is the case, is it actually worth supporting this codepath at all? I'm not super keen on needing root to run the test suite, but I also really don't like that we currently just aren't doing any automated testing of any of the codepaths that require root, and it seems generally better to keep prod & test as close as possible. It's particularly annoying to be debugging test failures that don't happen in prod, just because I want to use the tests to for other things.

Thoughts?

cc: @kentonv

kentonv · 2022-01-17T15:56:49Z

We shouldn't ever be setting UID 0 in a userns... I think we set UID 1000.

We do actually support installing Sandstorm without root privileges, in which case the userns sandbox will be used. It's not just for tests. I don't know how many people actually do this though.

xet7 · 2022-01-17T16:24:56Z

@kentonv

Would Sandstorm work at:

Chroot
Docker
Snap
Flatpak
AppImage ?

zenhack · 2022-01-17T19:05:22Z

@kentonv, the sequence of events I'm seeing is:

On startup, sandstorm enters the chroot and maps itself to uid zero:

sandstorm/src/sandstorm/run-bundle.c++

Lines 1406 to 1415 in 4849adc

    
           // Set up the UID namespace. We map ourselves as UID zero because this allows capabilities 
        
           // to be inherited through exec(), which we need to support update and restart. With any 
        
           // other UID, capabilities can only be inherited through exec() if the target exec'd file 
        
           // has its inheritable capabilities set filled. By default, the inheritable capability set 
        
           // for all files is empty, and only the filesystem's superuser (i.e. not us) can change them. 
        
           // But if our UID is zero, then the file's attributes are ignored and all capabilities are 
        
           // inherited. 
        
           writeSetgroupsIfPresent("deny\n"); 
        
           writeUserNSMap("uid", kj::str("0 ", uid, " 1\n")); 
        
           writeUserNSMap("gid", kj::str("0 ", gid, " 1\n"));

Then, on grain startup, we try to hide our real uid by mapping it to 1000 inside the grain's user namespace. But getuid() returns zero, so we end up writing "0 1000 1\n" to the file, which is failing the kernel's permission checks.

kentonv · 2022-01-18T02:59:47Z

Ah, hmm. I forgot that that's how that worked.

But it has worked for a long time and it still appears to work on my Debian Bullseye machine which is kernel 5.10.

zenhack · 2022-01-23T05:12:49Z

Hm, I am actually seeing the same error in a bullseye VM.

You mentioned on IRC that you were seeing (seemingly unrelated, spurious) test failures though when putting together the release -- have you in fact seen those tests pass recently?

I've been able to reproduce this on arch on older kernels, so per above I agree it's unlikely this is a kernel regression -- there must be some other system component at issue, but so far I am still scratching my head as to what that could be.

zenhack · 2022-01-24T03:16:42Z

from capabilities(7):

       CAP_SETFCAP (since Linux 2.6.24)
              Set arbitrary capabilities on a file.

              Since  Linux  5.12,  this  capability is also needed to map user ID 0 in a new user name‐
              space; see user_namespaces(7) for details.

And tracing through the code I can confirm the check it's failing isn't there in 5.10.
So it looks like it is a kernel change? I don't know why it's failing on my bullseye VM then though, but it seems like this will hit debian too sooner or later if we don't deal with it.

user_namespaces(7) elaborates on the reasoning for this:

          This  rule has been in place since Linux 5.12.  It eliminates an earlier security bug whereby
          a UID 0 process that lacks the CAP_SETFCAP capability, which is needed  to  create  a  binary
          with  namespaced file capabilities (as described in capabilities(7)), could nevertheless cre‐
          ate such a binary, by the following steps:

          *  Create a new user namespace with the identity mapping (i.e., UID 0 in the new  user  name‐
             space  maps to UID 0 in the parent namespace), so that UID 0 in both namespaces is equiva‐
             lent to the same root user ID.

          *  Since the child process has the CAP_SETFCAP capability, it  could  create  a  binary  with
             namespaced  file  capabilities  that  would then be effective in the parent user namespace
             (because the root user IDs are the same in the two namespaces).

So we can't just ask the kernel devs to revert the change; we will have to work around it somehow. I'll have to think about how to do this.

zenhack · 2022-01-24T05:42:57Z

One possibility, which is more work than I'd like but I think would work:

Rather than re-execing ourselves, we could instead hand the fdBundle and pid file back to a parent process outside the user namespace and exit, after which the parent process would restart us. Then, we would not need to do the first remapping.

Will continue to mull this over, but would appreciate @kentonv's thoughts.

zenhack · 2022-01-24T21:37:30Z

There's actually no need to pass the bundle back, since the parent can just hold on to the bundle. I've got a version of the above fix locally that seems to get some tests to pass; I haven't pinned down if all of the other failures are unrelated. I'll open a PR soonish probably.

Rather than having the update monitor itself exec the newly downloaded version of sandstorm, it simply exits (with status 0), and its parent process takes this as an indictation that it should re-spawn it (with the new version). This avoids the need to map ourselves to UID 0 in the user namespace, which avoids the EPERM error we were getting when later hiding the real UID from a grain. Per discussion on the issue, the old design no longer works as of Linux 5.12, after which spawning the grain we would need CAP_SETFCAP, which we do not want to have.

zenhack · 2022-01-25T06:53:05Z

Opened a draft pr (#3603).

kentonv · 2022-02-12T18:28:27Z

What the hell? They just broke mapping UID 0 in a user namespace in kernel 5.12? And Linus didn't throw a tantrum?

It feels like there are several ways they could have fixed the bug without outright breaking the ability for a non-privileged user to create a userns and be UID 0 inside it...

Doing this is the source of sandstorm-io#3584, and contrary to the now-deleted comment, it doesn't seem to be necessary -- both launching the supervisor and taking backups seems to work as expected.

Doing this is the source of sandstorm-io#3584. Instead, make sure we've fully entered a new user namespace before we have to do anything that would require the capabilities that are dropped on exec(). The only marginally tricky bit here is we have to fork() for the unshare() to fully take effect.

Doing this is the source of sandstorm-io#3584. Instead, make sure we've fully entered a new user namespace before we have to do anything that would require the capabilities that are dropped on exec(). We also need to be in a new pid namespace, since we try to mount /proc so it needs to be a procfs that we own. We use clone() instead of unshare() for this so we don't have to disturb the process hierarchy. Restart/update doesn't quite work yet; when these are invoked, the server just dies with this in sandstorm.log: ** Shutting down due to signal *** Uncaught exception *** sandstorm/run-bundle.c++:2042: failed: waitpid(sandstormPid, &status, 0): No child processes stack: 61be6b 619fc5 61977c 495ea7 495a84 I still need to debug why it can't seem to find the server monitor.

Doing this is the source of sandstorm-io#3584. Instead, make sure we've fully entered a new user namespace before we have to do anything that would require the capabilities that are dropped on exec(). We also need to be in a new pid namespace, since we try to mount /proc so it needs to be a procfs that we own. We use clone() instead of unshare() for this so we don't have to disturb the process hierarchy.

Doing this is the source of #3584. Instead, make sure we've fully entered a new user namespace before we have to do anything that would require the capabilities that are dropped on exec(). We also need to be in a new pid namespace, since we try to mount /proc so it needs to be a procfs that we own. We use clone() instead of unshare() for this so we don't have to disturb the process hierarchy.

zenhack mentioned this issue Nov 17, 2021

Grain backups block the backend's event loop. #3586

Open

zenhack mentioned this issue Nov 17, 2021

Factor out some common sandbox setup code #3587

Merged

ocdtrekkie added bug sandstorm-dev Issues hacking on Sandstorm labels Jan 24, 2022

zenhack mentioned this issue Jan 26, 2022

Rework the way updates work to fix #3584 #3603

Closed

3 tasks

ocdtrekkie added this to the Fix tests (2021) milestone Jan 27, 2022

ocdtrekkie linked a pull request Jan 31, 2022 that will close this issue

Rework the way updates work to fix #3584 #3603

Closed

3 tasks

zenhack mentioned this issue Feb 13, 2022

Don't map ourselves to root in the user namespace. #3609

Merged

ocdtrekkie linked a pull request Feb 13, 2022 that will close this issue

Don't map ourselves to root in the user namespace. #3609

Merged

kentonv closed this as completed in #3609 Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Permissions error causing test failures #3584

Permissions error causing test failures #3584

zenhack commented Nov 15, 2021

zenhack commented Nov 15, 2021

zenhack commented Nov 16, 2021

ocdtrekkie commented Nov 17, 2021

zenhack commented Jan 17, 2022

kentonv commented Jan 17, 2022

xet7 commented Jan 17, 2022

zenhack commented Jan 17, 2022

kentonv commented Jan 18, 2022

zenhack commented Jan 23, 2022

zenhack commented Jan 24, 2022

zenhack commented Jan 24, 2022

zenhack commented Jan 24, 2022

zenhack commented Jan 25, 2022

kentonv commented Feb 12, 2022

Permissions error causing test failures #3584

Permissions error causing test failures #3584

Comments

zenhack commented Nov 15, 2021

zenhack commented Nov 15, 2021

zenhack commented Nov 16, 2021

ocdtrekkie commented Nov 17, 2021

zenhack commented Jan 17, 2022

kentonv commented Jan 17, 2022

xet7 commented Jan 17, 2022

zenhack commented Jan 17, 2022

kentonv commented Jan 18, 2022

zenhack commented Jan 23, 2022

zenhack commented Jan 24, 2022

zenhack commented Jan 24, 2022

zenhack commented Jan 24, 2022

zenhack commented Jan 25, 2022

kentonv commented Feb 12, 2022