Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing (and invisible in strace) clone syscall related to privileged namespaces and chrome's setuid sandbox #2242

Closed
fpqc opened this issue Jun 21, 2017 · 38 comments

Comments

@fpqc
Copy link

fpqc commented Jun 21, 2017

Build 16215

We already knew that chromium couldn't be launched with its usual sandbox due to the fact that user namespaces aren't implemented. I assumed privileged namespaces were not completely implemented either, though I had heard some of the devs say that privileged namespaces as a concept already exist in the underlying native NT context.

Chromium also includes a setuid sandbox implementation for kernels that don't implement user namespaces (notably Arch, though I used a clean Ubuntu Xenial image to do this test), and you can launch chromium in this way with the flag --disable-namespace-sandbox, which disables the user namespace sandbox but enables the classic setuid sandbox.

In the user namespace sandbox case, we also get a line in between that actually shows the failing syscall:

clone(child_stack=0x7ffff8384930, flags=CLONE_NEWUSER|SIGCHLD) = -1 EINVAL (Invalid argument)

Now the weird part is this:
Let's try to run that without an strace first:

$ google-chrome --headless --disable-gpu --disable-namespace-sandbox
clone: Invalid argument

we get (among other errors, I know) that there is a clone syscall receiving an invalid parameter or option.

But when you look into the strace, there's no EINVAL on a Clone syscall, so somewhere some clone syscall is failing on unimplemented surface but is also failing to be picked up in the strace, unlike in the namespace sandbox case.

Straces:
for setuid sandbox:
suidsandbox.txt

for user namespaces sandbox:
unamespaces.txt

@therealkenc
Copy link
Collaborator

so somewhere some clone syscall is failing on unimplemented surface

This is the fail:

[pid 22732] execve("/opt/google/chrome/chrome-sandbox", 
    ["/opt/google/chrome/chrome-sandbo"..., "/opt/google/chrome/chrome", 
        "--type=zygote", "--headless", "--headless"], [/* 20 vars */]) = -1 
    EPERM (Operation not permitted)

I didn't track it very far. Takes on Real Linux natch. Running suid binaries on WSL works as far as it goes.

In any case, the suid sandbox is nearly dead. Still be valuable (on its own merits) looking into why the execve goes south.

@therealkenc
Copy link
Collaborator

Oh yeah, repro:

sudo apt-get install ubuntu-desktop libxss1 libappindicator1 libindicator7
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo dpkg -i google-chrome-stable_current_amd64.deb
sudo apt-get install -f  # probably
google-chrome --headless --disable-gpu --disable-namespace-sandbox

@fpqc
Copy link
Author

fpqc commented Jun 22, 2017

Yeah, I was mainly interested to see if privileged namespace functionality was working at all.

So is this a clone problem, or is it just reportinga clone problem incorrectly?

@therealkenc
Copy link
Collaborator

therealkenc commented Jun 22, 2017

or is it just reporting a clone problem incorrectly?

Maybe. But probably not. That error message to stderr didn't come from divine intervention. But either way that particular execve succeeds on Real Linux and fails on WSL.

Note chrome goes out of it's way to thwart strace when running sandboxed (either kind), so what you could be seeing is a successful execve when not under strace on WSL, and then a subsequent clone fail, for reasons. As I mentioned, bog standard suid proper works on WSL, as far as it goes. The arguments passed to clone calls preceding the execve don't look that noteworthy from an EINVAL standpoint, AFAICT.

All of which would be noteworthy if I thought Stephen actually put chrome in play.

@fpqc
Copy link
Author

fpqc commented Jul 14, 2017

Oh, by the way, an update: I contacted the chromium security team, and they confirmed that the setuid sandbox will continue to be included for the foreseeable future, but they have no plans to patch it. It will be removed probably when it accumulates enough bugs or breakages or security flaws as to become counterproductive (unless someone wants to take up the responsibility of maintaining patches for it).

I haven't yet forwarded the email I received to the Arch kernel and chromium maintainers (real life intervened and I forgot), but I can probably get an answer back in the next week or so to find out what they're going to do. If they plan to stick with the setuid sandbox also for the foreseeable future, WSL supporting it should keep WSL compatible with chromium (by reporting from the kernel that user namespaces are disabled and therefore letting chromium fall back gracefully to the setuid sandbox).

The security-focused guys (like the maintainer of the linux-hardened package for Arch) have told me that they feel that user namespaces are insecure by design (and indeed they have been exploited a few times in the past two years).

I'll ping the WSL team with the conclusion of my investigation once I get an answer from the Arch people.

@therealkenc
Copy link
Collaborator

therealkenc commented Jul 27, 2017

Amusingly enough @benhillis' work on #962 moved this forward. 16251 now gets over the execve(). Actually if I was a little quicker on the uptake I would have thought of #962 when I saw the EPERM. Same mode of failure.

Having missed that, I did get the "subsequent clone()" failure hypothesis right. Child stack of zero Actually it was CLONE_FS. Don't think I have seen that before.

[pid   519] clone(child_stack=0, flags=CLONE_FS|SIGCHLD) = -1 EINVAL (Invalid argument)

So @fpqc now ya know.

Since I am here, I have to underline this ticket has nada to do with "namespaces", even though the word is used 18 times in this thread so far including in the issue title (so says ctrl-f). The whole point of --disable-namespace-sandbox is you are on a kernel without them. Eye on the ball.

@fpqc
Copy link
Author

fpqc commented Jul 27, 2017

@therealkenc I'm not sure how the suid sandbox works exactly on-the-nose, but my (possibly incorrect) understanding is that it makes explicit use of the privileged namespace functionality to construct a secure chroot in which the renderer process runs.

In particular, from the suid sandbox page

CLONE_NEWPID method
The sandbox does three things to restrict the authority of a sandboxed process. The SUID helper is responsible for the first two:
The SUID helper chroots the process. This takes away access to the filesystem namespace.
The SUID helper puts the process in a PID namespace using the CLONE_NEWPID option to clone(). This stops the sandboxed process from being able to ptrace() or kill() unsandboxed processes.
In addition:
The Linux Zygote startup code sets the process to be undumpable using prctl(). This stops sandboxed processes from being able to ptrace() each other. More specifically, it stops the sandboxed process from being ptrace()'d by any other process. This can be switched off with the --allow-sandbox-debugging option.

So in particular, what I was asking was if privileged namespaces were the problem (in this case, I guess, the PID namespace).

@therealkenc
Copy link
Collaborator

Oh I see what you mean. Yeah that counts, and "nada" is clearly overstated. It would be more correct to say "the problem right now is not related to namespaces". So apologies for that.

[n.b. I just didn't want to see this ticket go down the same path as #1962, which is a mount -t drvfs bug/limitation. An otherwise useful ticket with an actionable problem was lost because an unrelated four letter acronym was repeated two dozen times.]

@fpqc
Copy link
Author

fpqc commented Jul 28, 2017

All I know about the User Namespace functionality is that it could conceptually make sandboxes/containers easier to construct and work with, but also that they have already been escaped/exploited numerous times in the past 3-4 years and have been one of the largest sources of Linux Kernel CVEs. I've spoken with some of the Linux-Hardened patchset (the successor to grsecurity) maintainers, and they have argued that User Namespaces will continue to be insecure because they are conceptually flawed, rather than flawed in implementation. I don't know myself, but that seems to be the opinion of security-minded people.

Anyway, quick question: Does WSL support PID namespaces? I could have sworn you commented recently to Ben that it's not and should probably be as part of the requirements to support systemd.

@therealkenc
Copy link
Collaborator

therealkenc commented Jul 28, 2017

I didn't think CLONE_NEWPID was supported. Now I'm not so sure. If you open a SUSE distro and a Ubuntu distro (installed from the Store) at the same time, you get two isolated PID 1 processes. Because of course you do. For all intents, they're containers. Looking into it.

@therealkenc
Copy link
Collaborator

Anyway, quick question: Does WSL support PID namespaces?

Yep. Funny thing, I think maybe it's been there for a while. And to answer your next question, I'm looking into it.

wsl-namespace

@fpqc
Copy link
Author

fpqc commented Jul 29, 2017

Haha that's super cool. I also noticed on that page there's a launch flag to start up the suid sandbox in a way that allows it to be ptraced, so if I try to trace it again in the new build, I'll turn that on too.

@fpqc
Copy link
Author

fpqc commented Jul 29, 2017

@therealkenc

So if they wanted to add nspawn/rkt support, I looked into it:

They need the multiple devpts mount support, network namespace support as part of the base systemd requirements, and the following additional namespaces:

UTS
PID
mount
IPC

From your research here it looks like PID and mount namespaces are implemented. How about UTS, IPC, and Network namespaces?

@therealkenc
Copy link
Collaborator

therealkenc commented Jul 29, 2017

CLONE_NEWUTS, CLONE_NEWIPC, CLONE_NEWPID, CLONE_NEWNS all work. I am actually a little embarrassed I didn't notice. I have not tried CLONE_NEWNET because it is academic in the context of WSL.

Need some more bits for systemd though. I did a short run at it last night but my heart wasn't in it. But take that over to #994.

From your first post, the issue here was and remains:

[pid   519] clone(child_stack=0, flags=CLONE_FS|SIGCHLD) = -1 EINVAL (Invalid argument)

...full stop...

@fpqc
Copy link
Author

fpqc commented Jul 30, 2017

Haha! That's pretty cool! So maybe I should close this thread, and you could just open a new issue for just that CLONE_FS|SIGCHLD pattern, which is much more specific.

PS does unshare()ing those other kernel namespaces work as expected as well?

@therealkenc
Copy link
Collaborator

and you could just open a new issue

Heh, don't look at me. I don't even run Chrome on WSL. 😉

PS does unshare()ing those other kernel namespaces work as expected as well?

I haven't done an unshare() test case or anything like that, but yes. It has to work, or it would be a resource leak.

@fpqc
Copy link
Author

fpqc commented Jul 30, 2017

I ask because in the release notes for when they added mount namespaces, they did mention that at least initially, unsharing was not working properly.

image

@benhillis
Copy link
Member

benhillis commented Aug 1, 2017

@therealkenc and @fpqc - Keep digging, you might find some surprises... :)

@therealkenc
Copy link
Collaborator

Finding the Easter Eggs is indeed fun; which I suppose is the point. But on balance I would still rather see a Roadmap (like chakra, .NET Core, typescript, and vscode), better release notes, and elimination of the internal shadow issue tracker.

That clone() fail above still deserves a look-see. I can work up a test repro if chrome is still a verboten use case.

@benhillis
Copy link
Member

benhillis commented Aug 1, 2017

@therealkenc - I 100% agree with you. I'd like to be more open with the community and it's something that I'm pushing towards. Unfortunately there are a lot of politics to content with.

The issue with unsupported clone flag combinations is well-understood so no test repro is needed. Thank you for the offer though.

@benhillis
Copy link
Member

@therealkenc - Dug into this a bit this afternoon. Once I implemented the CLONE_FS flag chromium makes it a bit further, but ultimately fails when trying to talk to udev over netlink sockets (I think).

[0801/165544.343755:WARNING:audio_manager.cc(293)] Multiple instances of AudioManager detected
[0801/165544.344512:WARNING:audio_manager.cc(254)] Multiple instances of AudioManager detected
[0801/165548.842971:FATAL:udev_linux.cc(20)] Check failed: monitor_.
#0 0x7fabeeec2df7 base::debug::StackTrace::StackTrace()
#1 0x7fabeeee435c logging::LogMessage::~LogMessage()
#2 0x7fabe7289001 <unknown>
#3 0x7fabe722447f media::DeviceMonitorLinux::Initialize()
#4 0x7fabeeec38c3 base::debug::TaskAnnotator::RunTask()
#5 0x7fabeeeec31d base::MessageLoop::RunTask()
#6 0x7fabeeeecdc8 base::MessageLoop::DoWork()
#7 0x7fabeeeef0e9 base::MessagePumpLibevent::Run()
#8 0x7fabeeeec049 base::MessageLoop::RunHandler()
#9 0x7fabeef177b0 base::RunLoop::Run()
#10 0x7fabe8f2beb6 content::BrowserThreadImpl::IOThreadRun()
#11 0x7fabe8f2c063 content::BrowserThreadImpl::Run()
#12 0x7fabeef4ce77 base::Thread::ThreadMain()
#13 0x7fabeef46fa3 <unknown>
#14 0x7fabef1e76da start_thread
#15 0x7fabd8ce8d7f clone

Received signal 6
#0 0x7fabeeec2df7 base::debug::StackTrace::StackTrace()
#1 0x7fabeeec296f <unknown>
#2 0x7fabef1f1670 <unknown>
#3 0x7fabd8c1577f gsignal
#4 0x7fabd8c1737a abort
#5 0x7fabeeec1022 base::debug::BreakDebugger()
#6 0x7fabeeee465a logging::LogMessage::~LogMessage()
#7 0x7fabe7289001 <unknown>
#8 0x7fabe722447f media::DeviceMonitorLinux::Initialize()
#9 0x7fabeeec38c3 base::debug::TaskAnnotator::RunTask()
#10 0x7fabeeeec31d base::MessageLoop::RunTask()
#11 0x7fabeeeecdc8 base::MessageLoop::DoWork()
#12 0x7fabeeeef0e9 base::MessagePumpLibevent::Run()
#13 0x7fabeeeec049 base::MessageLoop::RunHandler()
#14 0x7fabeef177b0 base::RunLoop::Run()
#15 0x7fabe8f2beb6 content::BrowserThreadImpl::IOThreadRun()
#16 0x7fabe8f2c063 content::BrowserThreadImpl::Run()
#17 0x7fabeef4ce77 base::Thread::ThreadMain()
#18 0x7fabeef46fa3 <unknown>
#19 0x7fabef1e76da start_thread
#20 0x7fabd8ce8d7f clone
  r8: 0000000000000000  r9: 00007fab8a7be8e0 r10: 0000000000000008 r11: 0000000000000008
 r12: 00007fab8a7befa8 r13: 00007fab8a7bf220 r14: 00007fab8a7befb8 r15: 00007fabd9646060
  di: 0000000000000002  si: 00007fab8a7be8e0  bp: 00007fab8a7bf3a0  bx: 0000000000000000
  dx: 0000000000000000  ax: 0000000000000000  cx: 0000000000000008  sp: 00007fab8a7be958
  ip: 00007fabd8c1577f efl: 0000000000000246 cgf: 00000053002b0033 erf: 0000000000000000
 trp: 0000000000000000 msk: 0000000000000000 cr2: 0000000000000000
[end of stack trace]
Calling _exit(1). Core file will not be generated.

@therealkenc
Copy link
Collaborator

therealkenc commented Aug 1, 2017

Hmmm. I assumed CLONE_FS was implemented (it shows up in quick issue search). You get over the udev netlink socket problem with libudev-stub, natch 😉.

I can't underline how easy it is to stub the netlink socket part in the WSL kernel if you guys feel like it. My userspace stub just opens a AF_UNIX dgram socket here. Which is totally the wrong family, but it isn't like userspace knows any different. The next thing userspace does with the socket descriptor is select(), which never fires. You don't need to actually implement the protocol.

@benhillis
Copy link
Member

@therealkenc - the CLONE_FS flag was implemented for the unshare system call, but not for the clone system call.

@benhillis
Copy link
Member

benhillis commented Aug 1, 2017

@therealkenc - Well would you look at that...

chrome

The change to add CLONE_FS support is pretty small so I think I can squeeze it in for Fall Creator's update. As always, thanks @therealkenc for all the help!

@therealkenc
Copy link
Collaborator

Awesome. Fun, no? Now all you need to do is track why VSCode doesn't launch (toy electron apps are fine). Give an inch, they'll ask for a mile 😜.

@fpqc
Copy link
Author

fpqc commented Aug 2, 2017

Did you also change the kernel to report user namespaces were disabled? I noticed you launched without the--disable-namespace-sandbox flag. That would make a lot of sense =]

@benhillis
Copy link
Member

@fpqc - no that clone call still fails but chrome appears to recover from the error.

@fpqc
Copy link
Author

fpqc commented Aug 2, 2017

Ah, neat!

@benhillis
Copy link
Member

@therealkenc - I liked your idea of stubbing netlink support so I decided to see what would happen if I allowed creation of NETLINK_KOBJECT_UEVENT netlink sockets that will never receive any messages. With that change and the CLONE_FS change I above I'm able to launch chromium without any workarounds.

Unfortunately I can't do a similar think for all netlink families. For example, allowing creation of NETLINK_AUDIT sockets but don't actually support the messages opens worms (sudo, adduser, passwd all start failing).

I'm submitting a CR for this change now.

@therealkenc
Copy link
Collaborator

...above I'm able to launch chromium without any workarounds.

Most excellent. But you forgot the PulseAudio .deb you installed. Which to fix, you need to stub priority futexes per #1006 (message) to behave like the normal kind. I am pretty sure user space won't tell the difference. I know Pulse doesn't anyway, because that's what the .deb you installed does.

That's the end of the hard blockers though. The rest of the messages you see in the console is one soft fail, plus configuration problems that would manifest on real Linux (including OpenGL).

@benhillis
Copy link
Member

benhillis commented Aug 3, 2017

@therealkenc - I did not have to apply the pulseaudio .deb to get chrome to launch. Chrome launches but per your assertion I'm assuming remote audio will not work without said futex change?

@therealkenc
Copy link
Collaborator

therealkenc commented Aug 3, 2017

@benhillis
It is in your screencap. The postinstall script failed but I I am pretty sure (but can't prove) that unpacking libpulse0:amd64 (1:8.0-0ubuntu3.2ppa1) over (1:10.0-1ubunut2) took and swapped out libpulse. The dependency problems are because your Ubuntu distro has been upgraded since I put up that .deb in May. I quietly (without announcement) updated the PPA on July 7. Instructions to install it 'properly' here. Unless you moved on to Ubuntu 17.04. You see the problem.

It isn't 'remote audio' per se. The problem (IMO Google bug) is that chrome faceplants on #486 whether you want to hear noises from your webpages or not. Contrast Firefox which recovers. You don't need the server side up and running. You just need to get past the futex() call that libpulse (the client) makes.

@benhillis
Copy link
Member

@therealkenc - You're correct my screencap was from a machine that I was testing some things on 17.04 in parallel. However my assertion that google-chrome launches without workarounds was based on trying this on a 16.04 install with none of your workarounds:
chrome2

I am seeing the FUTEX_CMP_REQUEUE errors in the strace but chrome seems to be handling it OK.

@therealkenc
Copy link
Collaborator

therealkenc commented Aug 3, 2017

Okay understood. It is possible (probable, even) that Google fixed it along the way and I never noticed because I've had the libpulse PPA sitting there (not making actual sounds) for all these months. The only reason I did the PPA was to get over the futex() hard fail. I say declare victory and go home.

[edit] Or Stephen fixed futex() and didn't tell anyone (laugh).

@benhillis
Copy link
Member

I'm taking a stab at implementing the futex requeue options, doesn't look like it should be too bad (famous last words).

@therealkenc
Copy link
Collaborator

therealkenc commented Aug 3, 2017

How hard can it be. I have been thinking at least faking it should be possible since Sept 2016 (message).

I just put up some theoretic install instructions for chrome (theoretic, because I have never run them in sequence). But if you pick out the OpenGL and dbus session setup good bits, you should be at console warning parity with Real Linux. "Should be", with the singular exception of #1353 (zombies). But oddly I am not seeing that (soft) fail in your screencap either. Typically it looks like:

[2644:2677:0803/125126.647021:ERROR:process_posix.cc(194)] Not implemented reached in bool (anonymous namespace)::WaitForExitWithTimeoutImpl(base::ProcessHandle, int *, base::TimeDelta)
[2644:2673:0803/125126.766243:ERROR:process_posix.cc(194)] Not implemented reached in bool (anonymous namespace)::WaitForExitWithTimeoutImpl(base::ProcessHandle, int *, base::TimeDelta)

So you are ahead of the game on a few fronts.

@fpqc
Copy link
Author

fpqc commented Aug 24, 2017

@benhillis should be good to close up shop on this thread as well with a fixedininsiders flag

@benhillis
Copy link
Member

@fpqc - will do, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants