Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will trigger glibc assertion if started with pid 2 #171

Open
cgwalters opened this issue Jan 30, 2017 · 3 comments
Open

Will trigger glibc assertion if started with pid 2 #171

cgwalters opened this issue Jan 30, 2017 · 3 comments
Labels

Comments

@cgwalters
Copy link
Collaborator

https://pagure.io/releng/issue/6602#comment-71459
https://sourceware.org/bugzilla/show_bug.cgi?id=17214
http://repo.or.cz/glibc.git/commit/c579f48edba88380635ab98cb612030e3ed8691e

The workarounds in the middle bug sound awful...but I didn't look in depth.

@cgwalters cgwalters added the bug label Jan 30, 2017
@cgwalters
Copy link
Collaborator Author

cgwalters commented Feb 3, 2017

Here is the code ChromiumOS uses (or at least did at one time). So...if we have to I could imagine adapting this, but...it's pretty evil. Particularly given that this really long standing issue is already fixed in glibc git/rawhide.

What makes this annoying is I can't seem to trigger the assertion when testing locally. I'm trying:

for x in $(seq 1000); do bwrap --unshare-all --symlink usr/lib /lib --symlink usr/lib64 /lib64 --symlink usr/bin /bin --ro-bind /usr /usr --proc /proc --dev /dev /bin/sh -c 'for x in $(seq 5); do sleep 0.1& done' & done

To run 1000 bwrap instances each of which run 5 processes in the container, to excercise any possible races in both bwrap core and anything with core vs container processes, but nope.

@cgwalters
Copy link
Collaborator Author

Oh man. This is so spectacularly evil.

Works ✅ reliably:

# systemd-nspawn -D /srv/container-rootfs --bind-ro /srv --bind-ro /usr --bind-ro /etc bash -c 'bwrap --ro-bind / / --unshare-all true'

Fails 💣 every time:

# systemd-nspawn -D /srv/container-rootfs --bind-ro /srv --bind-ro /usr --bind-ro /etc bash -c 'bwrap --ro-bind / / --unshare-all true && sleep 0.1'
bwrap: ../sysdeps/nptl/fork.c:156: __libc_fork: Assertion `THREAD_GETMEM (self, tid) != ppid' failed.
Container container-rootfs failed with error code 134.

I believe what's going on here is: we'll only hit this assertion if the outer pid happens to match the inner pid. Given that the inner pid is always 2, we'll trip this assertion if the outer pid is 2.

In our scenario, have an outer container managed by nspawn, which creates a PID namespace, then in bwrap we go to create a pid namespace too.

In the first case, one thing to note is that bash will automatically execve() if the script is only one command. Whereas, in the second case, the bash process is forced to stay alive to run bwrap as a child. Which changes its pid.

So...I think we can work around this reliably by simply changing the Fedora infra to prepend an additional shell around the whole process. For example, this works again:

systemd-nspawn -D /srv/container-rootfs --bind-ro /srv --bind-ro /usr --bind-ro /etc bash -c 'bash -c "bwrap --ro-bind / / --unshare-all true && sleep 0.1" && sleep 0.1'

@cgwalters
Copy link
Collaborator Author

Alternatively, use systemd-nspawn --as-pid2 which is probably correct here; the outer bwrap isn't necessarily expecting to be pid 1. And rpm-ostree definitely isn't.

@cgwalters cgwalters changed the title may assert with older glibc versions Will trigger glibc assertion if started with pid 2 Feb 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant