cmd/snap-confine: re-associate with pid-1 mount namespace if required #2624

Merged
merged 37 commits into from Mar 16, 2017

Conversation

Projects
None yet
7 participants
Contributor

zyga commented Jan 12, 2017

This patch changes the initialization process of mount namespace
handling code. If the snap-confine process is itself in a namespace
other than that of pid-1 (which happens when snaps with confinement
other than classic are trying to invoke other snaps) then snap-confine
will move itself to the mount namespace of pid-1 before attempting any
other actions.

This allows snap-confine to get access to /run/snapd/ns directory that
has private sharing and thus is not shared with any mount peer groups.

Fixes: https://bugs.launchpad.net/snap-confine/+bug/1644439
Signed-off-by: Zygmunt Krynicki zygmunt.krynicki@canonical.com

zyga added some commits Dec 2, 2016

Re-associate with pid-1 mount namespace if required
This patch changes the initialization process of mount namespace
handling code. If the snap-confine process is itself in a namespace
other than that of pid-1 (which happens when snaps with confinement
other than classic are trying to invoke other snaps) then snap-confine
will move itself to the mount namespace of pid-1 before attempting any
other actions.

This allows snap-confine to get access to /run/snapd/ns directory that
has private sharing and thus is not shared with any mount peer groups.

Fixes: https://bugs.launchpad.net/snap-confine/+bug/1644439
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
cmd/snap-confine: tweak code layout
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
cmd/snap-confine: don't use O_PATH fd with setns (doh)
Also don't use O_NOFOLLOW when opening the mount namespace file which is
always a symbolic link.

Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
tests: overlay new snap-confine and snap-discard-ns into core snap
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
tests: show a tail of kernel log when reassociate test fails
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
tests: add regression test for LP: #1644439
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Contributor

zyga commented Jan 12, 2017

NOTE: I managed to avoid the kernel OOPS but the tests will still fail on recursive confinement. I asked @jdstrand to have a look after sharing some ideas on IRC.

The essence of the remaining problem:

qemu:ubuntu-16.04-64 .../tests/regression/lp-1644439# test-snapd-tools.cmd sh -c "/var/lib/snapd/hostfs/usr/bin/strace -o strace.log /var/lib/snapd/hostfs/usr/bin/nsenter -m/proc/1/ns/mnt"
nsenter: cannot open /proc/1/ns/mnt: Permission denied

Then the log file contains:

open("/proc/1/ns/mnt", O_RDONLY)        = -1 EACCES (Permission denied)

the kernel ring buffer contains:

[ 1680.352515] audit: type=1400 audit(1484252277.064:162): apparmor="ALLOWED" operation="exec" profile="snap.test-snapd-tools.cmd" name="/var/lib/snapd/hostfs/usr/bin/strace" pid=25401 comm="sh" requested_mask="x" denied_mask="x" fsuid=0 ouid=0 target="snap.test-snapd-tools.cmd//null-/var/lib/snapd/hostfs/usr/bin/strace"
[ 1680.352783] audit: type=1400 audit(1484252277.064:163): apparmor="ALLOWED" operation="open" profile="snap.test-snapd-tools.cmd//null-/var/lib/snapd/hostfs/usr/bin/strace" name="/etc/ld.so.cache" pid=25401 comm="strace" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[ 1680.352902] audit: type=1400 audit(1484252277.064:164): apparmor="ALLOWED" operation="open" profile="snap.test-snapd-tools.cmd//null-/var/lib/snapd/hostfs/usr/bin/strace" name="/lib/x86_64-linux-gnu/libc-2.23.so" pid=25401 comm="strace" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[ 1680.353237] audit: type=1400 audit(1484252277.064:165): apparmor="ALLOWED" operation="file_mprotect" profile="snap.test-snapd-tools.cmd//null-/var/lib/snapd/hostfs/usr/bin/strace" name="/var/lib/snapd/hostfs/usr/bin/strace" pid=25401 comm="strace" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[ 1680.353322] audit: type=1400 audit(1484252277.064:166): apparmor="ALLOWED" operation="file_mprotect" profile="snap.test-snapd-tools.cmd//null-/var/lib/snapd/hostfs/usr/bin/strace" name="/lib/x86_64-linux-gnu/ld-2.23.so" pid=25401 comm="strace" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[ 1680.353672] audit: type=1400 audit(1484252277.064:167): apparmor="ALLOWED" operation="open" profile="snap.test-snapd-tools.cmd//null-/var/lib/snapd/hostfs/usr/bin/strace" name="/home/gopath/src/github.com/snapcore/snapd/tests/regression/lp-1644439/strace.log" pid=25401 comm="strace" requested_mask="wc" denied_mask="wc" fsuid=0 ouid=0
[ 1680.353681] audit: type=1400 audit(1484252277.064:168): apparmor="ALLOWED" operation="truncate" profile="snap.test-snapd-tools.cmd//null-/var/lib/snapd/hostfs/usr/bin/strace" name="/home/gopath/src/github.com/snapcore/snapd/tests/regression/lp-1644439/strace.log" pid=25401 comm="strace" requested_mask="w" denied_mask="w" fsuid=0 ouid=0
[ 1680.354057] audit: type=1400 audit(1484252277.064:169): apparmor="ALLOWED" operation="exec" profile="snap.test-snapd-tools.cmd//null-/var/lib/snapd/hostfs/usr/bin/strace" name="/var/lib/snapd/hostfs/usr/bin/nsenter" pid=25403 comm="strace" requested_mask="x" denied_mask="x" fsuid=0 ouid=0 target="snap.test-snapd-tools.cmd//null-/var/lib/snapd/hostfs/usr/bin/strace//null-/var/lib/snapd/hostfs/usr/bin/nsenter"
[ 1680.354435] audit: type=1400 audit(1484252277.064:170): apparmor="ALLOWED" operation="open" profile="snap.test-snapd-tools.cmd//null-/var/lib/snapd/hostfs/usr/bin/strace//null-/var/lib/snapd/hostfs/usr/bin/nsenter" name="/etc/ld.so.cache" pid=25403 comm="nsenter" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
[ 1680.354657] audit: type=1400 audit(1484252277.064:171): apparmor="ALLOWED" operation="open" profile="snap.test-snapd-tools.cmd//null-/var/lib/snapd/hostfs/usr/bin/strace//null-/var/lib/snapd/hostfs/usr/bin/nsenter" name="/lib/x86_64-linux-gnu/libselinux.so.1" pid=25403 comm="nsenter" requested_mask="r" denied_mask="r" fsuid=0 ouid=0

(I clear the buffer just before running this command)

cmd/snap-confine: fix formatting
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Contributor

jdstrand commented Jan 12, 2017

There is nothing in the logs that indicates a problem from AppArmor. Those ALLOWED messages are because you are running strace from under complain mode and the policy doesn't allow using strace, so the violation against policy is logged (but allowed).

Contributor

zyga commented Jan 12, 2017

@jdstrand correct BUT the permission denied only happens if you do this through apparmor. Try the same idea (unshare -m and then nsenter pid 1 mount ns). In addition the error code suggests apparmor (EACCES rather then EPERM) (or did I get this wrong?)

Contributor

jdstrand commented Jan 12, 2017

Did you turn of kernel rate limiting? sudo sysctl -w kernel.printk_ratelimit=0

tests: add extra debugging
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Contributor

zyga commented Jan 12, 2017

The kernel issue is now reported as https://bugs.launchpad.net/apparmor/+bug/1656121

zyga added some commits Jan 16, 2017

tests: tweak reassociate test to use debug kernel
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
tests: collect more data about the apparmor/kernel bug
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
tests: fix typo
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
tests: collect more debug logs
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
tests: wrap long line
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Contributor

zyga commented Jan 16, 2017

This is blocked on the kernel/apparmor bug and critical for CE for their product development.

Contributor

tyhicks commented Jan 17, 2017

@zyga there's a kernel waiting for you to test in the launchpad bug. Give it a shot when you get some time.

Contributor

zyga commented Jan 17, 2017

I tested the test kernel and the results are inconclusive. I could use some hands-on time with jj to debug this further.

zyga added some commits Jan 18, 2017

tests: switch to v2 test kernel
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
tests: fix log file redirection
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>

@pedronis pedronis changed the title from Re-associate with pid-1 mount namespace if required to cmd/snap-confine: re-associate with pid-1 mount namespace if required Jan 20, 2017

zyga added some commits Jan 31, 2017

cmd: add missing space
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
tests: use more recent kernel from jj
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
tests: add FIXME note
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Contributor

zyga commented Feb 2, 2017

JJ has fixed all the issues in apparmor that affected this test. The next kernel release (~2.5 weeks away from now) should make this test green without using the test kernel.

tests: don't install custom kernel for reassociate fix
The kernel issue that was blocking this regression test from passing was
recently fixed and in anticipatin of new kernel (deb and snap) rollout
the test is simplified to the core essence and no verbose debugging.

Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
tests/regression/lp-1644439/task.yaml
+details: |
+ snap-confine uses privately-shared /run/snapd/ns to store bind-mounted
+ mount namespaces of each snap. In the case that snap-confine is invoked
+ from the mount namespace it typically constructs that directory is not
@niemeyer

niemeyer Feb 13, 2017

Contributor

Not reading properly: "it typically constructs that directory is not available"

@zyga

zyga Feb 14, 2017

Contributor

Fixed

tests/regression/lp-1644439/task.yaml
+ operate in such an environment snap-confine must first re-associate its own
+ process with another namespace in which the /run/snapd/ns directory is
+ visible. The most obvious candidate is pid one, which definitely doesn't
+ run in a snap-specific namespace, has a predictable PID and is long lived.
@niemeyer

niemeyer Feb 13, 2017

Contributor

This is such a clear explanation of the rationale. Thanks! Can we please have this copied over above the function sc_reassociate_with_pid1_mount_ns in ns-support.c as well?

@zyga

zyga Feb 13, 2017

Contributor

Yes, sure

@zyga

zyga Feb 14, 2017

Contributor

And done now

zyga added some commits Feb 14, 2017

tests: fix wording
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
cmd/snap-confine: document why we reassociate with pid1 ns
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Contributor

zyga commented Feb 23, 2017

Tests failed but this may have been caused by stale kernels used on test images. I'll do a local test in qemu

Contributor

niemeyer commented Mar 10, 2017

We need to get this out of the pipeline..

@jdstrand @tyhicks Are you happy with this change?

@zyga Can we get the tests fixed?

Contributor

zyga commented Mar 10, 2017

I just tested this on the -67 kernel from xenial-proposed and it works :-) Looking forward to merging this.

@zyga The -67 is now in xenial-updates. Can we land this?

Contributor

tyhicks commented Mar 15, 2017

FYI, I'm going to review this PR later today.

Contributor

zyga commented Mar 15, 2017

@arapulido oh, indeed. I'll check if we can get a clean run and if tyler agrees we can land this

@zyga zyga added this to the 2.24 milestone Mar 15, 2017

Contributor

zyga commented Mar 15, 2017

This is now tagged to 2.24 since CE requested this to be fixed with high priority and we have a high chance of landing it as the fixed kernel is now out.

mvo5 and others added some commits Mar 15, 2017

I don't love this change because it opens up a way for a confined application to escape to possibly enter the init namespace. However, I think the code is correct around the mount namespace handling in snap-confine and it should be safe to land.

I had one suggestion to lock down the AppArmor profile a bit and I'd like for you to quickly investigate if it is possible before you merge.

Ack from me.

@@ -275,6 +275,10 @@
/var/lib/snapd/hostfs/var/lib/lxd r,
# support for the mount namespace sharing
+ capability sys_ptrace,
+ # allow snap-confine to read /proc/1/ns/mnt
+ ptrace peer=unconfined,
@tyhicks

tyhicks Mar 15, 2017

Contributor

From my quick testing using readlink /proc/1/ns/mnt and nsenter --mount=/proc/1/ns/mnt, you can get away with specifying a single ptrace permission rather than granting all ptrace permissions:

ptrace trace peer=unconfined,

Also, I don't seem to need capability sys_ptrace, but trust that you added it after seeing a denial for that capability.

@zyga

zyga Mar 15, 2017

Contributor

Thank you for the review. I'll tighten the ptrace rule and see if I need sys_ptrace with the fixed kernel.

@zyga

zyga Mar 15, 2017

Contributor

I ran a quick test without `capability sys_ptrace that showed green on the regression test. I chose to remove this extra permission. I think it was added when we tested various kernel fixes but I don't think it is required in practice now.

cmd/snap-confine/snap-confine.c
+ * lived.
+ */
+ sc_reassociate_with_pid1_mount_ns();
+ const char *snap_name = getenv("SNAP_NAME");
@pedronis

pedronis Mar 15, 2017

Contributor

not quite clear why we need this getenv again

@zyga

zyga Mar 15, 2017

Contributor

We don't need this. I think it was a part of a very very old fix that can now be simplified

@zyga

zyga Mar 15, 2017

Contributor

Redundant now, removed

zyga added some commits Mar 15, 2017

cmd/snap-confine: remove redundant getenv
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
cmd/snap-confine: tighten ptrace rule (thanks to Tyler Hicks)
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
cmd/snap-confine: remove capability sys_ptrace
I recall this was needed earlier but perhaps only due to bugs in the
kernel. Tyler Hicks suggested it is no longer needed and a quick spread
test seems to confirm this.

Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Contributor

zyga commented Mar 15, 2017

The failure in docker is unrelated to the test and looks network related.

Contributor

zyga commented Mar 16, 2017

So I got cannot perform readlinkat() on the mount namespace file descriptor of the init process: Permission denied. This seems to imply that in certain cases ptrace capability was required.

Contributor

zyga commented Mar 16, 2017

Running the .../tests/main/interfaces-locale-control test I got:

[34345.780067] audit: type=1400 audit(1489647627.003:150): apparmor="DENIED" operation="capable" profile="/usr/lib/snapd/snap-confine" pid=31109 comm="snap-confine" capability=19  capname="sys_ptrace"

EDIT:

This occurs during this command in the test:

Then the snap is able to read the locale configuration
++ su -l -c 'locale-control-consumer.get LANG' test
cannot perform readlinkat() on the mount namespace file descriptor of the init process: Permission denied

zyga added some commits Mar 16, 2017

Revert "cmd/snap-confine: remove capability sys_ptrace"
This reverts commit 3efbedd.

It seems that after all it is needed. Specifically if snap-confine is
started by a non-root user everything consistently fails with:

    cannot perform readlinkat() on the mount namespace file descriptor of
    the init process: Permission denied

And an apparmor denial is logged:

    [34345.780067] audit: type=1400 audit(1489647627.003:150):
    apparmor="DENIED" operation="capable"
    profile="/usr/lib/snapd/snap-confine" pid=31109 comm="snap-confine"
    capability=19  capname="sys_ptrace"

This corresponds to the following call:

if (readlinkat(init_mnt_fd, "", init_buf, sizeof init_buf) < 0) {

init_mnt_fd is coming from and earlier successful call:

    init_mnt_fd = open("/proc/1/ns/mnt",
	   O_RDONLY | O_CLOEXEC | O_NOFOLLOW | O_PATH);

Perhaps the fact that euid == 0 but uid != 0 is relevant here but I
don't know how exactly.

Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
tests: run 1644439 regression test as user as well
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
tests: display kernel version if 1644439 regression test fails
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
tests: disable regression test for 1644439 on core systems
Apparently our x86_64 core systems still have a rather old kernel (Linux
localhost.localdomain 4.4.0-59-generic #80-Ubuntu SMP Fri Jan 6 17:47:47
UTC 2017 x86_64 x86_64 x86_64 GNU/Linux), for contrast at the same time
we have -67 kernel on classic systems.

Until this is resolved the test needs to exclude such systems.

Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>

@zyga zyga removed the Blocked label Mar 16, 2017

Contributor

tyhicks commented Mar 16, 2017

Thanks for trying to remove the capability sys_ptrace, rule. I'm unable to reproduce the denial here by simply using readlink(1) but the denial that you're seeing speaks for itself. This PR still gets my ack with the inclusion of that rule.

Contributor

zyga commented Mar 16, 2017

Failures in adt are caused by the -66 kernel (the fix is in -67)

@niemeyer niemeyer merged commit a0f74cd into snapcore:master Mar 16, 2017

2 of 6 checks passed

xenial-amd64 autopkgtest finished (failure)
Details
xenial-i386 autopkgtest finished (failure)
Details
xenial-ppc64el autopkgtest finished (failure)
Details
yakkety-amd64 autopkgtest finished (failure)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
zesty-amd64 autopkgtest finished (success)
Details

@zyga zyga deleted the zyga:reassociate-fix branch Mar 16, 2017

Contributor

jdstrand commented Mar 23, 2017

@zyga - this PR needs to be reverted because the required kernel patches have been reverted. If 2.24 is released with this commit, we will see the OOPSes from https://bugs.launchpad.net/apparmor/+bug/1656121 in production.

Contributor

zyga commented Mar 24, 2017

@jdstrand a full revert will be messy (lots of patches) but we can disable this easily via: https://github.com/snapcore/snapd/pull/3076/files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment