interfaces: disable "mknod |N" in the default seccomp template again #3397

Merged
merged 1 commit into from May 31, 2017

Conversation

Projects
None yet
3 participants
Collaborator

mvo5 commented May 24, 2017

This partially reverts a previous commit to allow mknod/mknodat
for regular files, pipes and sockets (and not block or char
devices).

The reason for the revert is described in:
https://forum.snapcraft.io/t/snapd-2-25-blocked-because-of-possible-revert-race-condition

In a nutshell there is a race-condition when doing the following:
$ snap refresh core; snap install daemon; snap revert core; reboot

when the system reboots the seccomp profile will contain symbols
(the |N syntax) that snap-confine from the old core does not
understands. Snapd will rewrite the seccomp profile on startup,
however there is no gurantee currently that this rewrite will
happen before the installed software "daemon" starts which makes
this racy and we see real bugs because of this.

So we revert the new syntax for now to unblock 2.25+ and work
on a plan to properly fix this.

Disable "mknod |N" in the default seccomp template again
This partially reverts a previous commit to allow mknod/mknodat
for regular files, pipes and sockets (and not block or char
devices).

The reason for the revert is described in:
https://forum.snapcraft.io/t/snapd-2-25-blocked-because-of-possible-revert-race-condition

In a nutshell there is a race-condition when doing the following:
$ snap refresh core; snap install daemon; snap revert core; reboot

when the system reboots the seccomp profile will contain symbols
(the |N syntax) that snap-confine from the old core does not
understands. Snapd will rewrite the seccomp profile on startup,
however there is no gurantee currently that this rewrite will
happen before the installed software "daemon" starts which makes
this racy and we see real bugs because of this.

So we revert the new syntax for now to unblock 2.25+ and work
on a plan to properly fix this.

@mvo5 mvo5 requested a review from jdstrand May 24, 2017

Contributor

jdstrand commented May 24, 2017

While I understand the motivation for this PR (indeed, I extensively commented in the forum), I'm not thrilled with it. The bug is not in the seccomp policy, it is the fact that at some point snappy stopped requiring that services start after snapd has started (this of course also affects non-daemon commands, but we tend to not see problems with those cause snapd tends to have enough time to start before the user logs in and starts a command). This bug has been present for all of series 16 AIUI. I worry that we are going to revert this patch and the policy won't be added back in a timely fashion, like the last time before we had re-exec for snap-confine.

To put another way, the CE team only now found a possible issue that was always there. Even if we apply this PR and revert the mknod policy changes, if someone reverts to an earlier core where snap-confine doesn't understand something that policy, then the same potential issue still exists.

If this PR is addressing only theoretical issues, -1. If this is a real issue that user's are seeing in the field +0. I really want to see proper fixes for policy generation, but it is hard to see the path forward since we can't fix old core's that exhibit the broken behavior.

Contributor

niemeyer commented May 25, 2017

Perhaps I misunderstand, but isn't the consequence that software on the device will be completely broken if the core snap rolls back? If so, how's that theoretical? If not, then indeed I misunderstand.

Contributor

jdstrand commented May 25, 2017

@niemeyer - AIUI, the software is only broken until snapd starts after reboot and regenerates the security profiles. Since systemd will try to restart daemons and since login sessions have a strong tendency to start after snapd, this shouldn't normally be something people will see.

Importantly, my comments are based on the fact that the forum topic said 'possible issue' instead of 'regression' or 'observed behavior' since this issue is not new to 2.25. If there are real people affected by this and it's easily reproducible, that's a different thing (though I'd still rather see the root cause addressed rather than the symptom, hence the +0 in this case).

Collaborator

mvo5 commented May 29, 2017

@jdstrand It is unfortunately not theoretical, our CE team has hit a bug in real-world product testing that is based on the race condition described in the forum post. UPDATE: A reproducer is available in #3348.

Contrary to my previous comment in this PR, after discussing on forum, IRC and hangouts, I'm convinced the best immediate term solution is to revert the mknod policy.

+1 on this PR

@mvo5 mvo5 merged commit 5f43bf1 into snapcore:master May 31, 2017

6 of 7 checks passed

zesty-amd64 autopkgtest finished (failure)
Details
artful-amd64 autopkgtest finished (success)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
xenial-amd64 autopkgtest finished (success)
Details
xenial-i386 autopkgtest finished (success)
Details
xenial-ppc64el autopkgtest finished (success)
Details
yakkety-amd64 autopkgtest finished (success)
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment