Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interfaces: disable "mknod |N" in the default seccomp template again #3397

Merged
merged 1 commit into from May 31, 2017

Conversation

mvo5
Copy link
Collaborator

@mvo5 mvo5 commented May 24, 2017

This partially reverts a previous commit to allow mknod/mknodat
for regular files, pipes and sockets (and not block or char
devices).

The reason for the revert is described in:
https://forum.snapcraft.io/t/snapd-2-25-blocked-because-of-possible-revert-race-condition

In a nutshell there is a race-condition when doing the following:
$ snap refresh core; snap install daemon; snap revert core; reboot

when the system reboots the seccomp profile will contain symbols
(the |N syntax) that snap-confine from the old core does not
understands. Snapd will rewrite the seccomp profile on startup,
however there is no gurantee currently that this rewrite will
happen before the installed software "daemon" starts which makes
this racy and we see real bugs because of this.

So we revert the new syntax for now to unblock 2.25+ and work
on a plan to properly fix this.

This partially reverts a previous commit to allow mknod/mknodat
for regular files, pipes and sockets (and not block or char
devices).

The reason for the revert is described in:
https://forum.snapcraft.io/t/snapd-2-25-blocked-because-of-possible-revert-race-condition

In a nutshell there is a race-condition when doing the following:
$ snap refresh core; snap install daemon; snap revert core; reboot

when the system reboots the seccomp profile will contain symbols
(the |N syntax) that snap-confine from the old core does not
understands. Snapd will rewrite the seccomp profile on startup,
however there is no gurantee currently that this rewrite will
happen before the installed software "daemon" starts which makes
this racy and we see real bugs because of this.

So we revert the new syntax for now to unblock 2.25+ and work
on a plan to properly fix this.
@mvo5 mvo5 requested a review from jdstrand May 24, 2017 14:50
@jdstrand
Copy link

While I understand the motivation for this PR (indeed, I extensively commented in the forum), I'm not thrilled with it. The bug is not in the seccomp policy, it is the fact that at some point snappy stopped requiring that services start after snapd has started (this of course also affects non-daemon commands, but we tend to not see problems with those cause snapd tends to have enough time to start before the user logs in and starts a command). This bug has been present for all of series 16 AIUI. I worry that we are going to revert this patch and the policy won't be added back in a timely fashion, like the last time before we had re-exec for snap-confine.

To put another way, the CE team only now found a possible issue that was always there. Even if we apply this PR and revert the mknod policy changes, if someone reverts to an earlier core where snap-confine doesn't understand something that policy, then the same potential issue still exists.

If this PR is addressing only theoretical issues, -1. If this is a real issue that user's are seeing in the field +0. I really want to see proper fixes for policy generation, but it is hard to see the path forward since we can't fix old core's that exhibit the broken behavior.

@niemeyer
Copy link
Contributor

Perhaps I misunderstand, but isn't the consequence that software on the device will be completely broken if the core snap rolls back? If so, how's that theoretical? If not, then indeed I misunderstand.

@jdstrand
Copy link

jdstrand commented May 25, 2017

@niemeyer - AIUI, the software is only broken until snapd starts after reboot and regenerates the security profiles. Since systemd will try to restart daemons and since login sessions have a strong tendency to start after snapd, this shouldn't normally be something people will see.

Importantly, my comments are based on the fact that the forum topic said 'possible issue' instead of 'regression' or 'observed behavior' since this issue is not new to 2.25. If there are real people affected by this and it's easily reproducible, that's a different thing (though I'd still rather see the root cause addressed rather than the symptom, hence the +0 in this case).

@mvo5
Copy link
Collaborator Author

mvo5 commented May 29, 2017

@jdstrand It is unfortunately not theoretical, our CE team has hit a bug in real-world product testing that is based on the race condition described in the forum post. UPDATE: A reproducer is available in #3348.

Copy link

@jdstrand jdstrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contrary to my previous comment in this PR, after discussing on forum, IRC and hangouts, I'm convinced the best immediate term solution is to revert the mknod policy.

+1 on this PR

@mvo5 mvo5 merged commit 5f43bf1 into snapcore:master May 31, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants