cmd/snap-confine: put processes into freezer hierarchy #3973

Merged
merged 7 commits into from Oct 3, 2017

Conversation

Projects
None yet
5 participants
Contributor

zyga commented Sep 27, 2017

This patch makes snap-confine move each started snap process into a
freezer cgroup hierarchy called "snap.$SNAP_NAME". This allows for
reliable enumeration of all processes belonging to a given snap.

Reliable enumeration will be required by the upcoming base snap
invalidation feature, where preserved mount namespaces that are not
using the current revision of the base snap and have no processes, can
be and are discarded.

We cannot rely on the per-snap flock(2)-lock file since existing
processes do not need to acquire that lock to fork. While a simple
flock-based approach can reliably block new apps/hook processes from
starting it cannot stop processes from freely forking or exiting without
a race condition. A malicious (or just unlucky) process could repeatedly
fork and exit and could isolate itself from changes to the snap mount
namespace.

Subsequent patches will build upon this feature to detect when a mount
namespace is stale and vacant and can be discarded and re-built.

Signed-off-by: Zygmunt Krynicki zygmunt.krynicki@canonical.com

codecov-io commented Sep 27, 2017

Codecov Report

Merging #3973 into master will decrease coverage by 0.16%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #3973      +/-   ##
=========================================
- Coverage   75.97%   75.8%   -0.17%     
=========================================
  Files         423     424       +1     
  Lines       36505   36580      +75     
=========================================
- Hits        27734   27729       -5     
- Misses       6833    6905      +72     
- Partials     1938    1946       +8
Impacted Files Coverage Δ
cmd/snap-repair/cmd_run.go 53.7% <0%> (-28.12%) ⬇️
cmd/snap-repair/main.go 44.44% <0%> (-5.56%) ⬇️
cmd/snap-seccomp/main.go 54.4% <0%> (-2.48%) ⬇️
wrappers/desktop.go 72.16% <0%> (-2.31%) ⬇️
snap/info.go 88.94% <0%> (-0.11%) ⬇️
interfaces/builtin/opengl.go 100% <0%> (ø) ⬆️
osutil/group.go 0% <0%> (ø)
dirs/dirs.go 98.11% <0%> (+0.01%) ⬆️
cmd/snap-repair/runner.go 82.67% <0%> (+0.07%) ⬆️
overlord/snapstate/snapstate.go 80.29% <0%> (+0.19%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1d7d79c...9a9efe3. Read the comment docs.

Looks good. I've left a few comments inline.

So am I understanding things correctly in that you're using the freezer subsystem because it is unlikely that these processes will otherwise be added to a cgroup in this subsystem, so probably won't interferer?

It also looks like we already recommend (require?) people configure snappy kernels with CONFIG_CGROUP_FREEZER, so this shouldn't break existing systems.

+
+#include "../libsnap-confine-private/cleanup-funcs.h"
+#include "../libsnap-confine-private/string-utils.h"
+#include "../libsnap-confine-private/utils.h"
@jhenstridge

jhenstridge Sep 27, 2017

Contributor

Can't you just include these directly with e.g. #include "cleanup-funcs.h", etc?

I assume this code lived in cmd/snap-confine in an older version of the branch.

@zyga

zyga Sep 27, 2017

Contributor

Oh, sure. Yes, this is quite old branch, resurrected!

@zyga

zyga Sep 28, 2017

Contributor

Done

tests/main/cgroup-freezer/task.yaml
+ install_local test-snapd-sh
+execute: |
+ test-snapd-sh -c 'sleep 1000' &
+ pid=$!
@jhenstridge

jhenstridge Sep 27, 2017

Contributor

Maybe it would be worth starting two instances of the command in the test? That would give coverage for adding a process to an existing cgroup.

@zyga

zyga Sep 27, 2017

Contributor

Sure, I'll do that.

@zyga

zyga Sep 28, 2017

Contributor

Done, it took me forever to get right (due to wait and lack of exec sleep) but I figured it out and now it works :)

zyga added some commits Jun 24, 2017

cmd/snap-confine: put processes into freezer hierarchy
This patch makes snap-confine move each started snap process into a
freezer cgroup hierarchy called "snap.$SNAP_NAME". This allows for
reliable enumeration of all processes belonging to a given snap.

Reliable enumeration will be required by the upcoming base snap
invalidation feature, where preserved mount namespaces that are not
using the current revision of the base snap *and* have no processes, can
be and are discarded.

We cannot rely on the per-snap flock(2)-lock file since existing
processes do not need to acquire that lock to fork. While a simple
flock-based approach can reliably block new apps/hook processes from
starting it cannot stop processes from freely forking or exiting without
a race condition. A malicious (or just unlucky) process could repeatedly
fork and exit and could isolate itself from changes to the snap mount
namespace.

Subsequent patches will build upon this feature to detect when a mount
namespace is stale and vacant and can be discarded and re-built.

Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
cmd/libsnap: simplify include statements
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
tests: expand the freezer cgroup test
This patch ensures that multiple processes can inhabit a freezer cgroup
as well as that a process can join an existing cgroup. The test code is
also more correct as "sleep 1h &" was changed to "exec sleep 1h &" which
ensures that only one background process is active. The terminated
processes are now also correctly waited for.

Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>

Looks good from my perspective. But it really needs someone with more cgroups and/or security experience to look over it. @jdstrand maybe?

@zyga zyga requested a review from jdstrand Sep 28, 2017

Contributor

zyga commented Sep 28, 2017

Thanks, I requested @jdstrand to review as well.

@zyga zyga requested a review from mvo5 Sep 28, 2017

Looks good, thanks a lot for working on this. AIUI tihs is the first step towards fixing the stale namespace issue we are having. Just one tiny suggestion about avoiding the "sleep" in the test.

tests/main/cgroup-freezer/task.yaml
+ # Start a "sleep" process in the background
+ test-snapd-sh -c 'exec sleep 1h' &
+ pid1=$!
+ # Give snap-confine a moment to start and perform its task.
@mvo5

mvo5 Sep 28, 2017

Collaborator

Lets write a stamp file in the sh -c, this way we can avoid the sleep and just busy-loop and wait for the file to appear.

@zyga

zyga Sep 28, 2017

Contributor

Aye, I'll change this in the morning, thank you for the suggestion.

Something like this?

touch task-1.stamp
exec sleep 1h
@zyga

zyga Sep 29, 2017

Contributor

Done

tests/main/cgroup-freezer/task.yaml
+ # control group.
+ test-snapd-sh -c 'exec sleep 1h' &
+ pid2=$!
+ sleep 3
@mvo5

mvo5 Sep 28, 2017

Collaborator

Same as above.

tests: use capped active waiting and stamp files
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
tests/main/cgroup-freezer/task.yaml
- sleep 3
+ # Ensure that snap-confine has finished its task and that the snap process
+ # is active. Note that we don't want to wait forever either.
+ for i in $(seq 30); do
@mvo5

mvo5 Sep 29, 2017

Collaborator

(nitpick^2) You could make this a sh function and reuse it in both places. But fine, thanks for this update.

@zyga

zyga Sep 29, 2017

Contributor

I think it's fine as is but I'll consider it for the next pass :)

mvo5 approved these changes Sep 29, 2017

The overall concept is sane: adding the pids of snap commands to a cgroup freezer common to the snap causes the pids to be added to the /sys/fs/cgroup/freezer/snap.$snap_name/tasks file and the kernel manages adding any tasks that those commands spawn and removing them from the file as they end. This is confirmed by https://www.kernel.org/doc/Documentation/cgroup-v1/freezer-subsystem.txt and blackbox testing (referencing that link somewhere would be nice).

Comments are all for adding comments with one extra check for write().

+ if (mkdirat(cgroup_fd, buf, 0755) < 0 && errno != EEXIST) {
+ die("cannot create freezer cgroup hierarchy for snap %s",
+ snap_name);
+ }
@jdstrand

jdstrand Oct 2, 2017

Contributor

Not that it is a problem, but in udev-support.c we use mkdir() instead of mkdirat(). I would prefer we are consistent when working with cgroups (this is not a blocker; you could either adjust the PR or use a followup PR for udev-support.c).

@zyga

zyga Oct 3, 2017

Contributor

I'll follow up with a small PR for udev-support.

+ die("cannot open tasks file for freezer cgroup hierarchy for snap %s", snap_name);
+ }
+ // Write the process (task) number to the tasks file.
+ int n = sc_must_snprintf(buf, sizeof buf, "%ld", (long)pid);
@jdstrand

jdstrand Oct 2, 2017

Contributor

Per include/linux/threads.h from kernel source:

/*
 * A maximum of 4 million PIDs should be enough for a while.
 * [NOTE: PID/TIDs are limited to 2^29 ~= 500+ million, see futex.h.]
 */

As such, the cast to long is fine since long is guaranteed by the C standard to be at least 4 bytes. Can you add a comment referencing linux/threads.h?

@zyga

zyga Oct 3, 2017

Contributor

Done, though I didn't reference linux/threads.h as I could not find the right file on my system. EDIT: Found it :-)

+ }
+ // Write the process (task) number to the tasks file.
+ int n = sc_must_snprintf(buf, sizeof buf, "%ld", (long)pid);
+ if (write(tasks_fd, buf, n) < 0) {
@jdstrand

jdstrand Oct 2, 2017

Contributor

write() can return < n which would also be an error. In the context of the sysfs, this is likely a kernel error, but should still check for it.

@zyga

zyga Oct 3, 2017

Contributor

Done

+#include "error.h"
+
+/**
+ * Join the freezer cgroup of the given snap.
@jdstrand

jdstrand Oct 2, 2017

Contributor

s/of/for

@zyga

zyga Oct 3, 2017

Contributor

Done

+ * Join the freezer cgroup of the given snap.
+ *
+ * This function adds the specified task to the freezer cgroup specific to the
+ * given snap. The name of the cgroup is "snap.$snap_name".
@jdstrand

jdstrand Oct 2, 2017

Contributor

Like the suggested comment in the apparmor profile, it probably makes sense to say why you are creating this somewhere since you won't be using freeze/thaw. Not sure the best place... perhaps here or perhaps in snap-confine.c where you call this function. Feel free to put it wherever it makes the most sense to you.

@zyga

zyga Oct 3, 2017

Contributor

I placed it here in. I've also placed a smaller note next to the call site.

+ # cgroup: freezer
+ /sys/fs/cgroup/freezer/ r,
+ /sys/fs/cgroup/freezer/snap.*/ w,
+ /sys/fs/cgroup/freezer/snap.*/tasks w,
@jdstrand

jdstrand Oct 2, 2017

Contributor

Since you aren't going to be using the freezer to actually freeze/thaw tasks, I think you should add a comment here stating that. Eg:

# Allow creating per-snap cgroup freezers and adding snap command (task)
# invocations to the freezer. This allows for reliably enumerating all
# running tasks for the snap.
@zyga

zyga Oct 3, 2017

Contributor

Done

Contributor

zyga commented Oct 2, 2017

Thank you for the review Jamie. I'll address all the points first thing tomorrow!

zyga added some commits Oct 3, 2017

cmd/snap-confine: add extra comments
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
cmd/libsnap: more robust check for failing write
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
cmd/libsnap: link to freeezer cgroup docs
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>

Thanks for making the changes! +1

+ // limited to 2^29 so a long int is enough to represent it.
+ // See include/linux/threads.h in the kernel source tree for details.
+ int n = sc_must_snprintf(buf, sizeof buf, "%ld", (long)pid);
+ if (write(tasks_fd, buf, n) < n) {
@jdstrand

jdstrand Oct 3, 2017

Contributor

Not a blocker: if you wanted, you could tease out the difference for error messages if check for <0 && errno vs <n.

@zyga

zyga Oct 3, 2017

Contributor

I'll pull this into my cleanup branch. Thanks!

@zyga zyga merged commit 5e9ea55 into snapcore:master Oct 3, 2017

7 checks passed

artful-amd64 autopkgtest finished (success)
Details
artful-i386 autopkgtest finished (success)
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
xenial-amd64 autopkgtest finished (success)
Details
xenial-i386 autopkgtest finished (success)
Details
xenial-ppc64el autopkgtest finished (success)
Details
zesty-amd64 autopkgtest finished (success)
Details

@zyga zyga deleted the zyga:feature/use-freezer-cgroup branch Oct 3, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment