Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
cmd/snap-confine: put processes into freezer hierarchy #3973
Conversation
codecov-io
commented
Sep 27, 2017
•
Codecov Report
@@ Coverage Diff @@
## master #3973 +/- ##
=========================================
- Coverage 75.97% 75.8% -0.17%
=========================================
Files 423 424 +1
Lines 36505 36580 +75
=========================================
- Hits 27734 27729 -5
- Misses 6833 6905 +72
- Partials 1938 1946 +8
Continue to review full report at Codecov.
|
jhenstridge
reviewed
Sep 27, 2017
Looks good. I've left a few comments inline.
So am I understanding things correctly in that you're using the freezer subsystem because it is unlikely that these processes will otherwise be added to a cgroup in this subsystem, so probably won't interferer?
It also looks like we already recommend (require?) people configure snappy kernels with CONFIG_CGROUP_FREEZER, so this shouldn't break existing systems.
| + | ||
| +#include "../libsnap-confine-private/cleanup-funcs.h" | ||
| +#include "../libsnap-confine-private/string-utils.h" | ||
| +#include "../libsnap-confine-private/utils.h" |
jhenstridge
Sep 27, 2017
Contributor
Can't you just include these directly with e.g. #include "cleanup-funcs.h", etc?
I assume this code lived in cmd/snap-confine in an older version of the branch.
| + install_local test-snapd-sh | ||
| +execute: | | ||
| + test-snapd-sh -c 'sleep 1000' & | ||
| + pid=$! |
jhenstridge
Sep 27, 2017
Contributor
Maybe it would be worth starting two instances of the command in the test? That would give coverage for adding a process to an existing cgroup.
zyga
Sep 28, 2017
Contributor
Done, it took me forever to get right (due to wait and lack of exec sleep) but I figured it out and now it works :)
zyga
added some commits
Jun 24, 2017
jhenstridge
approved these changes
Sep 28, 2017
Looks good from my perspective. But it really needs someone with more cgroups and/or security experience to look over it. @jdstrand maybe?
zyga
requested a review
from
jdstrand
Sep 28, 2017
|
Thanks, I requested @jdstrand to review as well. |
zyga
requested a review
from
mvo5
Sep 28, 2017
mvo5
reviewed
Sep 28, 2017
Looks good, thanks a lot for working on this. AIUI tihs is the first step towards fixing the stale namespace issue we are having. Just one tiny suggestion about avoiding the "sleep" in the test.
| + # Start a "sleep" process in the background | ||
| + test-snapd-sh -c 'exec sleep 1h' & | ||
| + pid1=$! | ||
| + # Give snap-confine a moment to start and perform its task. |
mvo5
Sep 28, 2017
Collaborator
Lets write a stamp file in the sh -c, this way we can avoid the sleep and just busy-loop and wait for the file to appear.
zyga
Sep 28, 2017
Contributor
Aye, I'll change this in the morning, thank you for the suggestion.
Something like this?
touch task-1.stamp
exec sleep 1h
| + # control group. | ||
| + test-snapd-sh -c 'exec sleep 1h' & | ||
| + pid2=$! | ||
| + sleep 3 |
| - sleep 3 | ||
| + # Ensure that snap-confine has finished its task and that the snap process | ||
| + # is active. Note that we don't want to wait forever either. | ||
| + for i in $(seq 30); do |
mvo5
Sep 29, 2017
Collaborator
(nitpick^2) You could make this a sh function and reuse it in both places. But fine, thanks for this update.
jdstrand
requested changes
Oct 2, 2017
The overall concept is sane: adding the pids of snap commands to a cgroup freezer common to the snap causes the pids to be added to the /sys/fs/cgroup/freezer/snap.$snap_name/tasks file and the kernel manages adding any tasks that those commands spawn and removing them from the file as they end. This is confirmed by https://www.kernel.org/doc/Documentation/cgroup-v1/freezer-subsystem.txt and blackbox testing (referencing that link somewhere would be nice).
Comments are all for adding comments with one extra check for write().
| + if (mkdirat(cgroup_fd, buf, 0755) < 0 && errno != EEXIST) { | ||
| + die("cannot create freezer cgroup hierarchy for snap %s", | ||
| + snap_name); | ||
| + } |
jdstrand
Oct 2, 2017
Contributor
Not that it is a problem, but in udev-support.c we use mkdir() instead of mkdirat(). I would prefer we are consistent when working with cgroups (this is not a blocker; you could either adjust the PR or use a followup PR for udev-support.c).
| + die("cannot open tasks file for freezer cgroup hierarchy for snap %s", snap_name); | ||
| + } | ||
| + // Write the process (task) number to the tasks file. | ||
| + int n = sc_must_snprintf(buf, sizeof buf, "%ld", (long)pid); |
jdstrand
Oct 2, 2017
Contributor
Per include/linux/threads.h from kernel source:
/*
* A maximum of 4 million PIDs should be enough for a while.
* [NOTE: PID/TIDs are limited to 2^29 ~= 500+ million, see futex.h.]
*/
As such, the cast to long is fine since long is guaranteed by the C standard to be at least 4 bytes. Can you add a comment referencing linux/threads.h?
zyga
Oct 3, 2017
•
Contributor
Done, though I didn't reference linux/threads.h as I could not find the right file on my system. EDIT: Found it :-)
| + } | ||
| + // Write the process (task) number to the tasks file. | ||
| + int n = sc_must_snprintf(buf, sizeof buf, "%ld", (long)pid); | ||
| + if (write(tasks_fd, buf, n) < 0) { |
jdstrand
Oct 2, 2017
Contributor
write() can return < n which would also be an error. In the context of the sysfs, this is likely a kernel error, but should still check for it.
| +#include "error.h" | ||
| + | ||
| +/** | ||
| + * Join the freezer cgroup of the given snap. |
| + * Join the freezer cgroup of the given snap. | ||
| + * | ||
| + * This function adds the specified task to the freezer cgroup specific to the | ||
| + * given snap. The name of the cgroup is "snap.$snap_name". |
jdstrand
Oct 2, 2017
Contributor
Like the suggested comment in the apparmor profile, it probably makes sense to say why you are creating this somewhere since you won't be using freeze/thaw. Not sure the best place... perhaps here or perhaps in snap-confine.c where you call this function. Feel free to put it wherever it makes the most sense to you.
zyga
Oct 3, 2017
•
Contributor
I placed it here in. I've also placed a smaller note next to the call site.
| + # cgroup: freezer | ||
| + /sys/fs/cgroup/freezer/ r, | ||
| + /sys/fs/cgroup/freezer/snap.*/ w, | ||
| + /sys/fs/cgroup/freezer/snap.*/tasks w, |
jdstrand
Oct 2, 2017
Contributor
Since you aren't going to be using the freezer to actually freeze/thaw tasks, I think you should add a comment here stating that. Eg:
# Allow creating per-snap cgroup freezers and adding snap command (task)
# invocations to the freezer. This allows for reliably enumerating all
# running tasks for the snap.
|
Thank you for the review Jamie. I'll address all the points first thing tomorrow! |
zyga
added some commits
Oct 3, 2017
| + // limited to 2^29 so a long int is enough to represent it. | ||
| + // See include/linux/threads.h in the kernel source tree for details. | ||
| + int n = sc_must_snprintf(buf, sizeof buf, "%ld", (long)pid); | ||
| + if (write(tasks_fd, buf, n) < n) { |
jdstrand
Oct 3, 2017
Contributor
Not a blocker: if you wanted, you could tease out the difference for error messages if check for <0 && errno vs <n.
zyga commentedSep 27, 2017
•
Edited 1 time
-
zyga
Sep 27, 2017
This patch makes snap-confine move each started snap process into a
freezer cgroup hierarchy called "snap.$SNAP_NAME". This allows for
reliable enumeration of all processes belonging to a given snap.
Reliable enumeration will be required by the upcoming base snap
invalidation feature, where preserved mount namespaces that are not
using the current revision of the base snap and have no processes, can
be and are discarded.
We cannot rely on the per-snap flock(2)-lock file since existing
processes do not need to acquire that lock to fork. While a simple
flock-based approach can reliably block new apps/hook processes from
starting it cannot stop processes from freely forking or exiting without
a race condition. A malicious (or just unlucky) process could repeatedly
fork and exit and could isolate itself from changes to the snap mount
namespace.
Subsequent patches will build upon this feature to detect when a mount
namespace is stale and vacant and can be discarded and re-built.
Signed-off-by: Zygmunt Krynicki zygmunt.krynicki@canonical.com