Add support module for namespace sharing #126

Merged
merged 35 commits into from Sep 12, 2016

Conversation

Projects
None yet
3 participants
Collaborator

zyga commented Sep 6, 2016

This patch adds a module that implements mount namespace sharing. It is
not yet used by snap-confine. The module implements a set of library
functions that can perform the necessary magic.

As the topic is a bit unexpected a brief description follows.

Linux namespaces can be created with the unshare(2) call. They are
represented as files in the nsfs filesystem under /proc/$pid/ns/. The
mount namespace is represented as /proc/$pid/ns/mnt. An open file
descriptor to a file like that can be used along with setns(2) to move a
thread to a different namespace.

While namespaces are normally bound to the lifetime of a given process
they can be preserved by bind-mounting the appropriate namespace file to
another location. This feature serves as the basis for the namespace
sharing feature.

The mount namespace is a little bit special as it requires some
additional things to be in place before the bind mount can happen. All
violations of the rules listed below results in mount failing with
EINVAL.

In order to preserve the mount namespace the calling process must be in
a different mount namespace (snap-confine just uses the original namespace in
which it executes). This can be achieved by forking a child process that
can see its parent mount namespace file in /proc/$ppid/ns/mnt and having
the parent process move to a new namespace by calling unshare(). The
destination of the bind mount must be on a filesystem that is mounted
without any peers (in the sense of shared subtrees). This can be checked
by inspecting and parsing the /proc/self/mountinfo file. The new module
includes support function that bind mounts the target directory over
itself and the converts it to a private mount.

Actual mount namespaces are kept in /run/snapd/ns (this is also the
directory that gets converted to a private mount). The actual namespaces
are put in "groups" with names identical to the snap name. In practice
the preserved namespaces are in /run/snapd/ns/$SNAP_NAME.mnt

In order to make everything race-free the library uses locking based on
flock(2). There are two lock files. One global, required to make
/run/snapd/ns a private mount, and one local, required to manipulate a
particular mount namespace. The locks are /run/snapd/ns/.lock and
/run/snaps/ns/$SNAP_NAME.lock respectively.

Everything is coded defensively, terminating the process in case
something bad happens. The child process of snap-confine is using
prctl() to ensure it gets killed when the parent process dies for any
reason.

One notable thing that is not present in this patch is the adjusted
apparmor profile for snap-confine and for its child process (it switches
hats to a new profile for added paranoia and security). It will be
presented along with a, hopefully small, patch that enables namespace
sharing in practice.

Signed-off-by: Zygmunt Krynicki zygmunt.krynicki@canonical.com

Add support module for namespace sharing
This patch adds a module that implements mount namespace sharing. It is
not yet used by snap-confine. The module implements a set of library
functions that can perform the necessary magic.

As the topic is a bit unexpected a brief description follows.

Linux namespaces can be created with the unshare(2) call. They are
represented as files in the nsfs filesystem under /proc/$pid/ns/. The
mount namespace is represented as /proc/$pid/ns/mnt. An open file
descriptor to a file like that can be used along with setns(2) to move a
thread to a different namespace.

While namespaces are normally bound to the lifetime of a given process
they can be preserved by bind-mounting the appropriate namespace file to
another location. This feature serves as the basis for the namespace
sharing feature.

The mount namespace is a little bit special as it requires some
additional things to be in place before the bind mount can happen. All
violations of the rules listed below results in mount failing with
EINVAL.

In order to preserve the mount namespace the calling process must be in
a different namespace (snap-confine just uses the original namespace in
which it executes). This can be achieved by forking a child process that
can see its parent mount namespace file in /proc/$ppid/ns/mnt and having
the parent process moves to a new namespace by calling unshare(). The
destination of the bind mount must be on a filesystem that is mounted
without any peers (in the sense of shared subtrees). This can be checked
by inspecting and parsing the /proc/self/mountinfo file. The new module
includes support function that bind mounts the target directory over
itself and the converts it to a private mount.

Actual mount namespaces are kept in /run/snapd/ns (this is also the
directory that gets converted to a private mount). The actual namespaces
are put in "groups" with names identical to the snap name. In practice
the preserved namespaces are in /run/snapd/ns/$SNAP_NAME.mnt

In order to make everything race-free the library uses locking based on
flock(2). There are two lock files. One global, required to make
/run/snapd/ns a private mount, and one local, required to manipulate a
particular mount namespace. The locks are /run/snapd/ns/.lock and
/run/snaps/ns/$SNAP_NAME.lock respectively.

Everything is coded defensively, terminating the process in case
something bad happens. The child process of snap-confine is using
prctl() to ensure it gets killed when the parent process dies for any
reason.

One notable thing that is *not* present in this patch is the adjusted
apparmor profile for snap-confine and for its child process (it switches
hats to a new profile for added paranoia and security). It will be
presented along with a, hopefully small, patch that enables namespace
sharing in practice.

Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Contributor

jdstrand commented Sep 6, 2016

I spent some time looking at this today and will comment inline. I can say that I may have further comments once the PR for using these functions is in place. There is a lot of 'you should only do this if that' language in the header that I think makes this module somewhat error prone to use. The header does do a good job of explaining each function though. Perhaps it should at the top also give steps on the ordering of how the functions are intended to be used?

docs/snap-confine.rst
+
+As a part of application startup `snap-confine` will move the process to a new
+mount namespace. Since version 1.0.41 all the applications belonging to a given
+snap will share the mount namespace amongst them.
@jdstrand

jdstrand Sep 6, 2016

Contributor

Perhaps: "As of version 1.0.41 all the applications from the same snap will share the same mount namespace. Applications from different snaps continue to use separate mount namespaces."

@zyga

zyga Sep 8, 2016

Collaborator

Done

docs/snap-confine.rst
@@ -126,6 +133,18 @@ FILES
Description of the seccomp profile.
+`/run/snapd/ns/.lock`:
+
+ Global lock acquired to initialize namespace sharing feature
@jdstrand

jdstrand Sep 6, 2016

Contributor

FYI, I found the .lock file under-documented. I'm not sure where it would be best to document it, but it wasn't part of the google doc spec and I had to put it together myself why it was needed.

@zyga

zyga Sep 8, 2016

Collaborator

I've improved the documentation, have a look of that is clear now.

src/ns-support-test.c
+ g_test_queue_destroy((GDestroyNotify) sc_set_ns_dir, SC_NS_DIR);
+ g_test_queue_free(ns_dir);
+ sc_set_ns_dir(ns_dir);
+ // TODO: queue something that rm -rf's the directory tree
@jdstrand

jdstrand Sep 6, 2016

Contributor

Can you implement this?

@zyga

zyga Sep 6, 2016

Collaborator

Yes, I just found glib lacking in performing this thing in one, nice call.

@zyga

zyga Sep 8, 2016

Collaborator

Done

+ g_assert_cmpint(group->child, ==, 0);
+ g_assert_cmpint(group->should_populate, ==, false);
+ g_assert_null(group->name);
+}
@jdstrand

jdstrand Sep 6, 2016

Contributor

Nice test :)

src/ns-support-test.c
+ sc_unlock_ns_mutex(group);
+ // Re-attempt the locking operation. This time it shoud succeed.
+ err = flock(lock_fd, LOCK_EX | LOCK_NB);
+ g_assert_cmpint(err, ==, 0);
@jdstrand

jdstrand Sep 6, 2016

Contributor

For cleanliness, shouldn't we LOCK_UN after this assert?

@zyga

zyga Sep 6, 2016

Collaborator

Closing the file descriptor unlocks the lock. We could do it but at some point there's no need to do it.

src/ns-support.c
+#include <sys/prctl.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <sys/types.h>
@jdstrand

jdstrand Sep 6, 2016

Contributor

<sys/types.h> is specified twice.

@zyga

zyga Sep 8, 2016

Collaborator

Fixed

src/ns-support.c
+/**
+ * Effective value of SC_NS_DIR.
+ *
+ * This is only altered for testing.
@jdstrand

jdstrand Sep 6, 2016

Contributor

Perhaps change this to "We use 'const char *' so we can update sc_ns_dir in the testsuite"

@zyga

zyga Sep 8, 2016

Collaborator

Done

src/ns-support.c
+//
+// That is, it cannot be shared with any other peer as defined by kernel
+// documentation listed here:
+// https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt
@jdstrand

jdstrand Sep 6, 2016

Contributor

I think this could be worded a bit better. Eg:

// Read /proc/self/mountinfo and check if /run/snapd/ns is a private bind mount.
//
// We do this because /run/snapd/ns cannot be shared with any other peers as per:
// https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt

Also, why // here and /** up above?

@zyga

zyga Sep 6, 2016

Collaborator

I moved from vm to vm and I used misconfigure vim. I'll update the docs to match.

@zyga

zyga Sep 8, 2016

Collaborator

Done

src/ns-support.c
+ mkpath(sc_ns_dir);
+ debug("opening namespace group directory %s", sc_ns_dir);
+ dir_fd = open(sc_ns_dir, O_DIRECTORY | O_PATH | O_CLOEXEC | O_NOFOLLOW);
+ if (dir_fd == -1) {
@jdstrand

jdstrand Sep 6, 2016

Contributor

I tend to prefer < 0 checks instead of == -1. I won't block if this stays as is. Note, you use == -1 in quite a few places.

@zyga

zyga Sep 6, 2016

Collaborator

I realised I made this mistake just before pushing this. I was on edge between < 0 and -1 and I think I'll go back to < 0 as they cannot be mistyped to = -1 and make sense in any way.

@zyga

zyga Sep 8, 2016

Collaborator

Fixed all of them to be < 0

src/ns-support.c
+
+void sc_initialize_ns_groups()
+{
+ int dir_fd __attribute__ ((cleanup(sc_cleanup_close))) = -1;
@jdstrand

jdstrand Sep 6, 2016

Contributor

Moving this down above dir_fd = open(... would be slightly more clear.

@zyga

zyga Sep 8, 2016

Collaborator

Done

src/ns-support.c
+ }
+}
+
+static bool sc_is_ns_group_dir_private()
@jdstrand

jdstrand Sep 6, 2016

Contributor

I would probably prefer this to be declared above sc_initialize_ns_groups(), but I won't block on it.

@zyga

zyga Sep 8, 2016

Collaborator

Done

+ debug("unlocking the namespace group directory");
+ if (flock(lock_fd, LOCK_UN) < 0) {
+ die("cannot release lock for namespace control directory");
+ }
@jdstrand

jdstrand Sep 6, 2016

Contributor

Perhaps a comment stating that the caller must call sc_close_ns_group() to close dir_fd and lock_fd would be clearer.

@zyga

zyga Sep 8, 2016

Collaborator

Hmm? Wait, this is not the same lock file. This is the global lock file that only this function ever touches.

src/ns-support.c
+// https://www.kernel.org/doc/Documentation/filesystems/sharedsubtree.txt
+static bool sc_is_ns_group_dir_private();
+
+void sc_initialize_ns_groups()
@jdstrand

jdstrand Sep 6, 2016

Contributor

I'm not sure there is anything to be done about it, but is seems that sc_initialize_ns_groups() is going to be called on every app invocation, which means every invocation will try to create sc_ns_dir, create the lock file and parse the mountinfo table. Since /run is tmpfs we don't have to worry about media wear and performance should be ok, plus this only has to be done when the app is launched, not for every fork/exec the app does, but thought I'd at least point it out explicitly.

@zyga

zyga Sep 6, 2016

Collaborator

I'm aware of this and I was wondering if there's any way that we can avoid this without introducing some kind of raciness. I guess we could do it in a job that runs before we mount, say, the core snap, but at some point this becomes a bit distant and magic and I was worried that the essential property for being able to preserve the mount namespace would get lost.

src/ns-support.c
+ if (strcmp(mount_dir, sc_ns_dir) == 0
+ && strcmp(optional_fields, "") == 0) {
+ // If the /run/snapd/ns directory is mounted and has no optional
+ // fields (i.e. the shared:xxx is not there) then we are all set.
@jdstrand

jdstrand Sep 6, 2016

Contributor

Can have more than just 'shared:xxx'. Perhaps say: // If /run/snapd/ns has no optional fields, we know it is mounted private and there is nothing else to do.

@zyga

zyga Sep 8, 2016

Collaborator

Done

+ const char *optional_fields =
+ mountinfo_entry_optional_fields(entry);
+ if (strcmp(mount_dir, sc_ns_dir) == 0
+ && strcmp(optional_fields, "") == 0) {
@jdstrand

jdstrand Sep 6, 2016

Contributor

This strcmp() is perfectly correct for today's parse_mountinfo_entry(), but if parse_mountinfo_entry() ever changes, we'll have a bug here. Can you add a comment to parse_mountinfo_entry() in mountinfo.c circa line 238 that we 'strcpy(entry->optional_fields, "");' to guarantee consumers have a non-null string?

@zyga

zyga Sep 6, 2016

Collaborator

Yep, thanks for pointing this out

@zyga

zyga Sep 8, 2016

Collaborator

Done

+ int lock_fd;
+ // Descriptor to an eventfd that is used to notify the child that it can
+ // now complete its job and exit.
+ int event_fd;
@jdstrand

jdstrand Sep 6, 2016

Contributor

I was surprised to see event_fd based on the spec, but I don't think it is a problem.

@zyga

zyga Sep 6, 2016

Collaborator

It is easier than a pipe and more reliable (race free) than a signal handler.

+ int event_fd;
+ // Identifier of the child process that is used during the one-time (per
+ // group) initialization and capture process.
+ pid_t child;
@jdstrand

jdstrand Sep 6, 2016

Contributor

I know that you calloc'd this and so child is already zero, but an explicit assignment with a comment that you check for non-zero in other functions to verify the mount namespace capture process' status would make this a bit clearer.

@zyga

zyga Sep 8, 2016

Collaborator

Done

+ int lock_fd __attribute__ ((cleanup(sc_cleanup_close))) = -1;
+ lock_fd = openat(dir_fd,
+ SC_NS_LOCK_FILE,
+ O_CREAT | O_RDWR | O_CLOEXEC | O_NOFOLLOW, 0600);
@tyhicks

tyhicks Sep 6, 2016

Collaborator

On 09/06/2016 04:17 PM, jdstrand wrote:

In src/ns-support.c
#126 (comment):

+void sc_initialize_ns_groups()
+{

  • int dir_fd attribute ((cleanup(sc_cleanup_close))) = -1;
  • debug("creating namespace group directory %s", sc_ns_dir);
  • mkpath(sc_ns_dir);
  • debug("opening namespace group directory %s", sc_ns_dir);
  • dir_fd = open(sc_ns_dir, O_DIRECTORY | O_PATH | O_CLOEXEC | O_NOFOLLOW);
  • if (dir_fd == -1) {
  •    die("cannot open namespace group directory");
    
  • }
  • debug("opening lock file for group directory");
  • int lock_fd attribute ((cleanup(sc_cleanup_close))) = -1;
  • lock_fd = openat(dir_fd,
  •         SC_NS_LOCK_FILE,
    
  •         O_CREAT | O_RDWR | O_CLOEXEC | O_NOFOLLOW, 0600);
    

Please also add O_EXCL to this openat() call (at which point, you don't
need to specify O_NOFOLLOW).

I don't think that a lockfile should be opened with O_EXCL. Requiring
O_EXCL defeats the purpose of the lockfile since the exclusivity
provided by the lockfile is enforced by flock() once you've opened the file.

(Note that I have not reviewed the code in this PR at all but saw this
review comment fly by and thought I'd comment on it)

@jdstrand

jdstrand Sep 6, 2016

Contributor

Heh, of course you are right there. :) I've deleted those two comments.

src/ns-support.c
+ if (group->dir_fd == -1) {
+ die("cannot open directory for namespace group %s", group_name);
+ }
+ char lock_fname[PATH_MAX];
@jdstrand

jdstrand Sep 6, 2016

Contributor

I fear we may be in an 'undefined' area here. The actual full path is sc_ns_dir + '/' + lock_fname so I'm not sure what side-effects there might be if lock_name actually was PATH_MAX - 1. Really this need only be the maximum of SNAP_NAME + .lock + '\0', but I realize we don't want to be that tightly coupled with snapd's definition of how long SNAP_NAME can be. Perhaps define this length as PATH_MAX - length of sc_ns_dir + '/' + '\0'?

+ group->name);
+ close(group->dir_fd);
+ close(group->lock_fd);
+ close(group->event_fd);
@jdstrand

jdstrand Sep 6, 2016

Contributor

Might be nice to check if dir_fd, lock_fd and event_fd are all '>= 0' here. I like to also check the return code of close(), but the dir_fd and the lock_fd aren't being written to and event_fd is only written to trigger an event, so not checking should be safe.

@zyga

zyga Sep 8, 2016

Collaborator

They don't have to be >= though, e.g event_fd is only used if required.

+}
+
+void sc_lock_ns_mutex(struct sc_ns_group *group)
+{
@jdstrand

jdstrand Sep 6, 2016

Contributor

It would be nice if sc_lock_ns_mutex() and sc_unlock_ns_mutex() both checked if lock_fd was '>=0'.

@zyga

zyga Sep 8, 2016

Collaborator

Done, with unit tests as well

src/ns-support.c
+ int mnt_fd __attribute__ ((cleanup(sc_cleanup_close))) = -1;
+ mnt_fd =
+ openat(group->dir_fd, mnt_fname,
+ O_CREAT | O_RDONLY | O_CLOEXEC | O_NOFOLLOW, 0600);
@jdstrand

jdstrand Sep 6, 2016

Contributor

Can you add a comment here on why you are not using O_EXCL? AIUI, you want to either open the nsfs mounted file or open the newly created file. It would be good to document that for auditability.

@zyga

zyga Sep 6, 2016

Collaborator

Yes, you are right both in the deduction of the intent and the suggestion to document this better.

@zyga

zyga Sep 8, 2016

Collaborator

Documented

src/ns-support.c
+ // killed before it managed to complete the process.
+ if (errno != EINVAL) {
+ die("cannot re-associate the mount namespace with namespace group %s", group->name);
+ }
@jdstrand

jdstrand Sep 6, 2016

Contributor

Per the setns() man page, EINVAL might be returned for a bunch of reasons. The comment should be updated and the code reviewed to make sure the assumptions regarding EINVAL are still valid.

@zyga

zyga Sep 8, 2016

Collaborator

You are right. I plan to change this a little to call fstatfs() to determine if the opened file descriptor belongs to nsfs or not. This will let us predictably die() when anything goes wrong.

@jdstrand

jdstrand Sep 8, 2016

Contributor

This sounds like a good idea.

src/ns-support.c
+ if (pid == 0) {
+ // This is the child process which will capture the mount namespace.
+ //
+ // It will do so by bind-mounting it the SC_NS_MNT_FILE after
@jdstrand

jdstrand Sep 6, 2016

Contributor

s/it//

@zyga

zyga Sep 8, 2016

Collaborator

Fixed

+ ("changing apparmor hat of the support process for mount namespace capture");
+ if (aa_change_hat("mount-namespace-capture-helper", 0) < 0) {
+ die("cannot change apparmor hat of the support process for mount namespace capture");
+ }
@jdstrand

jdstrand Sep 6, 2016

Contributor

Why aa_change_hat() instead of aa_change_profile()? This seems like a one way operation for the child so aa_change_profile() may be better. Also, I think you can drop -helper since it implies a helper binary (at least to me anyway).

@zyga

zyga Sep 8, 2016

Collaborator

No particular reason, the profile name looked nice.

What should be the profile name here? snap-confine//mount-namespace-capture or just mount-namespace-capture?

@jdstrand

jdstrand Sep 8, 2016

Contributor

Depends on the policy. I suggest you use a child profile so it is clear this is for snap-confine. Eg:

@LIBEXECDIR@/snap-confine (attach_disconnected) {
    ...
    profile mount-namespace-capture (attach_disconnected) {
        ...
    }
}

With this you would use @LIBEXECDIR@/snap-confine//mount-namespace-capture.

@zyga

zyga Sep 12, 2016

Collaborator

I will follow up with a small branch that does this. Thanks

src/ns-support.c
+ }
+ debug("forking support process for mount namespace capture");
+ pid_t pid = fork();
+ debug("forked support process has pid %d", (int)pid);
@jdstrand

jdstrand Sep 6, 2016

Contributor

A comment on how casting pid to int is ok with glibc would be good here.

@zyga

zyga Sep 8, 2016

Collaborator

Done

src/ns-support.c
+ if (prctl(PR_SET_PDEATHSIG, SIGINT, 0, 0, 0) < 0) {
+ die("cannot set parent process death notification signal to SIGINT");
+ }
+ if (fchdir(group->dir_fd) < -1) {
@jdstrand

jdstrand Sep 6, 2016

Contributor

This should be '< 0'.

@zyga

zyga Sep 8, 2016

Collaborator

Fixed

src/ns-support.c
+ }
+ }
+ // Get back to the original directory
+ if (fchdir(old_dir_fd) < -1) {
@jdstrand

jdstrand Sep 6, 2016

Contributor

This should be '< 0'.

@zyga

zyga Sep 8, 2016

Collaborator

Fixed

src/ns-support.c
+ die("cannot open current directory");
+ }
+ // Move to the mount namespace directory (/run/snapd/ns)
+ if (fchdir(group->dir_fd) < -1) {
@jdstrand

jdstrand Sep 6, 2016

Contributor

This should be '< 0'.

@zyga

zyga Sep 8, 2016

Collaborator

Fixed

src/ns-support.h
+ * used for storing preserved namespaces as bind-mounted files from the nsfs
+ * filesystem (namespace filesystem).
+ *
+ * This function acquires a floc(2)-based lock to ensure that no other instance
@jdstrand

jdstrand Sep 6, 2016

Contributor

Typo: should be flock

@zyga

zyga Sep 8, 2016

Collaborator

Fixed

src/ns-support.h
+ *
+ * This will open and keep file descriptors for /run/snapd/ns/ as well as for
+ * /run/snapd/ns/${group_name}.lock. The lock file is created if necessary but
+ * is not locked in any way.
@jdstrand

jdstrand Sep 6, 2016

Contributor

'is not locked in any way' is confusing since this is a lock file. Perhaps say, 'is not locked until sc_lock_ns_mutex() is called'. Also, it seems a bit odd that the lock file is created here but not locked until another function. Perhaps document why that is?

@zyga

zyga Sep 8, 2016

Collaborator

Fixed

src/ns-support.h
+ *
+ * If the return value is true then at this stage the namespace is already
+ * unshared. The caller should perform any mount operations that are desired
+ * and then proceed to call sc_preserve_ns_group().
@jdstrand

jdstrand Sep 6, 2016

Contributor

sc_preserve_ns_group() is not a thing, I think you mean sc_preserve_populated_ns_group()

Contributor

jdstrand commented Sep 6, 2016

For completeness, please also add spread tests once there are consumers for sc_initialize_ns_groups(), sc_create_or_join_ns_group(), sc_should_populate_ns_group(), sc_preserve_populated_ns_group() and sc_discard_preserved_ns_group().

src/ns-support.h
+ * can be done by calling sc_preserve_populated_ns_group()). In the meantime
+ * the parent process unshares the mount namespace() and sets a flag so that
+ * sc_should_populate_ns_group() returns true.
+ *
@jdstrand

jdstrand Sep 6, 2016

Contributor

There is a space after this asterisk. Can you remove it?

Collaborator

zyga commented Sep 6, 2016

Thank you for the review. It's a bit too late to do this responsibly but I will go over all the feedback and ensure it is addressed. Depending on how you feel I can propose a sister branch that uses this module to implement fully-working mount namespace sharing, along with a few spread tests for the basic property.

+ // task or wait for anything.
+ if (prctl(PR_SET_PDEATHSIG, SIGINT, 0, 0, 0) < 0) {
+ die("cannot set parent process death notification signal to SIGINT");
+ }
@jdstrand

jdstrand Sep 6, 2016

Contributor

This may make sense to move above the aa_change_* call, but I'm curious as to the affect on policy it will have on the parent and child profiles. I'm not terribly worried about the timing of the call-- in the unlikely event the parent dies before this call (which is not attacker controlled), the child will become a zombie and at some point init should reap it so even if the parent is killed before prctl, we should be ok.

@zyga

zyga Sep 8, 2016

Collaborator

Let's postpone changing this until the next pull request where we actually start to use this (and we can see the apparmor profile).

zyga added some commits Sep 7, 2016

Editorial changes to the manual page
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Trim whitespace
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Consistently use < 0 instead of == -1
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Fix typo
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Reword documentation for sc_open_ns_group
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Fix function name in documentation for sc_create_or_join_ns_group
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Fix typo
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Add test for sc_init_ns_groups
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Add tests for sc_is_ns_group_dir_private
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Reformat tests
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Make sc_set_ns_dir() subprocess-aware
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Use g_test_subprocess to and g_set_nonfatal_assertions
This combination lets us run stuff in a subprocess and still clean up
any things that it might be doing (e.g. unmount something).

Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Cleanup temporary data in tests
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Fix grammar
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Document why we're opening the .mnt file without O_EXCL
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Check that we have a lock file in namespace locking fns
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Add missing return in test
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Be explicit about setting child pid to 0
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Comment that optional_fields is never NULL
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Tweak comment
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Move function for better readability
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Move variable declaration closer to use
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Tweak comment and unify comment style
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Tweak comment
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Remove duplicate include
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Document why casting pid_t to int is safe
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
src/ns-support.c
+ // print pid's portably so this is the best we can do.
+ pid_t pid = fork();
+ debug("forked support process has pid %d", (int)pid);
+ if (pid == -1) {
@jdstrand

jdstrand Sep 8, 2016

Contributor

pid < 0?

@zyga

zyga Sep 8, 2016

Collaborator

Fixed

@zyga

zyga Sep 8, 2016

Collaborator

Fixed

src/ns-support.h
@@ -112,7 +112,7 @@ void sc_create_or_join_ns_group(struct sc_ns_group *group);
*
* If the return value is true then at this stage the namespace is already
* unshared. The caller should perform any mount operations that are desired
- * and then proceed to call sc_preserve_ns_group().
+ * and then proceed to call sc_preserve_populated_ns_group().
@jdstrand

jdstrand Sep 8, 2016

Contributor

Typo: sc_should_populate_ns_group()

@zyga

zyga Sep 8, 2016

Collaborator

Hmm?

@jdstrand

jdstrand Sep 8, 2016

Contributor

Sorry, please disregard this. You are correct.

@zyga

zyga Sep 8, 2016

Collaborator

Resolved on IRC

src/ns-support-test.c
+static void rm_rf(const char *dir)
+{
+ gint exit_status = 0;
+ gchar *cmd = g_strdup_printf("rm -rf %s", dir);
@jdstrand

jdstrand Sep 8, 2016

Contributor

Oh, scary. I hope there are no bugs in the test code otherwise the developer might have severe data loss....

@jdstrand

jdstrand Sep 8, 2016

Contributor

Can we do some checks on dir to make sure it is inside the test dir?

@jdstrand

jdstrand Sep 8, 2016

Contributor

Alternatively remove the files we created and them rmdir the directory. If we make rmdir() fail the test, then we can catch stray files added via code changes easily. I'd prefer this approach.

@zyga

zyga Sep 8, 2016

Collaborator

I tried to use the function that doesn't do shell expansion but glib is just annoying here. I'll try again.

@zyga

zyga Sep 12, 2016

Collaborator

I've implemented this a little bit better now. I think it good to merge now.

src/mountinfo.c
@@ -235,6 +235,7 @@ static struct mountinfo_entry *parse_mountinfo_entry(const char *line)
if ((entry->mount_opts = parse_next_string_field()) == NULL)
goto fail;
entry->optional_fields = &entry->line_buf[0] + total_used++;
+ // NOTE: This ensure s that optional_fields is never NULL.
@jdstrand

jdstrand Sep 8, 2016

Contributor

There is a typo here. Perhaps be a little more explicit though. Eg: // NOTE: This ensures that optional_fields is never NULL. If this changes, must adjust all callers of parse_mountinfo_entry() accordingly.

@zyga

zyga Sep 8, 2016

Collaborator

Fixed

Contributor

jdstrand commented Sep 8, 2016

Thanks for all the changes! I think this is ok to merge provided:

  • my feedback from today is addressed (ie, pid < 0, rm_rf, comment changes)
  • your proposed fstatfs() is implemented

Don't feel like you need to implement the apparmor policy changes just yet as you mentioned, but let's not cut a new version of snap-confine until the other bits are in place.

zyga added some commits Sep 8, 2016

Fix function name
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Fix typo
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Reword comment
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Fix typo
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Test for '< 0' instead of '== -1'
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Add sanity check for nsfs
Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
Use fstatfs() and NSFS_MAGIC to decide about setns()
This patch fixes a potentially iffy code that used to open the mount
namespace file /run/snapd/ns/$SNAP_NAME.mnt and then blindly call
sents() on the file descriptor, if the open call succeeded.

Instead, we now fstatfs() the file descriptor and check the resulting
f_type field. If it is NSFS_MAGIC then we know that we should use
setns(). If not we fall back to regular initialization.

All errors from setns() now result in snap-confine death. This should be
more reliable against unexpected bugs.

Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>
src/ns-support.c
- debug
- ("cannot re-associate the mount namespace with namespace group %s, falling back to initialization",
- group->name);
+ debug("initializing new namespace group %s", group->name);
@jdstrand

jdstrand Sep 8, 2016

Contributor

This is soo much nicer, thanks! :)

It's interesting that NSFS_MAGIC is not included in man fstatfs. Note that this was added only in http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/patch/include/uapi/linux/magic.h?id=e149ed2b805fefdccf7ccdfc19eca22fdd4514ac so this will likely need to be added to the backports list if it isn't already. It does seem present in the 14.04 Ubuntu kernel.

@zyga

zyga Sep 12, 2016

Collaborator

I've informed tvoss about this. With the sanity check we'll know if tests are all green. Thanks for making me aware of this :)

+ int err = statfs("/proc/self/ns/mnt", &buf);
+ g_assert_cmpint(err, ==, 0);
+ g_assert_cmpint(buf.f_type, ==, NSFS_MAGIC);
+}
@jdstrand

jdstrand Sep 8, 2016

Contributor

Good thinking! :)

@zyga zyga added the Reviewed label Sep 12, 2016

Add some safeguards to rm_rf_tmp
This patch make the cleanup function a little bit more paranoid by not
expanding shell arguments and by ensuring that what is removed is in
/tmp/.

Signed-off-by: Zygmunt Krynicki <zygmunt.krynicki@canonical.com>

@zyga zyga merged commit 7f07af2 into master Sep 12, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@zyga zyga deleted the ns-sharing branch Sep 12, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment