Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
cmd/snap-update-ns: create missing mount points automatically. #4008
Conversation
zyga
added some commits
Oct 5, 2017
codecov-io
commented
Oct 5, 2017
•
Codecov Report
@@ Coverage Diff @@
## master #4008 +/- ##
==========================================
+ Coverage 75.8% 75.81% +0.01%
==========================================
Files 434 436 +2
Lines 37342 37465 +123
==========================================
+ Hits 28306 28405 +99
- Misses 7062 7080 +18
- Partials 1974 1980 +6
Continue to review full report at Codecov.
|
| + if !os.IsNotExist(err) { | ||
| + return err | ||
| + } | ||
| + // TODO: use the right mode and ownership. |
| @@ -492,6 +492,9 @@ | ||
| mount options=(rw bind) /var/snap/*/** -> /var/snap/*/**, | ||
| mount options=(ro bind) /var/snap/*/** -> /var/snap/*/**, | ||
| + # Allow creating missing mount directories under $SNAP_DATA | ||
| + /var/snap/*/** rw, |
jdstrand
Oct 5, 2017
Contributor
To limit to directories, use /var/snap/*/**/ w, (do you actually need 'r'?).
zyga
Oct 6, 2017
Contributor
I'll come up with a few spread tests and if we don't need this I will remove this.
| + // may execute concurrently with snap application processes. | ||
| + // | ||
| + // To have better defense we may consider freezing the freezer cgroup | ||
| + // belonging to the snap during the execution of snap-update-ns. |
jdstrand
Oct 5, 2017
Contributor
I don't like the sound of this. If the snap can race snap-update-ns, then the snap is able to create arbitrary directories anywhere the snap-update-ns profile allows, which is already pretty wide and by your own admission, will only get wider.
We can do better than this. Recall openat/mkdirat in sc_nonfatal_mkpath() in cmd/libsnap-confine-private/utils.c
zyga
Oct 6, 2017
Contributor
We can do better wrt mkdir but we cannot ever do better wrt mount itself. I can make the better mkdir for sure but the ultimate race towards mount is unfixable without freezing the snap processes or using some new kernel interface for mounting.
jdstrand
Oct 6, 2017
Contributor
Re mount, yes, we would have to freeze the process since mount() follows symlinks.
| + } | ||
| + if err := osChown(c.Entry.Dir, 0, 0); err != nil { | ||
| + return err | ||
| + } |
jdstrand
Oct 5, 2017
Contributor
This chown is only doing the bottom directory-- if parents were also created, they would not be chowned. It is probably fine to chown all to root:root for now, but if we decide to allow arbitrary ownership, that will get tricky to limit it to only inside the snap and how to declare which directory is supposed to have which ownership (I'm not sure if that is planned).
jdstrand
Oct 5, 2017
•
Contributor
In thinking about this, I think that root:root 0755 is fine for all of the directories because the snap can always adjust the parent directories after creation. The mountpoint permissions will be covered up by whatever is mounted on it, and the snap could adjust that after the fact as well if desired (but, that will change the permissions of the original mountpoint-- we need to be really careful of the mount options for things in SNAP_DATA, SNAP_COMMON or anywhere else the snaps can write to).
zyga
Oct 6, 2017
Contributor
I'll update the code to ensure we don't leave non root groups (as snap-confine is not setgid root) now.
zyga
added some commits
Oct 6, 2017
|
I have to say, the timing of this PR is important. I've had two different people walk up to me today at the sprint complaining that they can't use |
|
I will fix this branch and add a useful spread test in a moment. Thank you for re-affirming the utility @jdstrand :-) |
mvo5
added this to the 2.29 milestone
Oct 11, 2017
zyga
added some commits
Oct 6, 2017
|
@jdstrand I think this ready for another look now. I replied to all the comments and the only remaining TODO is to chown all the path fragments, not just the last one we made. I'll adjust that today but I need to pick one of two ways of doing that (immediately after arch mkdir or after all of them). I'll do a pass on some other PRs that I have and I'll get back here. |
| @@ -21,6 +21,7 @@ package main | ||
| import ( | ||
| "fmt" | ||
| + "os" | ||
| "path" |
| @@ -56,10 +57,116 @@ func (c Change) String() string { | ||
| var ( | ||
| sysMount = syscall.Mount | ||
| sysUnmount = syscall.Unmount | ||
| + osLstat = os.Lstat | ||
| + // NOTE: we're not using os.MkdirAll as it is not careful about symlinks. | ||
| + osMkdirAll = SecureMkdirAll |
zyga
Oct 12, 2017
Contributor
I wanted to keep the same semantics as os.MkdirAll (to make the two interchangeable). I guess it's not a great reason, I can tweak this.
| +// | ||
| +// The only handled error is mkdirat(2) that fails with EEXIST. All other | ||
| +// errors are fatal but there is no attempt to undo anything that was created. | ||
| +func SecureMkdirAll(name string, perm os.FileMode) error { |
zyga
Oct 12, 2017
Contributor
Yes but @jdstrand requested that we limit imports from other modules. I can move it there if he agrees.
| + // We use lstat to ensure that we don't follow the symlink in case one | ||
| + // was set up by the snap. At the time Change.Perform runs all the | ||
| + // processes in the snap should be frozen. | ||
| + fi, err := osLstat(path) |
chipaca
Oct 12, 2017
Member
why stat it at all, and not dive straight into the mkdirall? as it stands there's a window between the stat and the mkdir that can probably be (ab)used
zyga
Oct 12, 2017
Contributor
We want to stat it because the error message is different. There's no more any window as the stat is advisory now (mkdirall is coded to defend against that) and all the process are frozen.
| + if err := ensureMountPoint(c.Entry.Name); err != nil { | ||
| + return err | ||
| + } | ||
| + } |
|
@jdstrand I've changed |
zyga
added some commits
Oct 16, 2017
jdstrand
requested changes
Oct 24, 2017
This looks good overall. Thanks for all the hard work on it. Please see comments inline.
| + /var/ r, | ||
| + /var/snap/ r, | ||
| + /var/snap/*/ r, | ||
| + /var/snap/*/** rw, |
jdstrand
Oct 24, 2017
Contributor
The final '**' allows creating files as well as directories. I suggest changing this to:
/ r,
/var/ r,
/var/snap/{,*/} r,
/var/snap/*/**/ rw,
| - flags, err := mount.OptsToFlags(c.Entry.Options) | ||
| - if err != nil { | ||
| + if c.Action == Mount { | ||
| + // TODO: use the right mode and ownership. |
jdstrand
Oct 24, 2017
Contributor
As discussed in my previous comment, I think that this is the right mode and ownership. Removing the TODO comment and adding a comment as to why this is ok would be good (eg, '// Create the directories with sane permissions. The snap can always change them later if it wants')
zyga
Oct 24, 2017
Contributor
Ack, I'll do that. I suspect this also means we don't need to (urgently) pull in the other PR that adds a way for this to be customized.
| + switch c.Action { | ||
| + case Mount: | ||
| + flags, unparsed := mount.OptsToCommonFlags(c.Entry.Options) | ||
| + return sysMount(c.Entry.Name, c.Entry.Dir, c.Entry.Type, uintptr(flags), strings.Join(unparsed, ",")) |
jdstrand
Oct 24, 2017
Contributor
It feels weird that we aren't doing any validation on unparsed. I realize that the fstab files are controlled by snapd, so not strictly a blocker, but this seems to be leaving us open to introducing a future bug if assumptions change. Can you explain the motivation for this? (Eg, the sysMount this is replacing used "", but now you are using strings.Join(unparsed, ",").
zyga
Oct 24, 2017
Contributor
Those are needed for things like overlayfs's lower/upper/workdir. I'm not sure if we should have strong validation here given that snap-update-ns reads what snapd wrote. If you feel strongly about it I can add a validation system that will let us be explicit about what is allowed (on top of the mount mediation in apparmor).
jdstrand
Oct 24, 2017
Contributor
Not necessarily for this PR, but I feel it is worthwhile to have input validation for maintenance' sake. A typo or something else is caught in unit tests as opposed to spread tests.
| + | ||
| +var freezerCgroupDir = "/sys/fs/cgroup/freezer" | ||
| + | ||
| +func freezeSnapProcesses(snapName string) error { |
jdstrand
Oct 24, 2017
Contributor
A comment mentioning that we are freezing all processes in the snap would be good here, which is also why the name of the cgroup is snap.$SNAP and not snap.$SNAP.$COMMAND.
zyga
Oct 24, 2017
Contributor
I'll add a comment.
The name of the group is snap.$SNAP because the attack we are trying to fend off is involving the processes naturally inhabiting the mount namespace and that is no longer per-command but instead per-snap, if you recall.
| + } | ||
| + // If the cgroup is still freezing then wait a moment and try again. | ||
| + if bytes.Equal(data, []byte("FREEZING")) { | ||
| + time.Sleep(100 * time.Millisecond) |
jdstrand
Oct 24, 2017
Contributor
Is 1 second really enough time to wait on a constrained system (eg, rpi under load)?
zyga
Oct 24, 2017
Contributor
I think this is a multi-socket thing, otherwise the write/freeze would be immediate. I honestly never saw this take any amount of time (kernel expect could perhaps infer this from the inner workings of the cgroup).
| + } | ||
| + break | ||
| + } | ||
| + return nil |
jdstrand
Oct 24, 2017
Contributor
We seem to be failing open here if the cgroup is still FREEZING. We should be failing closed, since we are freezing to make sure a snap can fiddle with the mountpoints, no?
zyga
Oct 24, 2017
Contributor
Aha, good point! I'll correct that, it was meant to fail here as we check for the failure in the call site.
zyga
Oct 24, 2017
Contributor
Corrected. I also made this thaw the processes so that we don't break the system when this happens.
| + _, err = os.Stat(f) | ||
| + c.Assert(os.IsNotExist(err), Equals, true) | ||
| + | ||
| + // When the cgroup exists we write FROZEN the freezer.state file. |
| + // any errors by mistake. | ||
| + var err error | ||
| + // Start at the current working directory by default. | ||
| + var fd int = AT_FDCWD |
jdstrand
Oct 24, 2017
Contributor
What is the use case for the relative directory? Doing it this way means we are going to need to be really careful about the current directory which I'd prefer we not have to worry about unless we have to.
As it is, snap-confine does a chdir('/') sometime before sc_setup_mount_profiles() (what calls snap-update-ns) in setup_private_mount(), so the relative handling does nothing when called from snap-confine. I don't see any callers of os.Chdir() or osutil.ChDir() that are relevant to snap-update-ns, so the current working directory is whatever snapd seems to be running under. Considering all of this, I'd prefer we explicitly construct the absolute paths.
zyga
Oct 24, 2017
Contributor
We don't really need it. I was just trying to write some library-worthy code. I can make it always absolute if you prefer.
| + | ||
| + // If path is absolute then open the root directory and start there. | ||
| + if path.IsAbs(name) { | ||
| + fd, err = sysOpen("/", openFlags, 0) |
jdstrand
Oct 24, 2017
Contributor
It is a bit weird that you need to specify '0' here since you aren't using O_CREAT, but since it's a syscall, I guess it makes sense. Not a blocker.
zyga
Oct 24, 2017
Contributor
Yeah, that's not something we can avoid in the go binding to the syscall.
| + // the parent directory as reference. Each time we open the newly created | ||
| + // segment using the O_NOFOLLOW and O_DIRECTORY flag so that symlink | ||
| + // attacks are impossible to carry out. | ||
| + segments := strings.Split(path.Clean(name), "/") |
| + // | ||
| + // We use lstat to ensure that we don't follow the symlink in case one | ||
| + // was set up by the snap. At the time Change.Perform runs all the | ||
| + // processes in the snap should be frozen. |
jdstrand
Oct 24, 2017
Contributor
I suggest a stronger statement // Note that at the time this is run, all the snap's processes are frozen.
zyga
added some commits
Oct 24, 2017
|
@jdstrand all of the comments have been addressed! Thank you for the review :) |
jdstrand
approved these changes
Oct 24, 2017
Thanks for the updates! It would be nice if you could add the additional comment in line.
| + | ||
| + const openFlags = syscall.O_NOFOLLOW | syscall.O_CLOEXEC | syscall.O_DIRECTORY | ||
| + | ||
| + if !path.IsAbs(name) { |
jdstrand
Oct 24, 2017
Contributor
If you are going to make another commit, a comment here would be nice saying that we actively chose not to support relative directories. Eg:
// Only support absolute paths to avoid bugs in snap-confine when
// called from anywhere
| + return c.lowLevelPerform() | ||
| +} | ||
| + | ||
| +// lowLevelPerform is a version of Perform that doesn't contain any special behavior. |
mvo5
Oct 25, 2017
Collaborator
This comment feels a bit cryptic: for someone not familiar with the details that dosn't contain any special behaviour is hard to parse. Maybe: doPerform() or performOnly() or just fold it into perform and extract the ensureMountPoint thing into ensureMountPointForAction or somesuch? Looking at the code it seems like Perform() and lowLevelPerform could just be a single method maybe (neither is very long or complex)?
zyga
Oct 25, 2017
Contributor
They are separated because code further ahead needs to call lowLevelPerform. I'll adjust comments on this but I need them separate in one way or another.
| @@ -101,6 +101,14 @@ func run() error { | ||
| } | ||
| } | ||
| + // Freeze the mount namespace and unfreeze it later. This lets us perform | ||
| + // modifications without snap processes attempting to construct symlinks or | ||
| + // perform other malicious activity. |
mvo5
Oct 25, 2017
Collaborator
How can a snap process do "malicious activity"? Isn't the cgroup protected via apparmor from the snap anyway? Or does malicious mean something else in this context?
zyga
Oct 25, 2017
Contributor
It means attempting to mount a symlink attack against mount. I'll adjust the comment to be explicit about this.
| + fd, err = sysOpen("/", openFlags, 0) | ||
| + if err != nil { | ||
| + return fmt.Errorf("cannot open root directory, %v", err) | ||
| + } |
| + } | ||
| + previousFd := fd | ||
| + | ||
| + fd, err = sysOpenat(fd, segment, openFlags, 0) |
| + if err := sysClose(previousFd); err != nil { | ||
| + return fmt.Errorf("cannot close previous file descriptor, %v", err) | ||
| + } | ||
| + if err != nil { |
mvo5
Oct 25, 2017
Collaborator
Why do we have the sysClose() between this error check? The error is for the err in 101 so it feels harder to read this way than if the err check is followed immediately after the sysOpenat(). Maybe defer sysClose() would help to make this easier to read?
zyga
Oct 25, 2017
Contributor
The code is designed to ensure that we only keep two file descriptors around. The one for the parent directory and the one for the entry inside it. Everything is paired correctly.
As for defer, we could do that but it would mean we need to keep more file descriptors open for no good reason.
| + return nil | ||
| +} | ||
| + | ||
| +func EnsureMountPointImpl(path string, mode os.FileMode, uid int, gid int) error { |
mvo5
Oct 25, 2017
Collaborator
I think this can be renamed to "EnsureMountPoint()" (or possible even ensureMountPoint) and the var ensureMountPoint can be removed. AFAICT this is not reassigned anywhere in the code.
| @@ -0,0 +1,13 @@ | ||
| +name: test-snapd-content-advanced-plug | ||
| +summary: A more complex snap for testing content interface |
zyga
added some commits
Oct 25, 2017
| + var err error | ||
| + var fd int | ||
| + | ||
| + const openFlags = syscall.O_NOFOLLOW | syscall.O_CLOEXEC | syscall.O_DIRECTORY |
chipaca
Oct 25, 2017
Member
I'd suggest also O_PATH. But you need to use golang.org/x/sys/unix instead of syscall (which we should do as much as possible anyway)
| + // the parent directory as reference. Each time we open the newly created | ||
| + // segment using the O_NOFOLLOW and O_DIRECTORY flag so that symlink | ||
| + // attacks are impossible to carry out. | ||
| + segments := strings.Split(filepath.Clean(name), "/") |
chipaca
Oct 25, 2017
Member
you could also segments := strings.FieldsFunc(filepath.Clean(name), isSlash) (with func isSlash(c rune) bool { return c == '/' }) and avoid the if in the loop
| + if err != nil { | ||
| + return fmt.Errorf("cannot open path segment %q: %v", segment, err) | ||
| + } | ||
| + defer sysClose(fd) |
chipaca
Oct 25, 2017
Member
I'm a little bit unhappy about this approach. It's probably not worth worrying about, but I worry: this uses a lot more filehandles than necessary. On the other hand, doing the cleanup by hand, even with a helper, is not obviously right to the reader. On the other other hand, we're ignoring all the close errors. Sigh.
zyga
Oct 25, 2017
Contributor
Look at the history, one of the most recent patches introduces defer, if we pop that we're back to hand-done cleanup logic that only uses a pair of FDs
| + } | ||
| + | ||
| + } | ||
| + return nil |
zyga commentedOct 5, 2017
This branch allow snap-update-ns to create missing mount points (directories)
automatically.
The code is not perfect yet, we can only create directories as root and we can
only create them with mode 0755. There are ongoing branches that lift those
restrictions but they are not here yet.
The largest restriction is that we cannot yet create directories on top of read
only filesystems. This will be done with a separate feature branch as it is the
most complex.
The apparmor profile for snap-update-ns when invoked from snap-confine was
extended to allow writes to $SNAP_DATA but nowhere else yet. Eventually, to
support the full layout system we will need write permissions in various
places (essentially everywhere except for a specific blacklist).
Signed-off-by: Zygmunt Krynicki zygmunt.krynicki@canonical.com