interfaces/builtin: add initial docker interface #1619

Closed
wants to merge 17 commits into
from

Conversation

Projects
None yet
7 participants
Contributor

tianon commented Aug 2, 2016

This is a first pass on adding a docker interface to snapd. I've been discussing this off-and-on with @jdstrand on IRC, so hopefully it's at least an OK start, but if not, I'm happy to amend, rebase, refactor, etc as necessary or desired. 👍

One main question I've still got is around slot/plug overlap. With the way this is currently implemented, by docker snap includes both the docker and the network plugs on the Docker client command, and the docker slot plus the network-bind and firewall-control plugs on the Docker daemon. My question is whether that's the correct way to be thinking about this, or whether those overlapping privileges should be rolled into the docker interface directly?

To be more precise, here's the exact snapcraft.yml snippet:

apps:
  docker:
    command: docker-wrapper docker
    plugs:
      - docker
      - network

  dockerd:
    command: dockerd-wrapper
    daemon: simple
    plugs:
      - network-bind
      - firewall-control
    slots:
      - docker

And the interfaces I've got to have connected for it to function appropriately: (as one would expect)

$ snap interfaces | grep docker
docker:docker        docker
:firewall-control    docker
:network             docker
:network-bind        docker
Contributor

snappy-m-o commented Aug 2, 2016

Can one of the admins verify this patch?

Contributor

snappy-m-o commented Aug 2, 2016

Can one of the admins verify this patch?

Contributor

jdstrand commented Aug 2, 2016

Does dockerd provide a port and listen over TCP or UDP? If so, then network-bind is appropriate. If not, please add what you need to the docker slot side policy. I'll go through the rest of the interface now as a sort of pre-review. Thanks for working on this! :)

interfaces/builtin/docker.go
+# Usage: reserved
+
+# Allow sockets
+/{,var/}run/docker.sock rw,
@jdstrand

jdstrand Aug 2, 2016

Contributor

This is covered by the rule below:
/{,var/}run/docker/** mrwklix,

@tianon

tianon Aug 2, 2016

Contributor

Is it? It's /run/docker.sock vs /run/docker/**, so it's not really, right? (the socket is directly in /run, not in the /run/docker directory)

@tianon

tianon Aug 2, 2016

Contributor

(not as familiar with the AppArmor language/globbing as I need to be, but this was my understanding)

@jdstrand

jdstrand Aug 2, 2016

Contributor

Sorry, you are correct, I misread the rule. Please disregard that comment.

+/{,var/}run/docker/ rw,
+/{,var/}run/docker/** mrwklix,
+/{,var/}run/runc/ rw,
+/{,var/}run/runc/** mrwklix,
@jdstrand

jdstrand Aug 2, 2016

Contributor

Should runc be in /run or can it be in SNAP_DATA? I ask because in general if sockets/etc are exposed in /run then I think they should be well-known and I'm not familiar enough with runc to know if it needs to be there for just docker or for other things. Put another way, /run/docker makes sense since this matches the snap name, but if someone implemented a runc snap or interface, the two would collide. I'm not sure this is a blocker, but it is at least a consideration.

@tianon

tianon Aug 2, 2016

Contributor

It probably can be in SNAP_DATA, but would require patching Docker/runc to accomplish. The way runc is designed, it's supposed to be safe for separate instances to share this directory (especially since runc doesn't have a "daemon", so every container invoked with runc is technically a new instance sharing this directory).

@jdstrand

jdstrand Aug 16, 2016

Contributor

Regarding runc, ok, feel free to leave it as is.

interfaces/builtin/docker.go
+
+# Wide read access to /sys
+/sys/** r,
+# Limit cgroup writes a bit
@jdstrand

jdstrand Aug 2, 2016

Contributor

Thank you for this :)

@tianon

tianon Aug 2, 2016

Contributor

Hah, I can only take credit for small parts of this (and for limiting a bit what I copied over) -- most of the credit should go to whoever developed the 15.04 Docker snap before me. 😇

interfaces/builtin/docker.go
+/sys/fs/cgroup/*/docker/ rw,
+/sys/fs/cgroup/*/docker/** rw,
+/sys/fs/cgroup/*/system.slice/ rw,
+/sys/fs/cgroup/*/system.slice/** rw,
@jdstrand

jdstrand Aug 2, 2016

Contributor

Why does docker need access to the systemd slice cgroups?

@tianon

tianon Aug 2, 2016

Contributor

Good question -- I'll try taking this out and see what denials come of it so we can try narrowing it down.

@tianon

tianon Aug 2, 2016

Contributor

Sweet, looks like this is no longer needed with newer Docker. 👍 (will be removed in next update)

interfaces/builtin/docker.go
+/sys/fs/cgroup/*/system.slice/** rw,
+
+# Allow tracing other processes (especially ourselves)
+ptrace (read, trace),
@jdstrand

jdstrand Aug 2, 2016

Contributor

Is this needed for privileged containers? The old 15.04 policy had:

ptrace (trace) peer=@{profile_name},
ptrace (read, trace) peer=docker-default,
@tianon

tianon Aug 2, 2016

Contributor

I actually didn't test or target privileged containers at all with this policy -- wasn't sure whether that'd be a separate interface entirely or what. I'll play with what you've provided here and see if I can figure out why I ended up making this more permissive. 😞

@tianon

tianon Aug 2, 2016

Contributor

Aha:

Log: apparmor="DENIED" operation="ptrace" profile="snap.docker.dockerd" pid=18065 comm="docker-runc" requested_mask="trace" denied_mask="trace" peer="snap.docker.dockerd"

Adding just ptrace (trace) peer=@{profile_name}, is sufficient, so that'll be in the next update. 👍

interfaces/builtin/docker.go
+# Allow receiving from unconfined
+dbus (receive)
+ bus=system
+ peer=(label=unconfined),
@jdstrand

jdstrand Aug 2, 2016

Contributor

Can you adjust this to have interface and/or path?

@tianon

tianon Aug 2, 2016

Contributor

Yeah, I'll see what I can do to narrow down what Docker actually needs to discuss with systemd here. 👍

@tianon

tianon Aug 2, 2016

Contributor

I have no idea what part of Docker needed this, but after removing both dbus statements, Docker appears to be working fine, so I'm going to just axe them in my next update. 😅

interfaces/builtin/docker.go
+/.pivot_root*/ rw,
+
+# for console access
+/dev/ptmx rw,
@jdstrand

jdstrand Aug 2, 2016

Contributor

This is already in the default template.

@tianon

tianon Aug 2, 2016

Contributor

Woot! 🤘 🔪

interfaces/builtin/docker.go
+# for console access
+/dev/ptmx rw,
+# file descriptors (/proc/NNN/fd/X)
+/[0-9]* rw,
@jdstrand

jdstrand Aug 2, 2016

Contributor

This is curious, can you paste the denials that caused you to add this rule?

@tianon

tianon Aug 2, 2016

Contributor

Yeah, I can reproduce and paste them, although maybe this will help (from inside a container):

root@977b9f2d78fd:/# ls -l /proc/1/fd
total 0
lrwx------ 1 root root 64 Aug  2 20:23 0 -> /30
lrwx------ 1 root root 64 Aug  2 20:23 1 -> /30
lrwx------ 1 root root 64 Aug  2 20:23 2 -> /30
lrwx------ 1 root root 64 Aug  2 20:23 255 -> /30

(inside Docker containers, file descriptors end up showing as numbered files in / for some reason 😄)

@tianon

tianon Aug 2, 2016

Contributor
Log: apparmor="DENIED" operation="chown" profile="snap.docker.dockerd" name="/0" pid=19284 comm="exe" requested_mask="w" denied_mask="w" fsuid=0 ouid=0
File: /0 (write)
@jdstrand

jdstrand Aug 2, 2016

Contributor

Ok, that makes sense then. They are showing up as /N due to attach_disconnected in the policy. Since this rule is odd looking, can you add a comment: '// file descriptors in the container show up here due to attach_disconnected'

@tianon

tianon Aug 2, 2016

Contributor

Ok, comment will be in the next update 👍

interfaces/builtin/docker.go
+
+# Docker needs to be able to create and load the profile it applies to containers ("docker-default")
+# XXX We might be able to get rid of this if we generate and load docker-default ourselves and make docker not do it.
+/etc/apparmor.d/docker rw,
@jdstrand

jdstrand Aug 2, 2016

Contributor

This file will not be writable on an all-snaps system.

@tianon

tianon Aug 2, 2016

Contributor

Crap, so we need similar patches to what we do for 15.04 -- will do.

interfaces/builtin/docker.go
+# XXX We might be able to get rid of this if we generate and load docker-default ourselves and make docker not do it.
+/etc/apparmor.d/docker rw,
+/sbin/apparmor_parser ixr,
+/etc/apparmor*/** r,
@jdstrand

jdstrand Aug 2, 2016

Contributor

This is lenient-- why does docker need to read all the other profiles?

@tianon

tianon Aug 2, 2016

Contributor

Reading other config files from AppArmor happens when invoking apparmor_parser while Docker loads the docker profile it's generated -- if it'd be better to narrow this down to a more specific set, I'm happy to do so, but I think #include might end up making that a bit fragile in some cases, right? (especially if those files getting included ever change)

@jdstrand

jdstrand Aug 2, 2016

Contributor

If it is using abstractions, then perhaps change that to "/etc/apparmor.d/abstractions/* r,". You might need other rules-- let's see how fine-grained you can go. That said, if you change where the docker-default policy is (ie, out of /etc/apparmor.d to SNAP_DATA) then I suspect you may be able to get rid of these rules (or just have the abstractions rule).

interfaces/builtin/docker.go
+/[0-9]* rw,
+
+# Docker needs to be able to create and load the profile it applies to containers ("docker-default")
+# XXX We might be able to get rid of this if we generate and load docker-default ourselves and make docker not do it.
@jdstrand

jdstrand Aug 2, 2016

Contributor

Ideally, these profiles would be in $SNAP_DATA/etc/apparmor.d

@tianon

tianon Aug 2, 2016

Contributor

Ok, will do!

interfaces/builtin/docker.go
+/sbin/apparmor_parser ixr,
+/etc/apparmor*/** r,
+/etc/apparmor.d/cache/docker* w,
+/sys/kernel/security/apparmor/** rw,
@jdstrand

jdstrand Aug 2, 2016

Contributor

Can you fine-tune this and break out the writes and the reads?

@tianon

tianon Aug 2, 2016

Contributor

Yeah, I'll give it a go!

interfaces/builtin/docker.go
+/etc/apparmor.d/cache/docker* w,
+/sys/kernel/security/apparmor/** rw,
+
+# We'll want to adjust this to support --security-opts...
@jdstrand

jdstrand Aug 2, 2016

Contributor

Supporting --security-opts is very interesting. I imagine the best implementation would ensure that the profile name starts with 'docker.', similar to how we do with snapd profiles starting with 'snap.'. In this manner, docker could own that 'namespace' and not interfere with the policy of other snaps.

UPDATE: well, I say that, but currently we don't have mediation for "only load a profile that starts with 'docker.'" so this idea is more advisory (ie, when using --security-opts, someone could create $SNAP_DATA/etc/apparmor.d/docker.foo and docker.foo might have 'profile snap.bar.baz ...'. Docker would happily load that).

@tianon

tianon Aug 2, 2016

Contributor

Yeah, it's an interesting thought, but maybe best left relegated to the "privileged Docker" discussion given the implications?

@jdstrand

jdstrand Aug 2, 2016

Contributor

AIUI, this isn't necessarily related to privileged docker. Please correct me if I'm wrong, but I thought --security-opts was used by the admin to specify to use the profile name to use instead of 'docker-default' and it is up to the admin to make sure that profile is loaded. As such, if docker-default is in SNAP_DATA/etc/apparmor.d then one implementation would be for the admin to add profiles there and for the docker snap to make sure they were loaded on daemon startup (this could be simple shell code that runs before invoking docker-daemon).

I agree that --security-opts doesn't have to be implemented now, but hopefully the discussion will spark some ideas.

@tianon

tianon Aug 2, 2016

Contributor

--security-opt currently supports loading custom AppArmor profiles, custom SecComp profiles, SELinux labels, and for enabling "No New Privileges".

Loading custom AppArmor/SecComp profiles is IMO a "privileged" operation, given the potential danger implied, so that's why I lumped the two together.

+# We'll want to adjust this to support --security-opts...
+change_profile -> docker-default,
+signal (send) peer=docker-default,
+ptrace (read, trace) peer=docker-default,
@jdstrand

jdstrand Aug 2, 2016

Contributor

Your ptrace rule above is a super set of this rule. FWIW, I prefer this rule.

@tianon

tianon Aug 2, 2016

Contributor

Yeah, I've missed something for sure -- I'm going to test the rule above again and figure out what I've done wrong so it can be more specific.

@tianon

tianon Aug 2, 2016

Contributor

Latest push resolved this.

interfaces/builtin/docker.go
+/{,var/}run/docker.sock rw,
+
+@{PROC}/sys/net/core/somaxconn r,
+`)
@jdstrand

jdstrand Aug 2, 2016

Contributor

I'd like to point out just how privileged this plug is. A snap with this plug would be able to drive docker and essentially own the system. I realize that you are simply reimplementing the 'client' portion of the 15.04 framework policy, but I've always disliked this policy.

Do we really need to expose the docker socket to snaps and if so, for what purpose? If the purpose is limited, perhaps either docker-daemon could limit its API when the connecting process' label starts with 'snap.' (or similar) or perhaps a small proxy service could listen on /run/docker/snap-proxy and that service would only expose a limited docker API to snaps and snaps would only be able to connect to it. This limited API would need to be designed of course and if evaluating the process' security label, please consult with the security team on design.

@tianon

tianon Aug 2, 2016

Contributor

Yeah, I honestly wasn't sure at all what else to do here. Docker's API and limiting abilities are pretty limited at this point, and trying to limit what Docker is capable of isn't something upstream's been terribly interested in previously. As-is, it will also require that the user is either a member of a pre-existing docker group, or is root (so sudo will be necessary), but that's probably not really sufficient either.

Implementing a wrapper on the Docker API is an interesting solution, but it's going to be very difficult to maintain as Docker updates, since their API is still a strong moving target (especially 1.12, which now includes the functionality of Swarm and features similar to what Compose provides).

@jdstrand

jdstrand Aug 2, 2016

Contributor

Note that snaps won't be in the docker group and don't have sudo access but snaps implementing daemons do run as root, so the DAC check is insufficient on its own.

To be clear, protecting the user from herself is not the goal. What I'm concerned about is untrusted snaps having free rein over the system if they 'plugs: [ docker ]'. One could argue that with series 16's manual interface connections everything is 'ok' since the admin has to do it herself, or future assertions work could allow an auto-connection, and this is a fair point and may ultimately be what we decide we want to do; it's just that the socket offers so much privilege). I'm just trying to understand the requirements....

Do we expect snaps to actually drive docker or is docker simply an application for an admin? In the 15.04 implementation people were thinking that the store could house docker containers or scripts to drive docker to install app containers but docker already has a store and people are used to it.

@tianon

tianon Aug 2, 2016

Contributor

Right, I mean that the daemon will run as root, it will see that the "docker" group exists, and will apply it to the unix socket created. Then, from the client side, only clients that have both the "docker" group and the appropriate AppArmor permission will be able to talk to the Docker socket (and thus control the daemon). If the "docker" group does not exist, the daemon will instead use "root", and so from the perspective of the users (or even other snaps) only root will be able to even connect to the Docker daemon.

IMO, anything that isn't docker itself (and perhaps a manually-reviewed set of related utilities) should require manual connection of that docker plug, especially given the system-wide implications.

@jdstrand

jdstrand Aug 3, 2016

Contributor

"IMO, anything that isn't docker itself (and perhaps a manually-reviewed set of related utilities) should require manual connection of that docker plug, especially given the system-wide implications."

Yes I completely agree. My question is around the fact that the interface is so privileged, why offer the plugs side at all if we don't actually need it?

@tianon

tianon Aug 9, 2016

Contributor

I figured that adding AppArmor and forced manual connection (or at least an easy way to "disconnect" if auto-connect is added later) on top of docker.sock was a desirable feature, especially given the implications of access to docker.sock, so having a way to restrict access to it on a per-snap basis on top of the per-user basis the docker group provides seemed like a really handy feature. If I'm mistaken in the belief that such a thing is useful, I won't be put out about removing it. 😇

interfaces/builtin/docker.go
+
+var dockerPermanentSlotSecComp = []byte(`
+# The Docker daemon needs to be able to launch arbitrary processes within containers (whose syscall needs are unknown beforehand)
+@unrestricted
@jdstrand

jdstrand Aug 2, 2016

Contributor

I advised you to do this because of the way that seccomp works and to unblock you. However, is there set of syscalls that we know we never want to allow? Eg, module loading or access to the kernel keyring?

@tianon

tianon Aug 2, 2016

Contributor

It would probably make sense to look at Docker's own seccomp whitelist and mirror that at least. If we do end up supporting "privileged Docker" somehow, we could put this kind of thing there instead, and have this just be the set of what Docker itself needs + what Docker itself whitelists by default?

https://github.com/docker/docker/blob/0e9009bae3a493cfce8a8334949b239efcb88f43/profiles/seccomp/seccomp_default.go#L37-L1867

(still a pretty long list, but should be shorter than what @unrestricted implies)

@jdstrand

jdstrand Aug 2, 2016

Contributor

Well, yeah. This gets into the realm of privileged containers too cause the way seccomp works, we can never get a syscall back once we block it so we need to offer everything docker would ever want to give out (or we would want docker to give out). It seems fairly reasonable to take everything from snap-confine.git/debian/README.syscalls and put it in docker's seccomp, but leave out module loading and keyring syscalls. @tyhicks do you have an opinion?

@tianon

tianon Aug 2, 2016

Contributor

This might actually be a bit more complicated than that. Docker needs to orchestrate iptables (including setting up an appropriate NAT chain for containers) at startup, and it's looking like it'll need to be able to load the appropriate kernel modules in order to do that.

Additionally, before I'd put @unrestricted here I had just trial-and-errored the set of things Docker itself requires, and I believe there were some keyring-related syscalls there, but I'll have to try again to confirm that for sure.

@tianon

tianon Aug 2, 2016

Contributor

To be slightly more specific regarding loading modules, here's the precise Docker daemon logs where it's using modprobe, getting permission denied, ignoring the error and hitting iptables anyhow, then failing:

time="2016-08-02T14:46:55.740368931-07:00" level=warning msg="Running modprobe bridge br_netfilter failed with message: , error: fork/exec /sbin/modprobe: permission denied"
time="2016-08-02T14:46:55.741108535-07:00" level=warning msg="Running modprobe nf_nat failed with message: ``, error: fork/exec /sbin/modprobe: permission denied"
time="2016-08-02T14:46:55.741794046-07:00" level=warning msg="Running modprobe xt_conntrack failed with message: ``, error: fork/exec /sbin/modprobe: permission denied"
time="2016-08-02T14:46:55.742125959-07:00" level=debug msg="Fail to initialize firewalld: Failed to connect to D-Bus system bus: dial unix /var/run/dbus/system_bus_socket: connect: permission denied, using raw iptables instead"
time="2016-08-02T14:46:55.743626077-07:00" level=debug msg="/sbin/iptables, [--version]"
time="2016-08-02T14:46:55.745027672-07:00" level=debug msg="/sbin/iptables, [-t nat -D PREROUTING -m addrtype --dst-type LOCAL -j DOCKER]"
time="2016-08-02T14:46:55.746875503-07:00" level=debug msg="/sbin/iptables, [-t nat -D OUTPUT -m addrtype --dst-type LOCAL ! --dst 127.0.0.0/8 -j DOCKER]"
time="2016-08-02T14:46:55.748674940-07:00" level=debug msg="/sbin/iptables, [-t nat -D OUTPUT -m addrtype --dst-type LOCAL -j DOCKER]"
time="2016-08-02T14:46:55.750387276-07:00" level=debug msg="/sbin/iptables, [-t nat -D PREROUTING]"
time="2016-08-02T14:46:55.751859031-07:00" level=debug msg="/sbin/iptables, [-t nat -D OUTPUT]"
time="2016-08-02T14:46:55.753300771-07:00" level=debug msg="/sbin/iptables, [-t nat -F DOCKER]"
time="2016-08-02T14:46:55.754844062-07:00" level=debug msg="/sbin/iptables, [-t nat -X DOCKER]"
time="2016-08-02T14:46:55.756263058-07:00" level=debug msg="/sbin/iptables, [-t filter -F DOCKER]"
time="2016-08-02T14:46:55.757613661-07:00" level=debug msg="/sbin/iptables, [-t filter -X DOCKER]"
time="2016-08-02T14:46:55.759091740-07:00" level=debug msg="/sbin/iptables, [-t filter -F DOCKER-ISOLATION]"
time="2016-08-02T14:46:55.760441146-07:00" level=debug msg="/sbin/iptables, [-t filter -X DOCKER-ISOLATION]"
time="2016-08-02T14:46:55.761857607-07:00" level=debug msg="/sbin/iptables, [-t nat -n -L DOCKER]"
time="2016-08-02T14:46:55.763431457-07:00" level=debug msg="/sbin/iptables, [-t nat -N DOCKER]"
time="2016-08-02T14:46:55.765514246-07:00" level=debug msg="Cleaning up old mountid : start."
time="2016-08-02T14:46:55.765758158-07:00" level=fatal msg="Error starting daemon: Error initializing network controller: error obtaining controller instance: failed to create NAT chain: iptables failed: iptables -t nat -N DOCKER: iptables v1.6.0: can't initialize iptables table `nat': Permission denied\nPerhaps iptables or your kernel needs to be upgraded.\n (exit status 3)"

(and this is with the firewall-control plug connected)

@tianon

tianon Aug 2, 2016

Contributor

And this is what I'm getting after adding the following:

/bin/kmod ixr,
/etc/modprobe.d/ r,
/etc/modprobe.d/** r,
/lib/modules/** r,
time="2016-08-02T15:31:58.555640406-07:00" level=warning msg="Running modprobe bridge br_netfilter failed with message: modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.4.0-31-generic/modules.dep.bin'\nmodprobe: WARNING: Module bridge not found in directory /lib/modules/4.4.0-31-generic\nmodprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.4.0-31-generic/modules.dep.bin'\nmodprobe: WARNING: Module br_netfilter not found in directory /lib/modules/4.4.0-31-generic\n, error: exit status 1"
time="2016-08-02T15:31:58.558195696-07:00" level=warning msg="Running modprobe nf_nat failed with message: `modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.4.0-31-generic/modules.dep.bin'\nmodprobe: WARNING: Module nf_nat not found in directory /lib/modules/4.4.0-31-generic`, error: exit status 1"
time="2016-08-02T15:31:58.560394284-07:00" level=warning msg="Running modprobe xt_conntrack failed with message: `modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.4.0-31-generic/modules.dep.bin'\nmodprobe: WARNING: Module xt_conntrack not found in directory /lib/modules/4.4.0-31-generic`, error: exit status 1"
@jdstrand

jdstrand Aug 3, 2016

Contributor

We don't want docker loading modules itself I don't think. It seems it is only going that route because it can't connect to /var/run/dbus/system_bus_socket. Let's see what happens when you adjust the policy for that and add dbus policy for firewalld.

Regarding firewall-control, it still needs to load ip_tables and ip6_tables via the unimplemented module loading backend (see https://bugs.launchpad.net/snappy/+bug/1583514). What this backend would do is simply take a list of modules that are hardcoded in the interface and add them to /etc/modules-load.d/snap..conf (eg snap.firewall-control.conf, snap.ppp.conf, etc).

I suspect if you allow talking to systemd, this will start to work. Else, would need to implement the SecurityKernelModule backend in the manner described and update firewall-control. The other modules should then auto-load, but if they don't, docker can list its own modules for snapd to load via the SecurityKernelModule backend.

@tianon

tianon Aug 9, 2016

Contributor

Ah, interesting -- I'll play around with dbus and chatting with firewalld directly and see what I can make work!

@tianon

tianon Aug 10, 2016

Contributor

It looks like we're going to have to go with the SecurityKernelModule route:

Ubuntu Snappy Classic:

$ systemctl status firewalld
● firewalld.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead)

Ubuntu Snappy "all-snaps":

$ systemctl status firewalld
● firewalld.service
   Loaded: not-found (Reason: No such file or directory)
   Active: inactive (dead)

(ie, firewalld doesn't seem to actually be present by default in either Classic or standard Snappy)

interfaces/builtin/docker.go
+
+func (iface *DockerInterface) ConnectedSlotSnippet(plug *interfaces.Plug, slot *interfaces.Slot, securitySystem interfaces.SecuritySystem) ([]byte, error) {
+ switch securitySystem {
+ case interfaces.SecurityAppArmor, interfaces.SecurityDBus, interfaces.SecurityMount, interfaces.SecuritySecComp, interfaces.SecurityUDev:
@jdstrand

jdstrand Aug 2, 2016

Contributor

Can you add a comment here: '// The docker socket is a named socket and therefore mediated by AppArmor file rules and we can't currently limit connecting clients by their security label'

@tianon

tianon Aug 2, 2016

Contributor

Yes! 👍

Contributor

jdstrand commented Aug 2, 2016

Lastly, the 15.04 docker snap had a concept of unprivileged and privileged. Have you given any thought as to how to handle this? If we don't differentiate then we'll want to support just privileged. One thought on differentiating would be to have an interface attribute for privileged and when specified, you get the privileged policy. I think this would be troublesome in practice because there would need to be two snaps: once that used the attribute and one that didn't since there isn't a way for the user to toggle the attribute at this time (and I'm not sure that is something we'd want to support anyway).

In general I think I am of the mind that the docker interface is simply a super-special, highly-privileged interface.

Contributor

tianon commented Aug 2, 2016

Does dockerd provide a port and listen over TCP or UDP? If so, then network-bind is appropriate.

Yeah, it does (both optionally via -H on the service for talking to Docker itself over TCP, but also via -p on docker run for port mapping depending on certain circumstances). 👍

Lastly, the 15.04 docker snap had a concept of unprivileged and privileged. Have you given any thought as to how to handle this? If we don't differentiate then we'll want to support just privileged.

The problem with just supporting privileged is that we'll need to essentially just remove 90% of all this and go almost entirely unconfined. If the goal is to 100% support privileged Docker, then I need to be able to do crazy things like directly enter the namespace of the host's init (which is currently pretty trivially possible with --privileged on docker run, and that's very much on purpose given that --privileged is considered to be "take off all the things stopping us from being evil").

The only real possibility I can think of is to have a separate docker-privileged interface/slot/plug (which makes it possible for users to connect/disconnect it) which allows a much broader set of functionality (giving users the rope they need if that's really what they desire).

Contributor

jdstrand commented Aug 3, 2016

Regarding docker vs docker privileged, you make a good point. I guess most docker users don't need privileged mode. In that light, yes I think it would need to be a separate interface, but I think we would want to have it not auto-connect and instead only have policy on the plugs side so might have:

apps:
  docker-daemon:
    slots: [ docker ]
    plugs: [ docker-privileged ]

Then on install the user needs to do this for privileged mode:

$ sudo snap connect docker:docker-privileged docker:docker-privileged

or something. Point is, regular install is non-privileged and a snap connect command is needed for privileged.

+ptrace (trace) peer=@{profile_name},
+
+# Docker needs a lot of caps, but limits them in the app container
+capability,
@morphis

morphis Aug 4, 2016

Contributor

This involves the different capabilities to spawn up namespaces, right? I am wondering if we shouldn't move such things in a container-control interface so that we can have different container approaches (LXC/LXD, docker, ...) use the same interface rather than splitting the same rules across multiple interfaces.

@jdstrand

jdstrand Aug 4, 2016

Contributor

I initially suggested a generic container interface but was asked to instead have people implement separate docker and lxd interfaces, for at the very least, the sockets involved. Older versions of docker only used system namespaces but lxd defaults to userns, so the policies are expected to be different. That said, there is nothing saying that we can't tuck the shared policy into snapd somewhere to that the docker and lxd interfaces can pull it in, but that is an implementation detail and until we see lxd and docker interfaces side by side, we won't know if that is worth it.

interfaces/builtin/docker.go
+# Docker does all kinds of mounts all over the filesystem
+/dev/mapper/control rw,
+/dev/mapper/docker* rw,
+/dev/loop* r,
@morphis

morphis Aug 4, 2016

Contributor

I am wondering if it wouldn't make sense to have a loop-mount interface which allows those things?

We should at least make sure that this point the snap can only mount into its own writable directories (see http://wiki.apparmor.net/index.php/AppArmor_Core_Policy_Reference#Mount_rules_.28AppArmor_2.8_and_later.29). @jdstrand You agree with that?

@jdstrand

jdstrand Aug 4, 2016

Contributor

I'm not sure if loop-mount interface is worth it. I mean, docker and lxd will need it and maybe one or two other container technologies, but this is quite privileged and I'd like to understand the exact requirements before designing that interface.

@morphis

morphis Aug 4, 2016

Contributor

Ok.

@jdstrand

jdstrand Aug 16, 2016

Contributor

Can we change this to: /dev/loop-control r, and the following rule to /dev/loop[0-9]* rw,. It makes for easier policy auditing.

interfaces/builtin/docker.go
+mount,
+umount,
+pivot_root,
+/.pivot_root*/ rw,
@morphis

morphis Aug 4, 2016

Contributor

For what is this needed?

@jdstrand

jdstrand Aug 4, 2016

Contributor

Your point about the mount rules is fair though-- @tianon, can the mount rules be fine-tuned to be limited to SNAP_DATA or similar?

@tianon

tianon Aug 10, 2016

Contributor

Come to mention it, I think this particular part happens on the other side of the daemon-to-container boundary, so we might be able to remove this rule entirely and let the docker-default profile Docker manages take over this bit and remove it from here completely. I'll do some testing and verify that!

Regardless, this path is used inside the mount namespace of a container for pivoting the container from the host's / to the real container's root, so it's technically isolated that way already, but I'll still do some testing to see if we can remove it from this profile completely (because I think we can).

@tianon

tianon Aug 10, 2016

Contributor

Ok, I did some testing, and this is still necessary, but I did make it slightly more specific (/.pivot_root[0-9]*/) and added a better comment to explain what's happening:

# After doing a pivot_root using <graph-dir>/<container-fs>/.pivot_rootNNNNNN,
# Docker removes the leftover /.pivot_rootNNNNNN directory (which is now
# relative to "/" instead of "<graph-dir>/<container-fs>" thanks to pivot_root)
pivot_root,
/.pivot_root[0-9]*/ rw,
Contributor

lool commented Aug 14, 2016

Hi all,

I'd like to summarize discussions and share my own opinion. Discussions seem to have dried down; it seems consensus is:

  1. we want to keep the docker vs docker-privileged interfaces; these are mostly ready
  2. we dont want to see module loading done by dockerd

Concerning 2), we can't hold the Docker snap unconfined for too long and it looks like work on https://bugs.launchpad.net/snappy/+bug/1583514 hasn't started. Can we allow module loading in the docker interfaces for now and remove these when a firewall or module loading interface is available?

I used the current Docker snap in edge extensively this week and I have these pieces of feedback:
i) workaround the module loading issue by adding "iptables -L >/dev/null" in /etc/rc.local

ii) dockerd seems to load another appamor profile for the actual containers; I believe this is "docker-default"; docker-default misses some useful syscalls preventing e.g. to kill processes (you can repo with this snippet in a container:

sleep 10 &
kill $!

(Interestingly, the process can't even be killed from a root shell outside of the container because of the AppArmor profile guarding this!)

iii) I don't think we should keep docker.sock in /var/run or /run; first, this conflicts with the .deb when running snapd on server installs, so you can't have both the docker snap and the docker .deb; second, in some cases it might be useful to embed dockerd in other snaps and run it on a different socket/port (e.g. kubernetes runs multiple docker daemons; this could also be used to control the version of dockerd)

TLDR; is it enough to workaround the iptables module loading issue right now and improve this in updates?

Thanks,

  • Loïc Minier
Contributor

morphis commented Aug 14, 2016

@lool Actually its not really much work to implement the kernel module loading part for interfaces as described in https://bugs.launchpad.net/snappy/+bug/1583514

My vote would be to get this implemented rather than taking another workaround until we wait for it like we did already when we've added the ppp/modemmanager interfaces.

Contributor

lool commented Aug 14, 2016

@morphis On rereading https://bugs.launchpad.net/snappy/+bug/1583514, it seems indeed relatively straightforward; I've also checked that builtin modules wouldn't cause an error (modprobing a builtin module returns success).

Contributor

jdstrand commented Aug 15, 2016

"ii) dockerd seems to load another appamor profile for the actual containers; I believe this is "docker-default"; docker-default misses some useful syscalls preventing e.g. to kill processes (you can repo with this snippet in a container:
sleep 10 &
kill $!
"
Are these syscalls that are missing or apparmor rules?

Assuming seccomp, @tianon and I discussed this and I thought we were using @unrestricted for now. However, it would probably be best to add everything from https://github.com/snapcore/snap-confine/blob/master/debian/README.syscalls and then comment out some as dangerous:

var dockerPermanentSlotSecComp = []byte(`
# The Docker daemon needs to be able to launch arbitrary processes
# within containers (whose syscall needs are unknown beforehand)

# Allow all syscalls
accept
accept4
access
acct
...

# Except some we don't ever want to allow
# kexec
#kexec_load

# kernel modules
#create_module
#init_module
#finit_module
#delete_module

# these have a history of vulnerabilities, are not widely used, and
# open_by_handle_at has been used to break out of docker containers by brute
# forcing the handle value: http://stealth.openwall.net/xSports/shocker.c
#name_to_handle_at
#open_by_handle_at

# Explicitly deny ptrace since it can be abused to break out of the seccomp
# sandbox
#ptrace

# Explicitly deny kernel keyring access
#add_key
#keyctl
#request_key

" iii) I don't think we should keep docker.sock in /var/run or /run; first, this conflicts with the .deb when running snapd on server installs, so you can't have both the docker snap and the docker .deb; second, in some cases it might be useful to embed dockerd in other snaps and run it on a different socket/port (e.g. kubernetes runs multiple docker daemons; this could also be used to control the version of dockerd)
"

I'm conflicted on this. People who have developed software around docker may want to use the docker snap instead of the deb and might expect the socket to be in the same location. As for kubernetes, it is free to put its sockets in SNAP_DATA if it wants, so I don't think there is a limitation there? This should be discussed with the snappy architects (conflicting with debs) and won't be solved in this PR.

"TLDR; is it enough to workaround the iptables module loading issue right now and improve this in updates?
"

I agree with Simon-- someone should implement the kernel module loading backend for interfaces. We are needing it for several things.

interfaces/builtin/docker.go
+ "github.com/snapcore/snapd/interfaces"
+)
+
+var dockerPermanentSlotAppArmor = []byte(`
@jdstrand

jdstrand Aug 15, 2016

Contributor

Can you use 'const' here instead of 'var'?

@tianon

tianon Aug 15, 2016

Contributor

Not for []byte, but I can switch these to string and use a cast in the functions instead -- would that be preferable?

interfaces/builtin/docker.go:96: const initializer ([]byte)("\n# Description: Allow operating as the Docker daemon. Reserved because this\n#  gives privileged access to the system.\n# Usage: reserved\n\n# Allow sockets\n/{,var/}run/docker.sock rw,\n/{,var/}run/docker/     rw,\n/{,var/}run/docker/**   mrwklix,\n/{,var/}run/runc/       rw,\n/{,var/}run/runc/**     mrwklix,\n\n# Wide read access to /proc, but somewhat limited writes for now\n@{PROC}/ r,\n@{PROC}/** r,\n@{PROC}/[0-9]*/attr/exec w,\n@{PROC}/sys/net/** w,\n@{PROC}/[0-9]*/cmdline r,\n@{PROC}/[0-9]*/oom_score_adj w,\n\n# Wide read access to /sys\n/sys/** r,\n# Limit cgroup writes a bit\n/sys/fs/cgroup/*/docker/   rw,\n/sys/fs/cgroup/*/docker/** rw,\n\n# Allow tracing ourself (especially the \"runc\" process we create)\nptrace (trace) peer=@{profile_name},\n\n# Docker needs a lot of caps, but limits them in the app container\ncapability,\n\n# Docker does all kinds of mounts all over the filesystem\n/dev/mapper/control rw,\n/dev/mapper/docker* rw,\n/dev/loop* r,\n/dev/loop[0-9]* w,\nmount,\numount,\n\n# After doing a pivot_root using <graph-dir>/<container-fs>/.pivot_rootNNNNNN,\n# Docker removes the leftover /.pivot_rootNNNNNN directory (which is now\n# relative to \"/\" instead of \"<graph-dir>/<container-fs>\" thanks to pivot_root)\npivot_root,\n/.pivot_root[0-9]*/ rw,\n\n# file descriptors (/proc/NNN/fd/X)\n/[0-9]* rw,\n# file descriptors in the container show up here due to attach_disconnected\n\n# Docker needs to be able to create and load the profile it applies to containers (\"docker-default\")\n# XXX We might be able to get rid of this if we generate and load docker-default ourselves and make docker not do it.\n/sbin/apparmor_parser ixr,\n/etc/apparmor.d/cache/ r,\n/etc/apparmor.d/cache/.features r,\n/etc/apparmor.d/cache/docker* rw,\n/etc/apparmor/parser.conf r,\n/etc/apparmor/subdomain.conf r,\n/sys/kernel/security/apparmor/.replace rw,\n\n# We'll want to adjust this to support --security-opts...\nchange_profile -> docker-default,\nsignal (send) peer=docker-default,\nptrace (read, trace) peer=docker-default,\n\n# Graph (storage) driver bits\n/dev/shm/aufs.xino rw,\n\n#cf bug 1502785\n/ r,\n") is not a constant
interfaces/builtin/docker.go:107: const initializer ([]byte)("\n# Description: Allow using Docker. Reserved because this gives\n#  privileged access to the service/host.\n# Usage: reserved\n\n# Obviously need to be able to talk to the daemon\n/{,var/}run/docker.sock rw,\n\n@{PROC}/sys/net/core/somaxconn r,\n") is not a constant
interfaces/builtin/docker.go:112: const initializer ([]byte)("\n# The Docker daemon needs to be able to launch arbitrary processes within containers (whose syscall needs are unknown beforehand)\n@unrestricted\n") is not a constant
interfaces/builtin/docker.go:117: const initializer ([]byte)("\nsetsockopt\nbind\n") is not a constant
@jdstrand

jdstrand Aug 15, 2016

Contributor

Yes, sorry, I meant to say that too. There are inconsistencies in the current interfaces where some are var []byte and other are const strings and we'll be moving all these policy variables to const strings based on feedback from Gustavo in the most recent pulseaudio PR.

@tianon

tianon Aug 15, 2016

Contributor

Cool, will do! (and this time with a new commit instead of a force push 😇)

interfaces/builtin/docker.go
+/ r,
+`)
+
+var dockerConnectedPlugAppArmor = []byte(`
@jdstrand

jdstrand Aug 15, 2016

Contributor

Can you use 'const' here instead of 'var'?

interfaces/builtin/docker.go
+@{PROC}/sys/net/core/somaxconn r,
+`)
+
+var dockerPermanentSlotSecComp = []byte(`
@jdstrand

jdstrand Aug 15, 2016

Contributor

Can you use 'const' here instead of 'var'?

interfaces/builtin/docker.go
+@unrestricted
+`)
+
+var dockerConnectedPlugSecComp = []byte(`
@jdstrand

jdstrand Aug 15, 2016

Contributor

Can you use 'const' here instead of 'var'?

Contributor

tianon commented Aug 15, 2016

Failing to kill processes inside the container is definitely something missing in the docker-default AppArmor profile -- I'll take a look at that today (assuming my guess is correct, that'll be a patch to Docker itself to fix, so won't have any bearing on this PR).

Regarding seccomp, I'm not going to be a good judge of what's considered "dangerous" and what isn't, but I would be happy to compile the two lists of "seccomp privileges Docker itself needs" and "Docker's default seccomp whitelist for containers" -- I think that'd be the smallest possible set we could reasonably apply by default, and then we could leave the larger set (or even @unrestricted) to the docker-privileged interface. Thoughts?

Contributor

jdstrand commented Aug 15, 2016

"Regarding seccomp, I'm not going to be a good judge of what's considered "dangerous" and what isn't, but I would be happy to compile the two lists of "seccomp privileges Docker itself needs" and "Docker's default seccomp whitelist for containers" -- I think that'd be the smallest possible set we could reasonably apply by default, and then we could leave the larger set (or even @unrestricted) to the docker-privileged interface. Thoughts?"

Yes, I like this approach. Let's use the larger set I outlined rather than @unrestricted with docker-privileged and see how it goes.

tianon added some commits Aug 15, 2016

Replace Docker's "@unrestricted" seccomp policy with the smaller set …
…of "Docker daemon requirements" + "default Docker container seccomp whitelist"
Contributor

tianon commented Aug 15, 2016

As a matter of organization, should I put the docker-privileged interface in this same file (docker.go), or a separate docker_privileged.go file?

interfaces/builtin/docker.go
+@{PROC}/** r,
+@{PROC}/[0-9]*/attr/exec w,
+@{PROC}/sys/net/** w,
+@{PROC}/[0-9]*/cmdline r,
@jdstrand

jdstrand Aug 16, 2016

Contributor

This is covered by the @{PROC}/** r, rule.

interfaces/builtin/docker.go
+@{PROC}/ r,
+@{PROC}/** r,
+@{PROC}/[0-9]*/attr/exec w,
+@{PROC}/sys/net/** w,
@jdstrand

jdstrand Aug 16, 2016

Contributor

What, if anything, of this is not covered by the 'network-control' interface?

interfaces/builtin/docker.go
+@{PROC}/[0-9]*/oom_score_adj w,
+
+# Wide read access to /sys
+/sys/** r,
@jdstrand

jdstrand Aug 16, 2016

Contributor

Curious why this needs to be so broad in the 'unprivileged' policy.

interfaces/builtin/docker.go
+/dev/loop* r,
+/dev/loop[0-9]* w,
+mount,
+umount,
@jdstrand

jdstrand Aug 16, 2016

Contributor

With the 'unprivileged' interface, are the mounts expected to be in a particular location? Eg, in SNAP_DATA, HOME, SNAP_USER_DATA, somewhere?

interfaces/builtin/docker.go
+
+# file descriptors (/proc/NNN/fd/X)
+/[0-9]* rw,
+# file descriptors in the container show up here due to attach_disconnected
@jdstrand

jdstrand Aug 16, 2016

Contributor

Can you move this comment up above the rule it is meant to refer to?

+/etc/apparmor.d/cache/docker* rw,
+/etc/apparmor/parser.conf r,
+/etc/apparmor/subdomain.conf r,
+/sys/kernel/security/apparmor/.replace rw,
@jdstrand

jdstrand Aug 16, 2016

Contributor

What happened with the idea of docker not loading the policy? If that is unacceptable, I might prefer to create a child profile for apparmor_parser (I can help with that). If we go that route, we'd want to add a audit deny capability mac_admin, rule to the parent profile.

In many ways, with the mount rules, mac_admin, etc this is security theater but instead of thinking of it that way, I'd prefer to think of the docker policy as 'advisory' where we guard against a misbehaving docker as opposed to guarding against a malicious or attacker-controlled docker. We should probably adjust the comments in this file and docs/interfaces.md to mention that....

@tianon

tianon Aug 16, 2016

Contributor

It's possible to stop Docker from loading the policy, but the patches are going to be a bit invasive (and upstream prefers that we keep our patches to a minimum, which is understandable). A child profile for apparmor_parser sounds interesting, but way over my head ATM, so any help or more specific direction you can provide would be really appreciated. 😄

@jdstrand

jdstrand Aug 17, 2016

Contributor

If you give me a series 16 snap I can play with the child profile tomorrow.

@tianon

tianon Aug 17, 2016

Contributor

https://github.com/infosiftr/snap-docker/tree/docker-interface has the current WIP docker-interface-using snap code -- I can give you a built .snap if you'd prefer, but I figure this'd give you more freedom to experiment 😄

(probably want to clone that with -b docker-interface --single-branch so that it doesn't clone the massive 15.04 branch... 😞)

@jdstrand

jdstrand Aug 29, 2016

Contributor

@tianon - I built the docker snap and built this branch. I did:

$ sudo /tmp/snap install /tmp/docker*snap
$ sudo /tmp/snap connect docker:firewall-control ubuntu-core:firewall-control
$ sudo /tmp/snap connect docker:docker docker:docker
$ sudo systemctl stop snap.docker.dockerd
$ sudo systemctl start snap.docker.dockerd

The workaround a bug in the template policy (PR already done) and add to /var/lib/snapd/apparmor/profiles/snap.docker.docker and dockerd: /run/systemd/journal/socket w, and reload the policy with sudo apparmor_parser -r /var/lib/snapd/apparmor/profiles/snap.docker.*.

I can download an image fine with:

$ sudo /snap/bin/docker pull ubuntu:xenial
$ sudo /snap/bin/docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
ubuntu              xenial              bd3d4369aebc        3 days ago          126.6 MB

But running a command fails with:

$ sudo /snap/bin/docker run -i -t ubuntu:xenial uptime
docker: Error response from daemon: Container command 'uptime' not found or does not exist..

Am I doing something wrong?

@lool

lool Aug 30, 2016

Contributor

@jdstrand I have the same issues since some days; I couldn't identify what caused the regression, but the docker snap in edge channel is broken too in the same way (can't run hello-world image). The first error is auplink not being in PATH, hence snapcore/snapcraft#763
the next error is cryptic and I can't figure it out.

Note however that this only affects classic; the snap works on Ubuntu Core!

@jdstrand

jdstrand Sep 12, 2016

Contributor

@tianon - I've decided that a child profile for apparmor_parser will not be meaningful so we shouldn't do it.

interfaces/builtin/docker.go
+
+const dockerPermanentSlotAppArmor = `
+# Description: Allow operating as the Docker daemon. Reserved because this
+# gives privileged access to the system.
@jdstrand

jdstrand Aug 16, 2016

Contributor

Can you update this to be:

# Description: allow operating as the Docker daemon. This policy is intentionally
# not restrictive and is here to help guard against programming errors and not
# for security confinement. The Docker daemon by design requires extensive
# access to the system and cannot be effectively confined against malicious
# activity.
interfaces/builtin/docker.go
+@{PROC}/sys/net/core/somaxconn r,
+`
+
+const dockerPermanentSlotSecComp = `
@jdstrand

jdstrand Aug 16, 2016

Contributor

Can you add this at the top:

# Description: allow operating as the Docker daemon. This policy is intentionally
# not restrictive and is here to help guard against programming errors and not
# for security confinement. The Docker daemon by design requires extensive
# access to the system and cannot be effectively confined against malicious
# activity.
interfaces/builtin/docker.go
+# /snap/docker/VERSION/bin/docker-runc
+# "do not inherit the parent's session keyring"
+# "make session keyring searcheable"
+keyctl
@jdstrand

jdstrand Aug 16, 2016

Contributor

Hrmm

@jdstrand

jdstrand Aug 29, 2016

Contributor

Reading that code it seems that docker is trying to make sure the container doesn't have access to the keyring. Can you clarify the comment on this and we'll allow it.

@tianon

tianon Sep 2, 2016

Contributor

Ok, updated comments in 748750e; was that what you had in mind?

interfaces/builtin/docker.go
+getuid
+getuid32
+getxattr
+init_module
@jdstrand

jdstrand Aug 16, 2016

Contributor

Can we exclude init_module, finit_module, create_module, query_module and delete_module (move to the top, add a note about why and list these syscalls but commented out) and leave that to the 'privileged' interface? The (to be implemented) KernelModules backend can then be used for what 'unprvileged' needs (most things should autoload, we only should need KerneModules for those that don't).

interfaces/builtin/docker.go
+process_vm_readv
+process_vm_writev
+pselect6
+ptrace
@jdstrand

jdstrand Aug 16, 2016

Contributor

This is curious since ptrace can be used to escape the seccomp sandbox until something along the lines of https://lkml.org/lkml/2016/5/26/354 is applied.

Can you add a comment for this:

# ptrace can be abused to break out of the seccomp sandbox
# but is required by the Docker daemon.
ptrace
+writev
+`
+
+const dockerConnectedPlugSecComp = `
@jdstrand

jdstrand Aug 16, 2016

Contributor

Please add this comment:

# Description: allow access to the Docker daemon socket. This gives
# privileged access to the system via Docker's socket API.
@tianon

tianon Aug 16, 2016

Contributor

GitHub doesn't reflect it since it changed the next line, not this one, but this is added. 👍

interfaces/builtin/docker.go
+
+const dockerConnectedPlugAppArmor = `
+# Description: Allow using Docker. Reserved because this gives
+# privileged access to the service/host.
@jdstrand

jdstrand Aug 16, 2016

Contributor

Can you change this to:

# Description: allow access to the Docker daemon socket. This gives
# privileged access to the system via Docker's socket API.
interfaces/builtin/docker.go
+# privileged access to the service/host.
+# Usage: reserved
+
+# Obviously need to be able to talk to the daemon
@jdstrand

jdstrand Aug 16, 2016

Contributor

Please change to: # Allow talking to the docker daemon

interfaces/builtin/docker.go
+# Obviously need to be able to talk to the daemon
+/{,var/}run/docker.sock rw,
+
+@{PROC}/sys/net/core/somaxconn r,
@jdstrand

jdstrand Aug 16, 2016

Contributor

This is actually part of both network and network-bind. Docker daemon allows fetching images so I think removing this and having the slot implementation also use plugs: [ network ] would be the way to go.

interfaces/builtin/docker.go
+munlock
+munlockall
+munmap
+name_to_handle_at
@jdstrand

jdstrand Aug 16, 2016

Contributor

Please move name_to_handle_at and open_by_handle_at up to the top of the policy and disable them with:

# These have a history of vulnerabilities, are not widely used, and
# open_by_handle_at has been used to break out of docker containers by brute
# forcing the handle value: http://stealth.openwall.net/xSports/shocker.c
#name_to_handle_at
#open_by_handle_at
+readlink
+readlinkat
+readv
+reboot
@jdstrand

jdstrand Aug 16, 2016

Contributor

Fun! Why is this needed? So containers can use this?

@tianon

tianon Aug 16, 2016

Contributor

I'm not actually sure -- it's included in Docker's default whitelist, which as I understand was built via trial and error, but I imagine we could probably get away with not including it?

@jdstrand

jdstrand Aug 29, 2016

Contributor

Looking at https://github.com/docker/docker/blob/master/docs/security/seccomp.md, reboot is not allowed by default. I then compared that list to this list and found that the following are allowed in this policy but not allowed in the docker default policy:

acct
adjtimex
bpf
clone
ioperm
iopl
kcmp
lookup_dcookie
mount
perf_event_open
personality
process_vm_readv
process_vm_writev
reboot
setns
settimeofday
umount
umount2
unshare

Since you say in this file "This list comes from Docker's default seccomp whitelist" I'd like to better understand these. First off, I think that the comment should say this instead: "This list comes from Docker's default seccomp whitelist plus syscalls required for docker to properly function." As such, it seems quite clear why clone, mount, setns, umount, umount2 and unshare are needed by docker to funtion. Can you comment on the others?

@tianon

tianon Sep 2, 2016

Contributor

The markdown file is definitely out of date -- I scraped these directly from Docker's profile code, which is in Go. I don't know why many of them are here, and it's probably safe to remove them, but then we'll be applying a smaller mask to Docker containers than Docker does (which may or may not be acceptable -- things like settimeofday are probably safe to deny by default, especially if our privileged solution adds them back). I'm happy to adjust it either way you'd like (and point to the markdown file instead of the Go code, since I think our comment currently points to the Go code I scraped this list from).

interfaces/builtin/docker.go
+# $ grep -C1 'ActAllow' -- profiles/seccomp/seccomp_default.go \
+# | grep 'Name:' \
+# | cut -d'"' -f2 \
+# | sort -u
@jdstrand

jdstrand Aug 16, 2016

Contributor

Please add:

# Because seccomp may only go more strict, we must allow all
# all syscalls to Docker that it expects to give to containers
# in addition to what it needs to run and trust that docker daemon
# only gives out reasonable syscalls to containers.
@jdstrand

jdstrand Aug 16, 2016

Contributor

Along those lines-- perhaps it would be nice to instead of giving a sorted list, use this outline:

  • all syscalls we aren't going to allow (ie commented out) with a comment why (eg, name_to_handle_at, kernel module calls, etc)
  • all syscalls docker daemon itself needs to run (eg, (presumably) mount, umount, umount2, etc)
  • the syscalls docker daemon expects to give to containers

This will allow for auditing the policy a bit better and gives clues as to where to look in docker deeper inspection.

interfaces/builtin/docker.go
+
+func (iface *DockerInterface) PermanentPlugSnippet(plug *interfaces.Plug, securitySystem interfaces.SecuritySystem) ([]byte, error) {
+ switch securitySystem {
+ case interfaces.SecurityAppArmor, interfaces.SecurityDBus, interfaces.SecurityMount, interfaces.SecuritySecComp, interfaces.SecurityUDev:
@jdstrand

jdstrand Aug 16, 2016

Contributor

Can you run go fmt on these files? I think that will line wrap these better.

@tianon

tianon Aug 16, 2016

Contributor

gofmt doesn't seem to mind this; I even used -s and it didn't change anything. 😞

I'll play a bit and see if I can get something with a bit more whitespace that gofmt is still happy with.

+ snippet, err = s.iface.ConnectedSlotSnippet(s.plug, s.slot, "foo")
+ c.Assert(err, Equals, interfaces.ErrUnknownSecurity)
+ c.Assert(snippet, IsNil)
+}
@jdstrand

jdstrand Aug 16, 2016

Contributor

Can you add an auto-connect test? Eg,

func (s *DockerInterfaceSuite) TestAutoConnect(c *C) {
    c.Check(s.iface.AutoConnect(), Equals, false)
}
@jdstrand

jdstrand Aug 16, 2016

Contributor

Please also add a test for TestConnectedPlugSnippet() and search for the docker socket rule for apparmor and bind for seccomp. You can see an example of this sort of thing in mpris_test.go (but there are others).

Please do the same for TestPermanentSlotSnippet() with an appropriate string search for both apparmor and seccomp.

@tianon

tianon Aug 17, 2016

Contributor

These were added in a9e1b29, although I'm not positive that I've done them the way you wanted, so comments are definitely appreciated. 😄 😇

@jdstrand

jdstrand Aug 29, 2016

Contributor

They look fine, thanks!

interfaces/builtin/docker_test.go
+ c.Assert(s.iface.Name(), Equals, "docker")
+}
+
+// TODO more tests regarding the contents of Docker's security profiles (especially AppArmor)?
@jdstrand

jdstrand Aug 16, 2016

Contributor

I think any tests for security profiles you'll want to put in the spread tests because the unit tests won't have the privileges needed to exercise this properly.

@tianon

tianon Aug 17, 2016

Contributor

Heh, I was actually referring more to the TestConnectedPlugSnippet and TestPermanentSlotSnippet you asked me to add (which the example I'd started this file from had but I wasn't sure exactly what I should do with). If there are also some kind of spread tests I should add, pointers would be appreciated. 😇

Contributor

jdstrand commented Aug 16, 2016

"As a matter of organization, should I put the docker-privileged interface in this same file (docker.go), or a separate docker_privileged.go file?"

That is a good question. It is easy enough to have it in one file (eg, take a look at the recent browser-support interface with the 'allow-sandbox' attribute), but the question then becomes, what does this look like to the user and the developer?

We could do something like this (option 1. Note there are a lot of ways to do the yaml due to snappy's flexibility, but I'm striving for clarity over efficiency in all of these):

name: docker
slots:
  docker-privileged:
    interface: docker
    privileged: true
  docker-unprivileged:
    interface: docker
    privileged: false

apps:
  docker-daemon:
    command: path/to/docker
    daemon: simple
    slots: [ docker-unprivileged ]
  docker-daemon-privileged:
    slots: [ docker-privileged ]
    command: path/to/docker --privileged
    daemon: simple
  docker:
    plugs: [ docker ]

On the one hand this is kinda nice because everything is in one snap and the interfaces are clean, but all this is on the slot side policy and today, permanent slot policy is autoconnected so the above doesn't work (docker-daemon and docker-daemon-privileged get started at the same time).

We could instead do something like (option 2):

name: docker
plugs:
  docker-privileged:
    interface: docker
    privileged-daemon: true
  docker-cli:
    interface: docker
    privileged-daemon: false

apps:
  docker-daemon:
    command: path/to/docker
    daemon: simple
    slots: [ docker ]
    plugs: [ docker-privileged ]
  docker:
    plugs: [ docker-cli ]

This would auto-connect the PermanentSlot policy (slots: [ docker ]) but not the ConnectedPlug policy (plugs: [ docker-privileged ]) for docker-daemon. In this manner if the user wanted to give docker daemon the privileged policy, she would do: snap connect docker:docker-privileged docker:docker-privileged, but looks a bit weird (and we'd have to test if it works).

Another alternative (option 3) is to have a docker-privileged interface for just ConnectedPlug policy and have it be an implicit slot. The yaml is almost the same:

name: docker
plugs:
  docker-privileged: null
  docker-cli:
    interface: docker

apps:
  docker-daemon:
    command: path/to/docker
    daemon: simple
    slots: [ docker ]
    plugs: [ docker-privileged ]
  docker:
    plugs: [ docker-cli ]

This would auto-connect the PermanentSlot policy (slots: [ docker ]) but not the ConnectedPlug policy (plugs: [ docker-privileged ]) for docker-daemon. In this manner if the user wanted to give docker daemon the privileged policy, she would do: snap connect docker:docker-privileged ubuntu-core:docker-privileged (or eventually, snap connect docker:docker-privileged :docker-privileged. This also looks weird to me and has the side-effect of having docker-privileged show up in snap interfaces when docker is not installed.

A variation on this (option 4) would be:

name: docker
plugs:
  docker-privileged: null
  docker-cli:
    interface: docker

apps:
  docker-daemon:
    command: path/to/docker
    daemon: simple
    slots: [ docker, docker-privileged ]
    plugs: [ docker-privileged ]
  docker:
    plugs: [ docker-cli ]

where the PermanentSlot policy for docker-privileged is empty and the interface connection is: snap connect docker:docker-privileged docker:docker-privileged.

Of these I prefer option 2. Option 4 has equivalent user experience with confusing yaml and implementation. Option 3 has an unwanted implicit slot. Option 1 doesn't work right with how slots are auto-connected. One thing about option 2 is that if we really hate the snap connect line, we can perhaps refine snap connect to know how to manually connect interface attributes in the future.

Contributor

jdstrand commented Aug 16, 2016

@zyga and @niemeyer (cc @ejratl) - can you weigh in on my (lengthy) discussion regarding docker-privileged as an interface attribute vs a separate interface?

Important context:

  • most docker users do not want docker to run with --privileged so we don't want the default security policy to support it
  • ideally we don't want two snaps to be required (ie, don't want docker snap for normal users and docker-privileged snap for those who need --privileged)
  • ideally a manual snap connect command could be used to add --privileged support to the security policy
  • we don't want to clutter snapd's interfaces with extraneous interfaces
Contributor

tianon commented Aug 17, 2016

Regarding privileged discussion, here are my thoughts:

Option 1 isn't really tenable from the perspective of Docker IMO since the two daemons would have entirely different state and different sockets for access (and not all folks even need privileged, so that second daemon would likely sit dormant anyhow), but even more importantly would store images separately, so the storage toll on the user is likely to balloon out of control quickly, not to mention usability suffering (have to explicitly point the client at one daemon or the other).

Option 2 sounds the most appealing from my naive perspective -- easy for users to enable/disable the privileged behavior, even if the snap connect line is a bit verbose (but not necessarily unexpectedly so). I think users needing this functionality will either figure it out or copy/paste the example from somewhere. 😇

Contributor

jdstrand commented Aug 18, 2016

Marking as 'Blocked' for now since it isn't ready to land. I know I have some things to circle back on with this PR and will in the coming days.

@jdstrand jdstrand added the Blocked label Aug 18, 2016

@niemeyer niemeyer changed the title from Add initial "docker" interface based on some of 15.04's privileges to interfaces/builtin: add initial docker interface Aug 26, 2016

@niemeyer niemeyer added the Decaying label Aug 26, 2016

@jdstrand jdstrand removed the Decaying label Aug 29, 2016

Contributor

jdstrand commented Aug 29, 2016

@tianon - where did we land on 'privileged' wrt to this PR? Will it be included here or in a future PR?

chuckbutler commented Aug 31, 2016

[EDIT]
disregard, I found what I was looking for in the lengthy discussion preceding the comment.

Contributor

jdstrand commented Aug 31, 2016

Also, for docker to start at all someone needs to implement: https://bugs.launchpad.net/snappy/+bug/1583514/comments/5

(notes on how to implement KernelModule backed).

Contributor

jdstrand commented Aug 31, 2016

Ok, I was able to get this running on all snaps (using overlayfs to replace snapd) if I added :$SNAP/usr/sbin:$SNAP/sbin to the wrappers, then after doing the interface connections, adding this to the snap.docker.dockerd apparmor policy:

/proc/fs/aufs/plink_maint w,
/sys/fs/aufs/** r,
/sys/kernel/security/apparmor/{,**} r,
#include <abstractions/dbus-strict>
Contributor

lool commented Sep 2, 2016

(FYI the /sbin in wrappers was merged in snapcraft snapcore/snapcraft#763)

Contributor

tianon commented Sep 2, 2016

@jdstrand regarding privileged -- I think there was still a decision to be made regarding which direction we took for implementing it (#1619 (comment), #1619 (comment); also whether it was going to be a more permissive set or simply unconfined more like lxd-support, since with docker run --privileged, users can technically do anything with the host system, and I've seen a fair number of what you might call "creative" uses of that to do interesting things)

Contributor

jdstrand commented Sep 12, 2016

Note a recent commit broke the way this interface is written and intended to be used: #1848 (comment)

Contributor

jdstrand commented Sep 12, 2016

https://bugs.launchpad.net/snappy/+bug/1583514 is still unassigned which makes this snap difficult to use. Who is going to work on that?

Contributor

jdstrand commented Sep 12, 2016

FYI, I've found that using this PR with this yaml:

plugs:
  docker-cli:
    interface: docker
slots:
  docker-daemon:
    interface: docker

apps:
  compose:
    command: command-compose.wrapper
    plugs:
    - docker-cli
    - network
    - home
  docker:
    command: command-docker.wrapper
    plugs:
    - docker-cli
    - network
    - home
  dockerd:
    command: command-dockerd.wrapper
    daemon: simple
    plugs:
    - network-bind
    - firewall-control
    slots:
    - docker-daemon
architectures:
- amd64
confinement: strict
description: Docker complements kernel namespacing with a high-level API which operates
  at the process level. It runs unix processes with strong guarantees of isolation
  and repeatability across servers.
name: docker
summary: Docker Linux container runtime
version: 1.11.2-5+jdstrand2

Then I can sideload the snap, then do:

$ sudo iptables -L
$ sudo ip6tables -L
$ sudo snap connect docker:docker-cli docker:docker-daemon
$ sudo snap connect docker:firewall-control ubuntu-core:firewall-control
$ sudo snap connect docker:home ubuntu-core:home
$ sudo service snap.docker.dockerd stop
$ sudo service snap.docker.dockerd start

Then I can use a trusty container like so:

$ sudo docker pull ubuntu:trusty
$ sudo docker run ubuntu:trusty uptime
 21:07:14 up  4:01,  0 users,  load average: 0.00, 0.04, 0.01

I verified that launching app containers correctly results in applications running under the 'docker-default' profile. At this point, what remains is:

  1. figuring out the deal with #1848 (comment)
  2. getting someone assigned to https://bugs.launchpad.net/snappy/+bug/1583514
  3. working out how to implement privileged mode. I think option #2 is the way to go
  4. adjusting the docker plugs interface to work like lxd-support to restrict access to this plug to snaps of a particular name from a particular publisher (since this interface is so privileged)

This last point is a new requirement and is here since snap declarations aren't yet. Once snap declarations are in place, this can be removed. Thankfully, this is easy to implement: https://github.com/snapcore/snapd/pull/1733/files#diff-cb4146608474e85ac3a939ff9c288b83R94. For now, I suggest using 'canonical' as the devName. By the time we are ready to hand this over to Docker we should have snap declarations in place (and it is easy to adjust if they aren't ready).

I'll follow-up on '1' and will provide a starting point for a diff for option #2 (I want to look at it a little bit and then will hand-off). Feel free to do '3' and '4' now.

Contributor

jdstrand commented Sep 12, 2016

"3. working out how to implement privileged mode. I think option #2 is the way to go"

Ok, here is what option #2 looks like:

$ snap install ./docker*snap --dangerous # dangerous for now is needed without an assertion
$ snap interfaces|grep docker
Slot                    Plug
docker:docker-daemon    -
:network                docker
:network-bind           docker
-                       docker:docker-cli
-                       docker:docker-privileged
-                       docker:firewall-control
-                       docker:home

# these would auto-connect via snap declaration, but for now, manually connect the regular interfaces
$ sudo snap connect docker:docker-cli docker:docker-daemon
$ sudo snap connect docker:firewall-control ubuntu-core:firewall-control
$ sudo snap connect docker:home ubuntu-core:home

$ snap interfaces|grep docker
docker:docker-daemon    docker:docker-cli
:firewall-control       docker
:home                   docker
:network                docker
:network-bind           docker
-                       docker:docker-privileged

# connecting the privileged interface is always manual by the user
$ sudo snap connect docker:docker-privileged docker:docker-daemon

Here is some stub code for that: jdstrand/snapd@598951f

Here is what the relevant parts of snap.yaml for the above:

plugs:
  docker-cli:
    interface: docker
  docker-privileged:
    interface: docker
    daemon-privileged: true
slots:
  docker-daemon:
    interface: docker

apps:
  compose:
    command: command-compose.wrapper
    plugs:
    - docker-cli
    - network
    - home
  docker:
    command: command-docker.wrapper
    plugs:
    - docker-cli
    - network
    - home
  dockerd:
    command: command-dockerd.wrapper
    daemon: simple
    plugs:
    - network-bind
    - firewall-control
    - docker-privileged
    slots:
    - docker-daemon
Contributor

jdstrand commented Sep 15, 2016

FYI, someone has started working on "2. getting someone assigned to https://bugs.launchpad.net/snappy/+bug/1583514"

Contributor

lool commented Sep 19, 2016

I hadn't fully grasped the design of privileged mode at first, I thought a single docker daemon would switch to granting runs of privileged containers for all clients as soon as the interface was connected, but Jamie clarified that each client would connect to the privileged-container interface and that would allow running privileged containers or not – that sounds perfect to me :-)

Contributor

jdstrand commented Sep 19, 2016

After discussing with Loic, we decided the path forward would be to merge this interface as is and I would implement the privileged mode attribute and the limiting to a particular snap and publisher. The kmod backend is assigned and being worked on. Point '1' doesn't block this PR so can be handled separately. No longer Blocked.

+1

@jdstrand jdstrand removed the Blocked label Sep 19, 2016

LGTM in its current form. Will handle remaining points in separate PRs.

@jdstrand We cannot merge this until we have logic limiting the use of this slot to the docker from store only, because this interface is giving wide system access at the slot end even before the connection is established. It's okay to be on a different PR, but these two PRs need to be merged at once please.

@niemeyer niemeyer added the Blocked label Sep 19, 2016

Contributor

tianon commented Sep 19, 2016

@jdstrand happy to implement any bits of that you don't think will be lots faster if you just do them 😇

Adding the check similar to LXD's seemed really straightforward for sure, and I can still take a look at the rest of your comments if you'd like (just been a bit buried the past few days).

Contributor

jdstrand commented Sep 19, 2016

Since this PR needs to be refreshed against trunk and there are a few other small things to address besides points 3 and 4, I'm going to close this and point people to #1947

@jdstrand jdstrand closed this Sep 19, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment