Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
interfaces/builtin: add initial docker interface #1619
Conversation
|
Can one of the admins verify this patch? |
|
Can one of the admins verify this patch? |
|
Does dockerd provide a port and listen over TCP or UDP? If so, then network-bind is appropriate. If not, please add what you need to the docker slot side policy. I'll go through the rest of the interface now as a sort of pre-review. Thanks for working on this! :) |
jdstrand
reviewed
Aug 2, 2016
| +# Usage: reserved | ||
| + | ||
| +# Allow sockets | ||
| +/{,var/}run/docker.sock rw, |
tianon
Aug 2, 2016
Contributor
Is it? It's /run/docker.sock vs /run/docker/**, so it's not really, right? (the socket is directly in /run, not in the /run/docker directory)
tianon
Aug 2, 2016
Contributor
(not as familiar with the AppArmor language/globbing as I need to be, but this was my understanding)
jdstrand
Aug 2, 2016
Contributor
Sorry, you are correct, I misread the rule. Please disregard that comment.
jdstrand
reviewed
Aug 2, 2016
| +/{,var/}run/docker/ rw, | ||
| +/{,var/}run/docker/** mrwklix, | ||
| +/{,var/}run/runc/ rw, | ||
| +/{,var/}run/runc/** mrwklix, |
jdstrand
Aug 2, 2016
Contributor
Should runc be in /run or can it be in SNAP_DATA? I ask because in general if sockets/etc are exposed in /run then I think they should be well-known and I'm not familiar enough with runc to know if it needs to be there for just docker or for other things. Put another way, /run/docker makes sense since this matches the snap name, but if someone implemented a runc snap or interface, the two would collide. I'm not sure this is a blocker, but it is at least a consideration.
tianon
Aug 2, 2016
Contributor
It probably can be in SNAP_DATA, but would require patching Docker/runc to accomplish. The way runc is designed, it's supposed to be safe for separate instances to share this directory (especially since runc doesn't have a "daemon", so every container invoked with runc is technically a new instance sharing this directory).
jdstrand
reviewed
Aug 2, 2016
| + | ||
| +# Wide read access to /sys | ||
| +/sys/** r, | ||
| +# Limit cgroup writes a bit |
tianon
Aug 2, 2016
Contributor
Hah, I can only take credit for small parts of this (and for limiting a bit what I copied over) -- most of the credit should go to whoever developed the 15.04 Docker snap before me.
jdstrand
reviewed
Aug 2, 2016
| +/sys/fs/cgroup/*/docker/ rw, | ||
| +/sys/fs/cgroup/*/docker/** rw, | ||
| +/sys/fs/cgroup/*/system.slice/ rw, | ||
| +/sys/fs/cgroup/*/system.slice/** rw, |
tianon
Aug 2, 2016
Contributor
Good question -- I'll try taking this out and see what denials come of it so we can try narrowing it down.
tianon
Aug 2, 2016
Contributor
Sweet, looks like this is no longer needed with newer Docker.
jdstrand
reviewed
Aug 2, 2016
| +/sys/fs/cgroup/*/system.slice/** rw, | ||
| + | ||
| +# Allow tracing other processes (especially ourselves) | ||
| +ptrace (read, trace), |
jdstrand
Aug 2, 2016
Contributor
Is this needed for privileged containers? The old 15.04 policy had:
ptrace (trace) peer=@{profile_name},
ptrace (read, trace) peer=docker-default,
tianon
Aug 2, 2016
Contributor
I actually didn't test or target privileged containers at all with this policy -- wasn't sure whether that'd be a separate interface entirely or what. I'll play with what you've provided here and see if I can figure out why I ended up making this more permissive.
tianon
Aug 2, 2016
Contributor
Aha:
Log: apparmor="DENIED" operation="ptrace" profile="snap.docker.dockerd" pid=18065 comm="docker-runc" requested_mask="trace" denied_mask="trace" peer="snap.docker.dockerd"
Adding just ptrace (trace) peer=@{profile_name}, is sufficient, so that'll be in the next update.
jdstrand
reviewed
Aug 2, 2016
| +# Allow receiving from unconfined | ||
| +dbus (receive) | ||
| + bus=system | ||
| + peer=(label=unconfined), |
tianon
Aug 2, 2016
Contributor
Yeah, I'll see what I can do to narrow down what Docker actually needs to discuss with systemd here.
tianon
Aug 2, 2016
Contributor
I have no idea what part of Docker needed this, but after removing both dbus statements, Docker appears to be working fine, so I'm going to just axe them in my next update.
jdstrand
reviewed
Aug 2, 2016
| +/.pivot_root*/ rw, | ||
| + | ||
| +# for console access | ||
| +/dev/ptmx rw, |
jdstrand
reviewed
Aug 2, 2016
| +# for console access | ||
| +/dev/ptmx rw, | ||
| +# file descriptors (/proc/NNN/fd/X) | ||
| +/[0-9]* rw, |
jdstrand
Aug 2, 2016
Contributor
This is curious, can you paste the denials that caused you to add this rule?
tianon
Aug 2, 2016
Contributor
Yeah, I can reproduce and paste them, although maybe this will help (from inside a container):
root@977b9f2d78fd:/# ls -l /proc/1/fd
total 0
lrwx------ 1 root root 64 Aug 2 20:23 0 -> /30
lrwx------ 1 root root 64 Aug 2 20:23 1 -> /30
lrwx------ 1 root root 64 Aug 2 20:23 2 -> /30
lrwx------ 1 root root 64 Aug 2 20:23 255 -> /30(inside Docker containers, file descriptors end up showing as numbered files in / for some reason
tianon
Aug 2, 2016
Contributor
Log: apparmor="DENIED" operation="chown" profile="snap.docker.dockerd" name="/0" pid=19284 comm="exe" requested_mask="w" denied_mask="w" fsuid=0 ouid=0
File: /0 (write)
jdstrand
Aug 2, 2016
Contributor
Ok, that makes sense then. They are showing up as /N due to attach_disconnected in the policy. Since this rule is odd looking, can you add a comment: '// file descriptors in the container show up here due to attach_disconnected'
jdstrand
reviewed
Aug 2, 2016
| + | ||
| +# Docker needs to be able to create and load the profile it applies to containers ("docker-default") | ||
| +# XXX We might be able to get rid of this if we generate and load docker-default ourselves and make docker not do it. | ||
| +/etc/apparmor.d/docker rw, |
jdstrand
reviewed
Aug 2, 2016
| +# XXX We might be able to get rid of this if we generate and load docker-default ourselves and make docker not do it. | ||
| +/etc/apparmor.d/docker rw, | ||
| +/sbin/apparmor_parser ixr, | ||
| +/etc/apparmor*/** r, |
jdstrand
Aug 2, 2016
Contributor
This is lenient-- why does docker need to read all the other profiles?
tianon
Aug 2, 2016
Contributor
Reading other config files from AppArmor happens when invoking apparmor_parser while Docker loads the docker profile it's generated -- if it'd be better to narrow this down to a more specific set, I'm happy to do so, but I think #include might end up making that a bit fragile in some cases, right? (especially if those files getting included ever change)
jdstrand
Aug 2, 2016
Contributor
If it is using abstractions, then perhaps change that to "/etc/apparmor.d/abstractions/* r,". You might need other rules-- let's see how fine-grained you can go. That said, if you change where the docker-default policy is (ie, out of /etc/apparmor.d to SNAP_DATA) then I suspect you may be able to get rid of these rules (or just have the abstractions rule).
jdstrand
reviewed
Aug 2, 2016
| +/[0-9]* rw, | ||
| + | ||
| +# Docker needs to be able to create and load the profile it applies to containers ("docker-default") | ||
| +# XXX We might be able to get rid of this if we generate and load docker-default ourselves and make docker not do it. |
jdstrand
reviewed
Aug 2, 2016
| +/sbin/apparmor_parser ixr, | ||
| +/etc/apparmor*/** r, | ||
| +/etc/apparmor.d/cache/docker* w, | ||
| +/sys/kernel/security/apparmor/** rw, |
jdstrand
reviewed
Aug 2, 2016
| +/etc/apparmor.d/cache/docker* w, | ||
| +/sys/kernel/security/apparmor/** rw, | ||
| + | ||
| +# We'll want to adjust this to support --security-opts... |
jdstrand
Aug 2, 2016
•
Contributor
Supporting --security-opts is very interesting. I imagine the best implementation would ensure that the profile name starts with 'docker.', similar to how we do with snapd profiles starting with 'snap.'. In this manner, docker could own that 'namespace' and not interfere with the policy of other snaps.
UPDATE: well, I say that, but currently we don't have mediation for "only load a profile that starts with 'docker.'" so this idea is more advisory (ie, when using --security-opts, someone could create $SNAP_DATA/etc/apparmor.d/docker.foo and docker.foo might have 'profile snap.bar.baz ...'. Docker would happily load that).
tianon
Aug 2, 2016
Contributor
Yeah, it's an interesting thought, but maybe best left relegated to the "privileged Docker" discussion given the implications?
jdstrand
Aug 2, 2016
Contributor
AIUI, this isn't necessarily related to privileged docker. Please correct me if I'm wrong, but I thought --security-opts was used by the admin to specify to use the profile name to use instead of 'docker-default' and it is up to the admin to make sure that profile is loaded. As such, if docker-default is in SNAP_DATA/etc/apparmor.d then one implementation would be for the admin to add profiles there and for the docker snap to make sure they were loaded on daemon startup (this could be simple shell code that runs before invoking docker-daemon).
I agree that --security-opts doesn't have to be implemented now, but hopefully the discussion will spark some ideas.
tianon
Aug 2, 2016
Contributor
--security-opt currently supports loading custom AppArmor profiles, custom SecComp profiles, SELinux labels, and for enabling "No New Privileges".
Loading custom AppArmor/SecComp profiles is IMO a "privileged" operation, given the potential danger implied, so that's why I lumped the two together.
jdstrand
reviewed
Aug 2, 2016
| +# We'll want to adjust this to support --security-opts... | ||
| +change_profile -> docker-default, | ||
| +signal (send) peer=docker-default, | ||
| +ptrace (read, trace) peer=docker-default, |
jdstrand
Aug 2, 2016
•
Contributor
Your ptrace rule above is a super set of this rule. FWIW, I prefer this rule.
tianon
Aug 2, 2016
Contributor
Yeah, I've missed something for sure -- I'm going to test the rule above again and figure out what I've done wrong so it can be more specific.
jdstrand
reviewed
Aug 2, 2016
| +/{,var/}run/docker.sock rw, | ||
| + | ||
| +@{PROC}/sys/net/core/somaxconn r, | ||
| +`) |
jdstrand
Aug 2, 2016
•
Contributor
I'd like to point out just how privileged this plug is. A snap with this plug would be able to drive docker and essentially own the system. I realize that you are simply reimplementing the 'client' portion of the 15.04 framework policy, but I've always disliked this policy.
Do we really need to expose the docker socket to snaps and if so, for what purpose? If the purpose is limited, perhaps either docker-daemon could limit its API when the connecting process' label starts with 'snap.' (or similar) or perhaps a small proxy service could listen on /run/docker/snap-proxy and that service would only expose a limited docker API to snaps and snaps would only be able to connect to it. This limited API would need to be designed of course and if evaluating the process' security label, please consult with the security team on design.
tianon
Aug 2, 2016
Contributor
Yeah, I honestly wasn't sure at all what else to do here. Docker's API and limiting abilities are pretty limited at this point, and trying to limit what Docker is capable of isn't something upstream's been terribly interested in previously. As-is, it will also require that the user is either a member of a pre-existing docker group, or is root (so sudo will be necessary), but that's probably not really sufficient either.
Implementing a wrapper on the Docker API is an interesting solution, but it's going to be very difficult to maintain as Docker updates, since their API is still a strong moving target (especially 1.12, which now includes the functionality of Swarm and features similar to what Compose provides).
jdstrand
Aug 2, 2016
Contributor
Note that snaps won't be in the docker group and don't have sudo access but snaps implementing daemons do run as root, so the DAC check is insufficient on its own.
To be clear, protecting the user from herself is not the goal. What I'm concerned about is untrusted snaps having free rein over the system if they 'plugs: [ docker ]'. One could argue that with series 16's manual interface connections everything is 'ok' since the admin has to do it herself, or future assertions work could allow an auto-connection, and this is a fair point and may ultimately be what we decide we want to do; it's just that the socket offers so much privilege). I'm just trying to understand the requirements....
Do we expect snaps to actually drive docker or is docker simply an application for an admin? In the 15.04 implementation people were thinking that the store could house docker containers or scripts to drive docker to install app containers but docker already has a store and people are used to it.
tianon
Aug 2, 2016
Contributor
Right, I mean that the daemon will run as root, it will see that the "docker" group exists, and will apply it to the unix socket created. Then, from the client side, only clients that have both the "docker" group and the appropriate AppArmor permission will be able to talk to the Docker socket (and thus control the daemon). If the "docker" group does not exist, the daemon will instead use "root", and so from the perspective of the users (or even other snaps) only root will be able to even connect to the Docker daemon.
IMO, anything that isn't docker itself (and perhaps a manually-reviewed set of related utilities) should require manual connection of that docker plug, especially given the system-wide implications.
jdstrand
Aug 3, 2016
Contributor
"IMO, anything that isn't docker itself (and perhaps a manually-reviewed set of related utilities) should require manual connection of that docker plug, especially given the system-wide implications."
Yes I completely agree. My question is around the fact that the interface is so privileged, why offer the plugs side at all if we don't actually need it?
tianon
Aug 9, 2016
Contributor
I figured that adding AppArmor and forced manual connection (or at least an easy way to "disconnect" if auto-connect is added later) on top of docker.sock was a desirable feature, especially given the implications of access to docker.sock, so having a way to restrict access to it on a per-snap basis on top of the per-user basis the docker group provides seemed like a really handy feature. If I'm mistaken in the belief that such a thing is useful, I won't be put out about removing it.
jdstrand
reviewed
Aug 2, 2016
| + | ||
| +var dockerPermanentSlotSecComp = []byte(` | ||
| +# The Docker daemon needs to be able to launch arbitrary processes within containers (whose syscall needs are unknown beforehand) | ||
| +@unrestricted |
jdstrand
Aug 2, 2016
Contributor
I advised you to do this because of the way that seccomp works and to unblock you. However, is there set of syscalls that we know we never want to allow? Eg, module loading or access to the kernel keyring?
tianon
Aug 2, 2016
Contributor
It would probably make sense to look at Docker's own seccomp whitelist and mirror that at least. If we do end up supporting "privileged Docker" somehow, we could put this kind of thing there instead, and have this just be the set of what Docker itself needs + what Docker itself whitelists by default?
(still a pretty long list, but should be shorter than what @unrestricted implies)
jdstrand
Aug 2, 2016
Contributor
Well, yeah. This gets into the realm of privileged containers too cause the way seccomp works, we can never get a syscall back once we block it so we need to offer everything docker would ever want to give out (or we would want docker to give out). It seems fairly reasonable to take everything from snap-confine.git/debian/README.syscalls and put it in docker's seccomp, but leave out module loading and keyring syscalls. @tyhicks do you have an opinion?
tianon
Aug 2, 2016
Contributor
This might actually be a bit more complicated than that. Docker needs to orchestrate iptables (including setting up an appropriate NAT chain for containers) at startup, and it's looking like it'll need to be able to load the appropriate kernel modules in order to do that.
Additionally, before I'd put @unrestricted here I had just trial-and-errored the set of things Docker itself requires, and I believe there were some keyring-related syscalls there, but I'll have to try again to confirm that for sure.
tianon
Aug 2, 2016
Contributor
To be slightly more specific regarding loading modules, here's the precise Docker daemon logs where it's using modprobe, getting permission denied, ignoring the error and hitting iptables anyhow, then failing:
time="2016-08-02T14:46:55.740368931-07:00" level=warning msg="Running modprobe bridge br_netfilter failed with message: , error: fork/exec /sbin/modprobe: permission denied"
time="2016-08-02T14:46:55.741108535-07:00" level=warning msg="Running modprobe nf_nat failed with message: ``, error: fork/exec /sbin/modprobe: permission denied"
time="2016-08-02T14:46:55.741794046-07:00" level=warning msg="Running modprobe xt_conntrack failed with message: ``, error: fork/exec /sbin/modprobe: permission denied"
time="2016-08-02T14:46:55.742125959-07:00" level=debug msg="Fail to initialize firewalld: Failed to connect to D-Bus system bus: dial unix /var/run/dbus/system_bus_socket: connect: permission denied, using raw iptables instead"
time="2016-08-02T14:46:55.743626077-07:00" level=debug msg="/sbin/iptables, [--version]"
time="2016-08-02T14:46:55.745027672-07:00" level=debug msg="/sbin/iptables, [-t nat -D PREROUTING -m addrtype --dst-type LOCAL -j DOCKER]"
time="2016-08-02T14:46:55.746875503-07:00" level=debug msg="/sbin/iptables, [-t nat -D OUTPUT -m addrtype --dst-type LOCAL ! --dst 127.0.0.0/8 -j DOCKER]"
time="2016-08-02T14:46:55.748674940-07:00" level=debug msg="/sbin/iptables, [-t nat -D OUTPUT -m addrtype --dst-type LOCAL -j DOCKER]"
time="2016-08-02T14:46:55.750387276-07:00" level=debug msg="/sbin/iptables, [-t nat -D PREROUTING]"
time="2016-08-02T14:46:55.751859031-07:00" level=debug msg="/sbin/iptables, [-t nat -D OUTPUT]"
time="2016-08-02T14:46:55.753300771-07:00" level=debug msg="/sbin/iptables, [-t nat -F DOCKER]"
time="2016-08-02T14:46:55.754844062-07:00" level=debug msg="/sbin/iptables, [-t nat -X DOCKER]"
time="2016-08-02T14:46:55.756263058-07:00" level=debug msg="/sbin/iptables, [-t filter -F DOCKER]"
time="2016-08-02T14:46:55.757613661-07:00" level=debug msg="/sbin/iptables, [-t filter -X DOCKER]"
time="2016-08-02T14:46:55.759091740-07:00" level=debug msg="/sbin/iptables, [-t filter -F DOCKER-ISOLATION]"
time="2016-08-02T14:46:55.760441146-07:00" level=debug msg="/sbin/iptables, [-t filter -X DOCKER-ISOLATION]"
time="2016-08-02T14:46:55.761857607-07:00" level=debug msg="/sbin/iptables, [-t nat -n -L DOCKER]"
time="2016-08-02T14:46:55.763431457-07:00" level=debug msg="/sbin/iptables, [-t nat -N DOCKER]"
time="2016-08-02T14:46:55.765514246-07:00" level=debug msg="Cleaning up old mountid : start."
time="2016-08-02T14:46:55.765758158-07:00" level=fatal msg="Error starting daemon: Error initializing network controller: error obtaining controller instance: failed to create NAT chain: iptables failed: iptables -t nat -N DOCKER: iptables v1.6.0: can't initialize iptables table `nat': Permission denied\nPerhaps iptables or your kernel needs to be upgraded.\n (exit status 3)"
(and this is with the firewall-control plug connected)
tianon
Aug 2, 2016
Contributor
And this is what I'm getting after adding the following:
/bin/kmod ixr,
/etc/modprobe.d/ r,
/etc/modprobe.d/** r,
/lib/modules/** r,
time="2016-08-02T15:31:58.555640406-07:00" level=warning msg="Running modprobe bridge br_netfilter failed with message: modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.4.0-31-generic/modules.dep.bin'\nmodprobe: WARNING: Module bridge not found in directory /lib/modules/4.4.0-31-generic\nmodprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.4.0-31-generic/modules.dep.bin'\nmodprobe: WARNING: Module br_netfilter not found in directory /lib/modules/4.4.0-31-generic\n, error: exit status 1"
time="2016-08-02T15:31:58.558195696-07:00" level=warning msg="Running modprobe nf_nat failed with message: `modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.4.0-31-generic/modules.dep.bin'\nmodprobe: WARNING: Module nf_nat not found in directory /lib/modules/4.4.0-31-generic`, error: exit status 1"
time="2016-08-02T15:31:58.560394284-07:00" level=warning msg="Running modprobe xt_conntrack failed with message: `modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could not open moddep file '/lib/modules/4.4.0-31-generic/modules.dep.bin'\nmodprobe: WARNING: Module xt_conntrack not found in directory /lib/modules/4.4.0-31-generic`, error: exit status 1"
jdstrand
Aug 3, 2016
•
Contributor
We don't want docker loading modules itself I don't think. It seems it is only going that route because it can't connect to /var/run/dbus/system_bus_socket. Let's see what happens when you adjust the policy for that and add dbus policy for firewalld.
Regarding firewall-control, it still needs to load ip_tables and ip6_tables via the unimplemented module loading backend (see https://bugs.launchpad.net/snappy/+bug/1583514). What this backend would do is simply take a list of modules that are hardcoded in the interface and add them to /etc/modules-load.d/snap..conf (eg snap.firewall-control.conf, snap.ppp.conf, etc).
I suspect if you allow talking to systemd, this will start to work. Else, would need to implement the SecurityKernelModule backend in the manner described and update firewall-control. The other modules should then auto-load, but if they don't, docker can list its own modules for snapd to load via the SecurityKernelModule backend.
tianon
Aug 9, 2016
Contributor
Ah, interesting -- I'll play around with dbus and chatting with firewalld directly and see what I can make work!
tianon
Aug 10, 2016
Contributor
It looks like we're going to have to go with the SecurityKernelModule route:
Ubuntu Snappy Classic:
$ systemctl status firewalld
● firewalld.service
Loaded: not-found (Reason: No such file or directory)
Active: inactive (dead)Ubuntu Snappy "all-snaps":
$ systemctl status firewalld
● firewalld.service
Loaded: not-found (Reason: No such file or directory)
Active: inactive (dead)(ie, firewalld doesn't seem to actually be present by default in either Classic or standard Snappy)
jdstrand
reviewed
Aug 2, 2016
| + | ||
| +func (iface *DockerInterface) ConnectedSlotSnippet(plug *interfaces.Plug, slot *interfaces.Slot, securitySystem interfaces.SecuritySystem) ([]byte, error) { | ||
| + switch securitySystem { | ||
| + case interfaces.SecurityAppArmor, interfaces.SecurityDBus, interfaces.SecurityMount, interfaces.SecuritySecComp, interfaces.SecurityUDev: |
jdstrand
Aug 2, 2016
Contributor
Can you add a comment here: '// The docker socket is a named socket and therefore mediated by AppArmor file rules and we can't currently limit connecting clients by their security label'
|
Lastly, the 15.04 docker snap had a concept of unprivileged and privileged. Have you given any thought as to how to handle this? If we don't differentiate then we'll want to support just privileged. One thought on differentiating would be to have an interface attribute for privileged and when specified, you get the privileged policy. I think this would be troublesome in practice because there would need to be two snaps: once that used the attribute and one that didn't since there isn't a way for the user to toggle the attribute at this time (and I'm not sure that is something we'd want to support anyway). In general I think I am of the mind that the docker interface is simply a super-special, highly-privileged interface. |
Yeah, it does (both optionally via
The problem with just supporting privileged is that we'll need to essentially just remove 90% of all this and go almost entirely unconfined. If the goal is to 100% support privileged Docker, then I need to be able to do crazy things like directly enter the namespace of the host's init (which is currently pretty trivially possible with The only real possibility I can think of is to have a separate |
|
Regarding docker vs docker privileged, you make a good point. I guess most docker users don't need privileged mode. In that light, yes I think it would need to be a separate interface, but I think we would want to have it not auto-connect and instead only have policy on the plugs side so might have:
Then on install the user needs to do this for privileged mode:
or something. Point is, regular install is non-privileged and a snap connect command is needed for privileged. |
morphis
reviewed
Aug 4, 2016
| +ptrace (trace) peer=@{profile_name}, | ||
| + | ||
| +# Docker needs a lot of caps, but limits them in the app container | ||
| +capability, |
morphis
Aug 4, 2016
Contributor
This involves the different capabilities to spawn up namespaces, right? I am wondering if we shouldn't move such things in a container-control interface so that we can have different container approaches (LXC/LXD, docker, ...) use the same interface rather than splitting the same rules across multiple interfaces.
jdstrand
Aug 4, 2016
Contributor
I initially suggested a generic container interface but was asked to instead have people implement separate docker and lxd interfaces, for at the very least, the sockets involved. Older versions of docker only used system namespaces but lxd defaults to userns, so the policies are expected to be different. That said, there is nothing saying that we can't tuck the shared policy into snapd somewhere to that the docker and lxd interfaces can pull it in, but that is an implementation detail and until we see lxd and docker interfaces side by side, we won't know if that is worth it.
morphis
reviewed
Aug 4, 2016
| +# Docker does all kinds of mounts all over the filesystem | ||
| +/dev/mapper/control rw, | ||
| +/dev/mapper/docker* rw, | ||
| +/dev/loop* r, |
morphis
Aug 4, 2016
Contributor
I am wondering if it wouldn't make sense to have a loop-mount interface which allows those things?
We should at least make sure that this point the snap can only mount into its own writable directories (see http://wiki.apparmor.net/index.php/AppArmor_Core_Policy_Reference#Mount_rules_.28AppArmor_2.8_and_later.29). @jdstrand You agree with that?
jdstrand
Aug 4, 2016
Contributor
I'm not sure if loop-mount interface is worth it. I mean, docker and lxd will need it and maybe one or two other container technologies, but this is quite privileged and I'd like to understand the exact requirements before designing that interface.
jdstrand
Aug 16, 2016
Contributor
Can we change this to: /dev/loop-control r, and the following rule to /dev/loop[0-9]* rw,. It makes for easier policy auditing.
morphis
reviewed
Aug 4, 2016
| +mount, | ||
| +umount, | ||
| +pivot_root, | ||
| +/.pivot_root*/ rw, |
jdstrand
Aug 4, 2016
Contributor
Your point about the mount rules is fair though-- @tianon, can the mount rules be fine-tuned to be limited to SNAP_DATA or similar?
tianon
Aug 10, 2016
Contributor
Come to mention it, I think this particular part happens on the other side of the daemon-to-container boundary, so we might be able to remove this rule entirely and let the docker-default profile Docker manages take over this bit and remove it from here completely. I'll do some testing and verify that!
Regardless, this path is used inside the mount namespace of a container for pivoting the container from the host's / to the real container's root, so it's technically isolated that way already, but I'll still do some testing to see if we can remove it from this profile completely (because I think we can).
tianon
Aug 10, 2016
Contributor
Ok, I did some testing, and this is still necessary, but I did make it slightly more specific (/.pivot_root[0-9]*/) and added a better comment to explain what's happening:
# After doing a pivot_root using <graph-dir>/<container-fs>/.pivot_rootNNNNNN,
# Docker removes the leftover /.pivot_rootNNNNNN directory (which is now
# relative to "/" instead of "<graph-dir>/<container-fs>" thanks to pivot_root)
pivot_root,
/.pivot_root[0-9]*/ rw,
|
Hi all, I'd like to summarize discussions and share my own opinion. Discussions seem to have dried down; it seems consensus is:
Concerning 2), we can't hold the Docker snap unconfined for too long and it looks like work on https://bugs.launchpad.net/snappy/+bug/1583514 hasn't started. Can we allow module loading in the docker interfaces for now and remove these when a firewall or module loading interface is available? I used the current Docker snap in edge extensively this week and I have these pieces of feedback: ii) dockerd seems to load another appamor profile for the actual containers; I believe this is "docker-default"; docker-default misses some useful syscalls preventing e.g. to kill processes (you can repo with this snippet in a container:
(Interestingly, the process can't even be killed from a root shell outside of the container because of the AppArmor profile guarding this!) iii) I don't think we should keep docker.sock in /var/run or /run; first, this conflicts with the .deb when running snapd on server installs, so you can't have both the docker snap and the docker .deb; second, in some cases it might be useful to embed dockerd in other snaps and run it on a different socket/port (e.g. kubernetes runs multiple docker daemons; this could also be used to control the version of dockerd) TLDR; is it enough to workaround the iptables module loading issue right now and improve this in updates? Thanks,
|
|
@lool Actually its not really much work to implement the kernel module loading part for interfaces as described in https://bugs.launchpad.net/snappy/+bug/1583514 My vote would be to get this implemented rather than taking another workaround until we wait for it like we did already when we've added the ppp/modemmanager interfaces. |
|
@morphis On rereading https://bugs.launchpad.net/snappy/+bug/1583514, it seems indeed relatively straightforward; I've also checked that builtin modules wouldn't cause an error (modprobing a builtin module returns success). |
|
"ii) dockerd seems to load another appamor profile for the actual containers; I believe this is "docker-default"; docker-default misses some useful syscalls preventing e.g. to kill processes (you can repo with this snippet in a container: Assuming seccomp, @tianon and I discussed this and I thought we were using @unrestricted for now. However, it would probably be best to add everything from https://github.com/snapcore/snap-confine/blob/master/debian/README.syscalls and then comment out some as dangerous:
" iii) I don't think we should keep docker.sock in /var/run or /run; first, this conflicts with the .deb when running snapd on server installs, so you can't have both the docker snap and the docker .deb; second, in some cases it might be useful to embed dockerd in other snaps and run it on a different socket/port (e.g. kubernetes runs multiple docker daemons; this could also be used to control the version of dockerd) I'm conflicted on this. People who have developed software around docker may want to use the docker snap instead of the deb and might expect the socket to be in the same location. As for kubernetes, it is free to put its sockets in SNAP_DATA if it wants, so I don't think there is a limitation there? This should be discussed with the snappy architects (conflicting with debs) and won't be solved in this PR. "TLDR; is it enough to workaround the iptables module loading issue right now and improve this in updates? I agree with Simon-- someone should implement the kernel module loading backend for interfaces. We are needing it for several things. |
jdstrand
reviewed
Aug 15, 2016
| + "github.com/snapcore/snapd/interfaces" | ||
| +) | ||
| + | ||
| +var dockerPermanentSlotAppArmor = []byte(` |
tianon
Aug 15, 2016
Contributor
Not for []byte, but I can switch these to string and use a cast in the functions instead -- would that be preferable?
interfaces/builtin/docker.go:96: const initializer ([]byte)("\n# Description: Allow operating as the Docker daemon. Reserved because this\n# gives privileged access to the system.\n# Usage: reserved\n\n# Allow sockets\n/{,var/}run/docker.sock rw,\n/{,var/}run/docker/ rw,\n/{,var/}run/docker/** mrwklix,\n/{,var/}run/runc/ rw,\n/{,var/}run/runc/** mrwklix,\n\n# Wide read access to /proc, but somewhat limited writes for now\n@{PROC}/ r,\n@{PROC}/** r,\n@{PROC}/[0-9]*/attr/exec w,\n@{PROC}/sys/net/** w,\n@{PROC}/[0-9]*/cmdline r,\n@{PROC}/[0-9]*/oom_score_adj w,\n\n# Wide read access to /sys\n/sys/** r,\n# Limit cgroup writes a bit\n/sys/fs/cgroup/*/docker/ rw,\n/sys/fs/cgroup/*/docker/** rw,\n\n# Allow tracing ourself (especially the \"runc\" process we create)\nptrace (trace) peer=@{profile_name},\n\n# Docker needs a lot of caps, but limits them in the app container\ncapability,\n\n# Docker does all kinds of mounts all over the filesystem\n/dev/mapper/control rw,\n/dev/mapper/docker* rw,\n/dev/loop* r,\n/dev/loop[0-9]* w,\nmount,\numount,\n\n# After doing a pivot_root using <graph-dir>/<container-fs>/.pivot_rootNNNNNN,\n# Docker removes the leftover /.pivot_rootNNNNNN directory (which is now\n# relative to \"/\" instead of \"<graph-dir>/<container-fs>\" thanks to pivot_root)\npivot_root,\n/.pivot_root[0-9]*/ rw,\n\n# file descriptors (/proc/NNN/fd/X)\n/[0-9]* rw,\n# file descriptors in the container show up here due to attach_disconnected\n\n# Docker needs to be able to create and load the profile it applies to containers (\"docker-default\")\n# XXX We might be able to get rid of this if we generate and load docker-default ourselves and make docker not do it.\n/sbin/apparmor_parser ixr,\n/etc/apparmor.d/cache/ r,\n/etc/apparmor.d/cache/.features r,\n/etc/apparmor.d/cache/docker* rw,\n/etc/apparmor/parser.conf r,\n/etc/apparmor/subdomain.conf r,\n/sys/kernel/security/apparmor/.replace rw,\n\n# We'll want to adjust this to support --security-opts...\nchange_profile -> docker-default,\nsignal (send) peer=docker-default,\nptrace (read, trace) peer=docker-default,\n\n# Graph (storage) driver bits\n/dev/shm/aufs.xino rw,\n\n#cf bug 1502785\n/ r,\n") is not a constant
interfaces/builtin/docker.go:107: const initializer ([]byte)("\n# Description: Allow using Docker. Reserved because this gives\n# privileged access to the service/host.\n# Usage: reserved\n\n# Obviously need to be able to talk to the daemon\n/{,var/}run/docker.sock rw,\n\n@{PROC}/sys/net/core/somaxconn r,\n") is not a constant
interfaces/builtin/docker.go:112: const initializer ([]byte)("\n# The Docker daemon needs to be able to launch arbitrary processes within containers (whose syscall needs are unknown beforehand)\n@unrestricted\n") is not a constant
interfaces/builtin/docker.go:117: const initializer ([]byte)("\nsetsockopt\nbind\n") is not a constant
jdstrand
Aug 15, 2016
Contributor
Yes, sorry, I meant to say that too. There are inconsistencies in the current interfaces where some are var []byte and other are const strings and we'll be moving all these policy variables to const strings based on feedback from Gustavo in the most recent pulseaudio PR.
tianon
Aug 15, 2016
Contributor
Cool, will do! (and this time with a new commit instead of a force push
jdstrand
reviewed
Aug 15, 2016
| +/ r, | ||
| +`) | ||
| + | ||
| +var dockerConnectedPlugAppArmor = []byte(` |
jdstrand
reviewed
Aug 15, 2016
| +@{PROC}/sys/net/core/somaxconn r, | ||
| +`) | ||
| + | ||
| +var dockerPermanentSlotSecComp = []byte(` |
jdstrand
reviewed
Aug 15, 2016
| +@unrestricted | ||
| +`) | ||
| + | ||
| +var dockerConnectedPlugSecComp = []byte(` |
|
Failing to kill processes inside the container is definitely something missing in the Regarding seccomp, I'm not going to be a good judge of what's considered "dangerous" and what isn't, but I would be happy to compile the two lists of "seccomp privileges Docker itself needs" and "Docker's default seccomp whitelist for containers" -- I think that'd be the smallest possible set we could reasonably apply by default, and then we could leave the larger set (or even |
|
"Regarding seccomp, I'm not going to be a good judge of what's considered "dangerous" and what isn't, but I would be happy to compile the two lists of "seccomp privileges Docker itself needs" and "Docker's default seccomp whitelist for containers" -- I think that'd be the smallest possible set we could reasonably apply by default, and then we could leave the larger set (or even @unrestricted) to the docker-privileged interface. Thoughts?" Yes, I like this approach. Let's use the larger set I outlined rather than |
tianon
added some commits
Aug 15, 2016
|
As a matter of organization, should I put the |
jdstrand
reviewed
Aug 16, 2016
| +@{PROC}/** r, | ||
| +@{PROC}/[0-9]*/attr/exec w, | ||
| +@{PROC}/sys/net/** w, | ||
| +@{PROC}/[0-9]*/cmdline r, |
jdstrand
reviewed
Aug 16, 2016
| +@{PROC}/ r, | ||
| +@{PROC}/** r, | ||
| +@{PROC}/[0-9]*/attr/exec w, | ||
| +@{PROC}/sys/net/** w, |
jdstrand
Aug 16, 2016
Contributor
What, if anything, of this is not covered by the 'network-control' interface?
jdstrand
reviewed
Aug 16, 2016
| +@{PROC}/[0-9]*/oom_score_adj w, | ||
| + | ||
| +# Wide read access to /sys | ||
| +/sys/** r, |
jdstrand
Aug 16, 2016
Contributor
Curious why this needs to be so broad in the 'unprivileged' policy.
jdstrand
reviewed
Aug 16, 2016
| +/dev/loop* r, | ||
| +/dev/loop[0-9]* w, | ||
| +mount, | ||
| +umount, |
jdstrand
Aug 16, 2016
Contributor
With the 'unprivileged' interface, are the mounts expected to be in a particular location? Eg, in SNAP_DATA, HOME, SNAP_USER_DATA, somewhere?
jdstrand
reviewed
Aug 16, 2016
| + | ||
| +# file descriptors (/proc/NNN/fd/X) | ||
| +/[0-9]* rw, | ||
| +# file descriptors in the container show up here due to attach_disconnected |
jdstrand
Aug 16, 2016
Contributor
Can you move this comment up above the rule it is meant to refer to?
jdstrand
reviewed
Aug 16, 2016
| +/etc/apparmor.d/cache/docker* rw, | ||
| +/etc/apparmor/parser.conf r, | ||
| +/etc/apparmor/subdomain.conf r, | ||
| +/sys/kernel/security/apparmor/.replace rw, |
jdstrand
Aug 16, 2016
Contributor
What happened with the idea of docker not loading the policy? If that is unacceptable, I might prefer to create a child profile for apparmor_parser (I can help with that). If we go that route, we'd want to add a audit deny capability mac_admin, rule to the parent profile.
In many ways, with the mount rules, mac_admin, etc this is security theater but instead of thinking of it that way, I'd prefer to think of the docker policy as 'advisory' where we guard against a misbehaving docker as opposed to guarding against a malicious or attacker-controlled docker. We should probably adjust the comments in this file and docs/interfaces.md to mention that....
tianon
Aug 16, 2016
Contributor
It's possible to stop Docker from loading the policy, but the patches are going to be a bit invasive (and upstream prefers that we keep our patches to a minimum, which is understandable). A child profile for apparmor_parser sounds interesting, but way over my head ATM, so any help or more specific direction you can provide would be really appreciated.
jdstrand
Aug 17, 2016
Contributor
If you give me a series 16 snap I can play with the child profile tomorrow.
tianon
Aug 17, 2016
Contributor
https://github.com/infosiftr/snap-docker/tree/docker-interface has the current WIP docker-interface-using snap code -- I can give you a built .snap if you'd prefer, but I figure this'd give you more freedom to experiment
(probably want to clone that with -b docker-interface --single-branch so that it doesn't clone the massive 15.04 branch...
jdstrand
Aug 29, 2016
•
Contributor
@tianon - I built the docker snap and built this branch. I did:
$ sudo /tmp/snap install /tmp/docker*snap
$ sudo /tmp/snap connect docker:firewall-control ubuntu-core:firewall-control
$ sudo /tmp/snap connect docker:docker docker:docker
$ sudo systemctl stop snap.docker.dockerd
$ sudo systemctl start snap.docker.dockerd
The workaround a bug in the template policy (PR already done) and add to /var/lib/snapd/apparmor/profiles/snap.docker.docker and dockerd: /run/systemd/journal/socket w, and reload the policy with sudo apparmor_parser -r /var/lib/snapd/apparmor/profiles/snap.docker.*.
I can download an image fine with:
$ sudo /snap/bin/docker pull ubuntu:xenial
$ sudo /snap/bin/docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu xenial bd3d4369aebc 3 days ago 126.6 MB
But running a command fails with:
$ sudo /snap/bin/docker run -i -t ubuntu:xenial uptime
docker: Error response from daemon: Container command 'uptime' not found or does not exist..
Am I doing something wrong?
lool
Aug 30, 2016
Contributor
@jdstrand I have the same issues since some days; I couldn't identify what caused the regression, but the docker snap in edge channel is broken too in the same way (can't run hello-world image). The first error is auplink not being in PATH, hence snapcore/snapcraft#763
the next error is cryptic and I can't figure it out.
Note however that this only affects classic; the snap works on Ubuntu Core!
jdstrand
Sep 12, 2016
Contributor
@tianon - I've decided that a child profile for apparmor_parser will not be meaningful so we shouldn't do it.
jdstrand
reviewed
Aug 16, 2016
| + | ||
| +const dockerPermanentSlotAppArmor = ` | ||
| +# Description: Allow operating as the Docker daemon. Reserved because this | ||
| +# gives privileged access to the system. |
jdstrand
Aug 16, 2016
Contributor
Can you update this to be:
# Description: allow operating as the Docker daemon. This policy is intentionally
# not restrictive and is here to help guard against programming errors and not
# for security confinement. The Docker daemon by design requires extensive
# access to the system and cannot be effectively confined against malicious
# activity.
jdstrand
reviewed
Aug 16, 2016
| +@{PROC}/sys/net/core/somaxconn r, | ||
| +` | ||
| + | ||
| +const dockerPermanentSlotSecComp = ` |
jdstrand
Aug 16, 2016
Contributor
Can you add this at the top:
# Description: allow operating as the Docker daemon. This policy is intentionally
# not restrictive and is here to help guard against programming errors and not
# for security confinement. The Docker daemon by design requires extensive
# access to the system and cannot be effectively confined against malicious
# activity.
jdstrand
reviewed
Aug 16, 2016
| +# /snap/docker/VERSION/bin/docker-runc | ||
| +# "do not inherit the parent's session keyring" | ||
| +# "make session keyring searcheable" | ||
| +keyctl |
tianon
Aug 16, 2016
Contributor
Yeah, I figured you weren't going to like it as soon as I saw that.
Here's the exact code in question:
jdstrand
Aug 29, 2016
Contributor
Reading that code it seems that docker is trying to make sure the container doesn't have access to the keyring. Can you clarify the comment on this and we'll allow it.
jdstrand
reviewed
Aug 16, 2016
| +getuid | ||
| +getuid32 | ||
| +getxattr | ||
| +init_module |
jdstrand
Aug 16, 2016
•
Contributor
Can we exclude init_module, finit_module, create_module, query_module and delete_module (move to the top, add a note about why and list these syscalls but commented out) and leave that to the 'privileged' interface? The (to be implemented) KernelModules backend can then be used for what 'unprvileged' needs (most things should autoload, we only should need KerneModules for those that don't).
jdstrand
reviewed
Aug 16, 2016
| +process_vm_readv | ||
| +process_vm_writev | ||
| +pselect6 | ||
| +ptrace |
jdstrand
Aug 16, 2016
•
Contributor
This is curious since ptrace can be used to escape the seccomp sandbox until something along the lines of https://lkml.org/lkml/2016/5/26/354 is applied.
Can you add a comment for this:
# ptrace can be abused to break out of the seccomp sandbox
# but is required by the Docker daemon.
ptrace
tianon
added some commits
Aug 16, 2016
jdstrand
reviewed
Aug 16, 2016
| +writev | ||
| +` | ||
| + | ||
| +const dockerConnectedPlugSecComp = ` |
jdstrand
Aug 16, 2016
Contributor
Please add this comment:
# Description: allow access to the Docker daemon socket. This gives
# privileged access to the system via Docker's socket API.
tianon
Aug 16, 2016
Contributor
GitHub doesn't reflect it since it changed the next line, not this one, but this is added.
jdstrand
reviewed
Aug 16, 2016
| + | ||
| +const dockerConnectedPlugAppArmor = ` | ||
| +# Description: Allow using Docker. Reserved because this gives | ||
| +# privileged access to the service/host. |
jdstrand
Aug 16, 2016
•
Contributor
Can you change this to:
# Description: allow access to the Docker daemon socket. This gives
# privileged access to the system via Docker's socket API.
jdstrand
reviewed
Aug 16, 2016
| +# privileged access to the service/host. | ||
| +# Usage: reserved | ||
| + | ||
| +# Obviously need to be able to talk to the daemon |
jdstrand
reviewed
Aug 16, 2016
| +# Obviously need to be able to talk to the daemon | ||
| +/{,var/}run/docker.sock rw, | ||
| + | ||
| +@{PROC}/sys/net/core/somaxconn r, |
jdstrand
Aug 16, 2016
•
Contributor
This is actually part of both network and network-bind. Docker daemon allows fetching images so I think removing this and having the slot implementation also use plugs: [ network ] would be the way to go.
jdstrand
reviewed
Aug 16, 2016
| +munlock | ||
| +munlockall | ||
| +munmap | ||
| +name_to_handle_at |
jdstrand
Aug 16, 2016
Contributor
Please move name_to_handle_at and open_by_handle_at up to the top of the policy and disable them with:
# These have a history of vulnerabilities, are not widely used, and
# open_by_handle_at has been used to break out of docker containers by brute
# forcing the handle value: http://stealth.openwall.net/xSports/shocker.c
#name_to_handle_at
#open_by_handle_at
jdstrand
reviewed
Aug 16, 2016
| +readlink | ||
| +readlinkat | ||
| +readv | ||
| +reboot |
tianon
Aug 16, 2016
Contributor
I'm not actually sure -- it's included in Docker's default whitelist, which as I understand was built via trial and error, but I imagine we could probably get away with not including it?
jdstrand
Aug 29, 2016
Contributor
Looking at https://github.com/docker/docker/blob/master/docs/security/seccomp.md, reboot is not allowed by default. I then compared that list to this list and found that the following are allowed in this policy but not allowed in the docker default policy:
acct
adjtimex
bpf
clone
ioperm
iopl
kcmp
lookup_dcookie
mount
perf_event_open
personality
process_vm_readv
process_vm_writev
reboot
setns
settimeofday
umount
umount2
unshare
Since you say in this file "This list comes from Docker's default seccomp whitelist" I'd like to better understand these. First off, I think that the comment should say this instead: "This list comes from Docker's default seccomp whitelist plus syscalls required for docker to properly function." As such, it seems quite clear why clone, mount, setns, umount, umount2 and unshare are needed by docker to funtion. Can you comment on the others?
tianon
Sep 2, 2016
Contributor
The markdown file is definitely out of date -- I scraped these directly from Docker's profile code, which is in Go. I don't know why many of them are here, and it's probably safe to remove them, but then we'll be applying a smaller mask to Docker containers than Docker does (which may or may not be acceptable -- things like settimeofday are probably safe to deny by default, especially if our privileged solution adds them back). I'm happy to adjust it either way you'd like (and point to the markdown file instead of the Go code, since I think our comment currently points to the Go code I scraped this list from).
jdstrand
reviewed
Aug 16, 2016
| +# $ grep -C1 'ActAllow' -- profiles/seccomp/seccomp_default.go \ | ||
| +# | grep 'Name:' \ | ||
| +# | cut -d'"' -f2 \ | ||
| +# | sort -u |
jdstrand
Aug 16, 2016
•
Contributor
Please add:
# Because seccomp may only go more strict, we must allow all
# all syscalls to Docker that it expects to give to containers
# in addition to what it needs to run and trust that docker daemon
# only gives out reasonable syscalls to containers.
jdstrand
Aug 16, 2016
•
Contributor
Along those lines-- perhaps it would be nice to instead of giving a sorted list, use this outline:
- all syscalls we aren't going to allow (ie commented out) with a comment why (eg, name_to_handle_at, kernel module calls, etc)
- all syscalls docker daemon itself needs to run (eg, (presumably) mount, umount, umount2, etc)
- the syscalls docker daemon expects to give to containers
This will allow for auditing the policy a bit better and gives clues as to where to look in docker deeper inspection.
jdstrand
reviewed
Aug 16, 2016
| + | ||
| +func (iface *DockerInterface) PermanentPlugSnippet(plug *interfaces.Plug, securitySystem interfaces.SecuritySystem) ([]byte, error) { | ||
| + switch securitySystem { | ||
| + case interfaces.SecurityAppArmor, interfaces.SecurityDBus, interfaces.SecurityMount, interfaces.SecuritySecComp, interfaces.SecurityUDev: |
jdstrand
Aug 16, 2016
Contributor
Can you run go fmt on these files? I think that will line wrap these better.
tianon
Aug 16, 2016
Contributor
gofmt doesn't seem to mind this; I even used -s and it didn't change anything.
I'll play a bit and see if I can get something with a bit more whitespace that gofmt is still happy with.
jdstrand
reviewed
Aug 16, 2016
| + snippet, err = s.iface.ConnectedSlotSnippet(s.plug, s.slot, "foo") | ||
| + c.Assert(err, Equals, interfaces.ErrUnknownSecurity) | ||
| + c.Assert(snippet, IsNil) | ||
| +} |
jdstrand
Aug 16, 2016
Contributor
Can you add an auto-connect test? Eg,
func (s *DockerInterfaceSuite) TestAutoConnect(c *C) {
c.Check(s.iface.AutoConnect(), Equals, false)
}
jdstrand
Aug 16, 2016
Contributor
Please also add a test for TestConnectedPlugSnippet() and search for the docker socket rule for apparmor and bind for seccomp. You can see an example of this sort of thing in mpris_test.go (but there are others).
Please do the same for TestPermanentSlotSnippet() with an appropriate string search for both apparmor and seccomp.
tianon
Aug 17, 2016
Contributor
These were added in a9e1b29, although I'm not positive that I've done them the way you wanted, so comments are definitely appreciated.
jdstrand
reviewed
Aug 16, 2016
| + c.Assert(s.iface.Name(), Equals, "docker") | ||
| +} | ||
| + | ||
| +// TODO more tests regarding the contents of Docker's security profiles (especially AppArmor)? |
jdstrand
Aug 16, 2016
Contributor
I think any tests for security profiles you'll want to put in the spread tests because the unit tests won't have the privileges needed to exercise this properly.
tianon
Aug 17, 2016
Contributor
Heh, I was actually referring more to the TestConnectedPlugSnippet and TestPermanentSlotSnippet you asked me to add (which the example I'd started this file from had but I wasn't sure exactly what I should do with). If there are also some kind of spread tests I should add, pointers would be appreciated.
|
"As a matter of organization, should I put the docker-privileged interface in this same file (docker.go), or a separate docker_privileged.go file?" That is a good question. It is easy enough to have it in one file (eg, take a look at the recent browser-support interface with the 'allow-sandbox' attribute), but the question then becomes, what does this look like to the user and the developer? We could do something like this (option 1. Note there are a lot of ways to do the yaml due to snappy's flexibility, but I'm striving for clarity over efficiency in all of these):
On the one hand this is kinda nice because everything is in one snap and the interfaces are clean, but all this is on the slot side policy and today, permanent slot policy is autoconnected so the above doesn't work (docker-daemon and docker-daemon-privileged get started at the same time). We could instead do something like (option 2):
This would auto-connect the PermanentSlot policy ( Another alternative (option 3) is to have a
This would auto-connect the PermanentSlot policy ( A variation on this (option 4) would be:
where the PermanentSlot policy for docker-privileged is empty and the interface connection is: Of these I prefer option 2. Option 4 has equivalent user experience with confusing yaml and implementation. Option 3 has an unwanted implicit slot. Option 1 doesn't work right with how slots are auto-connected. One thing about option 2 is that if we really hate the snap connect line, we can perhaps refine |
|
@zyga and @niemeyer (cc @ejratl) - can you weigh in on my (lengthy) discussion regarding docker-privileged as an interface attribute vs a separate interface? Important context:
|
tianon
added some commits
Aug 16, 2016
|
Regarding privileged discussion, here are my thoughts: Option 1 isn't really tenable from the perspective of Docker IMO since the two daemons would have entirely different state and different sockets for access (and not all folks even need privileged, so that second daemon would likely sit dormant anyhow), but even more importantly would store images separately, so the storage toll on the user is likely to balloon out of control quickly, not to mention usability suffering (have to explicitly point the client at one daemon or the other). Option 2 sounds the most appealing from my naive perspective -- easy for users to enable/disable the privileged behavior, even if the |
|
Marking as 'Blocked' for now since it isn't ready to land. I know I have some things to circle back on with this PR and will in the coming days. |
jdstrand
added
the
Blocked
label
Aug 18, 2016
niemeyer
changed the title from
Add initial "docker" interface based on some of 15.04's privileges
to
interfaces/builtin: add initial docker interface
Aug 26, 2016
niemeyer
added
the
Decaying
label
Aug 26, 2016
jdstrand
removed
the
Decaying
label
Aug 29, 2016
|
@tianon - where did we land on 'privileged' wrt to this PR? Will it be included here or in a future PR? |
chuckbutler
commented
Aug 31, 2016
•
|
[EDIT] |
|
Also, for docker to start at all someone needs to implement: https://bugs.launchpad.net/snappy/+bug/1583514/comments/5 (notes on how to implement KernelModule backed). |
|
Ok, I was able to get this running on all snaps (using overlayfs to replace snapd) if I added
|
|
(FYI the /sbin in wrappers was merged in snapcraft snapcore/snapcraft#763) |
tianon
added some commits
Sep 2, 2016
|
@jdstrand regarding privileged -- I think there was still a decision to be made regarding which direction we took for implementing it (#1619 (comment), #1619 (comment); also whether it was going to be a more permissive set or simply unconfined more like |
|
Note a recent commit broke the way this interface is written and intended to be used: #1848 (comment) |
|
https://bugs.launchpad.net/snappy/+bug/1583514 is still unassigned which makes this snap difficult to use. Who is going to work on that? |
|
FYI, I've found that using this PR with this yaml:
Then I can sideload the snap, then do:
Then I can use a trusty container like so:
I verified that launching app containers correctly results in applications running under the 'docker-default' profile. At this point, what remains is:
This last point is a new requirement and is here since snap declarations aren't yet. Once snap declarations are in place, this can be removed. Thankfully, this is easy to implement: https://github.com/snapcore/snapd/pull/1733/files#diff-cb4146608474e85ac3a939ff9c288b83R94. For now, I suggest using 'canonical' as the devName. By the time we are ready to hand this over to Docker we should have snap declarations in place (and it is easy to adjust if they aren't ready). I'll follow-up on '1' and will provide a starting point for a diff for option #2 (I want to look at it a little bit and then will hand-off). Feel free to do '3' and '4' now. |
|
"3. working out how to implement privileged mode. I think option #2 is the way to go" Ok, here is what option #2 looks like:
Here is some stub code for that: jdstrand/snapd@598951f Here is what the relevant parts of snap.yaml for the above:
|
|
FYI, someone has started working on "2. getting someone assigned to https://bugs.launchpad.net/snappy/+bug/1583514" |
|
I hadn't fully grasped the design of privileged mode at first, I thought a single docker daemon would switch to granting runs of privileged containers for all clients as soon as the interface was connected, but Jamie clarified that each client would connect to the privileged-container interface and that would allow running privileged containers or not – that sounds perfect to me :-) |
|
After discussing with Loic, we decided the path forward would be to merge this interface as is and I would implement the privileged mode attribute and the limiting to a particular snap and publisher. The kmod backend is assigned and being worked on. Point '1' doesn't block this PR so can be handled separately. No longer Blocked. +1 |
jdstrand
removed
the
Blocked
label
Sep 19, 2016
jdstrand
approved these changes
Sep 19, 2016
LGTM in its current form. Will handle remaining points in separate PRs.
niemeyer
requested changes
Sep 19, 2016
@jdstrand We cannot merge this until we have logic limiting the use of this slot to the docker from store only, because this interface is giving wide system access at the slot end even before the connection is established. It's okay to be on a different PR, but these two PRs need to be merged at once please.
niemeyer
added
the
Blocked
label
Sep 19, 2016
|
@jdstrand happy to implement any bits of that you don't think will be lots faster if you just do them Adding the check similar to LXD's seemed really straightforward for sure, and I can still take a look at the rest of your comments if you'd like (just been a bit buried the past few days). |
jdstrand
referenced this pull request
Sep 19, 2016
Merged
interfaces/builtin: add initial docker interface #1947
|
Since this PR needs to be refreshed against trunk and there are a few other small things to address besides points 3 and 4, I'm going to close this and point people to #1947 |
tianon commentedAug 2, 2016
This is a first pass on adding a👍
dockerinterface tosnapd. I've been discussing this off-and-on with @jdstrand on IRC, so hopefully it's at least an OK start, but if not, I'm happy to amend, rebase, refactor, etc as necessary or desired.One main question I've still got is around slot/plug overlap. With the way this is currently implemented, by
dockersnap includes both thedockerand thenetworkplugs on the Docker client command, and thedockerslot plus thenetwork-bindandfirewall-controlplugs on the Docker daemon. My question is whether that's the correct way to be thinking about this, or whether those overlapping privileges should be rolled into thedockerinterface directly?To be more precise, here's the exact
snapcraft.ymlsnippet:And the interfaces I've got to have connected for it to function appropriately: (as one would expect)