-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker Security Profiles (seccomp, apparmor, etc) #17142
Comments
[RFC] Docker Security ProfilesThe profile would be passed on docker run, we can reuse the flag we already have Something like Obviously doesn't have to be toml since that's super hipster :p Assumptions
Goals
Inspiration Grouping into categories High level things you would want to configure should be generic and limited
Defining Permissions The cool thing about We can have this same concept and define them w syscalls and capabilities. We would need to discuss what these were and find the most common use cases for Behaviors
Super super super alpha example Kinda like [Networking]
Flags = [
# this will allow sendto(2), recvfrom(2), socket(2), connect(2)
"dns",
# adds CAP_NET_RAW
"ping"
]
[Filesystem]
Flags = [
# will allow lstat(2), chmod(2), chflags(2),
# chown(2), unlink(2), fstat(2) on /tmp
"tmp"
]
# filepaths where you would like to log on write
LogOnWrite = [
"/etc/**",
"/root/**"
]
# read-only filepaths
ReadOnly = [
"/sys/**"
]
[Runtime]
Flags = [
# allows getentropy(2), madvise(2), minherit(2),
# mmap(2), mprotect(2), mquery(2), munmap(2)
"malloc"
]
[User]
Flags = [
# allows getuid(2), getgid(2), setuid(2), setugid(2)
"create"
] Backends Will use whatever is installed on the system so if they have apparmor but no seccomp, then it will use apparmor (which can technically do all the syscall, cap, and filesystem privileges).
File Globbing Taken from apparmor profiles file globbing.
More Goodness
|
This is great. Some thoughts on the config file: It would be cool to take the way apparmor does file permissions (e.g. from the chromium profile,
but maybe allowing something more user-friendly for permission names (like read, write, etc). Default deny would be preferable, but that might not be the best option (maybe something the user can set, like It'd also be cool to have the logging support a similar scheme, so that something like
|
ah yay @kisom I definitely like the idea of something like |
@jfrazelle having a shorthand is good for people who write a lot of these, but also having long names is easier for people to remember; both could probably be supported if a leading char is used distinguish short form from long form. Something like |
yes for sure that makes sense |
How can it do the file permissions with the apparmor way? for example,how to limit to write a file? |
it will generate an apparmor profile, this is not just seccomp config it On Tue, Oct 27, 2015 at 6:23 PM, keloyang notifications@github.com wrote:
|
👍 to whatever @jfrazelle says. @anusha-ragunathan you should check this out. |
for phase 1 see: #17989 |
Related: seccomp/libseccomp#11 |
I think going towards immutable containers makes a lot more sense as a first step. Basically the equivalent of what OSTree does, make The app's executables are all immutable, etc. AppArmor is designed for a world where multple applications share a single rootfs, but Docker supports rootfs-per-app, so I don't see why it would be really valuable to specify what a container can do to its own files in Things however get a lot more interesting if we're talking about controls over host bind mounts. |
Note we're already bind mounting |
@cgwalters I think the idea of using seccomp and so on is to protect the host by having a deny all, allow some list. When you can protect the host you also indirectly protect the containers from each other (if the kernel isn't breached, the barriers between containers can't be breached). |
The profiles apply to containers, and yes re: /sys being read only already that's just a sample config, we can get nit picky later :P |
Also, a major step for basic security is to run containers as non-root, and also ensure they can't gain root via setuid binaries in the image, using In many cases a viable crutch if one needs root for setup (dpkg/rpm/etc as of today), is to have trusted base images that contain packages, then layer on later using non-root. This is what e.g. OpenShift S2I does. seccomp can make sense to try to contain root (and a good reference for this is the systemd-nspawn blacklist). But I'd look at stricter seccomp only for non-root containers. |
Well userns should help a lot of that I think not combining a bunch of stuff into this from the get go and slowly adding as needed would be the best route |
Personally, I think userns is a hacky crutch for dpkg/rpm not operating as non-root; it does have the compelling advantage that you can just use e.g. RHEL6 yum or Debian wheezy apt-get as is. But the downside is exciting new attack surface. |
@cgwalters the problem isn't dpkg/rpm. Just look at for example fakechroot. It's that for some things you need to have root-like privileges. You could use capabilities, like with opening a port below 1024, but how do you trust an untrusted installer to set the capabilities on a package/program. Those kinds of privileges need to be delegated to it somehow. That is where userns comes in, right ? Anyway, this is probably not the place to discuss this. Wasn't aware who I was talking to before, how about you make a blog post on your site and I'll comment there. :-) |
I was asked to move discussion over here: I agree with @mheon (No surprise there) If this is only opt in, then most users will never opt in. Having a default black list like systemd-nspawn and libvirt/qemu should also be possible. This will give all container greater security and not just for those crazy enough to try out --security-opt type flags. My point being is if the only way to turn these on are by choice, no one (Or a very small percentage of users will). Everyone is using SELinux by default. Everyone is using Dropped Capabilities by default, Everone is using read/only mount points by default. How can we get seccomp for everyone by default? |
The easiest solution to this would be to optionally pass a default security profile to the daemon via a flag, which would be applied to all containers unless explicitly disabled or overridden. The default would have to be fairly permissive to ensure we don't inconvenience users terribly (if we do, they'll just disable it entirely), but it would certainly be better than nothing. We could also make some syscalls unconditionally blocked in non-privileged containers, similar to how we always drop some capabilities. Most of the particularly offensive ones will already be restricted because a typical container doesn't retain |
I definitely expect https://github.com/kubernetes/kubernetes/blob/master/docs/design/security_context.md to trigger some of these, so this isn't a case of each application author having to specify |
@lblackstone There is a default profile baked in, however in order to make sure the user can't change this profile, you'll have to use an authorization plugin. |
@cpuguy83 Hadn't read up on those much yet, but that looks spot on. Thanks! |
Is there any chance to use custom profile for the Docker build command? |
This is not the right place for this, you should try stracing linux32 and On Tuesday, February 23, 2016, injectives notifications@github.com wrote:
Jessie Frazelle |
But please open a new issue On Tuesday, February 23, 2016, Jessica Frazelle me@jessfraz.com wrote:
Jessie Frazelle |
Sorry. I just wasn't sure if it is an issue, but on the other hand my build has stopped doing the right thing. |
I believe if you turn on seccomp with a profile this instantly blocks all access to non native syscalls. Turning on seccomp on x86_64 machine, will block all 32 bit syscalls, unless the profile allows for 32 bit syscalls. |
The default profile allows 32 On Wednesday, February 24, 2016, Daniel J Walsh notifications@github.com
Jessie Frazelle |
@jfrazelle Great, thanks. |
This is excellent, and I am pleased to have a program that takes the default profile and adds my extra syscalls and whatnot. I don't know enough about the underlying seccomp eBPF system, but would it be possible to change profiles after a container has been started? If not, would it be possible to have a group of system calls, (in my case ptrace) that are only enabled when some eBPF map is modified to allow them? I can see this being too fiddly for default docker, but something to think about. |
You cannot change profiles after startup. You can currently add ptrace into a new profile and use that, but we do plan to add an easy grouping feature to make it simpler. |
I know you can't today, but is the Linux subsystem such that Docker could never support that? |
It is a part of the security feature that once blacklisted syscalls can never be allowed in a process, so it is intended to work like that in Linux, so it will not change. |
Ok, good to know; thanks. |
Would it make sense for Docker image to get security metadata attached ? |
I like the idea, but I also like the idea of allowing the image to request more access, which could at least allow the docker client to report to the user that this container will not run without the following capabilitiy, or requires this syscall, or requires SELinux to be disabled. |
right, any assistance to let user know some features are required so he can check what they are about and decide to enable them would be nice, to avoid security newbies to just enable everything by default |
Yes, oddly I was having a discussion about this earlier today, and it was something that has come up before. I was thinking of perhaps prototyping it by defining a security metadata schema that could define the necessary things, and then having a tool to read that and construct the run command. I am not so sure about raising privileges, as a message saying "this container needs --cap-add SYS_ADMIN" to run might be abused to encourage people to run things with escalated privileges. Using it to drop privileges that are not needed seems ok though if the container has been labelled that way, eg the |
Yes I love the idea of having the image run with less privs, but also preventing:
Followed by
And the user goes off running the containers without any security forever. |
In both case this would indeed encourage users to run with extra privileges without taking care. |
@rhatdan @ndeloof: How about a nagging flag such as
(Whether supplied to docker run or provided in the container image). docker could exit with an error and a message such as:
|
Funny indeed, I had that conversation with @justincormack. I think it'd make sense to allow the image-maintainer to specify what capabilities / profile is needed for the image to run, but it should not automatically apply those (the person running the image should be the one deciding if the container actually gets those permissions). Perhaps;
to run the image with the seccomp-profile that's embedded in the image. Possibly even think of disallowing |
There was talk in #22109 of allowing Seccomp profiles to be layered, permitting more than one to be used at a time. I think this would be an ideal way to add image-specific Seccomp profiles without requiring users to opt into using just the profile embedded in the image, or just the global profile baked into the daemon. In some cases, applying both profiles will have no benefit (the image profile could well block every syscall the global profile does). Still, loading both doesn't require fully trusting the security profile baked into an image, which might be more insecure than the default profile. This doesn't help in cases where the image requires a syscall blocked by the default filter, but the baked-in filter only restricts a few high-impact syscalls. I'd say this should be handled similarly to the suggestions above for handling images that want to add capabilities instead of remove them. Requiring a flag or similar seems like a good idea. Layering wouldn't really work with other things one might embed in an image's security profile, though. You can't exactly layer SELinux or Apparmor labels on the same file, for example. |
Well a good packager could have his PID1 do a lot of what we are talking about, drop caps for example. Problem is most container packagers don't control PID1 code, they just stick in something like httpd. I am not crazy about blocking options from the user like --cap-add or --security-opt, worried about unexpected consequences. |
It makes sense to have an embedded security profile into the container metadata but would be great to prioritize the docker daemon settings because, as a sec guy, you maybe want to enforce a minimum profile and allow someone to run a container with a "better" profile but deny the use of profiles allowing things the default don't allow. This can work in three ways: --seccomp:embedded -> try to run the container with the embedded profile comparing it with the default profile. if the embedded profile is less secure, stop. --seccomp:merge -> try to run the container merging (with the layering proposed in #22109) the embedded profile with the default profile prioritizing the default opts creating a more secure profile. --seccomp:force-embedded -> run the embedded profile ignoring the default. This only can occur if the docker daemon is explicitly configured to allow this. |
Current proposal for that can be found here : #32801 |
I think I want to close this in favor of the (somewhat more concrete) #32801. |
As mentioned in our ROADMAP.md, we'd like to progress toward seccomp support in Docker 1.10.
As a phase 1, I propose allowing the Engine to accept a seccomp profile at container run time. In the future, we might want to ship builtin profiles, or bake profiles in the images: design work about that future would be a plus.
Ping @jfrazelle who's interested to look into that!
The text was updated successfully, but these errors were encountered: