Docker Security Profiles (seccomp, apparmor, etc) #17142

icecrime · 2015-10-18T03:11:37Z

As mentioned in our ROADMAP.md, we'd like to progress toward seccomp support in Docker 1.10.

As a phase 1, I propose allowing the Engine to accept a seccomp profile at container run time. In the future, we might want to ship builtin profiles, or bake profiles in the images: design work about that future would be a plus.

Ping @jfrazelle who's interested to look into that!

jessfraz · 2015-10-18T03:26:50Z

[RFC] Docker Security Profiles

The profile would be passed on docker run, we can reuse the flag we already have --security-opt.

Something like docker run ... --security-opt native:/path/to/config.toml ...

Obviously doesn't have to be toml since that's super hipster :p

Assumptions

no one is going to sit and write out all the syscalls/capabilities their app needs
automatic profiling would be super cool but like aa-genprof it is never
perfect, leads to pain or removing the profile altogether, and an
unmaintainable config file (we can always attempt this later)

Goals

maintainable config
readable by humans and not a linux syscall/cap nerd
something an app developer would want to write
someone who did not write the config should be able to understand, at
least at a high level, what is restricted

Inspiration

Grouping into categories

High level things you would want to configure should be generic and limited
to (for example):

Networking
Filesystem (Disk)
Runtime (CPU/Memory operations)
User Operations
Misc

Defining Permissions

The cool thing about tame
I think we should implement are what they refer to as "flags". It's a set of
syscalls that they allow for a common goal, such as TAME_RW will allow all
the syscalls for i/o operations but TAME_RPATH only allows the syscalls that
will enable read-only effects on the filesystem.

We can have this same concept and define them w syscalls and capabilities.

We would need to discuss what these were and find the most common use cases for
them.

Behaviors

If one permission denies a syscall and another allows it, the
deny should always override the allow.
Passing an empty config drops everything and nothing is allowed

Super super super alpha example

Kinda like jfrazelle/bane but better.

[Networking]
Flags = [
    # this will allow sendto(2), recvfrom(2), socket(2), connect(2)
    "dns",
    # adds CAP_NET_RAW
    "ping"
]

[Filesystem]
Flags = [
    # will allow lstat(2), chmod(2), chflags(2),
    # chown(2), unlink(2), fstat(2) on /tmp
    "tmp"
]
# filepaths where you would like to log on write
LogOnWrite = [
    "/etc/**",
    "/root/**"
]
# read-only filepaths
ReadOnly = [
    "/sys/**"
]

[Runtime]
Flags = [
    # allows getentropy(2), madvise(2), minherit(2),
    # mmap(2), mprotect(2), mquery(2), munmap(2)
    "malloc"
]

[User]
Flags = [
    # allows getuid(2), getgid(2), setuid(2), setugid(2)
    "create"
]

Backends

Will use whatever is installed on the system so if they have apparmor but no seccomp, then it will use apparmor (which can technically do all the syscall, cap, and filesystem privileges).

AppArmor
Seccomp
Capabilities

File Globbing

Taken from apparmor profiles file globbing.

Glob Example	Description
`/dir/file`	match a specific file
`/dir/*`	match any files in a directory (including dot files)
`/dir/a*`	match any file in a directory starting with a
`/dir/*.png`	match any file in a directory ending with .png
`/dir/[^.]*`	match any file in a directory except dot files
`/dir/`	match a directory
`/dir/*/`	match any directory within /dir/
`/dir/a*/`	match any directory within /dir/ starting with a
`/dir/*a/`	match any directory within /dir/ ending with a
`/dir/**`	match any file or directory in or below /dir/
`/dir/**/`	match any directory in or below /dir/
`/dir/**[^/]`	match any file in or below /dir/
`/dir{,1,2}/**`	match any file or directory in or below /dir/, /dir1/, and /dir2/

More Goodness

I think we should allow people to define their own flags (or whatever we end up calling them). It could be cool to have a way to do it with a define in text/template I believe this is possible if it is implemented the way I am thinking ;)

kisom · 2015-10-26T18:34:40Z

This is great. Some thoughts on the config file:

It would be cool to take the way apparmor does file permissions (e.g. from the chromium profile,

... {
   /lib/@{multiarch}/libgcc_s.so* mr,
    /lib{,32,64}/libm-*.so* mr,
    /lib/@{multiarch}/libm-*.so* mr,
...
}

but maybe allowing something more user-friendly for permission names (like read, write, etc). Default deny would be preferable, but that might not be the best option (maybe something the user can set, like AccessPolicy: whitelist or AccessPolicy:blacklist?).

It'd also be cool to have the logging support a similar scheme, so that something like

Log = [
    "/etc/something/config:read",
    "/var/run/something/**:write"
]

jessfraz · 2015-10-26T21:24:03Z

ah yay @kisom I definitely like the idea of something like :ro or :rw much like how it works for volumes

kisom · 2015-10-26T21:35:09Z

@jfrazelle having a shorthand is good for people who write a lot of these, but also having long names is easier for people to remember; both could probably be supported if a leading char is used distinguish short form from long form. Something like :lrw v. "lock read write".

jessfraz · 2015-10-26T22:43:20Z

yes for sure that makes sense

keloyang · 2015-10-28T01:22:02Z

How can it do the file permissions with the apparmor way? for example,how to limit to write a file?

jessfraz · 2015-10-28T01:25:24Z

it will generate an apparmor profile, this is not just seccomp config it
will be a generic security profile with backends

On Tue, Oct 27, 2015 at 6:23 PM, keloyang notifications@github.com wrote:

How can it do the file permissions with the apparmor way? for example,how
to limit to write a file?

—
Reply to this email directly or view it on GitHub
#17142 (comment).

calavera · 2015-11-04T23:06:41Z

👍 to whatever @jfrazelle says.

@anusha-ragunathan you should check this out.

jessfraz · 2015-11-15T05:46:52Z

for phase 1 see: #17989

cgwalters · 2015-11-19T15:48:03Z

Related: seccomp/libseccomp#11

cgwalters · 2015-11-19T19:24:54Z

I think going towards immutable containers makes a lot more sense as a first step. Basically the equivalent of what OSTree does, make / and /usr immutable, and only leave /tmp and /var writable.

The app's executables are all immutable, etc.

AppArmor is designed for a world where multple applications share a single rootfs, but Docker supports rootfs-per-app, so I don't see why it would be really valuable to specify what a container can do to its own files in /etc.

Things however get a lot more interesting if we're talking about controls over host bind mounts.

cgwalters · 2015-11-19T19:45:44Z

ReadOnly = [
    "/sys/**"
]

Note we're already bind mounting /sys as ro by default.

Lennie · 2015-11-19T22:13:19Z

@cgwalters I think the idea of using seccomp and so on is to protect the host by having a deny all, allow some list. When you can protect the host you also indirectly protect the containers from each other (if the kernel isn't breached, the barriers between containers can't be breached).

jessfraz · 2015-11-19T23:42:42Z

The profiles apply to containers, and yes re: /sys being read only already that's just a sample config, we can get nit picky later :P

cgwalters · 2015-11-20T02:43:28Z

Also, a major step for basic security is to run containers as non-root, and also ensure they can't gain root via setuid binaries in the image, using PR_SET_NO_NEW_PRIVS. Seccomp requires the latter.

In many cases a viable crutch if one needs root for setup (dpkg/rpm/etc as of today), is to have trusted base images that contain packages, then layer on later using non-root. This is what e.g. OpenShift S2I does.

seccomp can make sense to try to contain root (and a good reference for this is the systemd-nspawn blacklist). But I'd look at stricter seccomp only for non-root containers.

jessfraz · 2015-11-20T02:51:45Z

Well userns should help a lot of that I think not combining a bunch of stuff into this from the get go and slowly adding as needed would be the best route

cgwalters · 2015-11-20T13:33:42Z

Personally, I think userns is a hacky crutch for dpkg/rpm not operating as non-root; it does have the compelling advantage that you can just use e.g. RHEL6 yum or Debian wheezy apt-get as is. But the downside is exciting new attack surface.

Lennie · 2015-11-22T11:22:16Z

@cgwalters the problem isn't dpkg/rpm. Just look at for example fakechroot. It's that for some things you need to have root-like privileges. You could use capabilities, like with opening a port below 1024, but how do you trust an untrusted installer to set the capabilities on a package/program. Those kinds of privileges need to be delegated to it somehow. That is where userns comes in, right ?

Anyway, this is probably not the place to discuss this. Wasn't aware who I was talking to before, how about you make a blog post on your site and I'll comment there. :-)

rhatdan · 2015-12-01T17:16:48Z

I was asked to move discussion over here:

I agree with @mheon (No surprise there) If this is only opt in, then most users will never opt in. Having a default black list like systemd-nspawn and libvirt/qemu should also be possible. This will give all container greater security and not just for those crazy enough to try out --security-opt type flags.
We will need a way to set default security (seccomp) profiles at the daemon level.

My point being is if the only way to turn these on are by choice, no one (Or a very small percentage of users will).

Everyone is using SELinux by default. Everyone is using Dropped Capabilities by default, Everone is using read/only mount points by default.

How can we get seccomp for everyone by default?
People only opt in for security after they have had a security disaster, which is too late.

mheon · 2015-12-01T17:48:03Z

The easiest solution to this would be to optionally pass a default security profile to the daemon via a flag, which would be applied to all containers unless explicitly disabled or overridden. The default would have to be fairly permissive to ensure we don't inconvenience users terribly (if we do, they'll just disable it entirely), but it would certainly be better than nothing.

We could also make some syscalls unconditionally blocked in non-privileged containers, similar to how we always drop some capabilities. Most of the particularly offensive ones will already be restricted because a typical container doesn't retain CAP_SYS_ADMIN but there are others that no non-privileged container should ever need to make.

cgwalters · 2015-12-18T02:02:04Z

I definitely expect https://github.com/kubernetes/kubernetes/blob/master/docs/design/security_context.md to trigger some of these, so this isn't a case of each application author having to specify --security-opt to docker run, etc.

cpuguy83 · 2016-02-12T14:46:51Z

@lblackstone There is a default profile baked in, however in order to make sure the user can't change this profile, you'll have to use an authorization plugin.

lblackstone · 2016-02-12T14:52:23Z

@cpuguy83 Hadn't read up on those much yet, but that looks spot on. Thanks!

injectives · 2016-02-23T22:31:59Z

Is there any chance to use custom profile for the Docker build command?
Previously (on Docker 1.9.1) I was able to prepend my commands with linux32 for emulation purposes, but now it doesn't work. linux32 arch returns x86_64 instead of i686, so for example the command "linux32 yum install x" won't install the packages I wanted.
There were also custom 32bit base images on Docker Hub that use linux32 as an entrypoint, I suppose they won't work on 1.10 as well.

jessfraz · 2016-02-23T22:35:50Z

This is not the right place for this, you should try stracing linux32 and
see what syscall is missing we included I think all of them in the default
profile

On Tuesday, February 23, 2016, injectives notifications@github.com wrote:

Is there any chance to use custom profile for build?
Previously (on Docker 1.9.1) I was able to prepend my commands with
linux32 for emulation purposes, but now it doesn't work. linux32 arch
returns x86_64 instead of i686, so for example the command "linux32 yum
install x" won't install the packages I wanted.
There were also custom 32bit base images on Docker Hub that use linux32 as
an entrypoint, I suppose they won't work on 1.10 as well.

—
Reply to this email directly or view it on GitHub
#17142 (comment).

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

jessfraz · 2016-02-23T22:36:07Z

But please open a new issue

On Tuesday, February 23, 2016, Jessica Frazelle me@jessfraz.com wrote:

This is not the right place for this, you should try stracing linux32 and
see what syscall is missing we included I think all of them in the default
profile

On Tuesday, February 23, 2016, injectives <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Is there any chance to use custom profile for build?
Previously (on Docker 1.9.1) I was able to prepend my commands with
linux32 for emulation purposes, but now it doesn't work. linux32 arch
returns x86_64 instead of i686, so for example the command "linux32 yum
install x" won't install the packages I wanted.
There were also custom 32bit base images on Docker Hub that use linux32
as an entrypoint, I suppose they won't work on 1.10 as well.

—
Reply to this email directly or view it on GitHub
#17142 (comment).

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu
http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

injectives · 2016-02-23T22:42:50Z

Sorry. I just wasn't sure if it is an issue, but on the other hand my build has stopped doing the right thing.
I have created an issue for this.

rhatdan · 2016-02-24T13:05:34Z

I believe if you turn on seccomp with a profile this instantly blocks all access to non native syscalls. Turning on seccomp on x86_64 machine, will block all 32 bit syscalls, unless the profile allows for 32 bit syscalls.

jessfraz · 2016-02-24T13:23:11Z

The default profile allows 32

On Wednesday, February 24, 2016, Daniel J Walsh notifications@github.com
wrote:

I believe if you turn on seccomp with a profile this instantly blocks all
access to non native syscalls. Turning on seccomp on x86_64 machine, will
block all 32 bit syscalls, unless the profile allows for 32 bit syscalls.

—
Reply to this email directly or view it on GitHub
#17142 (comment).

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

rhatdan · 2016-02-25T14:03:54Z

@jfrazelle Great, thanks.

frioux · 2016-03-18T16:18:22Z

This is excellent, and I am pleased to have a program that takes the default profile and adds my extra syscalls and whatnot.

I don't know enough about the underlying seccomp eBPF system, but would it be possible to change profiles after a container has been started? If not, would it be possible to have a group of system calls, (in my case ptrace) that are only enabled when some eBPF map is modified to allow them? I can see this being too fiddly for default docker, but something to think about.

justincormack · 2016-03-18T16:22:52Z

You cannot change profiles after startup.

You can currently add ptrace into a new profile and use that, but we do plan to add an easy grouping feature to make it simpler.

frioux · 2016-03-18T16:23:34Z

I know you can't today, but is the Linux subsystem such that Docker could never support that?

justincormack · 2016-03-18T16:27:10Z

It is a part of the security feature that once blacklisted syscalls can never be allowed in a process, so it is intended to work like that in Linux, so it will not change.

frioux · 2016-03-18T16:30:28Z

Ok, good to know; thanks.

ndeloof · 2016-04-28T17:25:52Z

Would it make sense for Docker image to get security metadata attached ?
What I mean is I like the idea docker comes with some reasonable security profile, but I guess it will be hard for non-experts to define their own, and so you get most containers to run with unnecessary privileges. So could image metadata include security profile, and be used by default as long as it is a subset of user's configuration ? Official and well formed docker images could then be designed to only request the actually required capabilities/syscalls and as such ensure surface attack is minimal.
wdyt ?

rhatdan · 2016-04-28T17:39:14Z

I like the idea, but I also like the idea of allowing the image to request more access, which could at least allow the docker client to report to the user that this container will not run without the following capabilitiy, or requires this syscall, or requires SELinux to be disabled.

ndeloof · 2016-04-28T17:46:52Z

right, any assistance to let user know some features are required so he can check what they are about and decide to enable them would be nice, to avoid security newbies to just enable everything by default

justincormack · 2016-04-28T17:47:43Z

Yes, oddly I was having a discussion about this earlier today, and it was something that has come up before.

I was thinking of perhaps prototyping it by defining a security metadata schema that could define the necessary things, and then having a tool to read that and construct the run command.

I am not so sure about raising privileges, as a message saying "this container needs --cap-add SYS_ADMIN" to run might be abused to encourage people to run things with escalated privileges. Using it to drop privileges that are not needed seems ok though if the container has been labelled that way, eg the nginx image may just need NET_BIND_SERVICE and can drop all the other default capabilities.

rhatdan · 2016-04-28T17:54:06Z

Yes I love the idea of having the image run with less privs, but also preventing:

docker run ...
permission denied

Followed by

docker run --privileged ...
Success

And the user goes off running the containers without any security forever.

ndeloof · 2016-04-28T18:11:59Z

In both case this would indeed encourage users to run with extra privileges without taking care.
So need to make it clear about the risks.
"this container needs --cap-add SYS_ADMIN.
Your default configuration do exclude this capability, please use with care blah blah blah"
"
Another option would be to offer a link which explain each capability / seccomp role (good luck docker documentation team) and make it clear about the potential security risk.

djtm · 2016-04-28T19:34:14Z

@rhatdan @ndeloof: How about a nagging flag such as --allow-insecure for using dangerous options such as

privileged
cap_sys_admin
user=root

(Whether supplied to docker run or provided in the container image).

docker could exit with an error and a message such as:

You are attempting to run the container with (dangerous flags). Please add --allow-insecure to confirm you want to run the container without the default security. To read more about the secure use of docker, please visit http://... .

thaJeztah · 2016-04-29T00:10:48Z

Funny indeed, I had that conversation with @justincormack. I think it'd make sense to allow the image-maintainer to specify what capabilities / profile is needed for the image to run, but it should not automatically apply those (the person running the image should be the one deciding if the container actually gets those permissions).

Perhaps;

docker run --security-opt seccomp:embedded

to run the image with the seccomp-profile that's embedded in the image.

Possibly even think of disallowing --cap-add and --security-opt, and only allowing running images with the embedded profile? (Using a whitelist of images / trusted sources). Haven't given it much thought yet, so needs more thinking :D

mheon · 2016-04-29T00:42:58Z

There was talk in #22109 of allowing Seccomp profiles to be layered, permitting more than one to be used at a time. I think this would be an ideal way to add image-specific Seccomp profiles without requiring users to opt into using just the profile embedded in the image, or just the global profile baked into the daemon. In some cases, applying both profiles will have no benefit (the image profile could well block every syscall the global profile does). Still, loading both doesn't require fully trusting the security profile baked into an image, which might be more insecure than the default profile.

This doesn't help in cases where the image requires a syscall blocked by the default filter, but the baked-in filter only restricts a few high-impact syscalls. I'd say this should be handled similarly to the suggestions above for handling images that want to add capabilities instead of remove them. Requiring a flag or similar seems like a good idea.

Layering wouldn't really work with other things one might embed in an image's security profile, though. You can't exactly layer SELinux or Apparmor labels on the same file, for example.

rhatdan · 2016-04-29T14:17:54Z

Well a good packager could have his PID1 do a lot of what we are talking about, drop caps for example. Problem is most container packagers don't control PID1 code, they just stick in something like httpd.

I am not crazy about blocking options from the user like --cap-add or --security-opt, worried about unexpected consequences.

RobSkye · 2016-09-19T11:16:02Z

It makes sense to have an embedded security profile into the container metadata but would be great to prioritize the docker daemon settings because, as a sec guy, you maybe want to enforce a minimum profile and allow someone to run a container with a "better" profile but deny the use of profiles allowing things the default don't allow.

This can work in three ways:

--seccomp:embedded -> try to run the container with the embedded profile comparing it with the default profile. if the embedded profile is less secure, stop.

--seccomp:merge -> try to run the container merging (with the layering proposed in #22109) the embedded profile with the default profile prioritizing the default opts creating a more secure profile.

--seccomp:force-embedded -> run the embedded profile ignoring the default. This only can occur if the docker daemon is explicitly configured to allow this.

vdemeester · 2018-02-14T08:35:25Z

Current proposal for that can be found here : #32801

neersighted · 2024-04-03T13:22:47Z

I think I want to close this in favor of the (somewhat more concrete) #32801.

icecrime added this to the 1.10 milestone Oct 18, 2015

icecrime added roadmap area/security labels Oct 18, 2015

jessfraz closed this as completed Oct 18, 2015

jessfraz reopened this Oct 18, 2015

jessfraz self-assigned this Oct 19, 2015

keloyang mentioned this issue Oct 26, 2015

Seccomp initial support #17359

Closed

jessfraz mentioned this issue Nov 15, 2015

Phase 1: Initial seccomp support #17989

Merged

jessfraz changed the title ~~Seccomp initial support~~ Docker Security Profiles (seccomp, apparmor, etc) Nov 18, 2015

ndeloof unassigned jessfraz Apr 28, 2016

vdemeester removed the roadmap label Feb 14, 2018

neersighted closed this as completed Apr 3, 2024

Docker Security Profiles (seccomp, apparmor, etc) #17142

Docker Security Profiles (seccomp, apparmor, etc) #17142

Comments

icecrime commented Oct 18, 2015

jessfraz commented Oct 18, 2015 • edited Loading

[RFC] Docker Security Profiles

kisom commented Oct 26, 2015

jessfraz commented Oct 26, 2015

kisom commented Oct 26, 2015

jessfraz commented Oct 26, 2015

keloyang commented Oct 28, 2015

jessfraz commented Oct 28, 2015

calavera commented Nov 4, 2015

jessfraz commented Nov 15, 2015

cgwalters commented Nov 19, 2015

cgwalters commented Nov 19, 2015

cgwalters commented Nov 19, 2015

Lennie commented Nov 19, 2015

jessfraz commented Nov 19, 2015

cgwalters commented Nov 20, 2015

jessfraz commented Nov 20, 2015

cgwalters commented Nov 20, 2015

Lennie commented Nov 22, 2015

rhatdan commented Dec 1, 2015

mheon commented Dec 1, 2015

cgwalters commented Dec 18, 2015

cpuguy83 commented Feb 12, 2016

lblackstone commented Feb 12, 2016

injectives commented Feb 23, 2016

jessfraz commented Feb 23, 2016

jessfraz commented Feb 23, 2016

injectives commented Feb 23, 2016

rhatdan commented Feb 24, 2016

jessfraz commented Feb 24, 2016

rhatdan commented Feb 25, 2016

frioux commented Mar 18, 2016

justincormack commented Mar 18, 2016

frioux commented Mar 18, 2016

justincormack commented Mar 18, 2016

frioux commented Mar 18, 2016

ndeloof commented Apr 28, 2016

rhatdan commented Apr 28, 2016

ndeloof commented Apr 28, 2016

justincormack commented Apr 28, 2016

rhatdan commented Apr 28, 2016

ndeloof commented Apr 28, 2016

djtm commented Apr 28, 2016 • edited Loading

thaJeztah commented Apr 29, 2016

mheon commented Apr 29, 2016

rhatdan commented Apr 29, 2016

RobSkye commented Sep 19, 2016

vdemeester commented Feb 14, 2018

neersighted commented Apr 3, 2024

jessfraz commented Oct 18, 2015 •

edited

Loading

djtm commented Apr 28, 2016 •

edited

Loading