Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Security Profiles (seccomp, apparmor, etc) #17142

Closed
icecrime opened this issue Oct 18, 2015 · 60 comments
Closed

Docker Security Profiles (seccomp, apparmor, etc) #17142

icecrime opened this issue Oct 18, 2015 · 60 comments

Comments

@icecrime
Copy link
Contributor

As mentioned in our ROADMAP.md, we'd like to progress toward seccomp support in Docker 1.10.

As a phase 1, I propose allowing the Engine to accept a seccomp profile at container run time. In the future, we might want to ship builtin profiles, or bake profiles in the images: design work about that future would be a plus.

Ping @jfrazelle who's interested to look into that!

@icecrime icecrime added this to the 1.10 milestone Oct 18, 2015
@jessfraz
Copy link
Contributor

jessfraz commented Oct 18, 2015

[RFC] Docker Security Profiles

The profile would be passed on docker run, we can reuse the flag we already have --security-opt.

Something like docker run ... --security-opt native:/path/to/config.toml ...

Obviously doesn't have to be toml since that's super hipster :p

Assumptions

  • no one is going to sit and write out all the syscalls/capabilities their app needs
  • automatic profiling would be super cool but like aa-genprof it is never
    perfect, leads to pain or removing the profile altogether, and an
    unmaintainable config file (we can always attempt this later)

Goals

  • maintainable config
  • readable by humans and not a linux syscall/cap nerd
  • something an app developer would want to write
  • someone who did not write the config should be able to understand, at
    least at a high level, what is restricted

Inspiration

Grouping into categories

High level things you would want to configure should be generic and limited
to (for example):

  • Networking
  • Filesystem (Disk)
  • Runtime (CPU/Memory operations)
  • User Operations
  • Misc

Defining Permissions

The cool thing about tame
I think we should implement are what they refer to as "flags". It's a set of
syscalls that they allow for a common goal, such as TAME_RW will allow all
the syscalls for i/o operations but TAME_RPATH only allows the syscalls that
will enable read-only effects on the filesystem.

We can have this same concept and define them w syscalls and capabilities.

We would need to discuss what these were and find the most common use cases for
them.

Behaviors

  • If one permission denies a syscall and another allows it, the
    deny should always override the allow.
  • Passing an empty config drops everything and nothing is allowed

Super super super alpha example

Kinda like jfrazelle/bane but better.

[Networking]
Flags = [
    # this will allow sendto(2), recvfrom(2), socket(2), connect(2)
    "dns",
    # adds CAP_NET_RAW
    "ping"
]

[Filesystem]
Flags = [
    # will allow lstat(2), chmod(2), chflags(2),
    # chown(2), unlink(2), fstat(2) on /tmp
    "tmp"
]
# filepaths where you would like to log on write
LogOnWrite = [
    "/etc/**",
    "/root/**"
]
# read-only filepaths
ReadOnly = [
    "/sys/**"
]

[Runtime]
Flags = [
    # allows getentropy(2), madvise(2), minherit(2),
    # mmap(2), mprotect(2), mquery(2), munmap(2)
    "malloc"
]

[User]
Flags = [
    # allows getuid(2), getgid(2), setuid(2), setugid(2)
    "create"
]

Backends

Will use whatever is installed on the system so if they have apparmor but no seccomp, then it will use apparmor (which can technically do all the syscall, cap, and filesystem privileges).

  • AppArmor
  • Seccomp
  • Capabilities

File Globbing

Taken from apparmor profiles file globbing.

Glob Example Description
/dir/file match a specific file
/dir/* match any files in a directory (including dot files)
/dir/a* match any file in a directory starting with a
/dir/*.png match any file in a directory ending with .png
/dir/[^.]* match any file in a directory except dot files
/dir/ match a directory
/dir/*/ match any directory within /dir/
/dir/a*/ match any directory within /dir/ starting with a
/dir/*a/ match any directory within /dir/ ending with a
/dir/** match any file or directory in or below /dir/
/dir/**/ match any directory in or below /dir/
/dir/**[^/] match any file in or below /dir/
/dir{,1,2}/** match any file or directory in or below /dir/, /dir1/, and /dir2/

More Goodness

  • I think we should allow people to define their own flags (or whatever we end up calling them). It could be cool to have a way to do it with a define in text/template I believe this is possible if it is implemented the way I am thinking ;)

@jessfraz jessfraz reopened this Oct 18, 2015
@jessfraz jessfraz self-assigned this Oct 19, 2015
@kisom
Copy link

kisom commented Oct 26, 2015

This is great. Some thoughts on the config file:

It would be cool to take the way apparmor does file permissions (e.g. from the chromium profile,

... {
   /lib/@{multiarch}/libgcc_s.so* mr,
    /lib{,32,64}/libm-*.so* mr,
    /lib/@{multiarch}/libm-*.so* mr,
...
}

but maybe allowing something more user-friendly for permission names (like read, write, etc). Default deny would be preferable, but that might not be the best option (maybe something the user can set, like AccessPolicy: whitelist or AccessPolicy:blacklist?).

It'd also be cool to have the logging support a similar scheme, so that something like

Log = [
    "/etc/something/config:read",
    "/var/run/something/**:write"
]

@jessfraz
Copy link
Contributor

ah yay @kisom I definitely like the idea of something like :ro or :rw much like how it works for volumes

@kisom
Copy link

kisom commented Oct 26, 2015

@jfrazelle having a shorthand is good for people who write a lot of these, but also having long names is easier for people to remember; both could probably be supported if a leading char is used distinguish short form from long form. Something like :lrw v. "lock read write".

@jessfraz
Copy link
Contributor

yes for sure that makes sense

@keloyang
Copy link
Contributor

How can it do the file permissions with the apparmor way? for example,how to limit to write a file?

@jessfraz
Copy link
Contributor

it will generate an apparmor profile, this is not just seccomp config it
will be a generic security profile with backends

On Tue, Oct 27, 2015 at 6:23 PM, keloyang notifications@github.com wrote:

How can it do the file permissions with the apparmor way? for example,how
to limit to write a file?


Reply to this email directly or view it on GitHub
#17142 (comment).

@calavera
Copy link
Contributor

calavera commented Nov 4, 2015

👍 to whatever @jfrazelle says.

@anusha-ragunathan you should check this out.

@jessfraz
Copy link
Contributor

for phase 1 see: #17989

@jessfraz jessfraz changed the title Seccomp initial support Docker Security Profiles (seccomp, apparmor, etc) Nov 18, 2015
@cgwalters
Copy link
Contributor

Related: seccomp/libseccomp#11

@cgwalters
Copy link
Contributor

I think going towards immutable containers makes a lot more sense as a first step. Basically the equivalent of what OSTree does, make / and /usr immutable, and only leave /tmp and /var writable.

The app's executables are all immutable, etc.

AppArmor is designed for a world where multple applications share a single rootfs, but Docker supports rootfs-per-app, so I don't see why it would be really valuable to specify what a container can do to its own files in /etc.

Things however get a lot more interesting if we're talking about controls over host bind mounts.

@cgwalters
Copy link
Contributor

ReadOnly = [
    "/sys/**"
]

Note we're already bind mounting /sys as ro by default.

@Lennie
Copy link
Contributor

Lennie commented Nov 19, 2015

@cgwalters I think the idea of using seccomp and so on is to protect the host by having a deny all, allow some list. When you can protect the host you also indirectly protect the containers from each other (if the kernel isn't breached, the barriers between containers can't be breached).

@jessfraz
Copy link
Contributor

The profiles apply to containers, and yes re: /sys being read only already that's just a sample config, we can get nit picky later :P

@cgwalters
Copy link
Contributor

Also, a major step for basic security is to run containers as non-root, and also ensure they can't gain root via setuid binaries in the image, using PR_SET_NO_NEW_PRIVS. Seccomp requires the latter.

In many cases a viable crutch if one needs root for setup (dpkg/rpm/etc as of today), is to have trusted base images that contain packages, then layer on later using non-root. This is what e.g. OpenShift S2I does.

seccomp can make sense to try to contain root (and a good reference for this is the systemd-nspawn blacklist). But I'd look at stricter seccomp only for non-root containers.

@jessfraz
Copy link
Contributor

Well userns should help a lot of that I think not combining a bunch of stuff into this from the get go and slowly adding as needed would be the best route

@cgwalters
Copy link
Contributor

Personally, I think userns is a hacky crutch for dpkg/rpm not operating as non-root; it does have the compelling advantage that you can just use e.g. RHEL6 yum or Debian wheezy apt-get as is. But the downside is exciting new attack surface.

@Lennie
Copy link
Contributor

Lennie commented Nov 22, 2015

@cgwalters the problem isn't dpkg/rpm. Just look at for example fakechroot. It's that for some things you need to have root-like privileges. You could use capabilities, like with opening a port below 1024, but how do you trust an untrusted installer to set the capabilities on a package/program. Those kinds of privileges need to be delegated to it somehow. That is where userns comes in, right ?

Anyway, this is probably not the place to discuss this. Wasn't aware who I was talking to before, how about you make a blog post on your site and I'll comment there. :-)

@rhatdan
Copy link
Contributor

rhatdan commented Dec 1, 2015

I was asked to move discussion over here:

I agree with @mheon (No surprise there) If this is only opt in, then most users will never opt in. Having a default black list like systemd-nspawn and libvirt/qemu should also be possible. This will give all container greater security and not just for those crazy enough to try out --security-opt type flags.
We will need a way to set default security (seccomp) profiles at the daemon level.

My point being is if the only way to turn these on are by choice, no one (Or a very small percentage of users will).

Everyone is using SELinux by default. Everyone is using Dropped Capabilities by default, Everone is using read/only mount points by default.

How can we get seccomp for everyone by default?
People only opt in for security after they have had a security disaster, which is too late.

@mheon
Copy link
Contributor

mheon commented Dec 1, 2015

The easiest solution to this would be to optionally pass a default security profile to the daemon via a flag, which would be applied to all containers unless explicitly disabled or overridden. The default would have to be fairly permissive to ensure we don't inconvenience users terribly (if we do, they'll just disable it entirely), but it would certainly be better than nothing.

We could also make some syscalls unconditionally blocked in non-privileged containers, similar to how we always drop some capabilities. Most of the particularly offensive ones will already be restricted because a typical container doesn't retain CAP_SYS_ADMIN but there are others that no non-privileged container should ever need to make.

@cgwalters
Copy link
Contributor

I definitely expect https://github.com/kubernetes/kubernetes/blob/master/docs/design/security_context.md to trigger some of these, so this isn't a case of each application author having to specify --security-opt to docker run, etc.

@cpuguy83
Copy link
Member

@lblackstone There is a default profile baked in, however in order to make sure the user can't change this profile, you'll have to use an authorization plugin.

@lblackstone
Copy link

@cpuguy83 Hadn't read up on those much yet, but that looks spot on. Thanks!

@injectives
Copy link

Is there any chance to use custom profile for the Docker build command?
Previously (on Docker 1.9.1) I was able to prepend my commands with linux32 for emulation purposes, but now it doesn't work. linux32 arch returns x86_64 instead of i686, so for example the command "linux32 yum install x" won't install the packages I wanted.
There were also custom 32bit base images on Docker Hub that use linux32 as an entrypoint, I suppose they won't work on 1.10 as well.

@jessfraz
Copy link
Contributor

This is not the right place for this, you should try stracing linux32 and
see what syscall is missing we included I think all of them in the default
profile

On Tuesday, February 23, 2016, injectives notifications@github.com wrote:

Is there any chance to use custom profile for build?
Previously (on Docker 1.9.1) I was able to prepend my commands with
linux32 for emulation purposes, but now it doesn't work. linux32 arch
returns x86_64 instead of i686, so for example the command "linux32 yum
install x" won't install the packages I wanted.
There were also custom 32bit base images on Docker Hub that use linux32 as
an entrypoint, I suppose they won't work on 1.10 as well.


Reply to this email directly or view it on GitHub
#17142 (comment).

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

@jessfraz
Copy link
Contributor

But please open a new issue

On Tuesday, February 23, 2016, Jessica Frazelle me@jessfraz.com wrote:

This is not the right place for this, you should try stracing linux32 and
see what syscall is missing we included I think all of them in the default
profile

On Tuesday, February 23, 2016, injectives <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

Is there any chance to use custom profile for build?
Previously (on Docker 1.9.1) I was able to prepend my commands with
linux32 for emulation purposes, but now it doesn't work. linux32 arch
returns x86_64 instead of i686, so for example the command "linux32 yum
install x" won't install the packages I wanted.
There were also custom 32bit base images on Docker Hub that use linux32
as an entrypoint, I suppose they won't work on 1.10 as well.


Reply to this email directly or view it on GitHub
#17142 (comment).

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu
http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

@injectives
Copy link

Sorry. I just wasn't sure if it is an issue, but on the other hand my build has stopped doing the right thing.
I have created an issue for this.

@rhatdan
Copy link
Contributor

rhatdan commented Feb 24, 2016

I believe if you turn on seccomp with a profile this instantly blocks all access to non native syscalls. Turning on seccomp on x86_64 machine, will block all 32 bit syscalls, unless the profile allows for 32 bit syscalls.

@jessfraz
Copy link
Contributor

The default profile allows 32

On Wednesday, February 24, 2016, Daniel J Walsh notifications@github.com
wrote:

I believe if you turn on seccomp with a profile this instantly blocks all
access to non native syscalls. Turning on seccomp on x86_64 machine, will
block all 32 bit syscalls, unless the profile allows for 32 bit syscalls.


Reply to this email directly or view it on GitHub
#17142 (comment).

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

@rhatdan
Copy link
Contributor

rhatdan commented Feb 25, 2016

@jfrazelle Great, thanks.

@frioux
Copy link

frioux commented Mar 18, 2016

This is excellent, and I am pleased to have a program that takes the default profile and adds my extra syscalls and whatnot.

I don't know enough about the underlying seccomp eBPF system, but would it be possible to change profiles after a container has been started? If not, would it be possible to have a group of system calls, (in my case ptrace) that are only enabled when some eBPF map is modified to allow them? I can see this being too fiddly for default docker, but something to think about.

@justincormack
Copy link
Contributor

You cannot change profiles after startup.

You can currently add ptrace into a new profile and use that, but we do plan to add an easy grouping feature to make it simpler.

@frioux
Copy link

frioux commented Mar 18, 2016

I know you can't today, but is the Linux subsystem such that Docker could never support that?

@justincormack
Copy link
Contributor

It is a part of the security feature that once blacklisted syscalls can never be allowed in a process, so it is intended to work like that in Linux, so it will not change.

@frioux
Copy link

frioux commented Mar 18, 2016

Ok, good to know; thanks.

@ndeloof
Copy link
Contributor

ndeloof commented Apr 28, 2016

Would it make sense for Docker image to get security metadata attached ?
What I mean is I like the idea docker comes with some reasonable security profile, but I guess it will be hard for non-experts to define their own, and so you get most containers to run with unnecessary privileges. So could image metadata include security profile, and be used by default as long as it is a subset of user's configuration ? Official and well formed docker images could then be designed to only request the actually required capabilities/syscalls and as such ensure surface attack is minimal.
wdyt ?

@rhatdan
Copy link
Contributor

rhatdan commented Apr 28, 2016

I like the idea, but I also like the idea of allowing the image to request more access, which could at least allow the docker client to report to the user that this container will not run without the following capabilitiy, or requires this syscall, or requires SELinux to be disabled.

@ndeloof
Copy link
Contributor

ndeloof commented Apr 28, 2016

right, any assistance to let user know some features are required so he can check what they are about and decide to enable them would be nice, to avoid security newbies to just enable everything by default

@justincormack
Copy link
Contributor

Yes, oddly I was having a discussion about this earlier today, and it was something that has come up before.

I was thinking of perhaps prototyping it by defining a security metadata schema that could define the necessary things, and then having a tool to read that and construct the run command.

I am not so sure about raising privileges, as a message saying "this container needs --cap-add SYS_ADMIN" to run might be abused to encourage people to run things with escalated privileges. Using it to drop privileges that are not needed seems ok though if the container has been labelled that way, eg the nginx image may just need NET_BIND_SERVICE and can drop all the other default capabilities.

@rhatdan
Copy link
Contributor

rhatdan commented Apr 28, 2016

Yes I love the idea of having the image run with less privs, but also preventing:

docker run ...
permission denied

Followed by

docker run --privileged ...
Success

And the user goes off running the containers without any security forever.

@ndeloof
Copy link
Contributor

ndeloof commented Apr 28, 2016

In both case this would indeed encourage users to run with extra privileges without taking care.
So need to make it clear about the risks.
"this container needs --cap-add SYS_ADMIN.
Your default configuration do exclude this capability, please use with care blah blah blah"
"
Another option would be to offer a link which explain each capability / seccomp role (good luck docker documentation team) and make it clear about the potential security risk.

@djtm
Copy link

djtm commented Apr 28, 2016

@rhatdan @ndeloof: How about a nagging flag such as --allow-insecure for using dangerous options such as

  • privileged
  • cap_sys_admin
  • user=root

(Whether supplied to docker run or provided in the container image).

docker could exit with an error and a message such as:

You are attempting to run the container with (dangerous flags). Please add --allow-insecure to confirm you want to run the container without the default security. To read more about the secure use of docker, please visit http://... .

@thaJeztah
Copy link
Member

Funny indeed, I had that conversation with @justincormack. I think it'd make sense to allow the image-maintainer to specify what capabilities / profile is needed for the image to run, but it should not automatically apply those (the person running the image should be the one deciding if the container actually gets those permissions).

Perhaps;

docker run --security-opt seccomp:embedded

to run the image with the seccomp-profile that's embedded in the image.

Possibly even think of disallowing --cap-add and --security-opt, and only allowing running images with the embedded profile? (Using a whitelist of images / trusted sources). Haven't given it much thought yet, so needs more thinking :D

@mheon
Copy link
Contributor

mheon commented Apr 29, 2016

There was talk in #22109 of allowing Seccomp profiles to be layered, permitting more than one to be used at a time. I think this would be an ideal way to add image-specific Seccomp profiles without requiring users to opt into using just the profile embedded in the image, or just the global profile baked into the daemon. In some cases, applying both profiles will have no benefit (the image profile could well block every syscall the global profile does). Still, loading both doesn't require fully trusting the security profile baked into an image, which might be more insecure than the default profile.

This doesn't help in cases where the image requires a syscall blocked by the default filter, but the baked-in filter only restricts a few high-impact syscalls. I'd say this should be handled similarly to the suggestions above for handling images that want to add capabilities instead of remove them. Requiring a flag or similar seems like a good idea.

Layering wouldn't really work with other things one might embed in an image's security profile, though. You can't exactly layer SELinux or Apparmor labels on the same file, for example.

@rhatdan
Copy link
Contributor

rhatdan commented Apr 29, 2016

Well a good packager could have his PID1 do a lot of what we are talking about, drop caps for example. Problem is most container packagers don't control PID1 code, they just stick in something like httpd.

I am not crazy about blocking options from the user like --cap-add or --security-opt, worried about unexpected consequences.

@RobSkye
Copy link
Contributor

RobSkye commented Sep 19, 2016

It makes sense to have an embedded security profile into the container metadata but would be great to prioritize the docker daemon settings because, as a sec guy, you maybe want to enforce a minimum profile and allow someone to run a container with a "better" profile but deny the use of profiles allowing things the default don't allow.

This can work in three ways:

--seccomp:embedded -> try to run the container with the embedded profile comparing it with the default profile. if the embedded profile is less secure, stop.

--seccomp:merge -> try to run the container merging (with the layering proposed in #22109) the embedded profile with the default profile prioritizing the default opts creating a more secure profile.

--seccomp:force-embedded -> run the embedded profile ignoring the default. This only can occur if the docker daemon is explicitly configured to allow this.

@vdemeester
Copy link
Member

Current proposal for that can be found here : #32801

@neersighted
Copy link
Member

I think I want to close this in favor of the (somewhat more concrete) #32801.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests