New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for seccomp #20870

Closed
vishh opened this Issue Feb 9, 2016 · 51 comments

Comments

Projects
None yet
@vishh
Copy link
Member

vishh commented Feb 9, 2016

Starting from v1.10 docker daemon will run with a restrictive seccomp profile turned on by default.

Users will have to include a custom seccomp profile in addition to capabilities.

Until we figure out a sane way to add support for seccomp, I propose disabling (unconfined) seccomp profiles by default.
This will allow users to upgrade to docker v1.10 seamlessly.

cc @kubernetes/goog-node @bgrant0607

@vishh

This comment has been minimized.

Copy link
Member

vishh commented Feb 9, 2016

cc @erictune @kubernetes/rh-platform-management

@bgrant0607

This comment has been minimized.

Copy link
Member

bgrant0607 commented Feb 9, 2016

@bgrant0607 bgrant0607 added this to the next-candidate milestone Feb 9, 2016

@ncdc

This comment has been minimized.

Copy link
Member

ncdc commented Feb 9, 2016

cc @kubernetes/rh-cluster-infra

@rhatdan

This comment has been minimized.

Copy link

rhatdan commented Feb 24, 2016

Is the problem here that you are worried about seccomp breaking certain actions? I hate to turn off a key security feature of docker for the masses of users of k8s.

@vishh

This comment has been minimized.

Copy link
Member

vishh commented Feb 24, 2016

@erictune

This comment has been minimized.

Copy link
Member

erictune commented Mar 28, 2016

Do these make sense as requirements for kubernetes support for seccomp filters:

  1. we, the project, pre-defines names for certain "canned" seccomp filters with sensible defaults
    • probably the same as Docker's 1.10 default, and unrestricted.
  2. cluster admins can define their own custom filters too.
  3. it should be possible to define a seccomp filter and a list of allowed capabilities in the same policy object, so that both are set on a pod/container at once, to avoid user confusion about the need to set both.
  4. users can select the policy that a pod/container uses by giving the short name of a policy object, rather than inlining the entire seccomp filter text into the pod spec.
  5. admins can change the default policy for pods that don't say what to do as regards seccomp.
  6. the default default, on a new/upgraded cluster does not change, without some affirmative user action.
  7. if a user writes a pod spec that asks for a level of security that is not supported by the node (e.g. they ask for a restrictive seccomp profile, and the node is running docker 1.9) then pod creation fails, rather then the user silently getting no security.
@erictune

This comment has been minimized.

Copy link
Member

erictune commented Mar 28, 2016

@kubernetes/kube-iam

@erictune

This comment has been minimized.

Copy link
Member

erictune commented Mar 28, 2016

@eparis

This comment has been minimized.

Copy link
Member

eparis commented Mar 29, 2016

@erictune my thoughts are here (although disorganized as usual) https://trello.com/c/1pn9IN3m/320-better-seccomp-support But I think you nailed it.

@erictune

This comment has been minimized.

Copy link
Member

erictune commented Mar 29, 2016

I didn't realize that only increasingly more restrictive seccomp profiles can be loaded, if I am reading the comments on https://trello.com/c/1pn9IN3m/320-better-seccomp-support correctly. That makes it much less flexible that I had imagined.

@eparis

This comment has been minimized.

Copy link
Member

eparis commented Mar 29, 2016

docker only lets us load a single profile, so that part isn't relevant today (I think RH is doing OCI work to make it a list of profiles but who knows if that ever gets plumbed up into docker)

If you load 2 profiles they will be executed in order and all of the profiles must 'approve' the syscall. If profile 1 says
deny mount; allow all
and profile 2 says
deny module_init; allow mount; allow all

Then BOTH mount and module_init will get denied.

@eparis

This comment has been minimized.

Copy link
Member

eparis commented Mar 29, 2016

So the profiles themselves do not have to be of increasing restrictivity (restrictivity is a word, right?) but the combination will always be at least as restrictive as either parent.

@jessfraz

This comment has been minimized.

Copy link
Contributor

jessfraz commented Apr 5, 2016

What if instead of turning it off completely kubernetes loaded a custom default, w mount and whatever else you all need

@jessfraz

This comment has been minimized.

Copy link
Contributor

jessfraz commented Apr 5, 2016

and if the user loads one it would override your default, just like how it works in docker.

@jessfraz

This comment has been minimized.

Copy link
Contributor

jessfraz commented Apr 5, 2016

I'd be happy to contribute the patch depending on what syscalls you all need added

@vishh

This comment has been minimized.

Copy link
Member

vishh commented Apr 5, 2016

If users have a mechanism to override the default profile, I guess docker's default profile should itself work.

@jessfraz

This comment has been minimized.

Copy link
Contributor

jessfraz commented Apr 5, 2016

So is the problem exposing the custom profile in the kubernetes API in a
time constraint? Just want to understand so I can help

On Monday, April 4, 2016, Vish Kannan notifications@github.com wrote:

If users have a mechanism to override the default profile, I guess
docker's default profile should itself work.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#20870 (comment)

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

@justincormack

This comment has been minimized.

Copy link

justincormack commented Apr 5, 2016

@erictune are you explicitly referring to Linux capabilities there? One option would be to tie the default Docker seccomp filter to the allowed capabilities so that adding say CAP_SYS_ADMIN allows some more syscalls, while --cap-drop-all removes a few more potentially. In particular adding CAP_SYS_ADMIN would allow unshare and mount, allowing systemd to run.

Getting admins to write seccomp filters is somewhat unpleasant work, the aim with the Docker seccomp support has been to avoid it as much as possible.

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Apr 5, 2016

Is this manifesting as a real problem right now? I am wild-guessing that it shows up as containers that fail (syscall blocked) with no remediation (since it didn't make it into our API) ?

It seems like an overly large thing to cherry pick, and yet not having it seems bad. Can we maybe have an annotation that we pass through to docker, and add proper profiles and fields in 1.3?

@rhatdan

This comment has been minimized.

Copy link

rhatdan commented Apr 5, 2016

I tried to add this pull request to docker, but it was closed because docker wants the tighter security

projectatomic/docker#80

I believe that we should attempt to separate different security mechanisms from each other. For example. SELinux can block Capabilities, but I allow all Caps from an SELinux point of view and rely on the Capabilities Subsystem to manage caps, Seccomp did not follow this standard, at least to a few syscalls that are controlled by SYS_ADMIN. We saw these when trying to run systemd in a container.
We are working on patches to allow the addition of syscalls to the white list as well as changing the default way that seccomp is enforced, so that users would be allowed to specify KILL versus ERRNO. KILL Is the only seccomp action that outputs blocked syscalls to /var/log/audit/audit.log, which could give the user a clue on why he is getting permission denied.

Allowing tools to specify Whitelist or blacklist syscalls would help. It would also help if the syscall table that docker is using was provided externally to the tool, not compiled in. Then the admin or tools could just modify the default json and pass it to docker daemon on on the docker run command to specify the syscalls to be used by the container(s).

@jessfraz

This comment has been minimized.

Copy link
Contributor

jessfraz commented Apr 5, 2016

If the default profile is the problem it could easily be fixed by my
suggestion of kubes passing its own w whatever other syscalls they need
added to the whitelist, fwict the problem is more time adding it to the API
unless I am mistaken

On Tuesday, April 5, 2016, Daniel J Walsh notifications@github.com wrote:

I tried to add this pull request to docker, but it was closed because
docker wants the tighter security

projectatomic/docker#80 projectatomic/docker#80

I believe that we should attempt to separate different security mechanisms
from each other. For example. SELinux can block Capabilities, but I allow
all Caps from an SELinux point of view and rely on the Capabilities
Subsystem to manage caps, Seccomp did not follow this standard, at least to
a few syscalls that are controlled by SYS_ADMIN. We saw these when trying
to run systemd in a container.

We are working on patches to allow the addition of syscalls to the white
list as well as changing the default way that seccomp is enforced, so that
users would be allowed to specify KILL versus ERRNO. KILL Is the only
seccomp action that outputs blocked syscalls to /var/log/audit/audit.log,
which could give the user a clue on why he is getting permission denied.

Allowing tools to specify Whitelist or blacklist syscalls would help. It
would also help if the syscall table that docker is using was provided
externally to the tool, not compiled in. Then the admin or tools could just
modify the default json and pass it to docker daemon on on the docker run
command to specify the syscalls to be used by the container(s).


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#20870 (comment)

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

@eparis

This comment has been minimized.

Copy link
Member

eparis commented Apr 5, 2016

In 1.2 we explicitly disable the seccomp profiles (aka we disable the docker default). So the 'manifestation' today is that kubernetes users cannot make use of this security feature provided by the underlying container backend.

The default docker seccomp profile would be great 99% of the time (the exact same % of the time that it would be great for docker). But until we plumb through some way to control the seccomp profile there wasn't a better alternative. This issue requires someone to plumb through the kube API a way to represent seccomp profiles and how to disable them.

I think there are a lot of ideas above how to plumb seccomp through the API, some form of which needs to be done before 1.3...

@bgrant0607

This comment has been minimized.

Copy link
Member

bgrant0607 commented Apr 6, 2016

We can't break existing applications running in existing clusters.

Users also expect to be able to delete and re-create their clusters across releases with identical behavior, and we have the official policy that we don't change default behavior of APIs without introducing a new API version.

@erictune

This comment has been minimized.

Copy link
Member

erictune commented Apr 6, 2016

@jfrazelle we don't know what calls our users might be doing, so we don't want to change the default and break them.

I see that before 1.10 you tested all the official docker images and didn't see any issues. Since then, have there been any user reports of images not working after upgrading to 1.10 due to seccomp?

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Apr 20, 2016

Any better conclusions here? I hate disabling it, but without an escape
hatch it's bound to break someone. And it is probably too late to enable
an escape hatch as a point release under 1.2 :(

On Thu, Apr 7, 2016 at 5:26 PM, Jess Frazelle notifications@github.com
wrote:

What does the current pods security context look like vs. container
security context. You can just give me links to wherever it lives,
don't worry :) I'll try to find it too.

On Thu, Apr 7, 2016 at 12:14 PM, Jessica Frazelle me@jessfraz.com wrote:

Agreed with @justincormack and @rhatdan, definitely no. 3 needs some
tooling
for debugging, right now I will say it is not the most user friendly
experience. For no. 2 if it is really that bad we should remove it from
the
default, unless its something like open ;)

On Thu, Apr 7, 2016 at 11:08 AM, Daniel J Walsh <
notifications@github.com>
wrote:

I would say the two most common use cases are going to be.

My app does not work with the default seccomp filters, how do I add the
missing syscalls to whitelist?
I heard that this syscall is dangerous and my app does not use it, how
do
I remove it from whitelist?
How do I figure out which syscall is blocking my app from running?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#20870 (comment)

@ncdc

This comment has been minimized.

Copy link
Member

ncdc commented Apr 20, 2016

I think we're probably going to have to defer a real solution until 1.3. I know @pmorie is planning on writing up a proposal.

@jessfraz

This comment has been minimized.

Copy link
Contributor

jessfraz commented Apr 20, 2016

I think it would be a good idea to not include it in say a .1 or patch
release because it is kinda changing behavior and is better for a .0, if
feel bad breaking semver

On Wednesday, April 20, 2016, Andy Goldstein notifications@github.com
wrote:

I think we're probably going to have to defer a real solution until 1.3. I
know @pmorie https://github.com/pmorie is planning on writing up a
proposal.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#20870 (comment)

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Apr 20, 2016

You could argue it was an oversight, but it's a stretch. I was hoping
someone would convince me, but ...

On Wed, Apr 20, 2016 at 8:15 AM, Jess Frazelle notifications@github.com
wrote:

I think it would be a good idea to not include it in say a .1 or patch
release because it is kinda changing behavior and is better for a .0, if
feel bad breaking semver

On Wednesday, April 20, 2016, Andy Goldstein notifications@github.com
wrote:

I think we're probably going to have to defer a real solution until 1.3.
I
know @pmorie https://github.com/pmorie is planning on writing up a
proposal.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
<
#20870 (comment)

Jessie Frazelle
4096R / D4C4 DD60 0D66 F65A 8EFC 511E 18F3 685C 0022 BFF3
pgp.mit.edu <
http://pgp.mit.edu/pks/lookup?op=get&search=0x18F3685C0022BFF3>


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#20870 (comment)

@jessfraz

This comment has been minimized.

Copy link
Contributor

jessfraz commented Apr 21, 2016

don't worry I'll be the pusher for 1.3 😇

@thockin

This comment has been minimized.

Copy link
Member

thockin commented Apr 21, 2016

I'mma hold you to that.

On Thu, Apr 21, 2016 at 10:13 AM, Jess Frazelle notifications@github.com
wrote:

don't worry I'll be the pusher for 1.3 [image: 😇]


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#20870 (comment)

@pmorie pmorie referenced this issue Apr 21, 2016

Closed

Seccomp umbrella issue #24611

1 of 2 tasks complete
@pmorie

This comment has been minimized.

Copy link
Member

pmorie commented Apr 26, 2016

@jfrazelle

don't worry I'll be the pusher for 1.3

👍

k8s-merge-robot added a commit that referenced this issue May 23, 2016

Merge pull request #24602 from pmorie/seccomp-proposal
Automatic merge from submit-queue

Seccomp Proposal

WIP proposal to address #20870 

@kubernetes/kube-api 
@kubernetes/sig-node

<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/24602)
<!-- Reviewable:end -->
@dims

This comment has been minimized.

Copy link
Member

dims commented Jun 8, 2016

@thockin @vishh We can close this now (Since PR has merged aa8c72a)

@ncdc

This comment has been minimized.

Copy link
Member

ncdc commented Jun 8, 2016

Not yet - we need #26710 to merge first

@dchen1107 dchen1107 modified the milestones: v1.3, next-candidate Jun 8, 2016

@dchen1107

This comment has been minimized.

Copy link
Member

dchen1107 commented Jun 8, 2016

Moved this to v1.3 for the proper feature tracking.

@goltermann goltermann modified the milestones: v1.3, next-candidate Jun 13, 2016

xingzhou pushed a commit to xingzhou/kubernetes that referenced this issue Dec 15, 2016

Merge pull request kubernetes#24602 from pmorie/seccomp-proposal
Automatic merge from submit-queue

Seccomp Proposal

WIP proposal to address kubernetes#20870 

@kubernetes/kube-api 
@kubernetes/sig-node

<!-- Reviewable:start -->
---
This change is [<img src="http://reviewable.k8s.io/review_button.svg" height="35" align="absmiddle" alt="Reviewable"/>](http://reviewable.k8s.io/reviews/kubernetes/kubernetes/24602)
<!-- Reviewable:end -->
@bgrant0607

This comment has been minimized.

Copy link
Member

bgrant0607 commented May 30, 2017

@tallclair

This comment has been minimized.

Copy link
Member

tallclair commented Sep 23, 2017

Closing in favor of kubernetes/enhancements#135

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment