Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod-level security context proposal #12823

Merged
merged 1 commit into from Sep 25, 2015
Merged

Pod-level security context proposal #12823

merged 1 commit into from Sep 25, 2015

Conversation

@pmorie
Copy link
Member

@pmorie pmorie commented Aug 17, 2015

Proposal for differentiating pod-level and container-level security contexts.

@bgrant0607 @erictune @smarterclayton @pweil- @eparis

@googlebot googlebot added the cla: yes label Aug 17, 2015
@pmorie pmorie changed the title WIP: pod-level security context WIP: pod-level security context proposal Aug 17, 2015
@pmorie pmorie force-pushed the pmorie:pod-sc branch 2 times, most recently from 67b8fbe to 24be8ad Aug 17, 2015
@pmorie
Copy link
Member Author

@pmorie pmorie commented Aug 17, 2015

How could I forget @thockin

@k8s-bot
Copy link

@k8s-bot k8s-bot commented Aug 17, 2015

GCE e2e build/test passed for commit 769ea85.

@pmorie
Copy link
Member Author

@pmorie pmorie commented Aug 17, 2015

That's what i thought @k8s-bot 👊

@k8s-bot
Copy link

@k8s-bot k8s-bot commented Aug 17, 2015

GCE e2e build/test passed for commit 67b8fbe.

@k8s-bot
Copy link

@k8s-bot k8s-bot commented Aug 17, 2015

GCE e2e build/test passed for commit 24be8ad.

Privileged *bool
SELinuxOptions *SELinuxOptions
RunAsUser *int64
RunAsNonRoot bool

This comment has been minimized.

@thockin

thockin Aug 18, 2015
Member

why are there two flags for non-root? can we get away with just one?

This comment has been minimized.

@pmorie

pmorie Aug 18, 2015
Author Member

Good question -- they are two different facets:

  1. RunAsUser is the specific UID that is requested
  2. RunAsNonRoot is to ensure that when the uid is delegated to what's declared in the image metadata, that the image doesn't want to run as root.

If I had included the comments on those fields it would probably be clearer.

This comment has been minimized.

@thockin

thockin Aug 19, 2015
Member

Comments and names might make this clearer.

Capabilities *Capabilities
Privileged *bool
SELinuxOptions *SELinuxOptions
RunAsUser *int64

This comment has been minimized.

@thockin

thockin Aug 18, 2015
Member

Can we get declarative rather than imperative names? UserID, GroupID, SupplementalGroupIDs, etc

This comment has been minimized.

@pmorie

pmorie Aug 18, 2015
Author Member

I'm fine with that. @pweil-, any objection?

This comment has been minimized.

@pweil-

pweil- Aug 18, 2015
Member

Nope, change it 🔨

This comment has been minimized.

@bgrant0607

bgrant0607 Aug 19, 2015
Member

Good point, @thockin.

I'll try to add something to api-conventions.md about this.

@pmorie
Copy link
Member Author

@pmorie pmorie commented Aug 18, 2015

@thockin @bgrant0607 What's the backward-compat story for this going to be?

@bgrant0607
Copy link
Member

@bgrant0607 bgrant0607 commented Aug 19, 2015

We must support the current v1 API for the foreseeable future.


#### Container runtime changes

The docker and rkt `Runtime` implementations should be changed to set the group of each container

This comment has been minimized.


### Use Case: Override pod security context for container

Some use-cases require the containers in a pod to run with different security settings. As an

This comment has been minimized.

@bgrant0607

bgrant0607 Aug 19, 2015
Member

use cases
no hyphen

@bgrant0607
Copy link
Member

@bgrant0607 bgrant0607 commented Aug 19, 2015

I like it.

As when we introduced SecurityContext, we need to continue to support the current fields (except anything added since 1.0, such as hostIPC). Any duplicated fields need to be set from each other in conversion (it can't be done in defaulting because that causes problems for updates).

@bgrant0607 bgrant0607 added the area/api label Aug 19, 2015
@bgrant0607
Copy link
Member

@bgrant0607 bgrant0607 commented Aug 19, 2015

ContainerDefaults in the PodSecurityContext is problematic. If we copy it to ContainerSecurityContext of each container, that causes problems for updates -- we'd at least have to do something special. If we don't copy it, then we break v1 API clients expecting to be able to read the securityContext on each container.

@pmorie
Copy link
Member Author

@pmorie pmorie commented Aug 19, 2015

@bgrant0607 Agree it is problematic. IMO backward compat is going to be the trickiest aspect of actually coding this. By 'something special', I imagine you mean that for updates, we will have to project the container security contexts into the container defaults specified at the pod level. Accurate?

@bgrant0607
Copy link
Member

@bgrant0607 bgrant0607 commented Aug 19, 2015

@pmorie:

In addition to updates, we also need to think about reading versions of the resource from etcd that were written prior to the field being added.

It's general API policy to explicitly expand all default values so that client and other code dealing with the API don't all need to reason about implicit default behaviors. However, if a user originally set just one field that was copied to another field and then tried to update the original field, the two fields would then not match. What we'd really like is spreadsheet-like behavior, where updating any equivalent field updates the others automagically.

This doesn't exactly match what we do for defaulting, since that doesn't touch fields that are already set. It is possible to do during conversion, by treating one field in the internal representation as the source and others as mirrors. If/when we eliminate the internal representation we'd need to find another solution.

The other problem is how to tell which field was changed in the case that the two fields are set inconsistently. If PATCH, it should be straightforward. If PUT, then we'd need to diff the resource with the previous version.

With ContainerDefaults, it's even harder. If we copied ContainerDefaults into the ContainerSecurityContext of each container that didn't specify SecurityContext, we probably should not allow ContainerDefaults to be updated, since doing so would have no effect, unless the user explicitly nulled the ContainerSecurityContext fields again.

In this case, maybe we need to assume that anyone writing PodSecurityContext knows what they're doing (e.g., don't write PodSecurityContext and then just update the deprecated HostNetwork field without updating PodSecurityContext.HostNetwork), and we should just try to make reading the existing container-level securityContext work in a backward-compatible manner.

The other alternative is that we just make this better in the v2 API, and comment which fields are relevant to pod-level security. PodSecurityContext doesn't address host ports nor host paths, either.

Is this really worth the effort now?

two additions:

1. The `RunAsGroup` field specifies the GID the container process runs as
2. The `RunWithSupplementalGroups` field specifies additional groups the container process should

This comment has been minimized.

@vishh

vishh Aug 19, 2015
Member

What are the use cases for this? Is kubelet expected to verify that all the supplemental groups exist on the system?

This comment has been minimized.

@pmorie

pmorie Aug 19, 2015
Author Member

@vishh The use-case is to enable sharing volumes via supplemental groups. The supplemental groups don't need to exist in /etc/groups inside or outside the container -- since they're just IDs, not strings that have to be resolved by looking at the groups file. I'm preparing another PR for a volumes proposal that will go in-depth into their use.

This comment has been minimized.

@vishh

vishh Aug 19, 2015
Member

Ok. Looking forward to your proposal. Kubelet might end up using gids for disk management internally. To avoid conflicts, we might have to reserve ranges of gids.

This comment has been minimized.

@pmorie

pmorie Aug 19, 2015
Author Member

@vishh That's basically what I'm going to propose.

This comment has been minimized.

@pmorie

pmorie Aug 19, 2015
Author Member

Opened #12944 for volume stuff

This comment has been minimized.

@ghost

ghost Aug 26, 2015

I don't know that supplemental groups need to be part of the API. If RunAsGroup is specified in the pod's security context then the volumes can be owned by that GID. If it is not specified then the kubelet should generate and use supplemental group IDs in the background to solve the volume permissions problem. This has the added benefit of being able to solve the volume permissions problem without a need for API changes :)

This comment has been minimized.

@smarterclayton

smarterclayton Aug 26, 2015
Contributor

Hrm - if there is a use case for common supplemental groups across
different pods on shared volumes we'll still need a field.

On Aug 26, 2015, at 5:17 PM, Sami Wagiaalla notifications@github.com
wrote:

In docs/proposals/pod-security-context.md
#12823 (comment):

+type ContainerSecurityContext struct {

  • Capabilities *Capabilities
  • Privileged *bool
  • SELinuxOptions *SELinuxOptions
  • RunAsUser *int64
  • RunAsNonRoot bool
  • RunAsGroup *int64
  • RunWithSupplementalGroups []int64
    +}
    +```

+TheContainerSecurityContexttype is very similar to the existingSecurityContexttype, with
+two additions:
+
+1. TheRunAsGroupfield specifies the GID the container process runs as
+2. TheRunWithSupplementalGroups field specifies additional groups the container process should

I don't know that supplemental groups need to be part of the API. If
RunAsGroup is specified in the pod's security context then the volumes can
be owned by that GID. If it is not specified then the kubelet should
generate and use supplemental group IDs in the background to solve the
volume permissions problem. This has the added benefit of being able to
solve the volume permissions problem without a need for API changes :)


Reply to this email directly or view it on GitHub
https://github.com/kubernetes/kubernetes/pull/12823/files#r38029439.

@thockin
Copy link
Member

@thockin thockin commented Aug 19, 2015

I did not think of ContainerDefaults at all the same as API defaults fro non-specified values. I saw PodSecuritySpec and ContainerSecuritySpec as a late-binding base and overlay. In that sense, any pod that does not have a ContainerDefaults field (i.e. everything today) will continue to operate as it does today. Adding a ContainerDefaults would change the behavior ONLY of things that were not overridden by the container.

Updating the ContainerDefaults on a running pod would probably result in all of the containers being restarted (that's easy, we maybe could be smarter).

@pmorie
Copy link
Member Author

@pmorie pmorie commented Aug 19, 2015

I did not think of ContainerDefaults at all the same as API defaults fro non-specified values. I saw PodSecuritySpec and ContainerSecuritySpec as a late-binding base and overlay.

Me too.

@bgrant0607
Copy link
Member

@bgrant0607 bgrant0607 commented Aug 19, 2015

The problem with late binding / overlay is that anything looking at container-level SecurityContext would be misled. Let's say, for example, someone developed an admission-control plugin to enforce security policy over SecurityContext. An overlay would be a good way to circumvent enforcement.

@thockin
Copy link
Member

@thockin thockin commented Aug 19, 2015

Admission control would have to look at the final rendering of the
structure with the overlays. Or maybe I missed the point...

On Wed, Aug 19, 2015 at 3:46 PM, Brian Grant notifications@github.com
wrote:

The problem with late binding / overlay is that anything looking at
container-level SecurityContext would be misled. Let's say, for example,
someone developed an admission-control plugin to enforce security policy
over SecurityContext. An overlay would be a good way to circumvent
enforcement.


Reply to this email directly or view it on GitHub
#12823 (comment)
.

@bgrant0607
Copy link
Member

@bgrant0607 bgrant0607 commented Aug 19, 2015

The point is backwards compatibility with the existing v1 API -- code that's unaware of PodSecurityContext.

@thockin
Copy link
Member

@thockin thockin commented Aug 20, 2015

ahh, right. Good catch. yeah, that's harder then

On Wed, Aug 19, 2015 at 4:37 PM, Brian Grant notifications@github.com
wrote:

The point is backwards compatibility with the existing v1 API -- code
that's unaware of PodSecurityContext.


Reply to this email directly or view it on GitHub
#12823 (comment)
.

@pmorie
Copy link
Member Author

@pmorie pmorie commented Aug 20, 2015

@bgrant0607 @thockin At what point are we planning to introduce v2beta1?

@pmorie
Copy link
Member Author

@pmorie pmorie commented Aug 20, 2015

@bgrant0607 @thockin @smarterclayton

In the context of a new api version, I feel like this gets easier to support. A v1 podspec would be equivalent to a pod without a pod level security context in the internal API.

I do think we could accomplish our goals (which for me personally are around sharing volumes between multiple containers in a pod which may be running with different uid/gids) by omitting container defaults (maybe) and just introducing the pod-level security context for the pod-level options (host network ns, host ipc ns, pod-wide supplemental group). I think I could live with that for now (assuming it doesn't appear to be a horrible option after more thought) if the backwards compatibility issues within the v1 api are seriously problematic.

Let me think through fully what we would have to do on backwards compat and see what shakes out of that.

@smarterclayton
Copy link
Contributor

@smarterclayton smarterclayton commented Aug 20, 2015

I kind of like just saying we have this new thing that overrides the old thing as a start. Tim convinced me that most of the time the new thing is what people want. In the future, the old thing (per container) can mutate to take into account new thing

@pmorie
Copy link
Member Author

@pmorie pmorie commented Sep 24, 2015

@bgrant0607 Think you will have a chance to review today?

@pmorie
Copy link
Member Author

@pmorie pmorie commented Sep 24, 2015

@thockin Brian told me you were the one to bug about this :-D

1. Attributes that apply to the pod itself
2. Attributes that apply to the containers of the pod

In the internal API, fields of the `PodSpec` controlling the use of the host PID, IPC, and network

This comment has been minimized.

@thockin

thockin Sep 25, 2015
Member

might emphasize this is internal API only and why (compat)

This comment has been minimized.

@thockin

thockin Sep 25, 2015
Member

I see this below, so nm

// Optional: SecurityContext defines the security options the pod should be run with.
// Settings specified in this field take precedence over the settings defined in
// pod.Spec.SecurityContext.

This comment has been minimized.

@thockin

thockin Sep 25, 2015
Member

use JSON names in comments.

@thockin
Copy link
Member

@thockin thockin commented Sep 25, 2015

LGTM - most of my comments are for when the proposal turns into code. Just nits.

My only gripe with this doc is that it is very verbose in explaining what amounts to a relatively small change (after all the compat kerfuffle)

@bgrant0607
Copy link
Member

@bgrant0607 bgrant0607 commented Sep 25, 2015

Please remove WIP from commit and PR titles.

@thockin thockin changed the title WIP: pod-level security context proposal Pod-level security context proposal Sep 25, 2015
@thockin thockin added the lgtm label Sep 25, 2015
@pmorie
Copy link
Member Author

@pmorie pmorie commented Sep 25, 2015

My only gripe with this doc is that it is very verbose in explaining what amounts to a relatively small change (after all the compat kerfuffle)

+1

@pmorie pmorie force-pushed the pmorie:pod-sc branch from ef4a23f to 95cd1ff Sep 25, 2015
@pmorie
Copy link
Member Author

@pmorie pmorie commented Sep 25, 2015

Commit message changed. Thanks a lot @thockin @bgrant0607 @smarterclayton

@thockin thockin added lgtm and removed lgtm labels Sep 25, 2015
@k8s-bot
Copy link

@k8s-bot k8s-bot commented Sep 25, 2015

GCE e2e build/test passed for commit 95cd1ff.

@pmorie
Copy link
Member Author

@pmorie pmorie commented Sep 25, 2015

@bgrant0607
Copy link
Member

@bgrant0607 bgrant0607 commented Sep 25, 2015

Please rebase to get shippable.yml changes.

@pmorie pmorie force-pushed the pmorie:pod-sc branch from 95cd1ff to 0c70062 Sep 25, 2015
@pmorie
Copy link
Member Author

@pmorie pmorie commented Sep 25, 2015

@bgrant0607 Rebased

@k8s-bot
Copy link

@k8s-bot k8s-bot commented Sep 25, 2015

Unit, integration and GCE e2e test build/test passed for commit 0c70062.

@pmorie
Copy link
Member Author

@pmorie pmorie commented Sep 25, 2015

Shippable was a flake (#14577)

bgrant0607 added a commit that referenced this pull request Sep 25, 2015
Pod-level security context proposal
@bgrant0607 bgrant0607 merged commit cf75b0d into kubernetes:master Sep 25, 2015
3 of 4 checks passed
3 of 4 checks passed
Shippable Builds in progress on Shippable
Details
Jenkins GCE e2e 4738 tests run, 101 skipped, 0 failed.
Details
cla/google All necessary CLAs are signed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@bgrant0607 bgrant0607 added the team/api label Oct 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

You can’t perform that action at this time.