Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Security Contexts #3910

Merged
merged 2 commits into from
Feb 17, 2015
Merged

Conversation

csrwng
Copy link
Contributor

@csrwng csrwng commented Jan 29, 2015

No description provided.

@alex-mohr
Copy link
Contributor

Looks interesting Cesar. The doc mentions #398 and #2297 as related -- any other issues? And since you obviously have more context, I'll let you @ relevant reviewers?

@alex-mohr
Copy link
Contributor

@erictune perhaps?

@csrwng
Copy link
Contributor Author

csrwng commented Jan 29, 2015

@smarterclayton @derekwaynecarr @erictune @bgrant0607 - fyi

@alex-mohr #3585 is also relevant. Comments are welcome.


In order to improve container isolation from host and other containers running on the host, containers should only be
granted the access they need to perform their work. To this end it should be possible to take advantage of Docker
features such as the ability to [add or remove capabilities](https://docs.docker.com/reference/run/#runtime-privilege-linux-capabilities-and-lxc-configuration) and [assign MCS labels](https://docs.docker.com/reference/run/#security-configuration)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What types of volumes would the MCS labels be used with? Presumably there aren't files that are sensitive for the container process in the emptydir. If this for files in the hostDir, or some other type of volume?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything - the container would be relabeled, the process would have those labels, and any volumes would either be labelled or potentially left as is (in a few cases maybe this is reasonable?). Common case though is "you get these labels". I believe we have all but the volume support upstream and we carry the relabeling support on RHEL docker.

On Jan 29, 2015, at 7:49 PM, Eric Tune notifications@github.com wrote:

In docs/design/security_context.md:

+A security context is a set of constraints that are applied to a container in order to achieve the following goals (from security design):
+
+1. Ensure a clear isolation between container and the underlying host it runs on
+2. Limit the ability of the container to negatively impact the infrastructure or other containers
+
+## Background
+
+The problem of securing containers in Kubernetes has come up before and the potential problems with container security are well known. Although it is not possible to completely isolate Docker containers from their hosts, new features like user namespaces make it possible to greatly reduce the attack surface.
+
+## Motivation
+
+### Container isolation
+
+In order to improve container isolation from host and other containers running on the host, containers should only be
+granted the access they need to perform their work. To this end it should be possible to take advantage of Docker
+features such as the ability to add or remove capabilities and assign MCS labels
What types of volumes would the MCS labels be used with? Presumably there aren't files that are sensitive for the container process in the emptydir. If this for files in the hostDir, or some other type of volume?


Reply to this email directly or view it on GitHub.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, reading further I see you are talking about NFS, and stuff like that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - actually relabel is bad if arbitrary (I shouldn't be able to relabel existing content because I tricked the master). It would be better to relabel only new content.

On Jan 29, 2015, at 8:32 PM, Eric Tune notifications@github.com wrote:

In docs/design/security_context.md:

+A security context is a set of constraints that are applied to a container in order to achieve the following goals (from security design):
+
+1. Ensure a clear isolation between container and the underlying host it runs on
+2. Limit the ability of the container to negatively impact the infrastructure or other containers
+
+## Background
+
+The problem of securing containers in Kubernetes has come up before and the potential problems with container security are well known. Although it is not possible to completely isolate Docker containers from their hosts, new features like user namespaces make it possible to greatly reduce the attack surface.
+
+## Motivation
+
+### Container isolation
+
+In order to improve container isolation from host and other containers running on the host, containers should only be
+granted the access they need to perform their work. To this end it should be possible to take advantage of Docker
+features such as the ability to add or remove capabilities and assign MCS labels
Okay, reading further I see you are talking about NFS, and stuff like that.


Reply to this email directly or view it on GitHub.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should define a kubelet default security context as well - ie if nothing is specified this is the context. The kubelet can just auto assign uids locally for user namespaces and do similar for labels. At least some defense in depth.

On Jan 29, 2015, at 8:32 PM, Eric Tune notifications@github.com wrote:

In docs/design/security_context.md:

+A security context is a set of constraints that are applied to a container in order to achieve the following goals (from security design):
+
+1. Ensure a clear isolation between container and the underlying host it runs on
+2. Limit the ability of the container to negatively impact the infrastructure or other containers
+
+## Background
+
+The problem of securing containers in Kubernetes has come up before and the potential problems with container security are well known. Although it is not possible to completely isolate Docker containers from their hosts, new features like user namespaces make it possible to greatly reduce the attack surface.
+
+## Motivation
+
+### Container isolation
+
+In order to improve container isolation from host and other containers running on the host, containers should only be
+granted the access they need to perform their work. To this end it should be possible to take advantage of Docker
+features such as the ability to add or remove capabilities and assign MCS labels
Okay, reading further I see you are talking about NFS, and stuff like that.


Reply to this email directly or view it on GitHub.

@erictune
Copy link
Member

@csrwng
this is a great doc. It explains an important problem and outlines enough of a solution that someone could start implementing.

The two areas where I'd like to see the ideas a little more developed are:

  • how to handle multiple uids in a container.
  • if the security context can be described as intent rather than implementation.

@csrwng
Copy link
Contributor Author

csrwng commented Jan 30, 2015

@erictune Is the thinking that the intent would be part of the pod spec, and only the security provider would handle the implementation?

@erictune
Copy link
Member

@csrwng yes.

@mrunalp
Copy link
Contributor

mrunalp commented Jan 30, 2015

@smarterclayton Yes, it is possible to pass two non-overlapping uid ranges for user namespaces. The current somewhat arbitrary limit in the kernel is that one can specify 5 ranges.

@erictune
Copy link
Member

I'm out next week. If you don't mind letting this sit till I get back I'd like to continue the discussion the following week.

@smarterclayton
Copy link
Contributor

Sure - I'll get a covering proposal drafted.

----- Original Message -----

I'm out next week. If you don't mind letting this sit till I get back I'd
like to continue the discussion the following week.


Reply to this email directly or view it on GitHub:
#3910 (comment)

@csrwng
Copy link
Contributor Author

csrwng commented Feb 9, 2015

Updated the definition of a security context to specify intent instead of implementation for isolation. For user and group id mapping, we would use what @smarterclayton described above. A security context can include a set of ids that are unique across the cluster and another set that is only mapped to ids of the node where the container is running.

@erictune erictune mentioned this pull request Feb 11, 2015
@erictune
Copy link
Member

Did I already ask whether security contexts are per pod or per container? The doc should state that. And, it should be reconciled with #3817

@csrwng
Copy link
Contributor Author

csrwng commented Feb 11, 2015

@erictune We're saying in this proposal that a security context has a 1:1 relationship to a service account (and actually should be part of the service account). And so far the thinking is that a service account is associated with a pod and not a container. (https://github.com/GoogleCloudPlatform/kubernetes/pull/2297/files#diff-f94a5fcd4f2edf411fa3e2c8e08a56d0R37)

smarterclayton added a commit to smarterclayton/kubernetes that referenced this pull request Feb 12, 2015
This proposed update to docs/design/security.md includes proposals
on how to ensure containers have consistent Linux security behavior
across nodes, how containers authenticate and authorize to the master
and other components, and how secret data could be distributed to
pods to allow that authentication.

References concepts from kubernetes#3910, kubernetes#2030, and kubernetes#2297 as well as upstream issues
around the Docker vault and Docker secrets.
@erictune
Copy link
Member

Do you want me to merge this and then we can pick at the finer points in follow up PRs?

@erictune
Copy link
Member

previous comment is for @csrwng

@smarterclayton
Copy link
Contributor

He's on PTO, but his answer would be yes :)

----- Original Message -----

previous comment is for @csrwng


Reply to this email directly or view it on GitHub:
#3910 (comment)

@pmorie
Copy link
Member

pmorie commented Feb 16, 2015

@csrwng @erictune @smarterclayton As a finer point, it should be spelled out how the volume plugin API should change. The volume plugin will need access to the the SecurityContext associated with a pod during the SetUp method, so signature of the NewBuilder method should probably change to:

  type Plugin interface {
    // other methods omitted
    NewBuilder(spec *api.Volume, context *api.SecurityContext, podUID types.UID) (Builder, error)
  }

erictune added a commit that referenced this pull request Feb 17, 2015
@erictune erictune merged commit 97b7f7c into kubernetes:master Feb 17, 2015
### External integration with shared storage
In order to support external integration with shared storage, processes running in a Kubernetes cluster
should be able to be uniquely identified by their Unix UID, such that a chain of ownership can be established.
Processes in pods will need to have consistent UID/GID/SELinux category labels in order to access shared disks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean the in-namespace UID or the root-namespace UID?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For disk, outside. However, the user to run as inside the namespace may be something a user wants to change. If namespaces are not present, a mismatch between the two should reject the pod, maybe.

On Feb 20, 2015, at 12:00 AM, Tim Hockin notifications@github.com wrote:

In docs/design/security_context.md:

+## Motivation
+
+### Container isolation
+
+In order to improve container isolation from host and other containers running on the host, containers should only be
+granted the access they need to perform their work. To this end it should be possible to take advantage of Docker
+features such as the ability to add or remove capabilities and assign MCS labels
+to the container process.
+
+Support for user namespaces has recently been merged into Docker's libcontainer project and should soon surface in Docker itself. It will make it possible to assign a range of unprivileged uids and gids from the host to each container, improving the isolation between host and container and between containers.
+
+### External integration with shared storage
+In order to support external integration with shared storage, processes running in a Kubernetes cluster
+should be able to be uniquely identified by their Unix UID, such that a chain of ownership can be established.
+Processes in pods will need to have consistent UID/GID/SELinux category labels in order to access shared disks.
Does this mean the in-namespace UID or the root-namespace UID?


Reply to this email directly or view it on GitHub.

@csrwng csrwng deleted the security_contexts branch March 18, 2015 14:51
xingzhou pushed a commit to xingzhou/kubernetes that referenced this pull request Dec 15, 2016
This proposed update to docs/design/security.md includes proposals
on how to ensure containers have consistent Linux security behavior
across nodes, how containers authenticate and authorize to the master
and other components, and how secret data could be distributed to
pods to allow that authentication.

References concepts from kubernetes#3910, kubernetes#2030, and kubernetes#2297 as well as upstream issues
around the Docker vault and Docker secrets.
xingzhou pushed a commit to xingzhou/kubernetes that referenced this pull request Dec 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants