-
Notifications
You must be signed in to change notification settings - Fork 38.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Security Contexts #3910
Conversation
@erictune perhaps? |
@smarterclayton @derekwaynecarr @erictune @bgrant0607 - fyi @alex-mohr #3585 is also relevant. Comments are welcome. |
|
||
In order to improve container isolation from host and other containers running on the host, containers should only be | ||
granted the access they need to perform their work. To this end it should be possible to take advantage of Docker | ||
features such as the ability to [add or remove capabilities](https://docs.docker.com/reference/run/#runtime-privilege-linux-capabilities-and-lxc-configuration) and [assign MCS labels](https://docs.docker.com/reference/run/#security-configuration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What types of volumes would the MCS labels be used with? Presumably there aren't files that are sensitive for the container process in the emptydir. If this for files in the hostDir, or some other type of volume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything - the container would be relabeled, the process would have those labels, and any volumes would either be labelled or potentially left as is (in a few cases maybe this is reasonable?). Common case though is "you get these labels". I believe we have all but the volume support upstream and we carry the relabeling support on RHEL docker.
On Jan 29, 2015, at 7:49 PM, Eric Tune notifications@github.com wrote:
In docs/design/security_context.md:
+A security context is a set of constraints that are applied to a container in order to achieve the following goals (from security design):
+
+1. Ensure a clear isolation between container and the underlying host it runs on
+2. Limit the ability of the container to negatively impact the infrastructure or other containers
+
+## Background
+
+The problem of securing containers in Kubernetes has come up before and the potential problems with container security are well known. Although it is not possible to completely isolate Docker containers from their hosts, new features like user namespaces make it possible to greatly reduce the attack surface.
+
+## Motivation
+
+### Container isolation
+
+In order to improve container isolation from host and other containers running on the host, containers should only be
+granted the access they need to perform their work. To this end it should be possible to take advantage of Docker
+features such as the ability to add or remove capabilities and assign MCS labels
What types of volumes would the MCS labels be used with? Presumably there aren't files that are sensitive for the container process in the emptydir. If this for files in the hostDir, or some other type of volume?—
Reply to this email directly or view it on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, reading further I see you are talking about NFS, and stuff like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - actually relabel is bad if arbitrary (I shouldn't be able to relabel existing content because I tricked the master). It would be better to relabel only new content.
On Jan 29, 2015, at 8:32 PM, Eric Tune notifications@github.com wrote:
In docs/design/security_context.md:
+A security context is a set of constraints that are applied to a container in order to achieve the following goals (from security design):
+
+1. Ensure a clear isolation between container and the underlying host it runs on
+2. Limit the ability of the container to negatively impact the infrastructure or other containers
+
+## Background
+
+The problem of securing containers in Kubernetes has come up before and the potential problems with container security are well known. Although it is not possible to completely isolate Docker containers from their hosts, new features like user namespaces make it possible to greatly reduce the attack surface.
+
+## Motivation
+
+### Container isolation
+
+In order to improve container isolation from host and other containers running on the host, containers should only be
+granted the access they need to perform their work. To this end it should be possible to take advantage of Docker
+features such as the ability to add or remove capabilities and assign MCS labels
Okay, reading further I see you are talking about NFS, and stuff like that.—
Reply to this email directly or view it on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should define a kubelet default security context as well - ie if nothing is specified this is the context. The kubelet can just auto assign uids locally for user namespaces and do similar for labels. At least some defense in depth.
On Jan 29, 2015, at 8:32 PM, Eric Tune notifications@github.com wrote:
In docs/design/security_context.md:
+A security context is a set of constraints that are applied to a container in order to achieve the following goals (from security design):
+
+1. Ensure a clear isolation between container and the underlying host it runs on
+2. Limit the ability of the container to negatively impact the infrastructure or other containers
+
+## Background
+
+The problem of securing containers in Kubernetes has come up before and the potential problems with container security are well known. Although it is not possible to completely isolate Docker containers from their hosts, new features like user namespaces make it possible to greatly reduce the attack surface.
+
+## Motivation
+
+### Container isolation
+
+In order to improve container isolation from host and other containers running on the host, containers should only be
+granted the access they need to perform their work. To this end it should be possible to take advantage of Docker
+features such as the ability to add or remove capabilities and assign MCS labels
Okay, reading further I see you are talking about NFS, and stuff like that.—
Reply to this email directly or view it on GitHub.
@csrwng The two areas where I'd like to see the ideas a little more developed are:
|
@erictune Is the thinking that the intent would be part of the pod spec, and only the security provider would handle the implementation? |
@csrwng yes. |
@smarterclayton Yes, it is possible to pass two non-overlapping uid ranges for user namespaces. The current somewhat arbitrary limit in the kernel is that one can specify 5 ranges. |
I'm out next week. If you don't mind letting this sit till I get back I'd like to continue the discussion the following week. |
Sure - I'll get a covering proposal drafted. ----- Original Message -----
|
Updated the definition of a security context to specify intent instead of implementation for isolation. For user and group id mapping, we would use what @smarterclayton described above. A security context can include a set of ids that are unique across the cluster and another set that is only mapped to ids of the node where the container is running. |
Did I already ask whether security contexts are per pod or per container? The doc should state that. And, it should be reconciled with #3817 |
@erictune We're saying in this proposal that a security context has a 1:1 relationship to a service account (and actually should be part of the service account). And so far the thinking is that a service account is associated with a pod and not a container. (https://github.com/GoogleCloudPlatform/kubernetes/pull/2297/files#diff-f94a5fcd4f2edf411fa3e2c8e08a56d0R37) |
This proposed update to docs/design/security.md includes proposals on how to ensure containers have consistent Linux security behavior across nodes, how containers authenticate and authorize to the master and other components, and how secret data could be distributed to pods to allow that authentication. References concepts from kubernetes#3910, kubernetes#2030, and kubernetes#2297 as well as upstream issues around the Docker vault and Docker secrets.
Do you want me to merge this and then we can pick at the finer points in follow up PRs? |
previous comment is for @csrwng |
He's on PTO, but his answer would be yes :) ----- Original Message -----
|
@csrwng @erictune @smarterclayton As a finer point, it should be spelled out how the volume plugin API should change. The volume plugin will need access to the the type Plugin interface {
// other methods omitted
NewBuilder(spec *api.Volume, context *api.SecurityContext, podUID types.UID) (Builder, error)
} |
### External integration with shared storage | ||
In order to support external integration with shared storage, processes running in a Kubernetes cluster | ||
should be able to be uniquely identified by their Unix UID, such that a chain of ownership can be established. | ||
Processes in pods will need to have consistent UID/GID/SELinux category labels in order to access shared disks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean the in-namespace UID or the root-namespace UID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For disk, outside. However, the user to run as inside the namespace may be something a user wants to change. If namespaces are not present, a mismatch between the two should reject the pod, maybe.
On Feb 20, 2015, at 12:00 AM, Tim Hockin notifications@github.com wrote:
In docs/design/security_context.md:
+## Motivation
+
+### Container isolation
+
+In order to improve container isolation from host and other containers running on the host, containers should only be
+granted the access they need to perform their work. To this end it should be possible to take advantage of Docker
+features such as the ability to add or remove capabilities and assign MCS labels
+to the container process.
+
+Support for user namespaces has recently been merged into Docker's libcontainer project and should soon surface in Docker itself. It will make it possible to assign a range of unprivileged uids and gids from the host to each container, improving the isolation between host and container and between containers.
+
+### External integration with shared storage
+In order to support external integration with shared storage, processes running in a Kubernetes cluster
+should be able to be uniquely identified by their Unix UID, such that a chain of ownership can be established.
+Processes in pods will need to have consistent UID/GID/SELinux category labels in order to access shared disks.
Does this mean the in-namespace UID or the root-namespace UID?—
Reply to this email directly or view it on GitHub.
This proposed update to docs/design/security.md includes proposals on how to ensure containers have consistent Linux security behavior across nodes, how containers authenticate and authorize to the master and other components, and how secret data could be distributed to pods to allow that authentication. References concepts from kubernetes#3910, kubernetes#2030, and kubernetes#2297 as well as upstream issues around the Docker vault and Docker secrets.
[Proposal] Security Contexts
No description provided.