Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-root support for volumes #12944

Merged

Conversation

pmorie
Copy link
Member

@pmorie pmorie commented Aug 19, 2015

This is a WIP proposal for handing concerns around volume ownership and pods running containers as non-root UIDs.

I'm still hacking around to see what kind of changes I think will be necessary but I think the use-case and analysis are probably ready to begin discussion around.

@thockin @vishh @smarterclayton @ncdc

@k8s-bot
Copy link

k8s-bot commented Aug 19, 2015

GCE e2e build/test failed for commit 7ffc2e6.

return nil
}

// Do images need to be pulled here?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, I think this will need to be a function of the container runtime, because the image pull happens behind a Runtime interface method call.

@k8s-bot
Copy link

k8s-bot commented Aug 20, 2015

GCE e2e build/test failed for commit eceffd5a364f09b3c8e73aa2ce6182fe461a66e2.

@ghost
Copy link

ghost commented Aug 25, 2015

@pmorie Re SELinux in this PR and #9844 what if we do the following:

  • Set all volumes to use the :z flag by default (for nfs the default label can be svirt_sandbox_file_t)
  • An optional improvement later can be to use the :Z flag where appropriate.
  • Add a SecurityContext feild to volumes so we can avoid having to infer explicit context from the first container in a pod a la WIP: Emptydir and security context #9844. If this field is specified the :z :Z flags are not used and files are labeled with the given context. Users who wish to avoid generic context can use this.

This would give good defaults and the options to change them for more strict security. WDYT ?

@smarterclayton
Copy link
Contributor

There's a discussion going on to do a pod security context - that has
mostly superseded the idea of a security context to volumes (volumes would
inherit from the pod security policy).

On Tue, Aug 25, 2015 at 4:11 PM, Sami Wagiaalla notifications@github.com
wrote:

@pmorie https://github.com/pmorie Re SELinux in this PR and #9844
#9844 what if we do the
following:

  • Set all volumes to use the :z flag by default (for nfs the default
    label can be svirt_sandbox_file_t)
  • An optional improvement later can be to use the :Z flag where
    appropriate.
  • Add a SecurityContext feild to volumes so we can avoid having to
    infer explicit context from the first container in a pod a la WIP: Emptydir and security context #9844
    WIP: Emptydir and security context #9844. If this field is
    specified the :z :Z flags are not used and files are labeled with the given
    context. Users who wish to avoid generic context can use this.

This would give good defaults and the options to change them for more
strict security. WDYT ?


Reply to this email directly or view it on GitHub
#12944 (comment)
.

Clayton Coleman | Lead Engineer, OpenShift

@ghost
Copy link

ghost commented Aug 26, 2015

There's a discussion going on to do a pod security context
Ah yes.. I missed that PR.. pod level security context makes sense

@pmorie
Copy link
Member Author

pmorie commented Aug 27, 2015

@smarterclayton @swagiaal I don't want to use :z at all anymore, because it relaxes the SELinux context of the volume so that it is usable from an SELinux standpoint by any container.

@k8s-bot
Copy link

k8s-bot commented Aug 27, 2015

GCE e2e build/test failed for commit d63eabd4b3668241f80057b3251c4e7b935ce97e.

3. The builders for distributed file systems should return the correct values based on the `Manage`
field of the volume source and the SELinux support of that volume type.

TODO: persistent volumes
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markturansky, does this concern fall under the 'dynamic provisioning' work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will read this entire PR carefully to understand how it will work with PV and then provide feedback.

@pmorie
Copy link
Member Author

pmorie commented Aug 27, 2015

@thockin @vishh @smarterclayton @swagiaal PTAL, made some major revisions and additions to this tonight.

@k8s-bot
Copy link

k8s-bot commented Aug 27, 2015

GCE e2e build/test failed for commit 7450cbc.

@pmorie
Copy link
Member Author

pmorie commented Aug 27, 2015

@childsb @wattsteve PTAL

@childsb
Copy link
Contributor

childsb commented Aug 27, 2015

i'd like to see 2nd order multi-tenancy addressed. Or the use case of existing storage/LDAP system with UID/GIDs that may overlap with UID/GIDs on the host or other containers. If the decision is to specifically not allow it should be stated (i think it should be allowed)

@pmorie
Copy link
Member Author

pmorie commented Aug 27, 2015

@childsb I'm not exactly clear on what '2nd order' multitenancy is -- what would be first vs. second order?

@pmorie
Copy link
Member Author

pmorie commented Aug 27, 2015

@childsb I am planning to add a section on cluster-wide UID/GID provisioning, just want to make sure I know what you're looking for.

@childsb
Copy link
Contributor

childsb commented Aug 27, 2015

@pmorie i would define 1st/2nd order tenancy like:
1st order - user submitting the POD to kube
2nd order - user logging into the application running on top of kube that the first user submitted

@pmorie
Copy link
Member Author

pmorie commented Aug 27, 2015

@childsb Is this the same topic we've previously discussed? You brought up examples of hadoop, tomcat, etc.

@childsb
Copy link
Contributor

childsb commented Aug 27, 2015

@pmorie yes same issue

@pmorie
Copy link
Member Author

pmorie commented Aug 27, 2015

@childsb Okay, I will address that in the section I add about that. TL;DR: You should make sure your containers run as a single UID, handling multiple UIDs within containers is not going to be supported.

@ghost
Copy link

ghost commented Aug 27, 2015

I don't want to use :z at all anymore, because it relaxes the SELinux context of the volume so that it is usable from an SELinux standpoint by any container.

If you do not use :z you must generate pod level MCS labels and override the container ones


The Kubelet must analyze the pod spec to determine which UIDs need to use which volumes. If a
container's security context's `RunAsUser` field is not set, the Kubelet must inspect the image via
the container runtime to determine which UID the image will run as. Once the list of UIDs that need
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the kubelet should avoid inspecting container images. If RunAsUser is not specified then rely on the GID to grant volume permissions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, and will change it. I don't think there should be any ownership determination. We already inspect the image though for the RunAsNonRoot feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this before and it is part of the policy because it matches
end user expectation. Images are late bound to nodes, so it has to be
enforced at the kubelet.

You can represent the policy you describe already.

On Aug 27, 2015, at 5:29 PM, Sami Wagiaalla notifications@github.com
wrote:

In docs/proposals/volumes.md
#12944 (comment):

+1. When does Kubernetes need to manage the ownership of a volume?
+2. How does Kubernetes determine the ownership of a volume?
+
+#### When to determine ownership
+
+Whether Kubernetes should manage the ownership of a volume for a distributed filesystem depends upon
+both the file system type and the cluster operator's policy. For example:
+
+1. Some organizations will manage ownership of volumes externally to the cluster
+2. It is not possible to securely chown or chmod paths within some distributed filesystems
+
+#### How to determine ownership
+
+The Kubelet must analyze the pod spec to determine which UIDs need to use which volumes. If a
+container's security context's RunAsUser field is not set, the Kubelet must inspect the image via
+the container runtime to determine which UID the image will run as. Once the list of UIDs that need

I think the kubelet should avoid inspecting container images. If RunAsUser
is not specified then rely on the GID to grant volume permissions


Reply to this email directly or view it on GitHub
https://github.com/kubernetes/kubernetes/pull/12944/files#r38142257.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant this in the context of RunAsNonRoot - late binding the image requires late binding the policy

@pmorie
Copy link
Member Author

pmorie commented Sep 28, 2015

@thockin @smarterclayton PTAL

@k8s-bot
Copy link

k8s-bot commented Sep 28, 2015

Unit, integration and GCE e2e build/test failed for commit d096d83b7eee0b20b06a09fa7f10b4bb5d82c399.

@pmorie
Copy link
Member Author

pmorie commented Sep 29, 2015

@thockin I updated the doc based on our discussion today. Since the SELinux one is going to have similar copy, would you care to review this one first? If we're over the hump on this one, I will update the SELinux proposal as well.

@k8s-bot
Copy link

k8s-bot commented Sep 29, 2015

Unit, integration and GCE e2e test build/test passed for commit 26d10817f78323d7d8a5c1a03bc33d1d8ea80542.

@k8s-github-robot k8s-github-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 30, 2015
@k8s-github-robot
Copy link

Labelling this PR as size/XL


If the list of UIDs that need to use a volume includes both root and non-root users, supplemental
groups can be applied to enable sharing volumes between containers. The ownership and permissions
`root:<supplemental group> 0770` will make a volume usable from both containers running as root and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need the mode to be 02770 on the dir - if the group sticky bit is not set users can write to the volume as themselves but the process's FSGID is what is used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to say we want the setgid bit set for now. I'm still thinking through all the implications of sticky bit. I think we want it, trying to determine if we should set it on all volumes.

@thockin
Copy link
Member

thockin commented Sep 30, 2015

just a couple nits, then LGTM.

I don't want to re-review, so feel free to self-lgtm this :)

@pmorie
Copy link
Member Author

pmorie commented Sep 30, 2015

By the power vested in me by #12944 (comment), I declare this PR LGTM.

@pmorie
Copy link
Member Author

pmorie commented Sep 30, 2015

Thanks for the review @thockin

@pmorie pmorie added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 30, 2015
@pmorie pmorie changed the title WIP: Non-root support for volumes Non-root support for volumes Sep 30, 2015
@k8s-bot
Copy link

k8s-bot commented Sep 30, 2015

Unit, integration and GCE e2e build/test failed for commit 32b6646.

@pmorie
Copy link
Member Author

pmorie commented Sep 30, 2015

@k8s-bot test this

@k8s-bot
Copy link

k8s-bot commented Oct 1, 2015

Unit, integration and GCE e2e build/test failed for commit 32b6646.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

k8s-github-robot pushed a commit that referenced this pull request Oct 1, 2015
@k8s-github-robot k8s-github-robot merged commit b840357 into kubernetes:master Oct 1, 2015
@pmorie pmorie mentioned this pull request Oct 9, 2015
xingzhou pushed a commit to xingzhou/kubernetes that referenced this pull request Dec 15, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet