Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volumes are created in container with root ownership and strict permissions #2630

Open
carlossg opened this issue Nov 26, 2014 · 205 comments
Open

Volumes are created in container with root ownership and strict permissions #2630

carlossg opened this issue Nov 26, 2014 · 205 comments

Comments

@carlossg
Copy link
Contributor

@carlossg carlossg commented Nov 26, 2014

The emptyDir volumeMount is owned by root:root and permissions set to 750
hostDir is the same but with 755 permissions

Containers running with a non-root USER can't access the volumes

Related discussion at https://groups.google.com/forum/#!topic/google-containers/D5NdjKFs6Cc
and Docker issue moby/moby#9360

@thockin
Copy link
Member

@thockin thockin commented Nov 26, 2014

hostDir should get the same permissions as the existing host entry, though I am not sure we ensure a host direct exists before using hostDir

Part of the problem here is that different containers can run as
different users in the same pod - which user do we create the volume
with? what we really need is a way to tell docker to add supplemental
group IDs when launching a container, so we can assign all containers
in a pod to a common group.

I filed moby/moby#9360

@carlossg
Copy link
Contributor Author

@carlossg carlossg commented Nov 26, 2014

Would it be reasonable to add user and/or permissions option to volumeMounts or emptyDir to explicitly force it?

@thockin
Copy link
Member

@thockin thockin commented Nov 26, 2014

I don't think that we want that in the API long-term, so I'd rather apply a
hidden heuristic like "chown to the USER of the first container that mounts
the volume" or even "ensure that all VolumeMounts for an emptyDir Volume
have the same USER, else error". Do you think such heuristics would hold?

On Wed, Nov 26, 2014 at 9:07 AM, Carlos Sanchez notifications@github.com
wrote:

Would it be reasonable to add user and/or permissions option to
volumeMounts or emptyDir to explicitly force it?

Reply to this email directly or view it on GitHub
#2630 (comment)
.

@carlossg
Copy link
Contributor Author

@carlossg carlossg commented Nov 26, 2014

That sounds good to me

@thockin
Copy link
Member

@thockin thockin commented Dec 1, 2014

This is a good starter project

@saad-ali saad-ali assigned saad-ali and unassigned thockin Dec 12, 2014
@saad-ali
Copy link
Member

@saad-ali saad-ali commented Dec 13, 2014

Background
Inside a docker container, the primary process is launched as root by default . And, currently, docker containers can not be run without root privileges (Once docker supports the user namespace, a process inside a container can run as root, and the container root user could actually be mapped to a normal, non-privileged, user outside the container). However, even today, inside a docker container a process can be run under a non-privileged user: the Docker image can create new users and then force docker to launch the entry point process as that user instead of root (as long as that user exists within the container image).

When an external volume is mounted it’s permissions are set to ROOT (UID 0), therefore unless the process inside the container is launched as root, it won’t have permission to access the mounted directory.

Proposed Workarounds on the Kuberentes side

  1. While creating pod, if it requires an EmptyDir volume, before starting containers, retrieve the USER from each container image (introspect JSON for each container image), if any of the containers are launching their main process as non-root, fail pod creation.
  2. While creating pod, if it requires an EmptyDir volume, before creating shared volume, Chown it to the USER of the first container that mounts the volume.
  • Problems with this approach:
    1. With Kubernetes a pod can contain multiple containers that share a volume, but each container could potentially run their processes with different users inside, meaning even if the owner of a volume was changed, unless the owner was changed to a group that all containers were aware of (and all relevant users were part of), the problem would still exist.
    2. Another interesting dimension to the problem is that running CHOWN on a shared volume from outside the containers could fail if the host machine does not have the same user as inside the container (container images can create a new user, that the host is unaware of, and have the entry process run as that user, but since that user does not exist on the host, CHOWN to that user from the host will fail).
      One work around for this is to share the etc/passwd file between the host and the container, but that is very limiting.
      Another potential workaround would be for the host to some how reach inside the container during initialization (before the shared volume is mounted), read the USER that the main process will start with and use the image “/etc/passwd” file to map the USER to UID, and CHOWN the shared volume on the host to that UID (CHOWN on the host would only fails if it doesn’t find a user string because it uses /etc/passwd to find the mapping, but it always succeeds with UIDs because it just sets the uint value directly without any lookup).

Both approaches feel to me like they are breaking a layer of abstraction by having Kubernetes reach into the container to figure out what user the main process would start as, and doing something outside the container with that information. I feel like the right approach would be for the containers themselves to CHOWN any "mounted volumes" during setup (after creating and setting user).

Thoughts?

@saad-ali
Copy link
Member

@saad-ali saad-ali commented Dec 18, 2014

@thockin, after talking to some folks, I think @carlossg's approach of explicitly specifying the user in the API would be the cleanest work around. I don't thing we can apply "hidden heuristics" without doing icky violation of abstractions (like reaching in to a container to figure out what username to use and then mounting the container's /etc/passwd file to figure out the associated UID).

Proposal to modify the API:

  • Extend the API for EmptyDir, GitRepo, and GCEPersistentDisk volumes to optionally specify a unsigned integer UID.
    • If the UID is specified, the host will change the owner of the directory to that UID and set the permissions to 750 (User: rwx, Group: r-x, World: ---) when the volume directory is created.
    • If the UID is not specified, the host will not change the owner, but set the permissions to 757 (User: rwx, Group: r-x, World: rxw), i.e. world writable, when the volume directory is created.
    • HostDir volumes would be left untouched, since those directories are not created by Kubernetes.
    • Require UID instead of username string so there are no problems if the user does exist on the host machine (issue 2.ii above).

Thoughts?

CC: @bgrant0607, @dchen1107, @lavalamp

@thockin
Copy link
Member

@thockin thockin commented Dec 22, 2014

I think adding UID to volumes is a hack and redundant. I'd rather we do
the right thing and get Docker to support supplemental group IDs.

moby/moby#9360

On Thu, Dec 18, 2014 at 3:13 PM, Saad Ali notifications@github.com wrote:

@thockin https://github.com/thockin, after talking to some folks, I
think @carlossg https://github.com/carlossg's approach of explicitly
specifying the user in the API would be the cleanest work around. I don't
thing we can apply "hidden heuristics" without doing icky violation of
abstractions (like reaching in to a container to figure out what username
to use and then mounting the container's /etc/passwd file to figure out
the associated UID).

Proposal to modify the API:

  • Extend the API for EmptyDir, GitRepo, and GCEPersistentDisk volumes
    to optionally specify a unsigned integer UID.
    • If the UID is specified, the host will change the owner of the
      directory to that UID and set the permissions to 750 (User: rwx,
      Group: r-x, World: ---) when the volume directory is created.
    • If the UID is not specified, the host will not change the owner,
      but set the permissions to 757 (User: rwx, Group: r-x, World: rxw),
      i.e. world writable, when the volume directory is created.
    • HostDir volumes would be left untouched, since those directories
      are not created by Kubernetes.
    • Require UID instead of username string so there are no problems
      if the user does exist on the host machine (issue 2.ii above).

Thoughts?

CC: @bgrant0607 https://github.com/bgrant0607, @dchen1107
https://github.com/dchen1107, @lavalamp https://github.com/lavalamp

Reply to this email directly or view it on GitHub
#2630 (comment)
.

@LuqmanSahaf
Copy link

@LuqmanSahaf LuqmanSahaf commented Jan 13, 2015

@saad-ali I think HostDir should not be left untouched. Let's consider this: Hadoop on restart, restores the blocks from the directory it stores data in. If we use emptyDir, the container which restarted will get another directory and the previous data will be lost. And Hadoop requires the permissions and ownership of directory to be set to the user starting Hadoop (hdfs). If HostDir is not allowed to change permissions as per user, then similar use cases to this cannot be achieved. Please, comment.

@thockin
Copy link
Member

@thockin thockin commented Jan 14, 2015

Define restart? Do you mean the container crashed and came back, or do you
mean the machine rebooted and a new pod was scheduled and expects to be
able to reclaim the disk space used by the previous pod? Or something else?

On Mon, Jan 12, 2015 at 11:56 PM, Luqman notifications@github.com wrote:

@saad-ali https://github.com/saad-ali I think HostDir should not be
left untouched. Let's consider this: Hadoop on restart, restores the blocks
from the directory it stores data in. If we use emptyDir, the container
which restarted will get another directory and the previous data will be
lost. And Hadoop requires the permissions and ownership of directory to be
set to the user starting Hadoop (hdfs). If HostDir is not allowed to change
permissions per user, then similar use cases to this cannot be achieved.
Please, comment.

Reply to this email directly or view it on GitHub
#2630 (comment)
.

@LuqmanSahaf
Copy link

@LuqmanSahaf LuqmanSahaf commented Jan 14, 2015

@thockin Restart could be anything. It could be after pod failure or container failure. Or the container could be restarted, after changing some configurations (Hadoop needs to be restarted after changes in configs). Does that answer?

@LuqmanSahaf
Copy link

@LuqmanSahaf LuqmanSahaf commented Jan 14, 2015

This document mentions that when a pod is unbound, the emptyDir is deleted. In use case of Hadoop, the data might be essential and might be required when another pod of Hadoop comes back (or the container restarts). So, HostDir must be used to persist data even the pod is unbound. But Hadoop requires permissions to be set for the user for the data directory. Hope this explains.

@saad-ali
Copy link
Member

@saad-ali saad-ali commented Jan 15, 2015

With docker-archive/libcontainer/pull/322, docker containers now allow specifying AdditionalGroups (supplementary group GIDs). So an updated proposal to handle shared volumes amongst different containers in a pod:

  • When creating EmptyDir, GitRepo, or GCEPersistentDisk volumes for a new pod, Kubelet will:
    1. Create a new linux group for the pod on the host machine
      • Group is created with the next available Group ID number (GID)
    2. Change the group of the new directory (on the host machine) to the newly created group.
    3. Set the permissions of the new directory (on the host machine) to 770 (User: rwx, Group: rwx, World: ---).
    4. For each docker container, pass in the GID of the new group as AdditionalGroups via docker container configs.
      • Still requires docker to support passing AdditionalGroups through to libcontainer (moby/moby#9360).
      • May require updating fsouza/go-dockerclient to support AdditionalGroups
  • When creating HostDir volumes for a new pod, Kubelet will:
    • Leave the volume untouched, since those directories are not created by Kubernetes.
    • @LuqmanSahaf: this is up for debate, but my thinking is that since Kubernetes does not create the HostDir, and since it may contain existing data, Kubernetes should not get in to the business of modifying it. We should leave it up to the creator and maintainer of the HostDir to modify it's ownership or permissions to allow containers to access it.
@thockin
Copy link
Member

@thockin thockin commented Jan 15, 2015

There's an important distinction between a container restarting and a pod
being removed. When a container restarts, the data in a normal emptyDir
volume is safe. when a pod is removed, it should be GONE. Leaving
Hostdata and expecting it to be there at some later point in time is
awkward at best.

All of this is more complicated as soon as user namespaces land.

On Wed, Jan 14, 2015 at 12:04 AM, Luqman notifications@github.com wrote:

This document
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/volumes.md#emptydir,
mentions that when a pod unbound, the emptyDir is deleted. In use case of
Hadoop, the data might be essential and might be required when another pod
of Hadoop comes back (or the container restarts). So, HostDir must be used
to persist data even the pod is unbound. But Hadoop requires permissions to
be set for the user for the data directory. Hope this explains.

Reply to this email directly or view it on GitHub
#2630 (comment)
.

@bgrant0607 bgrant0607 modified the milestone: v1.0 Feb 6, 2015
@brendandburns brendandburns removed this from the v1.0 milestone Apr 28, 2015
@Cobra1978
Copy link

@Cobra1978 Cobra1978 commented Jul 10, 2019

Hi.

There are some news about this issue?

@mikekuzak
Copy link

@mikekuzak mikekuzak commented Jul 16, 2019

Why does pv.beta.kubernetes.io/gid not work for the local host path provisoner ?

@Reinkaos
Copy link

@Reinkaos Reinkaos commented Aug 11, 2019

Hey,

I am encountering this as well, I'd appreciate some news :).

@hughobrien
Copy link

@hughobrien hughobrien commented Aug 12, 2019

this has been my workaround so far:

      - name: init
        image: busybox:latest
        command: ['/bin/chown', 'nobody:nogroup', '/<my dir>']
        volumeMounts:
        - name: data
          mountPath: /<my dir>
@maxneaga
Copy link

@maxneaga maxneaga commented Aug 21, 2019

this has been my workaround so far:

      - name: init
        image: busybox:latest
        command: ['/bin/chown', 'nobody:nogroup', '/<my dir>']
        volumeMounts:
        - name: data
          mountPath: /<my dir>

The workarounds with chowning do not work for read-only volumes, such as secret mounts, unfortunately.

@jeffdesc
Copy link

@jeffdesc jeffdesc commented Nov 15, 2019

I would need this as well (pretty urgently), because we have software not starting due to permissions not being able to be different then 0600. If we could mount the volume under a specific UID my (and other's) problem will be solved.

@woodcockjosh
Copy link

@woodcockjosh woodcockjosh commented Nov 15, 2019

You can run a job as part of your deployment to update the volume permissions and use a ready state to check for write permission as a workaround. Or you can use fsGroup to specify the group for the volume and add the application user to the group that owns the volume. Option 2 seems cleaner to me. I used to use option 1 but now I use option 2.

@wjam
Copy link

@wjam wjam commented Nov 25, 2019

Note that if Kubernetes did support an fsUser option, then you'd trip over #57923 where all files within the mounted secret would be given 0440 permission (or 0660 for writeable mounts) and would ignore any other configuration.

@theonewolf
Copy link

@theonewolf theonewolf commented Dec 4, 2019

@woodcockjosh fsGroup doesn't cover the use case of security-sensitive software such as Vault trying to run as vault:vault and loading a private key file requiring permissions equal to or less than 0600. @wjam fsUser would be ideal if we could get 0400 permissions set as well (for things like private key files).

We hit this trying to configure Vault to authenticate to a PostgreSQL DB with certificates. The underlying Go library hard fails if the permission bits differ (https://github.com/lib/pq/blob/90697d60dd844d5ef6ff15135d0203f65d2f53b8/ssl_permissions.go#L17).

@eichlerla2
Copy link

@eichlerla2 eichlerla2 commented Feb 20, 2020

@jingxu97: Are there any news on that. We still have the pv ownership problem in our clusters with strict security policies.

@kaleabgirma
Copy link

@kaleabgirma kaleabgirma commented Feb 21, 2020

This article looks like working I din't test it but I'll test it on Monday, if anyone can do it b4 then please let us know.
The detail is here
Data persistence is configured using persistent volumes. Due to the fact that Kubernetes mounts these volumes with the root user as the owner, the non-root containers don't have permissions to write to the persistent directory.

The following are some things we can do to solve these permission issues:

Use an init-container to change the permissions of the volume before mounting it in the non-root container. Example:

    spec:
       initContainers:
       - name: volume-permissions
         image: busybox
         command: ['sh', '-c', 'chmod -R g+rwX /bitnami']
         volumeMounts:
         - mountPath: /bitnami
           name: nginx-data
       containers:
       - image: bitnami/nginx:latest
         name: nginx
         volumeMounts:
         - mountPath: /bitnami
           name: nginx-data

Use Pod Security Policies to specify the user ID and the FSGroup that will own the pod volumes. (Recommended)

  spec:
      securityContext:
        runAsUser: 1001
        fsGroup: 1001
      containers:
      - image: bitnami/nginx:latest
        name: nginx
        volumeMounts:
        - mountPath: /bitnami
          name: nginx-data
@tisc0
Copy link

@tisc0 tisc0 commented Apr 2, 2020

Hi,
I've seen all around the Internet the workaround with that weak initContainer running as root.
I've also been struggling with fsGroup, which apply only on the scope of the pod, not on each container in a pod, which is [also] a shame.
Just build a custom image (nonroot-initContainer) based on alpine, with sudo installed and custom /etc/sudoers giving my non-root user full power to apply the chmod actions. Unfortunately, I'm hitting another wall with:

sudo: effective uid is not 0, is /usr/bin/sudo on a file system with the 'nosuid' \
option set or an NFS file system without root privileges?

Since I'm not willing to create a less secure PodSecurityPolicy for that deployment, any news from that issue would be very welcome for people having to be compliant with security best practices.

Thanks in advance !

@thehappycoder
Copy link

@thehappycoder thehappycoder commented Jun 8, 2020

Is there fsGroup for kubernetes deployment files?

@fejta-bot
Copy link

@fejta-bot fejta-bot commented Sep 6, 2020

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@unixfox
Copy link

@unixfox unixfox commented Sep 6, 2020

/remove-lifecycle stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.