New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volumes are created in container with root ownership and strict permissions #2630

Open
carlossg opened this Issue Nov 26, 2014 · 153 comments

Comments

@carlossg
Contributor

carlossg commented Nov 26, 2014

The emptyDir volumeMount is owned by root:root and permissions set to 750
hostDir is the same but with 755 permissions

Containers running with a non-root USER can't access the volumes

Related discussion at https://groups.google.com/forum/#!topic/google-containers/D5NdjKFs6Cc
and Docker issue moby/moby#9360

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Nov 26, 2014

Member

hostDir should get the same permissions as the existing host entry, though I am not sure we ensure a host direct exists before using hostDir

Part of the problem here is that different containers can run as
different users in the same pod - which user do we create the volume
with? what we really need is a way to tell docker to add supplemental
group IDs when launching a container, so we can assign all containers
in a pod to a common group.

I filed moby/moby#9360

Member

thockin commented Nov 26, 2014

hostDir should get the same permissions as the existing host entry, though I am not sure we ensure a host direct exists before using hostDir

Part of the problem here is that different containers can run as
different users in the same pod - which user do we create the volume
with? what we really need is a way to tell docker to add supplemental
group IDs when launching a container, so we can assign all containers
in a pod to a common group.

I filed moby/moby#9360

@carlossg

This comment has been minimized.

Show comment
Hide comment
@carlossg

carlossg Nov 26, 2014

Contributor

Would it be reasonable to add user and/or permissions option to volumeMounts or emptyDir to explicitly force it?

Contributor

carlossg commented Nov 26, 2014

Would it be reasonable to add user and/or permissions option to volumeMounts or emptyDir to explicitly force it?

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Nov 26, 2014

Member

I don't think that we want that in the API long-term, so I'd rather apply a
hidden heuristic like "chown to the USER of the first container that mounts
the volume" or even "ensure that all VolumeMounts for an emptyDir Volume
have the same USER, else error". Do you think such heuristics would hold?

On Wed, Nov 26, 2014 at 9:07 AM, Carlos Sanchez notifications@github.com
wrote:

Would it be reasonable to add user and/or permissions option to
volumeMounts or emptyDir to explicitly force it?

Reply to this email directly or view it on GitHub
#2630 (comment)
.

Member

thockin commented Nov 26, 2014

I don't think that we want that in the API long-term, so I'd rather apply a
hidden heuristic like "chown to the USER of the first container that mounts
the volume" or even "ensure that all VolumeMounts for an emptyDir Volume
have the same USER, else error". Do you think such heuristics would hold?

On Wed, Nov 26, 2014 at 9:07 AM, Carlos Sanchez notifications@github.com
wrote:

Would it be reasonable to add user and/or permissions option to
volumeMounts or emptyDir to explicitly force it?

Reply to this email directly or view it on GitHub
#2630 (comment)
.

@carlossg

This comment has been minimized.

Show comment
Hide comment
@carlossg

carlossg Nov 26, 2014

Contributor

That sounds good to me

Contributor

carlossg commented Nov 26, 2014

That sounds good to me

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Dec 1, 2014

Member

This is a good starter project

Member

thockin commented Dec 1, 2014

This is a good starter project

@saad-ali saad-ali assigned saad-ali and unassigned thockin Dec 12, 2014

@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali Dec 13, 2014

Member

Background
Inside a docker container, the primary process is launched as root by default . And, currently, docker containers can not be run without root privileges (Once docker supports the user namespace, a process inside a container can run as root, and the container root user could actually be mapped to a normal, non-privileged, user outside the container). However, even today, inside a docker container a process can be run under a non-privileged user: the Docker image can create new users and then force docker to launch the entry point process as that user instead of root (as long as that user exists within the container image).

When an external volume is mounted it’s permissions are set to ROOT (UID 0), therefore unless the process inside the container is launched as root, it won’t have permission to access the mounted directory.

Proposed Workarounds on the Kuberentes side

  1. While creating pod, if it requires an EmptyDir volume, before starting containers, retrieve the USER from each container image (introspect JSON for each container image), if any of the containers are launching their main process as non-root, fail pod creation.
  2. While creating pod, if it requires an EmptyDir volume, before creating shared volume, Chown it to the USER of the first container that mounts the volume.
  • Problems with this approach:
    1. With Kubernetes a pod can contain multiple containers that share a volume, but each container could potentially run their processes with different users inside, meaning even if the owner of a volume was changed, unless the owner was changed to a group that all containers were aware of (and all relevant users were part of), the problem would still exist.
    2. Another interesting dimension to the problem is that running CHOWN on a shared volume from outside the containers could fail if the host machine does not have the same user as inside the container (container images can create a new user, that the host is unaware of, and have the entry process run as that user, but since that user does not exist on the host, CHOWN to that user from the host will fail).
      One work around for this is to share the etc/passwd file between the host and the container, but that is very limiting.
      Another potential workaround would be for the host to some how reach inside the container during initialization (before the shared volume is mounted), read the USER that the main process will start with and use the image “/etc/passwd” file to map the USER to UID, and CHOWN the shared volume on the host to that UID (CHOWN on the host would only fails if it doesn’t find a user string because it uses /etc/passwd to find the mapping, but it always succeeds with UIDs because it just sets the uint value directly without any lookup).

Both approaches feel to me like they are breaking a layer of abstraction by having Kubernetes reach into the container to figure out what user the main process would start as, and doing something outside the container with that information. I feel like the right approach would be for the containers themselves to CHOWN any "mounted volumes" during setup (after creating and setting user).

Thoughts?

Member

saad-ali commented Dec 13, 2014

Background
Inside a docker container, the primary process is launched as root by default . And, currently, docker containers can not be run without root privileges (Once docker supports the user namespace, a process inside a container can run as root, and the container root user could actually be mapped to a normal, non-privileged, user outside the container). However, even today, inside a docker container a process can be run under a non-privileged user: the Docker image can create new users and then force docker to launch the entry point process as that user instead of root (as long as that user exists within the container image).

When an external volume is mounted it’s permissions are set to ROOT (UID 0), therefore unless the process inside the container is launched as root, it won’t have permission to access the mounted directory.

Proposed Workarounds on the Kuberentes side

  1. While creating pod, if it requires an EmptyDir volume, before starting containers, retrieve the USER from each container image (introspect JSON for each container image), if any of the containers are launching their main process as non-root, fail pod creation.
  2. While creating pod, if it requires an EmptyDir volume, before creating shared volume, Chown it to the USER of the first container that mounts the volume.
  • Problems with this approach:
    1. With Kubernetes a pod can contain multiple containers that share a volume, but each container could potentially run their processes with different users inside, meaning even if the owner of a volume was changed, unless the owner was changed to a group that all containers were aware of (and all relevant users were part of), the problem would still exist.
    2. Another interesting dimension to the problem is that running CHOWN on a shared volume from outside the containers could fail if the host machine does not have the same user as inside the container (container images can create a new user, that the host is unaware of, and have the entry process run as that user, but since that user does not exist on the host, CHOWN to that user from the host will fail).
      One work around for this is to share the etc/passwd file between the host and the container, but that is very limiting.
      Another potential workaround would be for the host to some how reach inside the container during initialization (before the shared volume is mounted), read the USER that the main process will start with and use the image “/etc/passwd” file to map the USER to UID, and CHOWN the shared volume on the host to that UID (CHOWN on the host would only fails if it doesn’t find a user string because it uses /etc/passwd to find the mapping, but it always succeeds with UIDs because it just sets the uint value directly without any lookup).

Both approaches feel to me like they are breaking a layer of abstraction by having Kubernetes reach into the container to figure out what user the main process would start as, and doing something outside the container with that information. I feel like the right approach would be for the containers themselves to CHOWN any "mounted volumes" during setup (after creating and setting user).

Thoughts?

@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali Dec 18, 2014

Member

@thockin, after talking to some folks, I think @carlossg's approach of explicitly specifying the user in the API would be the cleanest work around. I don't thing we can apply "hidden heuristics" without doing icky violation of abstractions (like reaching in to a container to figure out what username to use and then mounting the container's /etc/passwd file to figure out the associated UID).

Proposal to modify the API:

  • Extend the API for EmptyDir, GitRepo, and GCEPersistentDisk volumes to optionally specify a unsigned integer UID.
    • If the UID is specified, the host will change the owner of the directory to that UID and set the permissions to 750 (User: rwx, Group: r-x, World: ---) when the volume directory is created.
    • If the UID is not specified, the host will not change the owner, but set the permissions to 757 (User: rwx, Group: r-x, World: rxw), i.e. world writable, when the volume directory is created.
    • HostDir volumes would be left untouched, since those directories are not created by Kubernetes.
    • Require UID instead of username string so there are no problems if the user does exist on the host machine (issue 2.ii above).

Thoughts?

CC: @bgrant0607, @dchen1107, @lavalamp

Member

saad-ali commented Dec 18, 2014

@thockin, after talking to some folks, I think @carlossg's approach of explicitly specifying the user in the API would be the cleanest work around. I don't thing we can apply "hidden heuristics" without doing icky violation of abstractions (like reaching in to a container to figure out what username to use and then mounting the container's /etc/passwd file to figure out the associated UID).

Proposal to modify the API:

  • Extend the API for EmptyDir, GitRepo, and GCEPersistentDisk volumes to optionally specify a unsigned integer UID.
    • If the UID is specified, the host will change the owner of the directory to that UID and set the permissions to 750 (User: rwx, Group: r-x, World: ---) when the volume directory is created.
    • If the UID is not specified, the host will not change the owner, but set the permissions to 757 (User: rwx, Group: r-x, World: rxw), i.e. world writable, when the volume directory is created.
    • HostDir volumes would be left untouched, since those directories are not created by Kubernetes.
    • Require UID instead of username string so there are no problems if the user does exist on the host machine (issue 2.ii above).

Thoughts?

CC: @bgrant0607, @dchen1107, @lavalamp

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Dec 22, 2014

Member

I think adding UID to volumes is a hack and redundant. I'd rather we do
the right thing and get Docker to support supplemental group IDs.

moby/moby#9360

On Thu, Dec 18, 2014 at 3:13 PM, Saad Ali notifications@github.com wrote:

@thockin https://github.com/thockin, after talking to some folks, I
think @carlossg https://github.com/carlossg's approach of explicitly
specifying the user in the API would be the cleanest work around. I don't
thing we can apply "hidden heuristics" without doing icky violation of
abstractions (like reaching in to a container to figure out what username
to use and then mounting the container's /etc/passwd file to figure out
the associated UID).

Proposal to modify the API:

  • Extend the API for EmptyDir, GitRepo, and GCEPersistentDisk volumes
    to optionally specify a unsigned integer UID.
    • If the UID is specified, the host will change the owner of the
      directory to that UID and set the permissions to 750 (User: rwx,
      Group: r-x, World: ---) when the volume directory is created.
    • If the UID is not specified, the host will not change the owner,
      but set the permissions to 757 (User: rwx, Group: r-x, World: rxw),
      i.e. world writable, when the volume directory is created.
    • HostDir volumes would be left untouched, since those directories
      are not created by Kubernetes.
    • Require UID instead of username string so there are no problems
      if the user does exist on the host machine (issue 2.ii above).

Thoughts?

CC: @bgrant0607 https://github.com/bgrant0607, @dchen1107
https://github.com/dchen1107, @lavalamp https://github.com/lavalamp

Reply to this email directly or view it on GitHub
#2630 (comment)
.

Member

thockin commented Dec 22, 2014

I think adding UID to volumes is a hack and redundant. I'd rather we do
the right thing and get Docker to support supplemental group IDs.

moby/moby#9360

On Thu, Dec 18, 2014 at 3:13 PM, Saad Ali notifications@github.com wrote:

@thockin https://github.com/thockin, after talking to some folks, I
think @carlossg https://github.com/carlossg's approach of explicitly
specifying the user in the API would be the cleanest work around. I don't
thing we can apply "hidden heuristics" without doing icky violation of
abstractions (like reaching in to a container to figure out what username
to use and then mounting the container's /etc/passwd file to figure out
the associated UID).

Proposal to modify the API:

  • Extend the API for EmptyDir, GitRepo, and GCEPersistentDisk volumes
    to optionally specify a unsigned integer UID.
    • If the UID is specified, the host will change the owner of the
      directory to that UID and set the permissions to 750 (User: rwx,
      Group: r-x, World: ---) when the volume directory is created.
    • If the UID is not specified, the host will not change the owner,
      but set the permissions to 757 (User: rwx, Group: r-x, World: rxw),
      i.e. world writable, when the volume directory is created.
    • HostDir volumes would be left untouched, since those directories
      are not created by Kubernetes.
    • Require UID instead of username string so there are no problems
      if the user does exist on the host machine (issue 2.ii above).

Thoughts?

CC: @bgrant0607 https://github.com/bgrant0607, @dchen1107
https://github.com/dchen1107, @lavalamp https://github.com/lavalamp

Reply to this email directly or view it on GitHub
#2630 (comment)
.

@LuqmanSahaf

This comment has been minimized.

Show comment
Hide comment
@LuqmanSahaf

LuqmanSahaf Jan 13, 2015

@saad-ali I think HostDir should not be left untouched. Let's consider this: Hadoop on restart, restores the blocks from the directory it stores data in. If we use emptyDir, the container which restarted will get another directory and the previous data will be lost. And Hadoop requires the permissions and ownership of directory to be set to the user starting Hadoop (hdfs). If HostDir is not allowed to change permissions as per user, then similar use cases to this cannot be achieved. Please, comment.

LuqmanSahaf commented Jan 13, 2015

@saad-ali I think HostDir should not be left untouched. Let's consider this: Hadoop on restart, restores the blocks from the directory it stores data in. If we use emptyDir, the container which restarted will get another directory and the previous data will be lost. And Hadoop requires the permissions and ownership of directory to be set to the user starting Hadoop (hdfs). If HostDir is not allowed to change permissions as per user, then similar use cases to this cannot be achieved. Please, comment.

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Jan 14, 2015

Member

Define restart? Do you mean the container crashed and came back, or do you
mean the machine rebooted and a new pod was scheduled and expects to be
able to reclaim the disk space used by the previous pod? Or something else?

On Mon, Jan 12, 2015 at 11:56 PM, Luqman notifications@github.com wrote:

@saad-ali https://github.com/saad-ali I think HostDir should not be
left untouched. Let's consider this: Hadoop on restart, restores the blocks
from the directory it stores data in. If we use emptyDir, the container
which restarted will get another directory and the previous data will be
lost. And Hadoop requires the permissions and ownership of directory to be
set to the user starting Hadoop (hdfs). If HostDir is not allowed to change
permissions per user, then similar use cases to this cannot be achieved.
Please, comment.

Reply to this email directly or view it on GitHub
#2630 (comment)
.

Member

thockin commented Jan 14, 2015

Define restart? Do you mean the container crashed and came back, or do you
mean the machine rebooted and a new pod was scheduled and expects to be
able to reclaim the disk space used by the previous pod? Or something else?

On Mon, Jan 12, 2015 at 11:56 PM, Luqman notifications@github.com wrote:

@saad-ali https://github.com/saad-ali I think HostDir should not be
left untouched. Let's consider this: Hadoop on restart, restores the blocks
from the directory it stores data in. If we use emptyDir, the container
which restarted will get another directory and the previous data will be
lost. And Hadoop requires the permissions and ownership of directory to be
set to the user starting Hadoop (hdfs). If HostDir is not allowed to change
permissions per user, then similar use cases to this cannot be achieved.
Please, comment.

Reply to this email directly or view it on GitHub
#2630 (comment)
.

@LuqmanSahaf

This comment has been minimized.

Show comment
Hide comment
@LuqmanSahaf

LuqmanSahaf Jan 14, 2015

@thockin Restart could be anything. It could be after pod failure or container failure. Or the container could be restarted, after changing some configurations (Hadoop needs to be restarted after changes in configs). Does that answer?

LuqmanSahaf commented Jan 14, 2015

@thockin Restart could be anything. It could be after pod failure or container failure. Or the container could be restarted, after changing some configurations (Hadoop needs to be restarted after changes in configs). Does that answer?

@LuqmanSahaf

This comment has been minimized.

Show comment
Hide comment
@LuqmanSahaf

LuqmanSahaf Jan 14, 2015

This document mentions that when a pod is unbound, the emptyDir is deleted. In use case of Hadoop, the data might be essential and might be required when another pod of Hadoop comes back (or the container restarts). So, HostDir must be used to persist data even the pod is unbound. But Hadoop requires permissions to be set for the user for the data directory. Hope this explains.

LuqmanSahaf commented Jan 14, 2015

This document mentions that when a pod is unbound, the emptyDir is deleted. In use case of Hadoop, the data might be essential and might be required when another pod of Hadoop comes back (or the container restarts). So, HostDir must be used to persist data even the pod is unbound. But Hadoop requires permissions to be set for the user for the data directory. Hope this explains.

@saad-ali

This comment has been minimized.

Show comment
Hide comment
@saad-ali

saad-ali Jan 15, 2015

Member

With docker/libcontainer/pull/322, docker containers now allow specifying AdditionalGroups (supplementary group GIDs). So an updated proposal to handle shared volumes amongst different containers in a pod:

  • When creating EmptyDir, GitRepo, or GCEPersistentDisk volumes for a new pod, Kubelet will:
    1. Create a new linux group for the pod on the host machine
      • Group is created with the next available Group ID number (GID)
    2. Change the group of the new directory (on the host machine) to the newly created group.
    3. Set the permissions of the new directory (on the host machine) to 770 (User: rwx, Group: rwx, World: ---).
    4. For each docker container, pass in the GID of the new group as AdditionalGroups via docker container configs.
      • Still requires docker to support passing AdditionalGroups through to libcontainer (moby/moby#9360).
      • May require updating fsouza/go-dockerclient to support AdditionalGroups
  • When creating HostDir volumes for a new pod, Kubelet will:
    • Leave the volume untouched, since those directories are not created by Kubernetes.
    • @LuqmanSahaf: this is up for debate, but my thinking is that since Kubernetes does not create the HostDir, and since it may contain existing data, Kubernetes should not get in to the business of modifying it. We should leave it up to the creator and maintainer of the HostDir to modify it's ownership or permissions to allow containers to access it.
Member

saad-ali commented Jan 15, 2015

With docker/libcontainer/pull/322, docker containers now allow specifying AdditionalGroups (supplementary group GIDs). So an updated proposal to handle shared volumes amongst different containers in a pod:

  • When creating EmptyDir, GitRepo, or GCEPersistentDisk volumes for a new pod, Kubelet will:
    1. Create a new linux group for the pod on the host machine
      • Group is created with the next available Group ID number (GID)
    2. Change the group of the new directory (on the host machine) to the newly created group.
    3. Set the permissions of the new directory (on the host machine) to 770 (User: rwx, Group: rwx, World: ---).
    4. For each docker container, pass in the GID of the new group as AdditionalGroups via docker container configs.
      • Still requires docker to support passing AdditionalGroups through to libcontainer (moby/moby#9360).
      • May require updating fsouza/go-dockerclient to support AdditionalGroups
  • When creating HostDir volumes for a new pod, Kubelet will:
    • Leave the volume untouched, since those directories are not created by Kubernetes.
    • @LuqmanSahaf: this is up for debate, but my thinking is that since Kubernetes does not create the HostDir, and since it may contain existing data, Kubernetes should not get in to the business of modifying it. We should leave it up to the creator and maintainer of the HostDir to modify it's ownership or permissions to allow containers to access it.
@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Jan 15, 2015

Member

There's an important distinction between a container restarting and a pod
being removed. When a container restarts, the data in a normal emptyDir
volume is safe. when a pod is removed, it should be GONE. Leaving
Hostdata and expecting it to be there at some later point in time is
awkward at best.

All of this is more complicated as soon as user namespaces land.

On Wed, Jan 14, 2015 at 12:04 AM, Luqman notifications@github.com wrote:

This document
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/volumes.md#emptydir,
mentions that when a pod unbound, the emptyDir is deleted. In use case of
Hadoop, the data might be essential and might be required when another pod
of Hadoop comes back (or the container restarts). So, HostDir must be used
to persist data even the pod is unbound. But Hadoop requires permissions to
be set for the user for the data directory. Hope this explains.

Reply to this email directly or view it on GitHub
#2630 (comment)
.

Member

thockin commented Jan 15, 2015

There's an important distinction between a container restarting and a pod
being removed. When a container restarts, the data in a normal emptyDir
volume is safe. when a pod is removed, it should be GONE. Leaving
Hostdata and expecting it to be there at some later point in time is
awkward at best.

All of this is more complicated as soon as user namespaces land.

On Wed, Jan 14, 2015 at 12:04 AM, Luqman notifications@github.com wrote:

This document
https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/volumes.md#emptydir,
mentions that when a pod unbound, the emptyDir is deleted. In use case of
Hadoop, the data might be essential and might be required when another pod
of Hadoop comes back (or the container restarts). So, HostDir must be used
to persist data even the pod is unbound. But Hadoop requires permissions to
be set for the user for the data directory. Hope this explains.

Reply to this email directly or view it on GitHub
#2630 (comment)
.

@bgrant0607 bgrant0607 modified the milestone: v1.0 Feb 6, 2015

@brendandburns brendandburns removed this from the v1.0 milestone Apr 28, 2015

@krmayankk

This comment has been minimized.

Show comment
Hide comment
@krmayankk

krmayankk Apr 12, 2018

Contributor

/sig auth

Contributor

krmayankk commented Apr 12, 2018

/sig auth

@gitnik

This comment has been minimized.

Show comment
Hide comment
@gitnik

gitnik Apr 24, 2018

None of the solutions suggested are working for me.

YML:

apiVersion: apps/v1beta1 # for versions before 1.8.0 use apps/v1beta1
kind: Deployment
metadata:
  labels:
    tier: frontend
spec:
  selector:
    matchLabels:
      tier: frontend
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        tier: frontend
    spec:
      securityContext:
        fsGroup: 1000
        runAsUser: 0
      initContainers:
      - image: some-sftp-container
        name: sftp-mount-permission-fix
        command: ["sh", "-c", "chown -R <user> /mnt/permission-fix"]
        volumeMounts:
        - name: azure
          mountPath: /mnt/permission-fix
      containers:
      - image: some-sftp-container
        name: sftp-container
        ports:
        - containerPort: 22
          name: port_22
        volumeMounts:
        - name: azure
          mountPath: /home/<user>/data
      volumes:
        - name: azure
          azureFile:
            secretName: azure-secret
            shareName: sftp-share
            readOnly: false

Once the Pod is ready and I exec into the container and check the dirs, nothing has happened:

root@container:/# cd /home/<user>                                                                        
root@container:/home/<user># ls -als
total 8
4 drwxr-xr-x 3 root root 4096 Apr 24 18:45 .
4 drwxr-xr-x 1 root root 4096 Apr 24 18:45 ..
0 drwxr-xr-x 2 root root    0 Apr 22 21:32 data
root@container:/home/<user># cd data
root@container:/home/<user>/data# ls -als
total 1
1 -rwxr-xr-x 1 root root 898 Apr 24 08:55 fix.sh
0 -rwxr-xr-x 1 root root   0 Apr 22 22:27 test.json
root@container:/home/<user>/data# 

At some point I also had the runAsUser: 0 on the container itself. But that didn't work either. Any help would be much appreciated

gitnik commented Apr 24, 2018

None of the solutions suggested are working for me.

YML:

apiVersion: apps/v1beta1 # for versions before 1.8.0 use apps/v1beta1
kind: Deployment
metadata:
  labels:
    tier: frontend
spec:
  selector:
    matchLabels:
      tier: frontend
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        tier: frontend
    spec:
      securityContext:
        fsGroup: 1000
        runAsUser: 0
      initContainers:
      - image: some-sftp-container
        name: sftp-mount-permission-fix
        command: ["sh", "-c", "chown -R <user> /mnt/permission-fix"]
        volumeMounts:
        - name: azure
          mountPath: /mnt/permission-fix
      containers:
      - image: some-sftp-container
        name: sftp-container
        ports:
        - containerPort: 22
          name: port_22
        volumeMounts:
        - name: azure
          mountPath: /home/<user>/data
      volumes:
        - name: azure
          azureFile:
            secretName: azure-secret
            shareName: sftp-share
            readOnly: false

Once the Pod is ready and I exec into the container and check the dirs, nothing has happened:

root@container:/# cd /home/<user>                                                                        
root@container:/home/<user># ls -als
total 8
4 drwxr-xr-x 3 root root 4096 Apr 24 18:45 .
4 drwxr-xr-x 1 root root 4096 Apr 24 18:45 ..
0 drwxr-xr-x 2 root root    0 Apr 22 21:32 data
root@container:/home/<user># cd data
root@container:/home/<user>/data# ls -als
total 1
1 -rwxr-xr-x 1 root root 898 Apr 24 08:55 fix.sh
0 -rwxr-xr-x 1 root root   0 Apr 22 22:27 test.json
root@container:/home/<user>/data# 

At some point I also had the runAsUser: 0 on the container itself. But that didn't work either. Any help would be much appreciated

@gitnik

This comment has been minimized.

Show comment
Hide comment
@gitnik

gitnik Apr 24, 2018

Also running a chown afterwards didn't work

gitnik commented Apr 24, 2018

Also running a chown afterwards didn't work

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Apr 29, 2018

Member

@eatnumber1 if a group is in your supplemental groups, shouldn't you assume that it was intended that you have access to that group's resources? Dropping supplemental groups is saying "I know you told me I need this, but I don't want it" and then later complaining that you don't have it.

Regardless, I am now throughly lost as to what this bug means - there are too many followups that don't seem to be quite the same.

Can someone summarize for me? Or better, post a full repro with non-pretend image names?

Member

thockin commented Apr 29, 2018

@eatnumber1 if a group is in your supplemental groups, shouldn't you assume that it was intended that you have access to that group's resources? Dropping supplemental groups is saying "I know you told me I need this, but I don't want it" and then later complaining that you don't have it.

Regardless, I am now throughly lost as to what this bug means - there are too many followups that don't seem to be quite the same.

Can someone summarize for me? Or better, post a full repro with non-pretend image names?

@qianzhangxa

This comment has been minimized.

Show comment
Hide comment
@qianzhangxa

qianzhangxa May 1, 2018

@thockin IIUC, Nginx is not just dropping the supplementary groups, it is actually resetting it with what is configured in nginx.conf by calling initgroups.

qianzhangxa commented May 1, 2018

@thockin IIUC, Nginx is not just dropping the supplementary groups, it is actually resetting it with what is configured in nginx.conf by calling initgroups.

@qafro1

This comment has been minimized.

Show comment
Hide comment
@qafro1

qafro1 May 16, 2018

This worked for me.. part of the script.

spec:
      containers:
      - name: jenkins
        image: jenkins/jenkins
        ports:
        - containerPort: 50000
        - containerPort: 8080
        volumeMounts:
        - mountPath: /var/jenkins_home
          name: jenkins-home
        securityContext:
           fsGroup: 1000
           runAsUser: 0

qafro1 commented May 16, 2018

This worked for me.. part of the script.

spec:
      containers:
      - name: jenkins
        image: jenkins/jenkins
        ports:
        - containerPort: 50000
        - containerPort: 8080
        volumeMounts:
        - mountPath: /var/jenkins_home
          name: jenkins-home
        securityContext:
           fsGroup: 1000
           runAsUser: 0
@ekhaydarov

This comment has been minimized.

Show comment
Hide comment
@ekhaydarov

ekhaydarov Jun 22, 2018

the solutions arent ideal, now your containers are running as root which is against the security standards that k8s tries to get its users to impose.

it would be great if persistent volumes could be created with securityContext in mind, ie

kind: PersistentVolume
metadata:
  name: redis-data-pv
  namespace: data
  labels:
    app: redis
spec:
  securityContext:
    runAsUser: 65534
    fsGroup: 65534
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  claimRef:
    namespace: data
    name: redis-data
  hostPath:
    path: "/data"```

ekhaydarov commented Jun 22, 2018

the solutions arent ideal, now your containers are running as root which is against the security standards that k8s tries to get its users to impose.

it would be great if persistent volumes could be created with securityContext in mind, ie

kind: PersistentVolume
metadata:
  name: redis-data-pv
  namespace: data
  labels:
    app: redis
spec:
  securityContext:
    runAsUser: 65534
    fsGroup: 65534
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  claimRef:
    namespace: data
    name: redis-data
  hostPath:
    path: "/data"```
@robbyt

This comment has been minimized.

Show comment
Hide comment
@robbyt

robbyt Jun 23, 2018

As a workaround, I use a postStart lifecycle hook to chown the volume data to the correct permissions. This may not work for all applications, because the postStart lifecycle hook may run too late, but it's more secure than running the container as root and then fixing permissions and dropping root (or using gosu) in the entrypoint script.

robbyt commented Jun 23, 2018

As a workaround, I use a postStart lifecycle hook to chown the volume data to the correct permissions. This may not work for all applications, because the postStart lifecycle hook may run too late, but it's more secure than running the container as root and then fixing permissions and dropping root (or using gosu) in the entrypoint script.

@chicocvenancio

This comment has been minimized.

Show comment
Hide comment
@chicocvenancio

chicocvenancio Jun 23, 2018

@robbyt commented
As a workaround, I use a postStart lifecycle hook to chown the volume data to the correct permissions. This may not work for all applications, because the postStart lifecycle hook may run too late, but it's more secure than running the container as root and then fixing permissions and dropping root (or using gosu) in the entrypoint script.

We use initContainer, can a lifecycle hook have a different securityContext than the container itself?

chicocvenancio commented Jun 23, 2018

@robbyt commented
As a workaround, I use a postStart lifecycle hook to chown the volume data to the correct permissions. This may not work for all applications, because the postStart lifecycle hook may run too late, but it's more secure than running the container as root and then fixing permissions and dropping root (or using gosu) in the entrypoint script.

We use initContainer, can a lifecycle hook have a different securityContext than the container itself?

@mheese

This comment has been minimized.

Show comment
Hide comment
@mheese

mheese Jun 28, 2018

it's sad to see that after I have to do research again @chicocvenancio's option (which I use as well) is still apparently the only way to achieve this.

I understand where the problem is coming from and why we are so reluctant to change this, however, especially for Secret volumes changing the UID of volumes can be essential.

Here is an example from the PostgreSQL world: mount a TLS client cert for your application with a secret volume. As recommended everywhere, you don't run your container as root. However, the postgres connection library will instantaneously complain that the key is world readable. "No problem" you think and you change the mode / default mode to match the demanded 0600 (which is very reasonable to demand that as a client library). However, now this won't work either, because now root is the only user which can read this file.

The point I'm trying to make with this example is: groups don't come to the rescue here.

Now PostgreSQL is definitely a standard database and a product that a lot of people use. And asking for mounting client certs in a way with Kubernetes that do not require an initContainer as a workaround is not too much to ask imho.

So please, let's find some middle ground on this issue, and not just close it. 🙏

mheese commented Jun 28, 2018

it's sad to see that after I have to do research again @chicocvenancio's option (which I use as well) is still apparently the only way to achieve this.

I understand where the problem is coming from and why we are so reluctant to change this, however, especially for Secret volumes changing the UID of volumes can be essential.

Here is an example from the PostgreSQL world: mount a TLS client cert for your application with a secret volume. As recommended everywhere, you don't run your container as root. However, the postgres connection library will instantaneously complain that the key is world readable. "No problem" you think and you change the mode / default mode to match the demanded 0600 (which is very reasonable to demand that as a client library). However, now this won't work either, because now root is the only user which can read this file.

The point I'm trying to make with this example is: groups don't come to the rescue here.

Now PostgreSQL is definitely a standard database and a product that a lot of people use. And asking for mounting client certs in a way with Kubernetes that do not require an initContainer as a workaround is not too much to ask imho.

So please, let's find some middle ground on this issue, and not just close it. 🙏

@rezroo

This comment has been minimized.

Show comment
Hide comment
@rezroo

rezroo Jun 28, 2018

I'm trying to mount a ssh-key to user's .ssh directory with defaultMode 0400 so the application can ssh without a password. But that doesn't work if the secret is mounted as owned by root. Can you explain again how this can be solved using fsGroup or some other such mechanism?
I don't see a solution if PodSecurityPolicy is enabled so applications cannot run as root. Please advise.

rezroo commented Jun 28, 2018

I'm trying to mount a ssh-key to user's .ssh directory with defaultMode 0400 so the application can ssh without a password. But that doesn't work if the secret is mounted as owned by root. Can you explain again how this can be solved using fsGroup or some other such mechanism?
I don't see a solution if PodSecurityPolicy is enabled so applications cannot run as root. Please advise.

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Jul 2, 2018

Member

I am still hopelessly confused about this bug. There seems to be about 6 things being reported that all fail the same way but are different for different reasons.

  • nginx drops supplemental groups
  • ssh/postgres demands a particular mode for keys (and does not accept group-read)
  • something about running as root ?

Can someone explain, top-to-bottom the issue (or issues) in a way that I can follow without having to re-read the whole thread?

Keep in mind that Volumes are defined as a Pod-scope construct, and 2 different containers may run as 2 different UIDs. Using group perms is ideal for this, but if it is really not meeting needs, then let's fix it. But i need to understand it first.

@saad-ali for your radar

Member

thockin commented Jul 2, 2018

I am still hopelessly confused about this bug. There seems to be about 6 things being reported that all fail the same way but are different for different reasons.

  • nginx drops supplemental groups
  • ssh/postgres demands a particular mode for keys (and does not accept group-read)
  • something about running as root ?

Can someone explain, top-to-bottom the issue (or issues) in a way that I can follow without having to re-read the whole thread?

Keep in mind that Volumes are defined as a Pod-scope construct, and 2 different containers may run as 2 different UIDs. Using group perms is ideal for this, but if it is really not meeting needs, then let's fix it. But i need to understand it first.

@saad-ali for your radar

@rezroo

This comment has been minimized.

Show comment
Hide comment
@rezroo

rezroo Jul 6, 2018

@thockin My use-case is very simple. I'm injecting a secret (ssh key) into a container that is not running as root. The ssh key in /home/username/.ssh must have 400 permission which I can do, but must also be owned by the UID, or it won't work. I don't want to give this pod any root privilege of any sorts, so an init container that modifies the UID of the file does not work for me. How do I do it, other than including the ssh-key in the image?

rezroo commented Jul 6, 2018

@thockin My use-case is very simple. I'm injecting a secret (ssh key) into a container that is not running as root. The ssh key in /home/username/.ssh must have 400 permission which I can do, but must also be owned by the UID, or it won't work. I don't want to give this pod any root privilege of any sorts, so an init container that modifies the UID of the file does not work for me. How do I do it, other than including the ssh-key in the image?

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Jul 6, 2018

Member
Member

thockin commented Jul 6, 2018

@tallclair

This comment has been minimized.

Show comment
Hide comment
@tallclair

tallclair Jul 6, 2018

Member

@vikaschoudhary16 @derekwaynecarr this has some overlap / implications for user-namespace mapping.

Member

tallclair commented Jul 6, 2018

@vikaschoudhary16 @derekwaynecarr this has some overlap / implications for user-namespace mapping.

@pearj

This comment has been minimized.

Show comment
Hide comment
@pearj

pearj Jul 7, 2018

@rezroo a workaround to could be to simply make a copy of the ssh key in an Init container that way you’ll be able to control who owns the file right? Provided the init container runs as the same user that needs to read the ssh key later. It’s a little gross, but “should” work I think.

pearj commented Jul 7, 2018

@rezroo a workaround to could be to simply make a copy of the ssh key in an Init container that way you’ll be able to control who owns the file right? Provided the init container runs as the same user that needs to read the ssh key later. It’s a little gross, but “should” work I think.

@mattthelee

This comment has been minimized.

Show comment
Hide comment
@mattthelee

mattthelee Jul 11, 2018

@thockin another use-case: I'm trying to run an ELK statefulset. The pod has an Elasticsearch container running as non-root. I'm using a volumeClaimTemplate to hold the elasticsearch data. The container is unable to write to the volume though as it is not running as root. K8s v.1.9 . The pod has multiple containers and i don't want to use the same fsgroup for all of them.

mattthelee commented Jul 11, 2018

@thockin another use-case: I'm trying to run an ELK statefulset. The pod has an Elasticsearch container running as non-root. I'm using a volumeClaimTemplate to hold the elasticsearch data. The container is unable to write to the volume though as it is not running as root. K8s v.1.9 . The pod has multiple containers and i don't want to use the same fsgroup for all of them.

@mheese

This comment has been minimized.

Show comment
Hide comment
@mheese

mheese Jul 19, 2018

@pearj that's exactly the workaround that everybody uses ... and as the name says: it's a workaround, and should get addressed :) ... However, there is also a problem with this workaround: updated secrets will eventually get updated in mounted volumes which will make it possible to act on a file change in the running pod; you will miss out on this update when you copy it from an init container.

mheese commented Jul 19, 2018

@pearj that's exactly the workaround that everybody uses ... and as the name says: it's a workaround, and should get addressed :) ... However, there is also a problem with this workaround: updated secrets will eventually get updated in mounted volumes which will make it possible to act on a file change in the running pod; you will miss out on this update when you copy it from an init container.

@rezroo

This comment has been minimized.

Show comment
Hide comment
@rezroo

rezroo Jul 21, 2018

@pearj @mheese This work around wouldn't work for me anyway - because our PodSecurityPolicy doesn't allow containers to run as root - normal or init containers - doesn't matter - no one can access a secret owned by root as far as I can tell.

rezroo commented Jul 21, 2018

@pearj @mheese This work around wouldn't work for me anyway - because our PodSecurityPolicy doesn't allow containers to run as root - normal or init containers - doesn't matter - no one can access a secret owned by root as far as I can tell.

@RobertKrawitz

This comment has been minimized.

Show comment
Hide comment
@RobertKrawitz

RobertKrawitz Aug 7, 2018

Contributor

Yet another use case for this: I'm working on using XFS quotas (obviously, if XFS is in use) for ephemeral storage. The current enforcement mechanism for ephemeral storage is to run du periodically; in addition to being slow and rather coarse granularity, it can be faked out completely (create a file, keep a file descriptor open on it, and delete it). I intend to use quotas for two purposes:

  1. Hard cap usage across all containers of a pod.

  2. Retrieve the per-volume storage consumption without having to run du (which can bog down).

I can't use one quota for both purposes. The hard cap applies to all emptydir volumes, the writable layer, and logs, but a quota used for that purpose can't be used to retrieve storage used for each volume. So what I'd like to do is use project quotas in a non-enforcing way to retrieve per-volume storage consumption and either user or group quotas to implement the hard cap. To do that requires that each pod have a unique UID or single unique GID (probably a unique UID would be best, since there may be reasons why a pod needs to be in multiple groups).

(As regards group and project IDs being documented as mutually exclusive with XFS, that is in fact no longer the case, as I've verified. I've asked some XFS people about it, and they confirmed that the documentation is out of date and needs to be fixed; this restriction was lifted about 5 years ago.)

Contributor

RobertKrawitz commented Aug 7, 2018

Yet another use case for this: I'm working on using XFS quotas (obviously, if XFS is in use) for ephemeral storage. The current enforcement mechanism for ephemeral storage is to run du periodically; in addition to being slow and rather coarse granularity, it can be faked out completely (create a file, keep a file descriptor open on it, and delete it). I intend to use quotas for two purposes:

  1. Hard cap usage across all containers of a pod.

  2. Retrieve the per-volume storage consumption without having to run du (which can bog down).

I can't use one quota for both purposes. The hard cap applies to all emptydir volumes, the writable layer, and logs, but a quota used for that purpose can't be used to retrieve storage used for each volume. So what I'd like to do is use project quotas in a non-enforcing way to retrieve per-volume storage consumption and either user or group quotas to implement the hard cap. To do that requires that each pod have a unique UID or single unique GID (probably a unique UID would be best, since there may be reasons why a pod needs to be in multiple groups).

(As regards group and project IDs being documented as mutually exclusive with XFS, that is in fact no longer the case, as I've verified. I've asked some XFS people about it, and they confirmed that the documentation is out of date and needs to be fixed; this restriction was lifted about 5 years ago.)

@ksemaev

This comment has been minimized.

Show comment
Hide comment
@ksemaev

ksemaev Aug 9, 2018

@robbyt please tell how you managed to chown with postStart ? My container runs as nonroot user, so poststart still uses nonroot permissions and can't change permissions:

chown: /home/user/: Operation not permitted
, message: "chown: /home/user/: Permission denied\nchown: /home/user/: Operation not permitted

ksemaev commented Aug 9, 2018

@robbyt please tell how you managed to chown with postStart ? My container runs as nonroot user, so poststart still uses nonroot permissions and can't change permissions:

chown: /home/user/: Operation not permitted
, message: "chown: /home/user/: Permission denied\nchown: /home/user/: Operation not permitted

@Cobra1978

This comment has been minimized.

Show comment
Hide comment
@Cobra1978

Cobra1978 Aug 30, 2018

Same problem here: whe have somo Dockerized tomcat that run our web applicaition and we us jmx to monithor them, we want to serve jmxremote user and jmxremote password as secrets, but tomcat, which obviously doesen't run as root, want that jmx files are readable only for the user that run tomcat.

Addendum: whe have many tomcat, and want to run every of them as different users.

Cobra1978 commented Aug 30, 2018

Same problem here: whe have somo Dockerized tomcat that run our web applicaition and we us jmx to monithor them, we want to serve jmxremote user and jmxremote password as secrets, but tomcat, which obviously doesen't run as root, want that jmx files are readable only for the user that run tomcat.

Addendum: whe have many tomcat, and want to run every of them as different users.

@ludwikbukowski

This comment has been minimized.

Show comment
Hide comment
@ludwikbukowski

ludwikbukowski Sep 5, 2018

the same problem!

ludwikbukowski commented Sep 5, 2018

the same problem!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment