Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Longhorn 1.2.0 - wrong volume permissions inside container / broken fsGroup #2964

Closed
bkupidura opened this issue Sep 1, 2021 · 20 comments
Assignees
Labels
backport/1.2.1 Require to backport to 1.2.1 release branch kind/bug kind/regression Regression which has worked before priority/0 Must be implement or fixed in this release (managed by PO) require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/doc Require updating the longhorn.io documentation
Milestone

Comments

@bkupidura
Copy link

bkupidura commented Sep 1, 2021

Describe the bug
After upgrade longhorn to 1.20, some container are unable to start corectly (e.g prometheus).

Looks like root cause is wrong Longhorn volume permisions inside container when container is not running as root.

Even with fsGroup specified, permissions are not set for volume.

To Reproduce

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: broken-longhorn
  namespace: default
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: longhorn

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: broken-longhorn
  namespace: default
  labels:
    app.kubernetes.io/name: broken-longhorn
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app.kubernetes.io/name: broken-longhorn
  template:
    metadata:
      labels:
        app.kubernetes.io/name: broken-longhorn
    spec:
      containers:
        - name: broken-longhorn
          image: ubuntu:focal-20210723
          command:
          - "/bin/sh"
          - "-ec"
          - |
            tail -f /dev/null
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - mountPath: /data
              name: data
      securityContext:
        runAsUser: 65534
        runAsNonRoot: true
        runAsGroup: 65534
        fsGroup: 65534
        fsGroupChangePolicy: Always
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: broken-longhorn
% kubectl get pvc broken-longhorn -o yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"name":"broken-longhorn","namespace":"default"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"1Gi"}},"storageClassName":"longhorn"}}
    pv.kubernetes.io/bind-completed: "yes"
    pv.kubernetes.io/bound-by-controller: "yes"
    volume.beta.kubernetes.io/storage-provisioner: driver.longhorn.io
  creationTimestamp: "2021-09-01T07:48:46Z"
  finalizers:
  - kubernetes.io/pvc-protection
  name: broken-longhorn
  namespace: default
  resourceVersion: "25249959"
  uid: 9ab5ca66-0794-4ad2-8aa5-73e96fa603fc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: longhorn
  volumeMode: Filesystem
  volumeName: pvc-9ab5ca66-0794-4ad2-8aa5-73e96fa603fc
status:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 1Gi
  phase: Bound

% kubectl get pv pvc-9ab5ca66-0794-4ad2-8aa5-73e96fa603fc -o yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/provisioned-by: driver.longhorn.io
  creationTimestamp: "2021-09-01T07:48:48Z"
  finalizers:
  - kubernetes.io/pv-protection
  - external-attacher/driver-longhorn-io
  name: pvc-9ab5ca66-0794-4ad2-8aa5-73e96fa603fc
  resourceVersion: "25250154"
  uid: d4db7690-4763-45bf-a5c7-9b7ae0b5d584
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 1Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: broken-longhorn
    namespace: default
    resourceVersion: "25249908"
    uid: 9ab5ca66-0794-4ad2-8aa5-73e96fa603fc
  csi:
    driver: driver.longhorn.io
    volumeAttributes:
      fromBackup: ""
      numberOfReplicas: "3"
      staleReplicaTimeout: "30"
      storage.kubernetes.io/csiProvisionerIdentity: 1630473362980-8081-driver.longhorn.io
    volumeHandle: pvc-9ab5ca66-0794-4ad2-8aa5-73e96fa603fc
  persistentVolumeReclaimPolicy: Delete
  storageClassName: longhorn
  volumeMode: Filesystem
status:
  phase: Bound

% kubectl get csidriver driver.longhorn.io -o yaml
apiVersion: storage.k8s.io/v1
kind: CSIDriver
metadata:
  annotations:
    driver.longhorn.io/kubernetes-version: v1.20.7+k3s1
    driver.longhorn.io/version: v1.2.0
  creationTimestamp: "2021-08-31T17:47:21Z"
  name: driver.longhorn.io
  resourceVersion: "24953648"
  uid: 274cd12a-6aca-47a9-bfd8-32261eb5033a
spec:
  attachRequired: true
  fsGroupPolicy: ReadWriteOnceWithFSType
  podInfoOnMount: true
  volumeLifecycleModes:
  - Persistent
$ kubectl exec -t -i broken-longhorn-c4ccbbb6f-79djg -- bash
nobody@broken-longhorn-c4ccbbb6f-79djg:/$ ls -la /data/
total 24
drwxr-xr-x 3 root root  4096 Sep  1 07:49 .
drwxr-xr-x 1 root root  4096 Sep  1 07:49 ..
drwx------ 2 root root 16384 Sep  1 07:49 lost+found
nobody@broken-longhorn-c4ccbbb6f-79djg:/$ touch /data/test
touch: cannot touch '/data/test': Permission denied

Expected behavior
When fsGroup is provided, it should be used to chown destination mount.

Environment:

  • Longhorn version: 1.20
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): helm
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: k3os/k3s
  • Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal): baremetal

Additional context

% kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:52:14Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.7+k3s1", GitCommit:"aa768cbdabdb44c95c5c1d9562ea7f5ded073bc0", GitTreeState:"clean", BuildDate:"2021-05-20T01:07:13Z", GoVersion:"go1.15.12", Compiler:"gc", Platform:"linux/amd64"}
@HubbeKing
Copy link

I can confirm this happens for automatically provisioned volumes as well (i.e. statefulsets)

[hubbe@ma3a ~]$ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:45:37Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.4", GitCommit:"3cce4a82b44f032d0cd1a1790e6d2f5a55d20aae", GitTreeState:"clean", BuildDate:"2021-08-11T18:10:22Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}

statefulset example:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: pvc-ownership-test
spec:
  selector:
    matchLabels:
      app: pvc-ownership-test
  serviceName: pvc-ownership-test
  template:
    metadata:
      labels:
        app: pvc-ownership-test
    spec:
      securityContext:
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
      containers:
        - image: docker.io/library/busybox:latest
          name: pvc-ownership-test
          command: ["/bin/sh"]
          args: ["-c", "sleep 6000000"]
          volumeMounts:
            - name: test
              mountPath: /test
  volumeClaimTemplates:
    - metadata:
        name: test
      spec:
        accessModes: [ "ReadWriteOnce" ]
        resources:
          requests:
            storage: 1Gi
        storageClassName: longhorn
[hubbe@ma3a ~]$ kubectl exec -it pvc-ownership-test-0 -- sh

/ $ id
uid=1000 gid=1000 groups=1000
/ $ ls -ahl /test
total 24K    
drwxr-xr-x    3 root     root        4.0K Sep  1 09:38 .
drwxr-xr-x    1 root     root        4.0K Sep  1 09:38 ..
drwx------    2 root     root       16.0K Sep  1 09:38 lost+found

@mstrent
Copy link

mstrent commented Sep 1, 2021

I seem to be hitting this as well.

Verified using repro examples above. And my workloads that were working fine yesterday on Longhorn 1.1 are now throwing permission denied on their Longhorn volumes with 1.2.

@bdobsonca
Copy link

Has anyone found a work around for this issue or can we expect an emergency patch as this seems to be a show stopper for me and is currently blocking us. Every pod deployed now gives a message similar to;

Memory limits: min=256m, max=512m 
/opt/startZk.sh: line 71: /opt/zookeeper/data/myid: Permission denied 

@mstrent
Copy link

mstrent commented Sep 1, 2021

Agreed, this is a showstopper. I had to roll back to 1.1.x. Fortunately this was only a dev environment and it's no biggie to blow away the storage.

@PhanLe1010
Copy link
Contributor

Thanks, guys! We are investigating this issue

@PhanLe1010 PhanLe1010 self-assigned this Sep 1, 2021
@yasker yasker added this to the v1.2.1 milestone Sep 1, 2021
@yasker yasker added kind/regression Regression which has worked before priority/0 Must be implement or fixed in this release (managed by PO) labels Sep 1, 2021
@yasker yasker changed the title [BUG] Longhorn 1.20 - wrong volume permissions inside container / broken fsGroup [BUG] Longhorn 1.2.0 - wrong volume permissions inside container / broken fsGroup Sep 1, 2021
@PhanLe1010
Copy link
Contributor

@bkupidura @HubbeKing @mstrent @bdobsonca

The problematic volumes are newly created after upgrading to Longhorn v1.2.0 or they have already existed since Longhorn v1.1.2?

@PhanLe1010
Copy link
Contributor

PhanLe1010 commented Sep 2, 2021

Our QA, @khushboo-rancher confirmed that this problem only happens with newly created volumes

@PhanLe1010
Copy link
Contributor

Workaround: manually add a new flag --default-fstype=ext4 to the csi-provisioner deployment in longhorn-system namespace. It should look like this:

...
      containers:
      - args:
        - --v=2
        - --csi-address=$(ADDRESS)
        - --timeout=1m50s
        - --leader-election
        - --leader-election-namespace=$(POD_NAMESPACE)
        - --default-fstype=ext4
        env:
 ...

Root cause:

  • The field fsType is missing in the PV created in Longhorn v1.2.0. And this prevents Kubernetes from changing the volume ownership and permissions, link
  • The problem of missing fsType in PVs seems to come from the csi-provisioner. In the old csi-provisioner (v1.6.0), the defaultFSType is hard-coded to ext4link.  However, in the new csi-provisioner (v2.1.2) it is empty by default, link.

@PhanLe1010 PhanLe1010 added the require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated label Sep 2, 2021
@innobead innobead added backport/1.2.1 Require to backport to 1.2.1 release branch backport/1.1.3 Require to backport to 1.1.3 release branch and removed backport/1.1.3 Require to backport to 1.1.3 release branch labels Sep 2, 2021
@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented Sep 2, 2021

Pre Ready-For-Testing Checklist

@PhanLe1010
Copy link
Contributor

Test steps:

  1. Install/Upgrade Longhorn using kubectl/helm
  2. Deploy statefulset at [BUG] Longhorn 1.2.0 - wrong volume permissions inside container / broken fsGroup #2964 (comment)
  3. Exec into the pod and verify that /test/ has correct permission and can read/write new files to /test

@khushboo-rancher
Copy link
Contributor

Verified with Longhorn-master and v1.2.x-head 09/08/2021

Validation - Pass

@mstrent
Copy link

mstrent commented Sep 9, 2021

Is pointing people to a workaround enough if 1.2.1 is still weeks away? This is a fatal enough flaw I'd think 1.2 should either be pulled or re-released with the fix.

@yasker
Copy link
Member

yasker commented Sep 10, 2021

@mstrent While this issue has indeed high impact:

  1. The workaround is non-intrusive and should solve the problem immediately without side effects.
  2. We're accelerating the release of v1.2.1 to less than 2 weeks from now (09/24), to fix this issue and a few other issues we've found after v1.2.0 release.

Retag or re-released a version is generally a bad idea, since there is no way to upgrade from and to the same version, and there won't be helpful to any existing users already hit the bug. It's going to be lots of things mixed up if we choose to do that.

Sorry for the inconvenience. v1.2.1 will be there soon.

@xeor
Copy link

xeor commented Sep 11, 2021

Retag is usually a no-go, but you can still release 1.2.1, then just go with 1.2.2 for whatever is "planned" for 1.2.1.
This is also fixed in the helm package, what about at least upgrading that? The version of the helm-chart and app-version is not the same.

My point is that you are a storage provider, and you should be able to release quick bug-fixes like this. 1.2.x is for patch-releases, they shouldn't be planned for, just release!

@samip5
Copy link

samip5 commented Sep 15, 2021

This bug still exists even with the workaround in #2964 (comment) or similar one.

LAST SEEN   TYPE      REASON        OBJECT                     MESSAGE
15m         Normal    error         helmrelease/influxdb       reconciliation failed: upgrade retries exhausted
6m43s       Warning   FailedMount   pod/influxdb-influxdb2-0   Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[data kube-api-access-hbpq2]: timed out waiting for the condition
2m25s       Warning   FailedMount   pod/influxdb-influxdb2-0   MountVolume.MountDevice failed for volume "pvc-fb7df8b3-0090-4fdf-b74f-309fd8056563" : rpc error: code = Internal desc = format of disk "/dev/longhorn/pvc-fb7df8b3-0090-4fdf-b74f-309fd8056563" failed: type:("ext4") target:("/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-fb7df8b3-0090-4fdf-b74f-309fd8056563/globalmount") options:("defaults") errcode:(exit status 1) output:(mke2fs 1.45.5 (07-Jan-2020)
/dev/longhorn/pvc-fb7df8b3-0090-4fdf-b74f-309fd8056563 is apparently in use by the system; will not make a filesystem here!
)

deployment for csi-provisioner:

$ k get deployment/csi-provisioner -n longhorn-system -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
    driver.longhorn.io/kubernetes-version: v1.21.4+k3s1
    driver.longhorn.io/version: v1.2.0
    longhorn.io/last-applied-tolerations: '[]'
  creationTimestamp: "2021-09-15T00:42:04Z"
  generation: 2
  labels:
    app: csi-provisioner
    longhorn.io/managed-by: longhorn-manager
  name: csi-provisioner
  namespace: longhorn-system
  resourceVersion: "3983839"
  uid: c12397e0-415d-4061-805b-ff49808c602c
spec:
  progressDeadlineSeconds: 600
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: csi-provisioner
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: csi-provisioner
    spec:
      containers:
      - args:
        - --v=2
        - --csi-address=$(ADDRESS)
        - --timeout=1m50s
        - --leader-election
        - --leader-election-namespace=$(POD_NAMESPACE)
        - --default-fstype=ext4
        env:
        - name: ADDRESS
          value: /csi/csi.sock
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
        image: k8s.gcr.io/sig-storage/csi-provisioner:v2.1.2
        imagePullPolicy: IfNotPresent
        name: csi-provisioner
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /csi/
          name: socket-dir
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: longhorn-service-account
      serviceAccountName: longhorn-service-account
      terminationGracePeriodSeconds: 30
      volumes:
      - hostPath:
          path: /var/lib/kubelet/plugins/driver.longhorn.io
          type: DirectoryOrCreate
        name: socket-dir
status:
  availableReplicas: 3
  conditions:
  - lastTransitionTime: "2021-09-15T00:42:45Z"
    lastUpdateTime: "2021-09-15T00:42:45Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2021-09-15T00:42:04Z"
    lastUpdateTime: "2021-09-15T00:58:38Z"
    message: ReplicaSet "csi-provisioner-669c8cc698" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 2
  readyReplicas: 3
  replicas: 3
  updatedReplicas: 3

@PhanLe1010
Copy link
Contributor

/dev/longhorn/pvc-fb7df8b3-0090-4fdf-b74f-309fd8056563 is apparently in use by the system

@samip5 I believe you have a different problem. Something is hijacking the Longhorn block device. There are some debugging steps here #2983

@samip5
Copy link

samip5 commented Sep 15, 2021

/dev/longhorn/pvc-fb7df8b3-0090-4fdf-b74f-309fd8056563 is apparently in use by the system

@samip5 I believe you have a different problem. Something is hijacking the Longhorn block device. There are some debugging steps here #2983

#1210 (comment):
Those instructions are not applicable, as there is no major:minor version befode device name? Oh, my bad

Culprit: multipathd

@GeroL
Copy link

GeroL commented Sep 17, 2021

I also encounter this bug. I tried to install Hashicorp Vault through a helm chart. This one does not allow custom fsGroup settings and I can see that the main directory is not set to the vault user.
I also tried setting the fsType through the storage class but it does not help.

@PhanLe1010
Copy link
Contributor

@GeroL
Did you try the workaround #2964 (comment)?

@mgcrea
Copy link

mgcrea commented Sep 20, 2021

In my opinion, if you can't/won't submit a quick patch release, the 1.2.0 should be pulled as it's broken in a non obvious way. In my case I thought there was some kind of issue with the GitLab helm chart, wasted literally hours on this, I'm afraid I won't be alone. Looking forward to upgrade to 1.2.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.2.1 Require to backport to 1.2.1 release branch kind/bug kind/regression Regression which has worked before priority/0 Must be implement or fixed in this release (managed by PO) require/auto-e2e-test Require adding/updating auto e2e test cases if they can be automated require/doc Require updating the longhorn.io documentation
Projects
Archived in project
Status: Closed
Development

No branches or pull requests