idempotent unmount from NodeUnstageVolume / NodeUnpublishVolume #1605

dobsonj · 2023-05-16T19:49:26Z

Is this a bug fix or adding new feature?

bug fix

What is this PR about? / Why do we need it?

We observed an issue where a workload mounting multiple EBS volumes gets stuck Terminating because unmount is failing on the node. It's racy and difficult to reproduce, but from the kubelet logs, the unmount succeeded once and about 0.1s later Unmounter.TearDownAt starts failing with exit status 32. Then the pod is left in the Terminating state until kubelet is restarted.

There was some work to make mounts idempotent in #1019 but it did not address the unmount path. Other CSI drivers use mountutils.CleanupMountPoint which checks if the path exists, if the mount is corrupted, and if the mount exists before attempting to call Unmount. This PR simply changes the Unmount calls to use CleanupMountPoint instead.

What testing is done?

Since this is not consistently reproducible, I resorted to manually umount and rmdir on the node before deleting the pod, and this left the pod in the Terminating state until kubelet was restarted.

Before this change:

$ cat test_pod_pvc.yaml
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: claim1
  namespace: testns
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: ubuntu
  namespace: testns
  labels:
    app: ubuntu
spec:
  volumes:
  - name: claim1-vol
    persistentVolumeClaim:
      claimName: claim1
  containers:
  - image: ubuntu
    command:
      - "sleep"
      - "604800"
    imagePullPolicy: IfNotPresent
    name: ubuntu
    volumeMounts:
      - mountPath: "/mnt/claim1"
        name: claim1-vol
    securityContext:
      allowPrivilegeEscalation: true
  restartPolicy: Always

$ oc apply -f test_pod_pvc.yaml
persistentvolumeclaim/claim1 created
pod/ubuntu created

$ oc get pods -o wide
NAME     READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
ubuntu   1/1     Running   0          115s   10.131.0.25   ip-10-0-216-169.us-east-2.compute.internal   <none>           <none>

$ oc debug node/ip-10-0-216-169.us-east-2.compute.internal
Starting pod/ip-10-0-216-169us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.216.169
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-5.1# mount | grep csi
/dev/nvme1n1 on /var/lib/kubelet/plugins/kubernetes.io/csi/ebs.csi.aws.com/64739da0c8f20ecdb049190c95e2587609871a2c69e89d5e2a0627bcb9b47cb6/globalmount type ext4 (rw,relatime,seclabel)
/dev/nvme1n1 on /var/lib/kubelet/pods/5ccf5697-f495-494f-9b5b-44eb3631266a/volumes/kubernetes.io~csi/pvc-69ce02b2-62a2-40b6-82b4-fc4ea9bb81a7/mount type ext4 (rw,relatime,seclabel)
sh-5.1# umount /var/lib/kubelet/pods/5ccf5697-f495-494f-9b5b-44eb3631266a/volumes/kubernetesio~csi/pvc-69ce02b2-62a2-40b6-82b4-fc4ea9bb81a7/mount && rmdir /var/lib/kubelet/pods/5ccf569-f495-494f-9b5b-44eb3631266a/volumes/kubernetes.io~csi/pvc-69ce02b2-62a2-40b6-82b4-fc4ea9bb8a7/mount

$ oc delete pod/ubuntu
pod "ubuntu" deleted
^C
$ oc get pods
NAME     READY   STATUS        RESTARTS   AGE
ubuntu   0/1     Terminating   0          46m

sh-5.1# journalctl -u kubelet | less
...
May 16 18:02:23 ip-10-0-216-169 kubenswrapper[1353]: E0516 18:02:23.165860    1353 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/ebs.csi.aws.com^vol-0fcba08aaa4ed58ee podName:5ccf5697-f495-494f-9b5b-44eb3631266a nodeName:}" failed. No retries permitted until 2023-05-16 18:02:24.165847203 +0000 UTC m=+5151.497100820 (durationBeforeRetry 1s). Error: UnmountVolume.TearDown failed for volume "claim1-vol" (UniqueName: "kubernetes.io/csi/ebs.csi.aws.com^vol-0fcba08aaa4ed58ee") pod "5ccf5697-f495-494f-9b5b-44eb3631266a" (UID: "5ccf5697-f495-494f-9b5b-44eb3631266a") : kubernetes.io/csi: Unmounter.TearDownAt failed: rpc error: code = Internal desc = Could not unmount "/var/lib/kubelet/pods/5ccf5697-f495-494f-9b5b-44eb3631266a/volumes/kubernetes.io~csi/pvc-69ce02b2-62a2-40b6-82b4-fc4ea9bb81a7/mount": unmount failed: exit status 32
...
sh-5.1# systemctl restart kubelet

$ oc get pods
No resources found in testns namespace.

After this change:

$ oc apply -f test_pod_pvc.yaml
persistentvolumeclaim/claim1 unchanged
pod/ubuntu created

$ oc get pods -o wide
NAME     READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
ubuntu   1/1     Running   0          99s   10.131.0.31   ip-10-0-216-169.us-east-2.compute.internal   <none>           <none>

$ oc debug node/ip-10-0-216-169.us-east-2.compute.internal
Starting pod/ip-10-0-216-169us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.216.169
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-5.1# mount | grep csi
/dev/nvme1n1 on /var/lib/kubelet/plugins/kubernetes.io/csi/ebs.csi.aws.com/64739da0c8f20ecdb049190c95e2587609871a2c69e89d5e2a0627bcb9b47cb6/globalmount type ext4 (rw,relatime,seclabel)
/dev/nvme1n2 on /var/lib/kubelet/plugins/kubernetes.io/csi/ebs.csi.aws.com/64739da0c8f20ecdb049190c95e2587609871a2c69e89d5e2a0627bcb9b47cb6/globalmount type ext4 (rw,relatime,seclabel)
/dev/nvme1n2 on /var/lib/kubelet/pods/e1308dfd-6d95-4b4e-8f90-09120dee0a0d/volumes/kubernetes.io~csi/pvc-69ce02b2-62a2-40b6-82b4-fc4ea9bb81a7/mount type ext4 (rw,relatime,seclabel)
sh-5.1# umount /var/lib/kubelet/pods/e1308dfd-6d95-4b4e-8f90-09120dee0a0d/volumes/kubernetes.io~csi/pvc-69ce02b2-62a2-40b6-82b4-fc4ea9bb81a7/mount && rmdir /var/lib/kubelet/pods/e1308dfd-6d95-4b4e-8f90-09120dee0a0d/volumes/kubernetes.io~csi/pvc-69ce02b2-62a2-40b6-82b4-fc4ea9bb81a7/mount

$ oc delete pod/ubuntu
pod "ubuntu" deleted
$ oc get pods
No resources found in testns namespace.

/cc @wongma7 @gnufied @jsafrane

Fix unmount idempotency

/kind bug
/sig storage
/triage accepted
/priority important-soon

dobsonj · 2023-05-16T20:07:14Z

Ah, #1597 just merged yesterday, which helps if the error returned contains 'not mounted', but does not help if it fails due to exit status 32 when the mount directory does not exist.
mountutils.CleanupMountPoint should address both cases. It's consistent with other drivers, and IMO is more reliable than checking for a specific error string.
/cc @ConnorJC3 @torredil @hanyuel
for reviews since it touches the same code

dobsonj · 2023-05-17T00:39:08Z

I had some discussion with @torredil and ran some extra tests with readOnlyRootFilesystem enabled to make sure that CleanupMountPoint handles that case correctly.

$ oc get daemonset -n openshift-cluster-csi-drivers aws-ebs-csi-driver-node -o yaml | grep readOnlyRootFilesystem
          readOnlyRootFilesystem: true

Simple create / delete:

$ oc apply -f test_pod_pvc.yaml
persistentvolumeclaim/claim1 unchanged
pod/ubuntu created

$ oc get pods
NAME     READY   STATUS    RESTARTS   AGE
ubuntu   1/1     Running   0          19s
$ oc delete pod/ubuntu
pod "ubuntu" deleted
$ oc get pods
No resources found in testns namespace

Unmounting and removing the directory before deleting the pod:

$ oc apply -f test_pod_pvc.yaml
persistentvolumeclaim/claim1 unchanged
pod/ubuntu created

$ oc get pods -o wide
NAME     READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
ubuntu   1/1     Running   0          48s   10.131.0.52   ip-10-0-216-169.us-east-2.compute.internal   <none>           <none>

$ oc debug node/ip-10-0-216-169.us-east-2.compute.internal
Starting pod/ip-10-0-216-169us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.216.169
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-5.1# mount | grep csi | grep \/mount
/dev/nvme1n1 on /var/lib/kubelet/pods/5ec4cb82-5617-4e19-ad5e-faf2641da175/volumes/kubernetes.io~csi/pvc-69ce02b2-62a2-40b6-82b4-fc4ea9bb81a7/mount type ext4 (rw,relatime,seclabel)
sh-5.1# umount /var/lib/kubelet/pods/5ec4cb82-5617-4e19-ad5e-faf2641da175/volumes/kubernetes.io~csi/pvc-69ce02b2-62a2-40b6-82b4-fc4ea9bb81a7/mount && rmdir /var/lib/kubelet/pods/5ec4cb82-5617-4e19-ad5e-faf2641da175/volumes/kubernetes.io~csi/pvc-69ce02b2-62a2-40b6-82b4-fc4ea9bb81a7/mount

$ oc delete pod/ubuntu
pod "ubuntu" deleted
$ oc get pods
No resources found in testns namespace.

Just unmounting (but not calling rmdir) before deleting the pod:

$ oc apply -f test_pod_pvc.yaml
persistentvolumeclaim/claim1 unchanged
pod/ubuntu created

$ oc get pods -o wide
NAME     READY   STATUS    RESTARTS   AGE   IP            NODE                                         NOMINATED NODE   READINESS GATES
ubuntu   1/1     Running   0          48s   10.131.0.53   ip-10-0-216-169.us-east-2.compute.internal   <none>           <none>

$ oc debug node/ip-10-0-216-169.us-east-2.compute.internal
Starting pod/ip-10-0-216-169us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.216.169
If you don't see a command prompt, try pressing enter.
sh-4.4# mount | grep csi | grep \/mount
/dev/nvme1n1 on /host/var/lib/kubelet/pods/0747dc0f-a384-48b4-8492-7ae8a0ca8c7d/volumes/kubernetes.io~csi/pvc-69ce02b2-62a2-40b6-82b4-fc4ea9bb81a7/mount type ext4 (rw,relatime,seclabel)
sh-4.4# umount /host/var/lib/kubelet/pods/0747dc0f-a384-48b4-8492-7ae8a0ca8c7d/volumes/kuberetes.io~csi/pvc-69ce02b2-62a2-40b6-82b4-fc4ea9bb81a7/mount

$ oc delete pod/ubuntu
pod "ubuntu" deleted
$ oc get pods
No resources found in testns namespace.

@torredil do you think I'm missing anything? If you have any concerns about removing the explicit 'not mounted' error check, then I could reduce this patch to a 1-liner to just replace the m.Unmount call with mountutils.CleanupMountPoint.

torredil

Thanks this lgtm as written, will leave it open for more feedback.

torredil · 2023-05-17T13:47:21Z

/test pull-aws-ebs-csi-driver-test-e2e-external-eks-windows

ConnorJC3 · 2023-05-17T15:03:42Z

If you have any concerns about removing the explicit 'not mounted' error check, then I could reduce this patch to a 1-liner to just replace the m.Unmount call with mountutils.CleanupMountPoint.

I think we should consider doing this. I'm not sure I'm 100% convinced that CleanupMountPoint always handles this correctly in our case (especially with weird race-y possible TOCTOU problems), and performing the check is practically free and guarantees that particularly nasty bug never re-emerges.

Otherwise, this lgtm as written. For what it's worth, I think that #1597 should fix 95+% of real world cases of this type of failure (and was released in Monday's 1.19.0 release), so I would recommend upgrading to 1.19.0 in the meantime to fix the problem you outlined in the issue description.

dobsonj · 2023-05-17T17:38:24Z

If you have any concerns about removing the explicit 'not mounted' error check, then I could reduce this patch to a 1-liner to just replace the m.Unmount call with mountutils.CleanupMountPoint.

I think we should consider doing this. I'm not sure I'm 100% convinced that CleanupMountPoint always handles this correctly in our case (especially with weird race-y possible TOCTOU problems), and performing the check is practically free and guarantees that particularly nasty bug never re-emerges.

Thanks a lot for the feedback, I made this change and re-tested the original scenario in my PR description, works as expected.

torredil · 2023-05-17T18:49:44Z

/test pull-aws-ebs-csi-driver-external-test

ConnorJC3 · 2023-05-18T15:10:46Z

/approve

k8s-ci-robot · 2023-05-18T15:10:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ConnorJC3

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ConnorJC3]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested review from gnufied, jsafrane and wongma7 May 16, 2023 19:49

dobsonj force-pushed the idempotent-unmount branch from c0665b3 to 0a06d01 Compare May 16, 2023 19:57

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels May 16, 2023

k8s-ci-robot requested review from ConnorJC3, hanyuel and torredil May 16, 2023 20:07

torredil approved these changes May 17, 2023

View reviewed changes

k8s-ci-robot assigned torredil May 17, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 17, 2023

idempotent unmount from NodeUnstageVolume / NodeUnpublishVolume

1b1e877

dobsonj force-pushed the idempotent-unmount branch from 0a06d01 to 1b1e877 Compare May 17, 2023 17:36

k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 17, 2023

hanyuel approved these changes May 18, 2023

View reviewed changes

k8s-ci-robot assigned hanyuel May 18, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 18, 2023

torredil approved these changes May 18, 2023

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 18, 2023

k8s-ci-robot merged commit 9b7b951 into kubernetes-sigs:master May 18, 2023
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

idempotent unmount from NodeUnstageVolume / NodeUnpublishVolume #1605

idempotent unmount from NodeUnstageVolume / NodeUnpublishVolume #1605

dobsonj commented May 16, 2023

dobsonj commented May 16, 2023

dobsonj commented May 17, 2023

torredil left a comment

torredil commented May 17, 2023

ConnorJC3 commented May 17, 2023

dobsonj commented May 17, 2023

torredil commented May 17, 2023

ConnorJC3 commented May 18, 2023

k8s-ci-robot commented May 18, 2023

idempotent unmount from NodeUnstageVolume / NodeUnpublishVolume #1605

idempotent unmount from NodeUnstageVolume / NodeUnpublishVolume #1605

Conversation

dobsonj commented May 16, 2023

dobsonj commented May 16, 2023

dobsonj commented May 17, 2023

torredil left a comment

Choose a reason for hiding this comment

torredil commented May 17, 2023

ConnorJC3 commented May 17, 2023

dobsonj commented May 17, 2023

torredil commented May 17, 2023

ConnorJC3 commented May 18, 2023

k8s-ci-robot commented May 18, 2023