[BUG] RWX volume remains attached after workload deleted if it's upgraded from v1.4.2 #6139

yangchiu · 2023-06-16T02:51:50Z

Describe the bug (🐛 if you encounter this issue)

RWX volume remains attached and healthy after workload deleted if it's created in v1.4.2 and then upgraded to master-head or v1.5.x-head.

Directly create/delete a workload using RWX volume in master-head or v1.5.x-head doesn't have this issue.

To Reproduce

Steps to reproduce the behavior:

Install Longhorn v1.4.2

kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.4.2/deploy/longhorn.yaml

Create a statefulset using RWX volume

# rwx_statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: test-statefulset-rwx
  namespace: default
spec:
  selector:
    matchLabels:
      app: test-statefulset-rwx
  serviceName: test-statefulset-rwx
  replicas: 1
  template:
    metadata:
      labels:
        app: test-statefulset-rwx
    spec:
      terminationGracePeriodSeconds: 10
      containers:
        - image: busybox
          imagePullPolicy: IfNotPresent
          name: sleep
          args: ['/bin/sh', '-c', 'while true;do date;sleep 5; done']
          volumeMounts:
            - name: pod-data
              mountPath: /data
  volumeClaimTemplates:
    - metadata:
        name: pod-data
      spec:
        accessModes: ['ReadWriteMany']
        storageClassName: 'longhorn'
        resources:
          requests:
            storage: 1Gi

# kubectl apply -f rwx_statefulset.yaml

Upgrade Longhorn to master-head or v1.5.x-head, and upgrade the engine image of the RWX volume

kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml

Delete the statefulset

kubectl delete -f rwx_statefulset.yaml

The volume remains attached and healthy after workload deleted

Expected behavior

A clear and concise description of what you expected to happen.

Log or Support bundle

supportbundle_7c27605c-22df-493e-9e2a-b9135c68e20b_2023-06-16T02-29-21Z.zip

Environment

Longhorn version:
Installation method (e.g. Rancher Catalog App/Helm/Kubectl):
Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version:
- Number of management node in the cluster:
- Number of worker node in the cluster:
Node config
- OS type and version:
- CPU per node:
- Memory per node:
- Disk type(e.g. SSD/NVMe):
- Network bandwidth between the nodes:
Underlying Infrastructure (e.g. on AWS/GCE, EKS/GKE, VMWare/KVM, Baremetal):
Number of Longhorn volumes in the cluster:

Additional context

Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

derekbit · 2023-06-16T02:52:50Z

The detachment needs to wait for a while.

cc @PhanLe1010

derekbit · 2023-06-16T03:22:44Z

statefultset was deleted at Jun 16 03:08:39.
attacherDetacher started detaching volume at Jun 16 03:10:52.

Then, the upgrader ticket of the rwx volume was deleted and triggered volume detachment.

Jun 16 03:08:00 ubuntu2004 k3s[3432]: I0616 03:08:00.076821    3432 handler.go:232] Adding GroupVersion metrics.k8s.io v1beta1 to ResourceManager
Jun 16 03:08:39 ubuntu2004 k3s[3432]: I0616 03:08:39.618789    3432 stateful_set.go:458] "StatefulSet has been deleted" key="default/test-statefulset-rwx"
Jun 16 03:08:48 ubuntu2004 systemd[1]: Started Session 9 of user root.
Jun 16 03:09:00 ubuntu2004 k3s[3432]: I0616 03:09:00.076014    3432 handler.go:232] Adding GroupVersion metrics.k8s.io v1beta1 to ResourceManager
Jun 16 03:10:00 ubuntu2004 k3s[3432]: I0616 03:10:00.077053    3432 handler.go:232] Adding GroupVersion metrics.k8s.io v1beta1 to ResourceManager
Jun 16 03:10:01 ubuntu2004 CRON[9564]: (root) CMD (test -e /run/systemd/system || SERVICE_MODE=1 /sbin/e2scrub_all -A -r)
Jun 16 03:10:02 ubuntu2004 systemd[1]: Starting Cleanup of Temporary Directories...
Jun 16 03:10:02 ubuntu2004 systemd[1]: systemd-tmpfiles-clean.service: Succeeded.
Jun 16 03:10:02 ubuntu2004 systemd[1]: Finished Cleanup of Temporary Directories.
Jun 16 03:10:52 ubuntu2004 k3s[3432]: I0616 03:10:52.017032    3432 reconciler.go:273] "attacherDetacher.DetachVolume started" volume={AttachedVolume:{VolumeName:kubernetes.io/csi/driver.longhorn.io^pvc-36481246-9464-41f9-b877-0b75353a34f2 VolumeSpec:0xc009b6e630 NodeName:rancher60-worker2 PluginIsAttachable:true DevicePath: DeviceMountPath: PluginName: SELinuxMountContext:} MountedByNode:false DetachRequestedTime:2023-06-16 03:08:49.788385216 +0000 UTC m=+771.166259892}
Jun 16 03:10:52 ubuntu2004 k3s[3432]: I0616 03:10:52.018700    3432 operation_generator.go:1626] Verified volume is safe to detach for volume "pvc-36481246-9464-41f9-b877-0b75353a34f2" (UniqueName: "kubernetes.io/csi/driver.longhorn.io^pvc-36481246-9464-41f9-b877-0b75353a34f2") on node "rancher60-worker2"
Jun 16 03:10:54 ubuntu2004 k3s[3432]: I0616 03:10:54.301710    3432 operation_generator.go:516] DetachVolume.Detach succeeded for volume "pvc-36481246-9464-41f9-b877-0b75353a34f2" (UniqueName: "kubernetes.io/csi/driver.longhorn.io^pvc-36481246-9464-41f9-b877-0b75353a34f2") on node "rancher60-worker2"
Jun 16 03:10:58 ubuntu2004 k3s[3432]: time="2023-06-16T03:10:58Z" level=info msg="COMPACT compactRev=1587 targetCompactRev=2587 currentRev=3635"
Jun 16 03:10:58 ubuntu2004 k3s[3432]: time="2023-06-16T03:10:58Z" level=info msg="COMPACT deleted 860 rows from 1000 revisions in 40.747639ms - compacted to 2587/3635"

yangchiu · 2023-06-16T05:49:36Z

Just for information. It still remains attached and healthy after a more than 3 hours waiting.

PhanLe1010 · 2023-06-16T23:28:40Z

I see. This case can happen when there is no workload pod on the same node with the share manager pod. When user upgrade to 1.5.x, we create an upgrade AD ticket for the share manager's node to keep volume attached there. When the user scale down the workload, we don't cleanup that upgrade AD ticket because that ticket is on different node as the workload pod's node. As the result, no one is cleaning up the upgrade AD ticket and the volume stuck there forever.

Still figuring out how to fix this issue.

PhanLe1010 · 2023-06-17T00:50:40Z

Just for information. It still remains attached and healthy after a more than 3 hours waiting.

@yangchiu Do you have the support bundle after 3 hours?

derekbit · 2023-06-17T01:01:16Z

I see. This case can happen when there is no workload pod on the same node with the share manager pod. When user upgrade to 1.5.x, we create an upgrade AD ticket for the share manager's node to keep volume attached there. When the user scale down the workload, we don't cleanup that upgrade AD ticket because that ticket is on different node as the workload pod's node. As the result, no one is cleaning up the upgrade AD ticket and the volume stuck there forever.

It sounds we create a ticket for the node where the share manager pod is running.

Can we create multiple tickets for nodes where workloads are running instead?

PhanLe1010 · 2023-06-17T01:08:43Z

Can we create multiple tickets for nodes where workloads are running instead?

This is a potential approach. We need to be careful not to fall into some race conditions though. For example, at the time we decided to create a ticket for a running workload, it was running but when we finished creating the ticket the workload already stopped. Now there will be a leftover ticket that nobody is going to clean it up

PhanLe1010 · 2023-06-17T01:16:03Z

I am thinking if we should use the native Kubernetes volumeAttachment object to create ticket instead. The logic is:

When Longhorn manager start upgrade, it fetch all Kubernetes volumeAttachment objects.
For the VA that doesn't have deletion timestamp, create a corresponding Longhorn VA ticket with the type csi-attacher
If the Kubernetes VA has deletion timestamp immediately after we create Longhorn VA, it is OK. All Longhorn manager is down at this moment, Kubernetes will have to wait for Longhorn to come up first, then issue the ControllerUnpublish call. Longhorn CSI plugin can then clean up the Longhorn VA created in the upgrade path.
There is no Longhorn VA leftover this way

PhanLe1010 · 2023-06-17T01:24:23Z

Hmm. But we still have problem with the both of the above approaches:

if we don't create a ticket for the share-manager's node, Longhorn VA controller will detach the volume from that node (maybe) before the share-manager controller kicks in and create the ticket
On the flip side, if we create the ticket for the share-manager's node, no one is responsible for cleaning it up (a.k.a the issue in this ticket)

PhanLe1010 · 2023-06-17T01:35:49Z

How about this design?

In the upgrgade path:

When Longhorn manager starts upgrade, it fetches all Kubernetes volumeAttachment objects.
For the VA that doesn't have deletion timestamp, create a corresponding Longhorn VA ticket with the type csi-attacher
If the Kubernetes VA has deletion timestamp immediately after we create Longhorn VA, it is OK. All Longhorn manager is down at this moment, Kubernetes will have to wait for Longhorn to come up first, then issue the ControllerUnpublish call. Longhorn CSI plugin can then clean up the Longhorn VA created in the upgrade path.
If the volume is RWX and has csi-attacher ticket -> also create share-manager-controller ticket. This way the volume is not detached and share manager pod is not deleted during the upgrade.
If for some reasons, we created unnecessary share-manager-controller ticket. It is ok, share-manager controller will clean it up later on.
There is no Longhorn VA leftover this way and RWX is safe during the upgrade

Limitation:

The volume that doesn't have corresponding Kubernetes workload (A.K.A manually attached volume will be detached after upgrading). I think this limitation is acceptable

Another benefit of this design:

We get rid of longhorn-upgrade ticket type. The logic of AD controller is cleaner

WDTY @derekbit @shuo-wu @innobead ?

PhanLe1010 · 2023-06-17T01:42:08Z

Btw, this issue is side effect of this fix https://github.com/longhorn/longhorn-manager/pull/1993/files. We fixed the detaching issue but introduced the stuck attaching issue as a side effect 😄

derekbit · 2023-06-17T02:16:57Z

The design looks good to me.

We get rid of longhorn-upgrade ticket type. The logic of AD controller is cleaner

Agree with the removal. longhorn-upgrade is a bit confusing.

derekbit · 2023-06-17T02:31:53Z

Btw, this issue is side effect of this fix https://github.com/longhorn/longhorn-manager/pull/1993/files. We fixed the detaching issue but introduced the stuck attaching issue as a side effect 😄

Clarify it a bit. It is not a side effect of the detaching fix. The hasActiveWorkload was already removed from AD controller implantation before. I mistakenly introduced it into longhorn-manager before. It is not related to detaching issue, so I removed it in the end.

Hence, both RWX volumes attaching and detaching issues exist in AD controller design and implementation.

PhanLe1010 · 2023-06-17T02:38:17Z

You are right @derekbit it is already exist. The behavior just a little bit different but there is still same issue. Thank you for the clarification

innobead · 2023-06-17T06:37:32Z

I see. This case can happen when there is no workload pod on the same node with the share manager pod. When user upgrade to 1.5.x, we create an upgrade AD ticket for the share manager's node to keep volume attached there. When the user scale down the workload, we don't cleanup that upgrade AD ticket because that ticket is on different node as the workload pod's node. As the result, no one is cleaning up the upgrade AD ticket and the volume stuck there forever.

Still figuring out how to fix this issue.

@PhanLe1010 can't we just delete the ticket (upgrader) if the share manager is not required anymore given no workloads are using it (of course not maintenance mode), no matter what node the ticket is belonging to? or anything I missed? any obstacles?

innobead · 2023-06-17T06:44:24Z

BTW, the proposal I agreed as well, especially for having the correct attacker type.

For static volume attachment (not triggered by K8s volume attachment), why not just use AttacherTypeLonghornAPI, so they will not be detached after the upgrade, users still can detach them via UI or an unknown source like emptying the nodeId of the volume resource.

shuo-wu · 2023-06-19T04:03:36Z

I agree with using the correct/valid AD ticket type as well and removing type longhorn-upgrade.
To be honest, I prefer Derek's suggestion: Just directly creating (multiple) CSI-attacher tickets for volumes used by workload pods. As for the race condition you mentioned, I think the main blocker is the existing CSI plugin that may handle the detachment calls incorrectly (with the old way). If we can remove the old CSI plugin before jumping in the upgrade path, will everything work fine? The new Longhorn CSI driver deployer will be deployed when new longhorn manager is up hence we don't need to worry about it.
For the attached volumes that has no workload pod but containning non-empty spec node ID, I think we can add AttacherTypeLonghornAPI. But for the auto-attached volumes, it's too complicated to generate correct AD tickets hence we can ignore them.

PhanLe1010 · 2023-06-19T05:33:44Z

Thanks @innobead and @shuo-wu for the great feedbacks!

I agree with points 1 and 2 @shuo-wu mentioned above (which I understand that they are also the points that @innobead proposed. Please correct me if I am understand it wrong @innobead )

For the point 2 that @shuo-wu mentioned:

To be honest, I prefer Derek's suggestion: Just directly creating (multiple) CSI-attacher tickets for volumes used by workload pods.

Let's me evaluate more to see which one is better option: using workload pod state VS using Kuberntes VolumeAttachment state

innobead · 2023-06-19T05:48:09Z

@PhanLe1010 This seems a blocker for 1.5.0. Let's make this the highest priority to tackle first. Thanks.

shuo-wu · 2023-06-19T06:21:09Z

Point 3 I mentioned was similar to what David suggested but with extra concerns about the auto-attached volumes.

PhanLe1010 · 2023-06-19T21:37:23Z

Let's me evaluate more to see which one is better option: using workload pod state VS using Kubernetes VolumeAttachment state

I think the using Kubernetes VolumeAttachment state is better because:

Kubernetes VA and Longhorn CSI VA is one-to-one mapping so it is cleaner
If we use using workload pod state to make decision we will have this problem:
When we get the workload pod in the upgrade path, assume that the pod has deletionTimeStamp set ( the pod is being terminated). In this case, if we add the Longhorn VA, it might be wrong because no one will cleanup that ticket later. If we don't add Longhorn VA, Longhorn manager will detach the volume immediately when it comes up, this might be bad because we might pull the volume out unexpectedly when the NodeUnPublish and NodeUnstageVolume has not been executed yet. This is bad for filesystem.

longhorn-io-github-bot · 2023-06-19T21:44:32Z

PhanLe1010 · 2023-06-20T03:44:19Z

Test Plan

Check the problem in the issue description

Perform the test in the issue description but make sure the workload pod and share manager pod are on different nodes (by setting nodeName: for the stateful set)

Test attach/detach after upgrade

Install Longhorn v1.4.2
Create deployment of 1 pod that use a RWO volume, vol1
Create deployment of 1 pod that use a RWX volume, vol2
Create deployment of 2 pods that use a RWX volume, vol3
Create a volume and attach using Longhorn UI, vol4
Create a volume a not attach it, vol5
Upgrade Longhorn to master-head
Verify that vol1, vol2, and vol3 are not detached during and after the upgrade. Exec into each workload pod, verify that you can still read/write to volumes' mount point
Verify that vol4 is not detached during and after the upgrade
Verify that vol5 remains detached
Scale down the workload deployment to 0, and verify that vol1, vol2, and vol3 are detached
Scale up the workload deployment, and verify that vol1, vol2, and vol3 are attached
Detach/Attach the vol4 and vol5 using Longhorn UI (without the force option) and verify that they are attached/detached ok

Upgrade tests

Retest this upgrade test plan [IMPROVEMENT] Consolidate volume attach/detach implementation #3715 (comment)

chriscchien · 2023-06-21T06:38:41Z

Verified pass in

longhorn master(longhorn-manager ec130d)
longhorn v1.5.x(longhorn-manager c8092b)

After upgrade from v1.4.2 to master and upgrade from v1.4.2 to v1.5.x, perform test steps were passed, delete workload, the RWX volume become detached as well.

yangchiu added kind/bug reproduce/always 100% reproducible labels Jun 16, 2023

yangchiu modified the milestones: v1.5.1, v1.5.0 Jun 16, 2023

innobead added the area/volume-attach-detach Volume attach & detach related label Jun 16, 2023

innobead assigned PhanLe1010 Jun 16, 2023

innobead added the priority/0 Must be fixed in this release (managed by PO) label Jun 16, 2023

innobead assigned derekbit and PhanLe1010 and unassigned PhanLe1010 and derekbit Jun 16, 2023

PhanLe1010 mentioned this issue Jun 19, 2023

Remove the longhorn-upgrade attacher type and refactor the upgrade path longhorn/longhorn-manager#2006

Merged

chriscchien self-assigned this Jun 21, 2023

chriscchien closed this as completed Jun 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] RWX volume remains attached after workload deleted if it's upgraded from v1.4.2 #6139

[BUG] RWX volume remains attached after workload deleted if it's upgraded from v1.4.2 #6139

yangchiu commented Jun 16, 2023

derekbit commented Jun 16, 2023 •

edited

derekbit commented Jun 16, 2023 •

edited

yangchiu commented Jun 16, 2023

PhanLe1010 commented Jun 16, 2023 •

edited

PhanLe1010 commented Jun 17, 2023

derekbit commented Jun 17, 2023

PhanLe1010 commented Jun 17, 2023

PhanLe1010 commented Jun 17, 2023

PhanLe1010 commented Jun 17, 2023

PhanLe1010 commented Jun 17, 2023 •

edited

PhanLe1010 commented Jun 17, 2023 •

edited

derekbit commented Jun 17, 2023

derekbit commented Jun 17, 2023 •

edited

PhanLe1010 commented Jun 17, 2023 •

edited

innobead commented Jun 17, 2023 •

edited

innobead commented Jun 17, 2023

shuo-wu commented Jun 19, 2023

PhanLe1010 commented Jun 19, 2023 •

edited

innobead commented Jun 19, 2023

shuo-wu commented Jun 19, 2023

PhanLe1010 commented Jun 19, 2023 •

edited

longhorn-io-github-bot commented Jun 19, 2023 •

edited by PhanLe1010

PhanLe1010 commented Jun 20, 2023 •

edited

chriscchien commented Jun 21, 2023

[BUG] RWX volume remains attached after workload deleted if it's upgraded from v1.4.2 #6139

[BUG] RWX volume remains attached after workload deleted if it's upgraded from v1.4.2 #6139

Comments

yangchiu commented Jun 16, 2023

Describe the bug (🐛 if you encounter this issue)

To Reproduce

Expected behavior

Log or Support bundle

Environment

Additional context

derekbit commented Jun 16, 2023 • edited

derekbit commented Jun 16, 2023 • edited

yangchiu commented Jun 16, 2023

PhanLe1010 commented Jun 16, 2023 • edited

PhanLe1010 commented Jun 17, 2023

derekbit commented Jun 17, 2023

PhanLe1010 commented Jun 17, 2023

PhanLe1010 commented Jun 17, 2023

PhanLe1010 commented Jun 17, 2023

PhanLe1010 commented Jun 17, 2023 • edited

PhanLe1010 commented Jun 17, 2023 • edited

derekbit commented Jun 17, 2023

derekbit commented Jun 17, 2023 • edited

PhanLe1010 commented Jun 17, 2023 • edited

innobead commented Jun 17, 2023 • edited

innobead commented Jun 17, 2023

shuo-wu commented Jun 19, 2023

PhanLe1010 commented Jun 19, 2023 • edited

innobead commented Jun 19, 2023

shuo-wu commented Jun 19, 2023

PhanLe1010 commented Jun 19, 2023 • edited

longhorn-io-github-bot commented Jun 19, 2023 • edited by PhanLe1010

Pre Ready-For-Testing Checklist

PhanLe1010 commented Jun 20, 2023 • edited

Test Plan

Check the problem in the issue description

Test attach/detach after upgrade

Upgrade tests

chriscchien commented Jun 21, 2023

derekbit commented Jun 16, 2023 •

edited

derekbit commented Jun 16, 2023 •

edited

PhanLe1010 commented Jun 16, 2023 •

edited

PhanLe1010 commented Jun 17, 2023 •

edited

PhanLe1010 commented Jun 17, 2023 •

edited

derekbit commented Jun 17, 2023 •

edited

PhanLe1010 commented Jun 17, 2023 •

edited

innobead commented Jun 17, 2023 •

edited

PhanLe1010 commented Jun 19, 2023 •

edited

PhanLe1010 commented Jun 19, 2023 •

edited

longhorn-io-github-bot commented Jun 19, 2023 •

edited by PhanLe1010

PhanLe1010 commented Jun 20, 2023 •

edited