[BUG] The feature of auto remount read only volume not work on a single node cluster. #7843

chriscchien · 2024-02-05T08:03:35Z

Describe the bug

After #6386 implemented, Longhorn will automatically remount read-only RWO volume to read-write. It worked well on multi nodes cluster, the volume will remount in few seconds and can write data again.

But in single node cluster it seems not worked properly, after change volume to read only, after 20 minutes, the volume stiil can not write data.

To Reproduce

Create single node cluster curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.28.5+k3s1 sh -s - --write-kubeconfig-mode 644
Deploy Longhorn master-head
Create a single replica volume, attach to node with deployment, record the PVC name
Find the PVC location by command find /var/lib -name ${PVC name}
Remount PVC to read only by below command (replace below command by result of step 4, do not remove /mount in the command end )

mount -o remount,ro /var/lib/kubelet/pods/39dad47a-3724-4f3e-9385-c38c90e3cd3b/volumes/kubernetes.io~csi/pvc-617eee44-df06-4c37-9575-d5f2b92c0690/mount

The deployment pod kept recreate, but volume still attached, write data in pod, the file system still read-only
The longhorn-manager log size is 17 MB after extracted from support bundle and can see a bunch of below:

2024-02-05T07:26:33.987587217Z time="2024-02-05T07:26:33Z" level=info msg="Current time is not 5 seconds after request remount, requeue the pod test-deployment-754dd9fc66-hhcls to handle it later" func="controller.(*KubernetesPodController).handlePodDeletionIfVolumeRequestRemount" file="kubernetes_pod_controller.go:343" controller=longhorn-kubernetes-pod node=ip-172-31-32-43

After 20 minutes passed by, the volume still read only and deployment pod still kept recreateing.

Expected behavior

Auto remount read only volume work on a single node cluster.

Support bundle for troubleshooting

supportbundle_b3abd9ae-8fc9-44f4-ad95-2deceffe927b_2024-02-05T07-26-29Z.zip

Environment

Longhorn version: master-head
Impacted volume (PV): pvc-12b01b25-0f09-4f88-8a76-96a8bce90a41
Installation method (e.g. Rancher Catalog App/Helm/Kubectl): kubectl
Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: v1.28.5+k3s1
- Number of control plane nodes in the cluster: 1
- Number of worker nodes in the cluster: 1
Node config
- OS type and version: Ubuntu 22.04, 100 Gi, t2.xlarge
Number of Longhorn volumes in the cluster: 1

Additional context

N/A

The text was updated successfully, but these errors were encountered:

ChanYiLin · 2024-02-05T13:18:43Z

I can reproduce the issue.

If there is only one node or the pod keeps recreating on the same node
Kubernetes won't detach and reattach the volume to the node, because it is on the same node
It only remount the global mount point into the newly created pod.
But the global mount point on the host would still be read-only.

This is a day-1 issue, we trust that when Kubernetes recreates the pod it will reattach the volume but it is not true in this case.
cc @innobead @derekbit

ChanYiLin · 2024-02-05T13:21:14Z

Before volume remount feature

pod will stuck in read only

After volume remount feature (Only in one node or statefulset which pod always runs on the same node)

pod will keep restarting and still in read only
manager's log will be full of msg="Current time is not 5 seconds after request remount, requeue the pod test-deployment-754dd9fc66-hhcls to handle it later" logs

Workaround

Scale down the deployment to 0 so the volume is truly detach
Scale it back to reattach

ChanYiLin · 2024-02-06T04:58:52Z

After discussed with @derekbit
We can check if the current global mount path is read-only here to request re-stage Volume
cc @innobead

ChanYiLin · 2024-02-06T10:57:11Z

After investigation, is not what we originally thought
Need to reinvestigate

ChanYiLin · 2024-02-06T11:32:51Z

Note

The root cause is

When the mount point becomes read-only
We then try to delete the current pod and let the deployment to recreate the pod
When the previous pod still under terminating, it won't call UnPublishVolume so the mount of the pod is still there
But the next pod is now starting and call PublishVolume which will try to remount the globalmount
However, since the previous pod mount is there and ro, the remount of globalmount will still be ro
So even the pod is recreated the mount point is still ro

/dev/longhorn/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 on /host/var/lib/kubelet/pods/4e7d6e15-e754-4755-8948-079b
c98a8509/volumes/kubernetes.io~csi/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182/mount type ext4 (ro,relatime)
/dev/longhorn/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 on /host/var/lib/kubelet/plugins/kubernetes.io/csi/driver.
longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount type ext4 (ro,relatime)

Workaround:

Scale down to 0 wait for pod to be terminated and cleaned up and set the replicas back to 1

longhorn-io-github-bot · 2024-02-07T05:26:43Z

Pre Ready-For-Testing Checklist

Where is the reproduce steps/test steps documented?
The reproduce steps/test steps are at:

Create single node cluster curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.28.5+k3s1 sh -s - --write-kubeconfig-mode 644
Deploy Longhorn master-head
Create a single replica volume, attach to node with deployment, record the PVC name
Find the PVC mount location mount | grep "/dev/longhorn/${VOLUME_NAME}"
/host/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/${VOLUME_NAME_HASH}/globalmount
Remount PVC to read only by below command (remember the path ends with "globalmount" )

mount -o remount,ro /host/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/${VOLUME_NAME_HASH}/globalmount

The engine condition should back to FaileSystemReadOnly: false and the IO works in the workload

PRs:
- feat(volume): add remount read only volume grpc longhorn-instance-manager#392
- feat(volume): remount volume if it becomes read only longhorn-manager#2550

ejweber · 2024-02-07T15:57:58Z

I am a bit confused on a couple of points. Can you help to clarify?

When the previous pod still under terminating, it won't call UnPublishVolume so the mount of the pod is still there

I understand that UnpulishVolume hasn't been called yet, but it seems to me the real problem is that UnstageVolume hasn't been called yet. (Of course, it can't be until after UnpublishVolume.)

However, since the previous pod mount is there and ro, the remount of globalmount will still be ro

I don't get the cause and effect here. It sounds like you are implying the following, but I don't see how the bind mount can affect the global mount.

CAUSE: Previous pod mount still exists and is read only.
EFFECT: Global mount is still read only.

Is the following flow accurate?

CAUSE: Previous pod mount still exists.
EFFECT1: UnstageVolume was not called.
EFFECT2: Global mount is still read only.
EFFECT3: StageVolume is not called (volume is already staged).
EFFECT4: Global mount is still read only.

ChanYiLin · 2024-02-07T18:23:30Z

Hi @ejweber
The story is as follow

Intentionally mounted the globalmount to read only, then the bind mount would also become read only. The workload then suffered from I/O error because it was now read only

/dev/longhorn/${VOLUME} on /host/var/lib/kubelet/pods/${OLDPOD}/volumes/kubernetes.io~csi/${PVC}/mount type ext4 (ro,relatime)
/dev/longhorn/${VOLUME} on /host/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/${VOLUME}/globalmount type ext4 (ro,relatime)

The original implementation was to delete the pod to let the deployment recreate the pod, so it would remount the volume to the pod.
However, when the previous pod, lets say PodOld, was still under terminating and the bind mount was still there, the new pod
, PodNew, was creating. Because deployment recreates the pod when it is aware that the old pod is terminating.
PodNew became running even before the PodOld was terminated, the container of PodOld was still there and the bind mount was also there. (Maybe because grace period deletion time delay and the new pod just started too fast)
That is, PodNew triggered NodePublishVolume before PodOld triggered NodeUnPublishVolume
- From the log I didn't see it triggered NodeStageVolume, I think it was because the globalmount was still mounted.
When the PodNew triggered NodePublishVolume to bind mount the globlalmount to the pod, we actually verified if the globalmount was healthy here by checking if the mount can perform I/O ensureMountPoint().
If it is not healthy we unmount the mount point and call NodeUnstageVolume => NodeStageVolume to remount the globalmount. And the globalmount was unhealthy because it was read only

The logs matches the steps above. (There are two pods)

# PodNew cb3d3b82-58e2-4d93-be58-6394018da53a triggered NodePublishVolume
time="2024-02-06T10:03:57Z" level=info msg="NodePublishVolume: req: {\"staging_target_path\":\"/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount\",\"target_path\":\"/var/lib/kubelet/pods/cb3d3b82-58e2-4d93-be58-6394018da53a/volumes/kubernetes.io~csi/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182/mount\",\"volume_capability\":{\"AccessType\":{\"Mount\":{\"fs_type\":\"ext4\"}},\"access_mode\":{\"mode\":1}},\"volume_context\":{\"csi.storage.k8s.io/ephemeral\":\"false\",\"csi.storage.k8s.io/pod.name\":\"test-deployment-fd47b4f4-v8gtn\",\"csi.storage.k8s.io/pod.namespace\":\"default\",\"csi.storage.k8s.io/pod.uid\":\"cb3d3b82-58e2-4d93-be58-6394018da53a\",\"csi.storage.k8s.io/serviceAccount.name\":\"default\",\"dataLocality\":\"disabled\",\"fromBackup\":\"\",\"fsType\":\"ext4\",\"numberOfReplicas\":\"3\",\"staleReplicaTimeout\":\"30\",\"storage.kubernetes.io/csiProvisionerIdentity\":\"1707213517288-841-driver.longhorn.io\"},\"volume_id\":\"pvc-2d7f3352-f549-4565-a4b8-61ef7121a182\"}" func=csi.logGRPC file="server.go:132"
time="2024-02-06T10:03:57Z" level=info msg="NodePublishVolume is called with req volume_id:\"pvc-2d7f3352-f549-4565-a4b8-61ef7121a182\" staging_target_path:\"/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount\" target_path:\"/var/lib/kubelet/pods/cb3d3b82-58e2-4d93-be58-6394018da53a/volumes/kubernetes.io~csi/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182/mount\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:\"csi.storage.k8s.io/ephemeral\" value:\"false\" > volume_context:<key:\"csi.storage.k8s.io/pod.name\" value:\"test-deployment-fd47b4f4-v8gtn\" > volume_context:<key:\"csi.storage.k8s.io/pod.namespace\" value:\"default\" > volume_context:<key:\"csi.storage.k8s.io/pod.uid\" value:\"cb3d3b82-58e2-4d93-be58-6394018da53a\" > volume_context:<key:\"csi.storage.k8s.io/serviceAccount.name\" value:\"default\" > volume_context:<key:\"dataLocality\" value:\"disabled\" > volume_context:<key:\"fromBackup\" value:\"\" > volume_context:<key:\"fsType\" value:\"ext4\" > volume_context:<key:\"numberOfReplicas\" value:\"3\" > volume_context:<key:\"staleReplicaTimeout\" value:\"30\" > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1707213517288-841-driver.longhorn.io\" > " func="csi.(*NodeServer).NodePublishVolume" file="node_server.go:82" component=csi-node-server function=NodePublishVolume
time="2024-02-06T10:03:57Z" level=info msg="Volume pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 using user and longhorn provided ext4 fs creation params: -b4096" func="csi.(*NodeServer).getMounter" file="node_server.go:841"
# PodNew verified if the globalmount was healthy
time="2024-02-06T10:03:57Z" level=info msg="Trying to ensure mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount" func=csi.ensureMountPoint file="util.go:296"
time="2024-02-06T10:03:57Z" level=info msg="Mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount try opening and syncing dir to make sure it's healthy" func=csi.ensureMountPoint file="util.go:304"
time="2024-02-06T10:03:57Z" level=warning msg="Mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount was identified as corrupt by opening and syncing" func=csi.ensureMountPoint file="util.go:306" error="open /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount/.longhorn-volume-mount-point-test.tmp: read-only file system"
# PodNew tried to umount the corrupt mount point and re-stage the volume
time="2024-02-06T10:03:57Z" level=info msg="Trying to unmount potential mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount" func=csi.unmount file="util.go:353"
time="2024-02-06T10:03:57Z" level=info msg="Unmounted existing corrupt mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount" func=csi.ensureMountPoint file="util.go:318"
time="2024-02-06T10:03:57Z" level=error msg="Staging target path /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount is no longer valid for volume pvc-2d7f3352-f549-4565-a4b8-61ef7121a182" func="csi.(*NodeServer).NodePublishVolume" file="node_server.go:183" component=csi-node-server error="<nil>" function=NodePublishVolume
time="2024-02-06T10:03:57Z" level=warning msg="Calling NodeUnstageVolume for volume pvc-2d7f3352-f549-4565-a4b8-61ef7121a182" func="csi.(*NodeServer).NodePublishVolume" file="node_server.go:189" component=csi-node-server function=NodePublishVolume
# PodNew manually triggered NodeUnstageVolume in our csi
time="2024-02-06T10:03:57Z" level=info msg="NodeUnstageVolume is called with req volume_id:\"pvc-2d7f3352-f549-4565-a4b8-61ef7121a182\" staging_target_path:\"/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount\" " func="csi.(*NodeServer).NodeUnstageVolume" file="node_server.go:553" component=csi-node-server function=NodeUnstageVolume
time="2024-02-06T10:03:57Z" level=info msg="Trying to force unmount potential mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount" func=csi.unmount file="util.go:350"
time="2024-02-06T10:03:57Z" level=info msg="Trying to cleanup mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182" func=csi.cleanupMountPoint file="util.go:372"
time="2024-02-06T10:03:57Z" level=info msg="Trying to force unmount potential mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182" func=csi.unmount file="util.go:350"
time="2024-02-06T10:03:57Z" level=info msg="No need for unmount not a mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182" func=csi.unmount file="util.go:362"
time="2024-02-06T10:03:57Z" level=info msg="Cleaned up mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182" func=csi.cleanupMountPoint file="util.go:378"
W0206 10:03:57.162254 1062985 mount_helper_common.go:34] Warning: mount cleanup skipped because path does not exist: /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182
time="2024-02-06T10:03:57Z" level=warning msg="Device /dev/mapper/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 is not an active LUKS device" func=crypto.DeviceEncryptionStatus file="crypto.go:171" error="failed to execute: /usr/bin/nsenter [nsenter --mount=/proc/1062858/ns/mnt --ipc=/proc/1062858/ns/ipc cryptsetup status pvc-2d7f3352-f549-4565-a4b8-61ef7121a182], output /dev/mapper/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 is inactive.\n, stderr : exit status 4"
time="2024-02-06T10:03:57Z" level=info msg="Volume pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 unmounted from node path /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount" func="csi.(*NodeServer).NodeUnstageVolume" file="node_server.go:602" component=csi-node-server function=NodeUnstageVolume
# PodNew manually triggered NodeStageVolume in our csi
time="2024-02-06T10:03:57Z" level=warning msg="Calling NodeStageVolume for volume pvc-2d7f3352-f549-4565-a4b8-61ef7121a182" func="csi.(*NodeServer).NodePublishVolume" file="node_server.go:195" component=csi-node-server function=NodePublishVolume
time="2024-02-06T10:03:57Z" level=info msg="NodeStageVolume is called with req volume_id:\"pvc-2d7f3352-f549-4565-a4b8-61ef7121a182\" staging_target_path:\"/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:\"csi.storage.k8s.io/ephemeral\" value:\"false\" > volume_context:<key:\"csi.storage.k8s.io/pod.name\" value:\"test-deployment-fd47b4f4-v8gtn\" > volume_context:<key:\"csi.storage.k8s.io/pod.namespace\" value:\"default\" > volume_context:<key:\"csi.storage.k8s.io/pod.uid\" value:\"cb3d3b82-58e2-4d93-be58-6394018da53a\" > volume_context:<key:\"csi.storage.k8s.io/serviceAccount.name\" value:\"default\" > volume_context:<key:\"dataLocality\" value:\"disabled\" > volume_context:<key:\"fromBackup\" value:\"\" > volume_context:<key:\"fsType\" value:\"ext4\" > volume_context:<key:\"numberOfReplicas\" value:\"3\" > volume_context:<key:\"staleReplicaTimeout\" value:\"30\" > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1707213517288-841-driver.longhorn.io\" > " func="csi.(*NodeServer).NodeStageVolume" file="node_server.go:376" component=csi-node-server function=NodeStageVolume
time="2024-02-06T10:03:57Z" level=info msg="Volume pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 using user and longhorn provided ext4 fs creation params: -b4096" func="csi.(*NodeServer).getMounter" file="node_server.go:841"
time="2024-02-06T10:03:57Z" level=info msg="Volume pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 device /dev/longhorn/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 contains filesystem of format ext4" func="csi.(*NodeServer).NodeStageVolume" file="node_server.go:462" component=csi-node-server function=NodeStageVolume
time="2024-02-06T10:03:57Z" level=info msg="Trying to ensure mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount" func=csi.ensureMountPoint file="util.go:296"
time="2024-02-06T10:03:57Z" level=info msg="Mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount try opening and syncing dir to make sure it's healthy" func=csi.ensureMountPoint file="util.go:304"
# PodNew manually triggered NodeStageVolume, but something went wrong.
time="2024-02-06T10:03:57Z" level=info msg="Formatting device /dev/longhorn/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 with fsType ext4 and mounting at /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount with mount flags []" func="csi.(*NodeServer).nodeStageMountVolume" file="node_server.go:313" component=csi-node-server function=NodePublishVolume
I0206 10:03:57.187150 1062985 mount_linux.go:452] `fsck` error fsck from util-linux 2.37.4
/dev/longhorn/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 is mounted.
e2fsck: Cannot continue, aborting.


time="2024-02-06T10:03:57Z" level=info msg="Mounted volume pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 on node kworker1 does not require filesystem resize" func="csi.(*NodeServer).NodeStageVolume" file="node_server.go:543" component=csi-node-server function=NodeStageVolume
time="2024-02-06T10:03:57Z" level=info msg="Mounted volume pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 on node kworker1 via device /dev/longhorn/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182" func="csi.(*NodeServer).NodeStageVolume" file="node_server.go:546" component=csi-node-server function=NodeStageVolume
time="2024-02-06T10:03:57Z" level=info msg="Trying to ensure mount point /var/lib/kubelet/pods/cb3d3b82-58e2-4d93-be58-6394018da53a/volumes/kubernetes.io~csi/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182/mount" func=csi.ensureMountPoint file="util.go:296"
# PodNew cb3d3b82-58e2-4d93-be58-6394018da53a NodePublishVolume still succeeded in the end and the PodNew was now running
time="2024-02-06T10:03:57Z" level=info msg="NodePublishVolume: rsp: {}" func=csi.logGRPC file="server.go:141"
# PodOld cdf8fa36-df88-4796-b474-97a486024a8c triggered NodeUnpublishVolume
time="2024-02-06T10:04:20Z" level=info msg="NodeUnpublishVolume: req: {\"target_path\":\"/var/lib/kubelet/pods/cdf8fa36-df88-4796-b474-97a486024a8c/volumes/kubernetes.io~csi/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182/mount\",\"volume_id\":\"pvc-2d7f3352-f549-4565-a4b8-61ef7121a182\"}" func=csi.logGRPC file="server.go:132"
time="2024-02-06T10:04:20Z" level=info msg="NodeUnpublishVolume is called with req volume_id:\"pvc-2d7f3352-f549-4565-a4b8-61ef7121a182\" target_path:\"/var/lib/kubelet/pods/cdf8fa36-df88-4796-b474-97a486024a8c/volumes/kubernetes.io~csi/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182/mount\" " func="csi.(*NodeServer).NodeUnpublishVolume" file="node_server.go:353" component=csi-node-server function=NodeUnpublishVolume
time="2024-02-06T10:04:20Z" level=info msg="Trying to cleanup mount point /var/lib/kubelet/pods/cdf8fa36-df88-4796-b474-97a486024a8c/volumes/kubernetes.io~csi/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182/mount" func=csi.cleanupMountPoint file="util.go:372"
time="2024-02-06T10:04:20Z" level=info msg="Trying to force unmount potential mount point /var/lib/kubelet/pods/cdf8fa36-df88-4796-b474-97a486024a8c/volumes/kubernetes.io~csi/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182/mount" func=csi.unmount file="util.go:350"
time="2024-02-06T10:04:20Z" level=info msg="Cleaned up mount point /var/lib/kubelet/pods/cdf8fa36-df88-4796-b474-97a486024a8c/volumes/kubernetes.io~csi/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182/mount" func=csi.cleanupMountPoint file="util.go:378"
time="2024-02-06T10:04:20Z" level=info msg="Volume pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 unmounted from path /var/lib/kubelet/pods/cdf8fa36-df88-4796-b474-97a486024a8c/volumes/kubernetes.io~csi/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182/mount" func="csi.(*NodeServer).NodeUnpublishVolume" file="node_server.go:369" component=csi-node-server function=NodeUnpublishVolume
time="2024-02-06T10:04:20Z" level=info msg="NodeUnpublishVolume: rsp: {}" func=csi.logGRPC file="server.go:141"

To see what went wrong exactly (``fsck error fsck from util-linux 2.37.4 in the log), I did an experiment. I tried to unmount and remount the global mount point when the bind mount was still there and was read only

Initally

/dev/longhorn/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 on /host/var/lib/kubelet/pods/4e7d6e15-e754-4755-8948-079b
c98a8509/volumes/kubernetes.io~csi/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182/mount type ext4 (ro,relatime)
/dev/longhorn/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 on /host/var/lib/kubelet/plugins/kubernetes.io/csi/driver.
longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount type ext4 (ro,relatime)

umount the globlamount

/dev/longhorn/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 on /host/var/lib/kubelet/pods/4e7d6e15-e754-4755-8948-079b
c98a8509/volumes/kubernetes.io~csi/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182/mount type ext4 (ro,relatime)

use fsck to check the device, got

instance-manager-1d2defe020180358325daa37768b1023:/ # fsck -a /dev/longhorn/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182
fsck from util-linux 2.37.4
/dev/longhorn/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 is mounted.
e2fsck: Cannot continue, aborting.

Note: when doing k8s.io/mount-utils FormatAndMount(), it calls checkAndRepairFilesystem to check the device here . However, it by pass the error because it didn't return the error in this error case here

mount the globalmount back

instance-manager-1d2defe020180358325daa37768b1023:/ # mount /dev/longhorn/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 host/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount
mount: /host/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount: WARNING: source write-protected, mounted read-only.

Summary

So I believe that the issue here was because the old pod bind mount was not unmount first.
In this case the remount of the global mount would still be mounted read-only and the bind mount of the new pod would also be read only.

When the new pod was starting, since the globalmount point was there, unStageVolume had never been triggered by csi/kubelet.
When the old pod was terminating, since the new pod was already running, unStageVolume had never been triggered by csi/kubelet as well.
We did trigger unStageVolume -> StageVolume manually because we checked if the globalmount was healthy. But the remount won't solved the problem, because the error was bypass by the check implemented in mount_linux.go. So even the unStageVolume -> StageVolume succeeded, the globalmount was still read only

chriscchien · 2024-03-04T06:58:19Z

Verified pass on longhorn master(longhorn-manager 8cdf68) with test stpes

Remount PVC to read only, can see engine FilesystemReadOnly changed from True to False and data can write into workload without problem.

> k get engine -A -o yaml -w | grep FilesystemReadOnly -B 1 
    status: "False"
    type: FilesystemReadOnly
--
    status: "True"
    type: FilesystemReadOnly
--
    status: "False"
    type: FilesystemReadOnly

ChanYiLin self-assigned this Feb 5, 2024

innobead added this to the v1.7.0 milestone Feb 5, 2024

innobead added backport/1.4.5 backport/1.5.4 backport/1.6.1 priority/0 Must be fixed in this release (managed by PO) labels Feb 5, 2024

chriscchien mentioned this issue Feb 5, 2024

[BACKPORT][v1.5.4][IMPROVEMENT] Automatically remount read-only RWO volume to read-write #7500

Closed

This was referenced Feb 7, 2024

feat(volume): add remount read only volume grpc longhorn/longhorn-instance-manager#392

Merged

feat(volume): remount volume if it becomes read only longhorn/longhorn-manager#2550

Merged

innobead assigned chriscchien Feb 21, 2024

chriscchien closed this as completed Mar 4, 2024

innobead removed the backport/1.4.5 label Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] The feature of auto remount read only volume not work on a single node cluster. #7843

[BUG] The feature of auto remount read only volume not work on a single node cluster. #7843

chriscchien commented Feb 5, 2024

ChanYiLin commented Feb 5, 2024

ChanYiLin commented Feb 5, 2024 •

edited

ChanYiLin commented Feb 6, 2024

ChanYiLin commented Feb 6, 2024

ChanYiLin commented Feb 6, 2024

longhorn-io-github-bot commented Feb 7, 2024 •

edited by ChanYiLin

ejweber commented Feb 7, 2024

ChanYiLin commented Feb 7, 2024 •

edited

chriscchien commented Mar 4, 2024

[BUG] The feature of auto remount read only volume not work on a single node cluster. #7843

[BUG] The feature of auto remount read only volume not work on a single node cluster. #7843

Comments

chriscchien commented Feb 5, 2024

Describe the bug

To Reproduce

Expected behavior

Support bundle for troubleshooting

Environment

Additional context

ChanYiLin commented Feb 5, 2024

ChanYiLin commented Feb 5, 2024 • edited

Before volume remount feature

After volume remount feature (Only in one node or statefulset which pod always runs on the same node)

Workaround

ChanYiLin commented Feb 6, 2024

ChanYiLin commented Feb 6, 2024

ChanYiLin commented Feb 6, 2024

Note

Workaround:

longhorn-io-github-bot commented Feb 7, 2024 • edited by ChanYiLin

Pre Ready-For-Testing Checklist

ejweber commented Feb 7, 2024

ChanYiLin commented Feb 7, 2024 • edited

chriscchien commented Mar 4, 2024

ChanYiLin commented Feb 5, 2024 •

edited

longhorn-io-github-bot commented Feb 7, 2024 •

edited by ChanYiLin

ChanYiLin commented Feb 7, 2024 •

edited