Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] The feature of auto remount read only volume not work on a single node cluster. #7843

Closed
chriscchien opened this issue Feb 5, 2024 · 9 comments
Assignees
Labels
backport/1.5.4 backport/1.6.1 kind/bug priority/0 Must be fixed in this release (managed by PO) reproduce/always 100% reproducible require/backport Require backport. Only used when the specific versions to backport have not been definied. require/qa-review-coverage Require QA to review coverage severity/3 Function working but has a major issue w/ workaround
Milestone

Comments

@chriscchien
Copy link
Contributor

Describe the bug

After #6386 implemented, Longhorn will automatically remount read-only RWO volume to read-write. It worked well on multi nodes cluster, the volume will remount in few seconds and can write data again.

But in single node cluster it seems not worked properly, after change volume to read only, after 20 minutes, the volume stiil can not write data.

To Reproduce

  1. Create single node cluster curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.28.5+k3s1 sh -s - --write-kubeconfig-mode 644
  2. Deploy Longhorn master-head
  3. Create a single replica volume, attach to node with deployment, record the PVC name
  4. Find the PVC location by command find /var/lib -name ${PVC name}
  5. Remount PVC to read only by below command (replace below command by result of step 4, do not remove /mount in the command end )
mount -o remount,ro /var/lib/kubelet/pods/39dad47a-3724-4f3e-9385-c38c90e3cd3b/volumes/kubernetes.io~csi/pvc-617eee44-df06-4c37-9575-d5f2b92c0690/mount
  1. The deployment pod kept recreate, but volume still attached, write data in pod, the file system still read-only
  2. The longhorn-manager log size is 17 MB after extracted from support bundle and can see a bunch of below:
2024-02-05T07:26:33.987587217Z time="2024-02-05T07:26:33Z" level=info msg="Current time is not 5 seconds after request remount, requeue the pod test-deployment-754dd9fc66-hhcls to handle it later" func="controller.(*KubernetesPodController).handlePodDeletionIfVolumeRequestRemount" file="kubernetes_pod_controller.go:343" controller=longhorn-kubernetes-pod node=ip-172-31-32-43
  1. After 20 minutes passed by, the volume still read only and deployment pod still kept recreateing.

Expected behavior

Auto remount read only volume work on a single node cluster.

Support bundle for troubleshooting

supportbundle_b3abd9ae-8fc9-44f4-ad95-2deceffe927b_2024-02-05T07-26-29Z.zip

Environment

  • Longhorn version: master-head
  • Impacted volume (PV): pvc-12b01b25-0f09-4f88-8a76-96a8bce90a41
  • Installation method (e.g. Rancher Catalog App/Helm/Kubectl): kubectl
  • Kubernetes distro (e.g. RKE/K3s/EKS/OpenShift) and version: v1.28.5+k3s1
    • Number of control plane nodes in the cluster: 1
    • Number of worker nodes in the cluster: 1
  • Node config
    • OS type and version: Ubuntu 22.04, 100 Gi, t2.xlarge
  • Number of Longhorn volumes in the cluster: 1

Additional context

N/A

@ChanYiLin
Copy link
Contributor

I can reproduce the issue.

If there is only one node or the pod keeps recreating on the same node
Kubernetes won't detach and reattach the volume to the node, because it is on the same node
It only remount the global mount point into the newly created pod.
But the global mount point on the host would still be read-only.

This is a day-1 issue, we trust that when Kubernetes recreates the pod it will reattach the volume but it is not true in this case.
cc @innobead @derekbit

@ChanYiLin
Copy link
Contributor

ChanYiLin commented Feb 5, 2024

Before volume remount feature

  • pod will stuck in read only

After volume remount feature (Only in one node or statefulset which pod always runs on the same node)

  • pod will keep restarting and still in read only
  • manager's log will be full of msg="Current time is not 5 seconds after request remount, requeue the pod test-deployment-754dd9fc66-hhcls to handle it later" logs

Workaround

  • Scale down the deployment to 0 so the volume is truly detach
  • Scale it back to reattach

@ChanYiLin
Copy link
Contributor

After discussed with @derekbit
We can check if the current global mount path is read-only here to request re-stage Volume
cc @innobead

@ChanYiLin
Copy link
Contributor

After investigation, is not what we originally thought
Need to reinvestigate

@ChanYiLin
Copy link
Contributor

Note

The root cause is

  1. When the mount point becomes read-only
  2. We then try to delete the current pod and let the deployment to recreate the pod
  3. When the previous pod still under terminating, it won't call UnPublishVolume so the mount of the pod is still there
  4. But the next pod is now starting and call PublishVolume which will try to remount the globalmount
  5. However, since the previous pod mount is there and ro, the remount of globalmount will still be ro
  6. So even the pod is recreated the mount point is still ro
/dev/longhorn/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 on /host/var/lib/kubelet/pods/4e7d6e15-e754-4755-8948-079b
c98a8509/volumes/kubernetes.io~csi/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182/mount type ext4 (ro,relatime)
/dev/longhorn/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 on /host/var/lib/kubelet/plugins/kubernetes.io/csi/driver.
longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount type ext4 (ro,relatime)

Workaround:

Scale down to 0 wait for pod to be terminated and cleaned up and set the replicas back to 1

@longhorn-io-github-bot
Copy link

longhorn-io-github-bot commented Feb 7, 2024

Pre Ready-For-Testing Checklist

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at:
  1. Create single node cluster curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION=v1.28.5+k3s1 sh -s - --write-kubeconfig-mode 644
  2. Deploy Longhorn master-head
  3. Create a single replica volume, attach to node with deployment, record the PVC name
  4. Find the PVC mount location mount | grep "/dev/longhorn/${VOLUME_NAME}"
    /host/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/${VOLUME_NAME_HASH}/globalmount
  5. Remount PVC to read only by below command (remember the path ends with "globalmount" )
mount -o remount,ro /host/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/${VOLUME_NAME_HASH}/globalmount
  1. The engine condition should back to FaileSystemReadOnly: false and the IO works in the workload

@ejweber
Copy link
Contributor

ejweber commented Feb 7, 2024

I am a bit confused on a couple of points. Can you help to clarify?

  1. When the previous pod still under terminating, it won't call UnPublishVolume so the mount of the pod is still there

I understand that UnpulishVolume hasn't been called yet, but it seems to me the real problem is that UnstageVolume hasn't been called yet. (Of course, it can't be until after UnpublishVolume.)

  1. However, since the previous pod mount is there and ro, the remount of globalmount will still be ro

I don't get the cause and effect here. It sounds like you are implying the following, but I don't see how the bind mount can affect the global mount.

  • CAUSE: Previous pod mount still exists and is read only.
  • EFFECT: Global mount is still read only.

Is the following flow accurate?

  • CAUSE: Previous pod mount still exists.
  • EFFECT1: UnstageVolume was not called.
  • EFFECT2: Global mount is still read only.
  • EFFECT3: StageVolume is not called (volume is already staged).
  • EFFECT4: Global mount is still read only.

@ChanYiLin
Copy link
Contributor

ChanYiLin commented Feb 7, 2024

Hi @ejweber
The story is as follow

  1. Intentionally mounted the globalmount to read only, then the bind mount would also become read only. The workload then suffered from I/O error because it was now read only
/dev/longhorn/${VOLUME} on /host/var/lib/kubelet/pods/${OLDPOD}/volumes/kubernetes.io~csi/${PVC}/mount type ext4 (ro,relatime)
/dev/longhorn/${VOLUME} on /host/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/${VOLUME}/globalmount type ext4 (ro,relatime)
  1. The original implementation was to delete the pod to let the deployment recreate the pod, so it would remount the volume to the pod.
  2. However, when the previous pod, lets say PodOld, was still under terminating and the bind mount was still there, the new pod
    , PodNew, was creating. Because deployment recreates the pod when it is aware that the old pod is terminating.
  3. PodNew became running even before the PodOld was terminated, the container of PodOld was still there and the bind mount was also there. (Maybe because grace period deletion time delay and the new pod just started too fast)
  4. That is, PodNew triggered NodePublishVolume before PodOld triggered NodeUnPublishVolume
    • From the log I didn't see it triggered NodeStageVolume, I think it was because the globalmount was still mounted.
  5. When the PodNew triggered NodePublishVolume to bind mount the globlalmount to the pod, we actually verified if the globalmount was healthy here by checking if the mount can perform I/O ensureMountPoint().
  6. If it is not healthy we unmount the mount point and call NodeUnstageVolume => NodeStageVolume to remount the globalmount. And the globalmount was unhealthy because it was read only

The logs matches the steps above. (There are two pods)

# PodNew cb3d3b82-58e2-4d93-be58-6394018da53a triggered NodePublishVolume
time="2024-02-06T10:03:57Z" level=info msg="NodePublishVolume: req: {\"staging_target_path\":\"/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount\",\"target_path\":\"/var/lib/kubelet/pods/cb3d3b82-58e2-4d93-be58-6394018da53a/volumes/kubernetes.io~csi/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182/mount\",\"volume_capability\":{\"AccessType\":{\"Mount\":{\"fs_type\":\"ext4\"}},\"access_mode\":{\"mode\":1}},\"volume_context\":{\"csi.storage.k8s.io/ephemeral\":\"false\",\"csi.storage.k8s.io/pod.name\":\"test-deployment-fd47b4f4-v8gtn\",\"csi.storage.k8s.io/pod.namespace\":\"default\",\"csi.storage.k8s.io/pod.uid\":\"cb3d3b82-58e2-4d93-be58-6394018da53a\",\"csi.storage.k8s.io/serviceAccount.name\":\"default\",\"dataLocality\":\"disabled\",\"fromBackup\":\"\",\"fsType\":\"ext4\",\"numberOfReplicas\":\"3\",\"staleReplicaTimeout\":\"30\",\"storage.kubernetes.io/csiProvisionerIdentity\":\"1707213517288-841-driver.longhorn.io\"},\"volume_id\":\"pvc-2d7f3352-f549-4565-a4b8-61ef7121a182\"}" func=csi.logGRPC file="server.go:132"
time="2024-02-06T10:03:57Z" level=info msg="NodePublishVolume is called with req volume_id:\"pvc-2d7f3352-f549-4565-a4b8-61ef7121a182\" staging_target_path:\"/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount\" target_path:\"/var/lib/kubelet/pods/cb3d3b82-58e2-4d93-be58-6394018da53a/volumes/kubernetes.io~csi/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182/mount\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:\"csi.storage.k8s.io/ephemeral\" value:\"false\" > volume_context:<key:\"csi.storage.k8s.io/pod.name\" value:\"test-deployment-fd47b4f4-v8gtn\" > volume_context:<key:\"csi.storage.k8s.io/pod.namespace\" value:\"default\" > volume_context:<key:\"csi.storage.k8s.io/pod.uid\" value:\"cb3d3b82-58e2-4d93-be58-6394018da53a\" > volume_context:<key:\"csi.storage.k8s.io/serviceAccount.name\" value:\"default\" > volume_context:<key:\"dataLocality\" value:\"disabled\" > volume_context:<key:\"fromBackup\" value:\"\" > volume_context:<key:\"fsType\" value:\"ext4\" > volume_context:<key:\"numberOfReplicas\" value:\"3\" > volume_context:<key:\"staleReplicaTimeout\" value:\"30\" > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1707213517288-841-driver.longhorn.io\" > " func="csi.(*NodeServer).NodePublishVolume" file="node_server.go:82" component=csi-node-server function=NodePublishVolume
time="2024-02-06T10:03:57Z" level=info msg="Volume pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 using user and longhorn provided ext4 fs creation params: -b4096" func="csi.(*NodeServer).getMounter" file="node_server.go:841"
# PodNew verified if the globalmount was healthy
time="2024-02-06T10:03:57Z" level=info msg="Trying to ensure mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount" func=csi.ensureMountPoint file="util.go:296"
time="2024-02-06T10:03:57Z" level=info msg="Mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount try opening and syncing dir to make sure it's healthy" func=csi.ensureMountPoint file="util.go:304"
time="2024-02-06T10:03:57Z" level=warning msg="Mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount was identified as corrupt by opening and syncing" func=csi.ensureMountPoint file="util.go:306" error="open /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount/.longhorn-volume-mount-point-test.tmp: read-only file system"
# PodNew tried to umount the corrupt mount point and re-stage the volume
time="2024-02-06T10:03:57Z" level=info msg="Trying to unmount potential mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount" func=csi.unmount file="util.go:353"
time="2024-02-06T10:03:57Z" level=info msg="Unmounted existing corrupt mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount" func=csi.ensureMountPoint file="util.go:318"
time="2024-02-06T10:03:57Z" level=error msg="Staging target path /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount is no longer valid for volume pvc-2d7f3352-f549-4565-a4b8-61ef7121a182" func="csi.(*NodeServer).NodePublishVolume" file="node_server.go:183" component=csi-node-server error="<nil>" function=NodePublishVolume
time="2024-02-06T10:03:57Z" level=warning msg="Calling NodeUnstageVolume for volume pvc-2d7f3352-f549-4565-a4b8-61ef7121a182" func="csi.(*NodeServer).NodePublishVolume" file="node_server.go:189" component=csi-node-server function=NodePublishVolume
# PodNew manually triggered NodeUnstageVolume in our csi
time="2024-02-06T10:03:57Z" level=info msg="NodeUnstageVolume is called with req volume_id:\"pvc-2d7f3352-f549-4565-a4b8-61ef7121a182\" staging_target_path:\"/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount\" " func="csi.(*NodeServer).NodeUnstageVolume" file="node_server.go:553" component=csi-node-server function=NodeUnstageVolume
time="2024-02-06T10:03:57Z" level=info msg="Trying to force unmount potential mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount" func=csi.unmount file="util.go:350"
time="2024-02-06T10:03:57Z" level=info msg="Trying to cleanup mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182" func=csi.cleanupMountPoint file="util.go:372"
time="2024-02-06T10:03:57Z" level=info msg="Trying to force unmount potential mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182" func=csi.unmount file="util.go:350"
time="2024-02-06T10:03:57Z" level=info msg="No need for unmount not a mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182" func=csi.unmount file="util.go:362"
time="2024-02-06T10:03:57Z" level=info msg="Cleaned up mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182" func=csi.cleanupMountPoint file="util.go:378"
W0206 10:03:57.162254 1062985 mount_helper_common.go:34] Warning: mount cleanup skipped because path does not exist: /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182
time="2024-02-06T10:03:57Z" level=warning msg="Device /dev/mapper/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 is not an active LUKS device" func=crypto.DeviceEncryptionStatus file="crypto.go:171" error="failed to execute: /usr/bin/nsenter [nsenter --mount=/proc/1062858/ns/mnt --ipc=/proc/1062858/ns/ipc cryptsetup status pvc-2d7f3352-f549-4565-a4b8-61ef7121a182], output /dev/mapper/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 is inactive.\n, stderr : exit status 4"
time="2024-02-06T10:03:57Z" level=info msg="Volume pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 unmounted from node path /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount" func="csi.(*NodeServer).NodeUnstageVolume" file="node_server.go:602" component=csi-node-server function=NodeUnstageVolume
# PodNew manually triggered NodeStageVolume in our csi
time="2024-02-06T10:03:57Z" level=warning msg="Calling NodeStageVolume for volume pvc-2d7f3352-f549-4565-a4b8-61ef7121a182" func="csi.(*NodeServer).NodePublishVolume" file="node_server.go:195" component=csi-node-server function=NodePublishVolume
time="2024-02-06T10:03:57Z" level=info msg="NodeStageVolume is called with req volume_id:\"pvc-2d7f3352-f549-4565-a4b8-61ef7121a182\" staging_target_path:\"/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:\"csi.storage.k8s.io/ephemeral\" value:\"false\" > volume_context:<key:\"csi.storage.k8s.io/pod.name\" value:\"test-deployment-fd47b4f4-v8gtn\" > volume_context:<key:\"csi.storage.k8s.io/pod.namespace\" value:\"default\" > volume_context:<key:\"csi.storage.k8s.io/pod.uid\" value:\"cb3d3b82-58e2-4d93-be58-6394018da53a\" > volume_context:<key:\"csi.storage.k8s.io/serviceAccount.name\" value:\"default\" > volume_context:<key:\"dataLocality\" value:\"disabled\" > volume_context:<key:\"fromBackup\" value:\"\" > volume_context:<key:\"fsType\" value:\"ext4\" > volume_context:<key:\"numberOfReplicas\" value:\"3\" > volume_context:<key:\"staleReplicaTimeout\" value:\"30\" > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1707213517288-841-driver.longhorn.io\" > " func="csi.(*NodeServer).NodeStageVolume" file="node_server.go:376" component=csi-node-server function=NodeStageVolume
time="2024-02-06T10:03:57Z" level=info msg="Volume pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 using user and longhorn provided ext4 fs creation params: -b4096" func="csi.(*NodeServer).getMounter" file="node_server.go:841"
time="2024-02-06T10:03:57Z" level=info msg="Volume pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 device /dev/longhorn/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 contains filesystem of format ext4" func="csi.(*NodeServer).NodeStageVolume" file="node_server.go:462" component=csi-node-server function=NodeStageVolume
time="2024-02-06T10:03:57Z" level=info msg="Trying to ensure mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount" func=csi.ensureMountPoint file="util.go:296"
time="2024-02-06T10:03:57Z" level=info msg="Mount point /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount try opening and syncing dir to make sure it's healthy" func=csi.ensureMountPoint file="util.go:304"
# PodNew manually triggered NodeStageVolume, but something went wrong.
time="2024-02-06T10:03:57Z" level=info msg="Formatting device /dev/longhorn/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 with fsType ext4 and mounting at /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount with mount flags []" func="csi.(*NodeServer).nodeStageMountVolume" file="node_server.go:313" component=csi-node-server function=NodePublishVolume
I0206 10:03:57.187150 1062985 mount_linux.go:452] `fsck` error fsck from util-linux 2.37.4
/dev/longhorn/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 is mounted.
e2fsck: Cannot continue, aborting.


time="2024-02-06T10:03:57Z" level=info msg="Mounted volume pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 on node kworker1 does not require filesystem resize" func="csi.(*NodeServer).NodeStageVolume" file="node_server.go:543" component=csi-node-server function=NodeStageVolume
time="2024-02-06T10:03:57Z" level=info msg="Mounted volume pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 on node kworker1 via device /dev/longhorn/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182" func="csi.(*NodeServer).NodeStageVolume" file="node_server.go:546" component=csi-node-server function=NodeStageVolume
time="2024-02-06T10:03:57Z" level=info msg="Trying to ensure mount point /var/lib/kubelet/pods/cb3d3b82-58e2-4d93-be58-6394018da53a/volumes/kubernetes.io~csi/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182/mount" func=csi.ensureMountPoint file="util.go:296"
# PodNew cb3d3b82-58e2-4d93-be58-6394018da53a NodePublishVolume still succeeded in the end and the PodNew was now running
time="2024-02-06T10:03:57Z" level=info msg="NodePublishVolume: rsp: {}" func=csi.logGRPC file="server.go:141"
# PodOld cdf8fa36-df88-4796-b474-97a486024a8c triggered NodeUnpublishVolume
time="2024-02-06T10:04:20Z" level=info msg="NodeUnpublishVolume: req: {\"target_path\":\"/var/lib/kubelet/pods/cdf8fa36-df88-4796-b474-97a486024a8c/volumes/kubernetes.io~csi/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182/mount\",\"volume_id\":\"pvc-2d7f3352-f549-4565-a4b8-61ef7121a182\"}" func=csi.logGRPC file="server.go:132"
time="2024-02-06T10:04:20Z" level=info msg="NodeUnpublishVolume is called with req volume_id:\"pvc-2d7f3352-f549-4565-a4b8-61ef7121a182\" target_path:\"/var/lib/kubelet/pods/cdf8fa36-df88-4796-b474-97a486024a8c/volumes/kubernetes.io~csi/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182/mount\" " func="csi.(*NodeServer).NodeUnpublishVolume" file="node_server.go:353" component=csi-node-server function=NodeUnpublishVolume
time="2024-02-06T10:04:20Z" level=info msg="Trying to cleanup mount point /var/lib/kubelet/pods/cdf8fa36-df88-4796-b474-97a486024a8c/volumes/kubernetes.io~csi/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182/mount" func=csi.cleanupMountPoint file="util.go:372"
time="2024-02-06T10:04:20Z" level=info msg="Trying to force unmount potential mount point /var/lib/kubelet/pods/cdf8fa36-df88-4796-b474-97a486024a8c/volumes/kubernetes.io~csi/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182/mount" func=csi.unmount file="util.go:350"
time="2024-02-06T10:04:20Z" level=info msg="Cleaned up mount point /var/lib/kubelet/pods/cdf8fa36-df88-4796-b474-97a486024a8c/volumes/kubernetes.io~csi/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182/mount" func=csi.cleanupMountPoint file="util.go:378"
time="2024-02-06T10:04:20Z" level=info msg="Volume pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 unmounted from path /var/lib/kubelet/pods/cdf8fa36-df88-4796-b474-97a486024a8c/volumes/kubernetes.io~csi/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182/mount" func="csi.(*NodeServer).NodeUnpublishVolume" file="node_server.go:369" component=csi-node-server function=NodeUnpublishVolume
time="2024-02-06T10:04:20Z" level=info msg="NodeUnpublishVolume: rsp: {}" func=csi.logGRPC file="server.go:141"

To see what went wrong exactly (``fsck error fsck from util-linux 2.37.4 in the log), I did an experiment. I tried to unmount and remount the global mount point when the bind mount was still there and was read only

  1. Initally
/dev/longhorn/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 on /host/var/lib/kubelet/pods/4e7d6e15-e754-4755-8948-079b
c98a8509/volumes/kubernetes.io~csi/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182/mount type ext4 (ro,relatime)
/dev/longhorn/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 on /host/var/lib/kubelet/plugins/kubernetes.io/csi/driver.
longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount type ext4 (ro,relatime)
  1. umount the globlamount
/dev/longhorn/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 on /host/var/lib/kubelet/pods/4e7d6e15-e754-4755-8948-079b
c98a8509/volumes/kubernetes.io~csi/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182/mount type ext4 (ro,relatime)
  1. use fsck to check the device, got
instance-manager-1d2defe020180358325daa37768b1023:/ # fsck -a /dev/longhorn/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182
fsck from util-linux 2.37.4
/dev/longhorn/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 is mounted.
e2fsck: Cannot continue, aborting.

Note: when doing k8s.io/mount-utils FormatAndMount(), it calls checkAndRepairFilesystem to check the device here . However, it by pass the error because it didn't return the error in this error case here

  1. mount the globalmount back
instance-manager-1d2defe020180358325daa37768b1023:/ # mount /dev/longhorn/pvc-2d7f3352-f549-4565-a4b8-61ef7121a182 host/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount
mount: /host/var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/536e9cde86b7c02e277285c8cc459b13625f724485127fae824f102463baae5c/globalmount: WARNING: source write-protected, mounted read-only.

Summary

So I believe that the issue here was because the old pod bind mount was not unmount first.
In this case the remount of the global mount would still be mounted read-only and the bind mount of the new pod would also be read only.

When the new pod was starting, since the globalmount point was there, unStageVolume had never been triggered by csi/kubelet.
When the old pod was terminating, since the new pod was already running, unStageVolume had never been triggered by csi/kubelet as well.
We did trigger unStageVolume -> StageVolume manually because we checked if the globalmount was healthy. But the remount won't solved the problem, because the error was bypass by the check implemented in mount_linux.go. So even the unStageVolume -> StageVolume succeeded, the globalmount was still read only

@chriscchien
Copy link
Contributor Author

Verified pass on longhorn master(longhorn-manager 8cdf68) with test stpes

Remount PVC to read only, can see engine FilesystemReadOnly changed from True to False and data can write into workload without problem.

> k get engine -A -o yaml -w | grep FilesystemReadOnly -B 1 
    status: "False"
    type: FilesystemReadOnly
--
    status: "True"
    type: FilesystemReadOnly
--
    status: "False"
    type: FilesystemReadOnly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.5.4 backport/1.6.1 kind/bug priority/0 Must be fixed in this release (managed by PO) reproduce/always 100% reproducible require/backport Require backport. Only used when the specific versions to backport have not been definied. require/qa-review-coverage Require QA to review coverage severity/3 Function working but has a major issue w/ workaround
Projects
None yet
Development

No branches or pull requests

5 participants