Question about reconciling (reconcileVA) based on RPC_LIST_VOLUMES_PUBLISHED_NODES #374

vitalif · 2022-09-15T14:20:28Z

Hi. People suggest me to implement reconciling based on RPC_LIST_VOLUMES_PUBLISHED_NODES in my csi-s3 FUSE-based driver, however there's a problem:
As I understand reconciling is done via ControllerServer's ListVolumes which must report published nodes for each volume.
However, ControllerServer doesn't have this information - it's only available on nodes themselves in the form of running FUSE processes and mount lists.
How is it supposed to provide this information to ControllerServer, what's the right way to do that?

jsafrane · 2022-09-19T08:57:04Z

I would perhaps ask why do you implement ControllerPublish / ControllerUnpublish and what these calls do?

In a typical CSI driver, they contact a storage API (like AWS) and attach/detach the volume to/from the node + provide any initialization / cleanup on the storage backend side. It is then natural that such storage backend has an API to list what is attached where.

In S3, I don't see such API to attach / detach volumes.

vitalif · 2022-09-19T09:17:20Z

API calls happen in a typical cloud-based CSI driver, not just any CSI driver :-)
csi-s3 mounts S3 as a local FS using FUSE (geesefs / goofys / s3fs). So nodes start FUSE themselves and there's no API that can list mounted volumes.

vitalif · 2022-09-19T09:23:25Z

Original problem is here yandex-cloud/k8s-csi-s3#29 - problem is that FUSE process dies when CSI-S3 pod is restarted and Kubernetes doesn't know anything about it, and volume mounts become broken.

In fact the same happens for CephFS-FUSE and other FUSE-based CSI drivers: datashim-io/datashim#153 ceph/ceph-csi#792

Alibaba solves that by moving the FUSE process out of K8s into systemd: https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/blob/master/docs/oss-upgrade.md, Azure also does something similar: https://github.com/kubernetes-sigs/blob-csi-driver/tree/master/deploy/blobfuse-proxy

In fact I'm already thinking about implementing a fix without ListVolumes, based on a state file - i.e. make node servers save mount lists into a local file on each node and remount all volumes listed there on restart. The only things I fear are that remounting may in theory fail when other pods still hold open file descriptors inside a failed volume, and that the unmounted and remounted volume may not propagate to other pods correctly... But I need to check it to be sure.

jsafrane · 2022-09-19T12:05:56Z

Fuse issues that you describe do not seem to be related to ControllerPublish / Unpublish. So asking again, what will the CSI driver do in ControllerPublish and Unpublish and why do you need it implemented? With RPC_LIST_VOLUMES_PUBLISHED_NODES implemented in a CSI driver, external-attacher will only call ControllerPublish on volume that look erroneously ControllerUnpublished. It will not call NodeStage / NodePublish and it will not remount the volumes!

The only things I fear are that remounting may in theory fail when other pods still hold open file descriptors inside a failed volume, and that the unmounted and remounted volume may not propagate to other pods correctly...

Your fear is IMO correct, application pods that started before the CSI driver remounts a volume will see the old mount. Slave or Bidirectional mount propagation in the application pods would help, but then you would need to convince app providers (= e.g. people who write helm charts) to use that mount propagation in their charts.
(but please test it, this is complicated and I could be wrong (-: )

I am afraid I do not have any good / easy way how to work with fuse in containers.

vitalif · 2022-09-19T17:45:48Z

Ok, I see... I didn't ask anything about ControllerPublish/Unpublish, I thought it would call NodePublish for some reason :-)

vitalif · 2022-09-19T17:51:03Z

Thanks for your answer anyway :)

This was referenced Sep 19, 2022

csi-s3 daemonset pod restart causes mounted pvc not accessible yandex-cloud/k8s-csi-s3#29

Closed

Transport endpoint is not connected when csi-s3 pod is restarted datashim-io/datashim#153

Open

vitalif closed this as completed Sep 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about reconciling (reconcileVA) based on RPC_LIST_VOLUMES_PUBLISHED_NODES #374

Question about reconciling (reconcileVA) based on RPC_LIST_VOLUMES_PUBLISHED_NODES #374

vitalif commented Sep 15, 2022

jsafrane commented Sep 19, 2022

vitalif commented Sep 19, 2022

vitalif commented Sep 19, 2022 •

edited

jsafrane commented Sep 19, 2022

vitalif commented Sep 19, 2022

vitalif commented Sep 19, 2022

Question about reconciling (reconcileVA) based on RPC_LIST_VOLUMES_PUBLISHED_NODES #374

Question about reconciling (reconcileVA) based on RPC_LIST_VOLUMES_PUBLISHED_NODES #374

Comments

vitalif commented Sep 15, 2022

jsafrane commented Sep 19, 2022

vitalif commented Sep 19, 2022

vitalif commented Sep 19, 2022 • edited

jsafrane commented Sep 19, 2022

vitalif commented Sep 19, 2022

vitalif commented Sep 19, 2022

vitalif commented Sep 19, 2022 •

edited