New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about reconciling (reconcileVA) based on RPC_LIST_VOLUMES_PUBLISHED_NODES #374
Comments
I would perhaps ask why do you implement ControllerPublish / ControllerUnpublish and what these calls do? In a typical CSI driver, they contact a storage API (like AWS) and attach/detach the volume to/from the node + provide any initialization / cleanup on the storage backend side. It is then natural that such storage backend has an API to list what is attached where. In S3, I don't see such API to attach / detach volumes. |
API calls happen in a typical cloud-based CSI driver, not just any CSI driver :-) |
Original problem is here yandex-cloud/k8s-csi-s3#29 - problem is that FUSE process dies when CSI-S3 pod is restarted and Kubernetes doesn't know anything about it, and volume mounts become broken. In fact the same happens for CephFS-FUSE and other FUSE-based CSI drivers: datashim-io/datashim#153 ceph/ceph-csi#792 Alibaba solves that by moving the FUSE process out of K8s into systemd: https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/blob/master/docs/oss-upgrade.md, Azure also does something similar: https://github.com/kubernetes-sigs/blob-csi-driver/tree/master/deploy/blobfuse-proxy In fact I'm already thinking about implementing a fix without ListVolumes, based on a state file - i.e. make node servers save mount lists into a local file on each node and remount all volumes listed there on restart. The only things I fear are that remounting may in theory fail when other pods still hold open file descriptors inside a failed volume, and that the unmounted and remounted volume may not propagate to other pods correctly... But I need to check it to be sure. |
Fuse issues that you describe do not seem to be related to ControllerPublish / Unpublish. So asking again, what will the CSI driver do in ControllerPublish and Unpublish and why do you need it implemented? With RPC_LIST_VOLUMES_PUBLISHED_NODES implemented in a CSI driver, external-attacher will only call ControllerPublish on volume that look erroneously ControllerUnpublished. It will not call NodeStage / NodePublish and it will not remount the volumes!
Your fear is IMO correct, application pods that started before the CSI driver remounts a volume will see the old mount. I am afraid I do not have any good / easy way how to work with fuse in containers. |
Ok, I see... I didn't ask anything about ControllerPublish/Unpublish, I thought it would call NodePublish for some reason :-) |
Thanks for your answer anyway :) |
Hi. People suggest me to implement reconciling based on RPC_LIST_VOLUMES_PUBLISHED_NODES in my csi-s3 FUSE-based driver, however there's a problem:
As I understand reconciling is done via ControllerServer's ListVolumes which must report published nodes for each volume.
However, ControllerServer doesn't have this information - it's only available on nodes themselves in the form of running FUSE processes and mount lists.
How is it supposed to provide this information to ControllerServer, what's the right way to do that?
The text was updated successfully, but these errors were encountered: