Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about reconciling (reconcileVA) based on RPC_LIST_VOLUMES_PUBLISHED_NODES #374

Closed
vitalif opened this issue Sep 15, 2022 · 6 comments

Comments

@vitalif
Copy link

vitalif commented Sep 15, 2022

Hi. People suggest me to implement reconciling based on RPC_LIST_VOLUMES_PUBLISHED_NODES in my csi-s3 FUSE-based driver, however there's a problem:
As I understand reconciling is done via ControllerServer's ListVolumes which must report published nodes for each volume.
However, ControllerServer doesn't have this information - it's only available on nodes themselves in the form of running FUSE processes and mount lists.
How is it supposed to provide this information to ControllerServer, what's the right way to do that?

@jsafrane
Copy link
Contributor

I would perhaps ask why do you implement ControllerPublish / ControllerUnpublish and what these calls do?

In a typical CSI driver, they contact a storage API (like AWS) and attach/detach the volume to/from the node + provide any initialization / cleanup on the storage backend side. It is then natural that such storage backend has an API to list what is attached where.

In S3, I don't see such API to attach / detach volumes.

@vitalif
Copy link
Author

vitalif commented Sep 19, 2022

API calls happen in a typical cloud-based CSI driver, not just any CSI driver :-)
csi-s3 mounts S3 as a local FS using FUSE (geesefs / goofys / s3fs). So nodes start FUSE themselves and there's no API that can list mounted volumes.

@vitalif
Copy link
Author

vitalif commented Sep 19, 2022

Original problem is here yandex-cloud/k8s-csi-s3#29 - problem is that FUSE process dies when CSI-S3 pod is restarted and Kubernetes doesn't know anything about it, and volume mounts become broken.

In fact the same happens for CephFS-FUSE and other FUSE-based CSI drivers: datashim-io/datashim#153 ceph/ceph-csi#792

Alibaba solves that by moving the FUSE process out of K8s into systemd: https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/blob/master/docs/oss-upgrade.md, Azure also does something similar: https://github.com/kubernetes-sigs/blob-csi-driver/tree/master/deploy/blobfuse-proxy

In fact I'm already thinking about implementing a fix without ListVolumes, based on a state file - i.e. make node servers save mount lists into a local file on each node and remount all volumes listed there on restart. The only things I fear are that remounting may in theory fail when other pods still hold open file descriptors inside a failed volume, and that the unmounted and remounted volume may not propagate to other pods correctly... But I need to check it to be sure.

@jsafrane
Copy link
Contributor

Fuse issues that you describe do not seem to be related to ControllerPublish / Unpublish. So asking again, what will the CSI driver do in ControllerPublish and Unpublish and why do you need it implemented? With RPC_LIST_VOLUMES_PUBLISHED_NODES implemented in a CSI driver, external-attacher will only call ControllerPublish on volume that look erroneously ControllerUnpublished. It will not call NodeStage / NodePublish and it will not remount the volumes!

The only things I fear are that remounting may in theory fail when other pods still hold open file descriptors inside a failed volume, and that the unmounted and remounted volume may not propagate to other pods correctly...

Your fear is IMO correct, application pods that started before the CSI driver remounts a volume will see the old mount. Slave or Bidirectional mount propagation in the application pods would help, but then you would need to convince app providers (= e.g. people who write helm charts) to use that mount propagation in their charts.
(but please test it, this is complicated and I could be wrong (-: )

I am afraid I do not have any good / easy way how to work with fuse in containers.

@vitalif
Copy link
Author

vitalif commented Sep 19, 2022

Ok, I see... I didn't ask anything about ControllerPublish/Unpublish, I thought it would call NodePublish for some reason :-)

@vitalif
Copy link
Author

vitalif commented Sep 19, 2022

Thanks for your answer anyway :)

@vitalif vitalif closed this as completed Sep 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants