Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csi-azurefile-controller pod constantly restarts #1790

Closed
luber opened this issue Apr 1, 2024 · 4 comments · Fixed by #1795
Closed

csi-azurefile-controller pod constantly restarts #1790

luber opened this issue Apr 1, 2024 · 4 comments · Fixed by #1795

Comments

@luber
Copy link

luber commented Apr 1, 2024

What happened: csi-azurefile-controller constantly restarts

What you expected to happen: csi-azurefile-controller pod works stable

How to reproduce it:

  1. install k3s cluster on-prem
  2. install csi driver (as described here: https://github.com/kubernetes-sigs/azurefile-csi-driver/blob/master/docs/install-csi-driver-v1.30.0.md)
  3. create storageClass:
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: azurefile-csi-dev
provisioner: file.csi.azure.com
allowVolumeExpansion: true
parameters:
  skuName: Standard_LRS
  storageAccount: YOUR_STORAGE_ACCOUNT
  resourceGroup: YOUR_RG
  csi.storage.k8s.io/provisioner-secret-name: azure-secret
  csi.storage.k8s.io/provisioner-secret-namespace: default
  csi.storage.k8s.io/node-stage-secret-name: azure-secret
  csi.storage.k8s.io/node-stage-secret-namespace: default
  csi.storage.k8s.io/controller-expand-secret-name: azure-secret
  csi.storage.k8s.io/controller-expand-secret-namespace: default
reclaimPolicy: Delete
volumeBindingMode: Immediate
mountOptions:
  - dir_mode=0777
  - file_mode=0777
  - mfsymlinks
  - cache=strict  # https://linux.die.net/man/8/mount.cifs
  - nosharesock  # reduce probability of reconnect race
  - actimeo=30  # reduce latency for metadata-heavy workload
  1. create some deployment with PVC that will use that storage class and wait until it succeed (ensure you provided cred in azure-secret secret)
  2. drop deployment and check that PVs are in the Released state
  3. deploy again so new PVs are provisioned

Environment:

E0401 16:44:13.019159       1 controller.go:1007] error syncing volume "pvc-58c59d03-0885-42c7-a3a0-740bf720ed42": rpc error: code = Unavailable desc = error reading from server: EOF
E0401 16:44:13.019187       1 controller.go:1007] error syncing volume "pvc-725a26e4-8836-4f90-89cd-00cee569c8e4": rpc error: code = Unavailable desc = error reading from server: EOF
I0401 16:44:13.019263       1 event.go:364] Event(v1.ObjectReference{Kind:"PersistentVolume", Namespace:"", Name:"pvc-58c59d03-0885-42c7-a3a0-740bf720ed42", UID:"641f0fd2-6575-4c20-b887-ccba07108bab", APIVersion:"v1", ResourceVersion:"23180", FieldPath:""}): type: 'Warning' reason: 'VolumeFailedDelete' rpc error: code = Unavailable desc = error reading from server: EOF
I0401 16:44:13.019315       1 event.go:364] Event(v1.ObjectReference{Kind:"PersistentVolume", Namespace:"", Name:"pvc-725a26e4-8836-4f90-89cd-00cee569c8e4", UID:"d63ba523-7648-4aec-8416-ba1b1196748b", APIVersion:"v1", ResourceVersion:"23177", FieldPath:""}): type: 'Warning' reason: 'VolumeFailedDelete' rpc error: code = Unavailable desc = error reading from server: EOF
I0401 16:44:14.020065       1 controller.go:1509] delete "pvc-58c59d03-0885-42c7-a3a0-740bf720ed42": started
I0401 16:44:14.020124       1 controller.go:1509] delete "pvc-725a26e4-8836-4f90-89cd-00cee569c8e4": started
E0401 16:44:14.028769       1 connection.go:193] Lost connection to unix:///csi/csi.sock.
F0401 16:44:14.029450       1 connection.go:114] Lost connection to CSI driver, exiting
@luber
Copy link
Author

luber commented Apr 1, 2024

the workaround is to manually go through the list of PVs in the "Released" state (that is supposed to be removed) and remove them manually (enduring there are no finalizers)

@andyzhangx
Copy link
Member

@luber what's the azurefile container logs?

kubectl logs csi-azurefile-node-cvgbs -c azurefile -n kube-system > csi-azurefile-node.log

@andyzhangx
Copy link
Member

does this workaround work? #1707 (comment)

@luber
Copy link
Author

luber commented Apr 3, 2024

@luber what's the azurefile container logs?

kubectl logs csi-azurefile-node-cvgbs -c azurefile -n kube-system > csi-azurefile-node.log
I0403 06:51:48.199273       1 utils.go:100] GRPC call: /csi.v1.Identity/GetPluginInfo
I0403 06:51:48.199356       1 utils.go:101] GRPC request: {}
I0403 06:51:48.201424       1 utils.go:107] GRPC response: {"name":"file.csi.azure.com","vendor_version":"v1.30.0"}
I0403 06:51:48.203360       1 utils.go:100] GRPC call: /csi.v1.Identity/GetPluginCapabilities
I0403 06:51:48.203427       1 utils.go:101] GRPC request: {}
I0403 06:51:48.203529       1 utils.go:107] GRPC response: {"capabilities":[{"Type":{"Service":{"type":1}}}]}
I0403 06:51:48.204166       1 utils.go:100] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
I0403 06:51:48.204252       1 utils.go:101] GRPC request: {}
I0403 06:51:48.204322       1 utils.go:107] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":5}}},{"Type":{"Rpc":{"type":9}}},{"Type":{"Rpc":{"type":13}}},{"Type":{"Rpc":{"type":7}}}]}
I0403 06:52:10.350419       1 utils.go:100] GRPC call: /csi.v1.Controller/DeleteVolume
I0403 06:52:10.350452       1 utils.go:101] GRPC request: {"secrets":"***stripped***","volume_id":"insight-dev#ktcinsightstoragedev#pvc-b8701f74-88e1-4303-bd8f-945d36b4d2cc###rabbitmq-system"}
I0403 06:52:10.350678       1 azurefile.go:607] got storage account(ktcinsightstoragedev) from secret
I0403 06:52:10.352306       1 utils.go:100] GRPC call: /csi.v1.Controller/DeleteVolume
I0403 06:52:10.352330       1 utils.go:101] GRPC request: {"secrets":"***stripped***","volume_id":"me-insight-test#metestinsightstorage#pvc-f7a123a1-f8bf-4a2f-bf77-a44669760a64###rabbitmq-system"}
I0403 06:52:10.352411       1 azurefile.go:607] got storage account(metestinsightstorage) from secret
W0403 06:52:10.625891       1 azurefile.go:950] DeleteFileShare(pvc-b8701f74-88e1-4303-bd8f-945d36b4d2cc) on account(ktcinsightstoragedev) failed with error(storage: service returned error: StatusCode=404, ErrorCode=ShareNotFound, ErrorMessage=The specified share does not exist.
RequestId:3fbf70e8-801a-0000-3793-85ba0e000000
Time:2024-04-03T06:52:10.6228107Z, RequestInitiated=Wed, 03 Apr 2024 06:52:10 GMT, RequestId=3fbf70e8-801a-0000-3793-85ba0e000000, API Version=2018-03-28, QueryParameterName=, QueryParameterValue=), return as success
I0403 06:52:10.625975       1 controllerserver.go:715] azure file(pvc-b8701f74-88e1-4303-bd8f-945d36b4d2cc) under subsID() rg(insight-dev) account(ktcinsightstoragedev) volume(insight-dev#ktcinsightstoragedev#pvc-b8701f74-88e1-4303-bd8f-945d36b4d2cc###rabbitmq-system) is deleted successfully
I0403 06:52:10.626008       1 azurefile.go:1094] remove tag(skip-matching) on account(ktcinsightstoragedev) subsID(), resourceGroup(insight-dev)
I0403 06:52:10.626682       1 azure_metrics.go:115] "Observed Request Latency" latency_seconds=0.275877016 request="azurefile_csi_driver_controller_delete_volume" resource_group="insight-dev" subscription_id="" source="file.csi.azure.com" volumeid="insight-dev#ktcinsightstoragedev#pvc-b8701f74-88e1-4303-bd8f-945d36b4d2cc###rabbitmq-system" result_code="failed_csi_driver_controller_delete_volume"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1bdac0b]

goroutine 13 [running]:
sigs.k8s.io/cloud-provider-azure/pkg/provider.(*LockMap).LockEntry(0x0, {0xc00025066c, 0x14})
	/mnt/vss/_work/1/go/src/sigs.k8s.io/azurefile-csi-driver/vendor/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_utils.go:64 +0x2b
sigs.k8s.io/cloud-provider-azure/pkg/provider.(*Cloud).RemoveStorageAccountTag(0xc000106c00, {0x2934da8, 0xc0003e8d20}, {0x0, 0x0}, {0xc000250660, 0xb}, {0xc00025066c, 0x14}, {0x25c6fcd, ...})
	/mnt/vss/_work/1/go/src/sigs.k8s.io/azurefile-csi-driver/vendor/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_storageaccount.go:744 +0x98
sigs.k8s.io/azurefile-csi-driver/pkg/azurefile.(*Driver).RemoveStorageAccountTag(0xc00016e000, {0x2934da8, 0xc0003e8d20}, {0x0, 0x0}, {0xc000250660, 0xb}, {0xc00025066c, 0x14}, {0x25c6fcd, ...})
	/mnt/vss/_work/1/go/src/sigs.k8s.io/azurefile-csi-driver/pkg/azurefile/azurefile.go:1096 +0x511
sigs.k8s.io/azurefile-csi-driver/pkg/azurefile.(*Driver).DeleteVolume(0xc00016e000, {0x2934da8, 0xc0003e8d20}, 0xc00043ca80)
	/mnt/vss/_work/1/go/src/sigs.k8s.io/azurefile-csi-driver/pkg/azurefile/controllerserver.go:716 +0xc10
github.com/container-storage-interface/spec/lib/go/csi._Controller_DeleteVolume_Handler.func1({0x2934da8, 0xc0003e8d20}, {0x2427760?, 0xc00043ca80})
	/mnt/vss/_work/1/go/src/sigs.k8s.io/azurefile-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi.pb.go:6438 +0x72
sigs.k8s.io/azurefile-csi-driver/pkg/csi-common.LogGRPC({0x2934da8, 0xc0003e8d20}, {0x2427760?, 0xc00043ca80?}, 0xc000436a20, 0xc000010e40)
	/mnt/vss/_work/1/go/src/sigs.k8s.io/azurefile-csi-driver/pkg/csi-common/utils.go:103 +0x3a9
github.com/container-storage-interface/spec/lib/go/csi._Controller_DeleteVolume_Handler({0x2572de0?, 0xc00016e000}, {0x2934da8, 0xc0003e8d20}, 0xc00048ef80, 0x26f0f50)
	/mnt/vss/_work/1/go/src/sigs.k8s.io/azurefile-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi.pb.go:6440 +0x135
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0001a8e00, {0x2934da8, 0xc0003e8c90}, {0x293f080, 0xc0000076c0}, 0xc000499d40, 0xc00026b170, 0x3c51fb8, 0x0)
	/mnt/vss/_work/1/go/src/sigs.k8s.io/azurefile-csi-driver/vendor/google.golang.org/grpc/server.go:1383 +0xe03
google.golang.org/grpc.(*Server).handleStream(0xc0001a8e00, {0x293f080, 0xc0000076c0}, 0xc000499d40)
	/mnt/vss/_work/1/go/src/sigs.k8s.io/azurefile-csi-driver/vendor/google.golang.org/grpc/server.go:1794 +0x100c
google.golang.org/grpc.(*Server).serveStreams.func2.1()
	/mnt/vss/_work/1/go/src/sigs.k8s.io/azurefile-csi-driver/vendor/google.golang.org/grpc/server.go:1027 +0x8b
created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 53
	/mnt/vss/_work/1/go/src/sigs.k8s.io/azurefile-csi-driver/vendor/google.golang.org/grpc/server.go:1038 +0x135

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants