Pod creation problems due to slow mounting/attaching #79

spencergilbert · 2020-04-11T21:32:07Z

We have 28 total nodes (3 masters, 25 worker)
Each node has 48 cores, 252GB ram

~228 zfs volumes (305 including old path volumes)
~212 volumeattachments

    I0409 17:26:53.814862 1 operation_generator.go:359] AttachVolume.Attach succeeded for volume "pvc-f029b12f-92d6-44fa-aa28-33a64f420b34" (UniqueName: "kubernetes.io/csi/zfs.csi.openebs.io^pvc-f029b12f-92d6-44fa-aa28-33a64f420b34") from node "halfling-mexico"```

Volumes provisioned by the zfs provider all seem to take >1hr to complete the mounting of a volume. We haven't seen any throttling logs form the controller manager.

Happy to provide any additional information.

The text was updated successfully, but these errors were encountered:

pawanpraka1 · 2020-04-13T18:01:26Z

Analysis so far => The problem seems to be related to k8s, ZFS-LocalPV seems to be fine. K8s services are taking too long to create the volumeattachment object, which will attach the volume to the node. the pod will be stuck in ContainerCreating state, until volumeattachment object has been created. Once it is created the pod will come into running state.

spencergilbert · 2020-04-15T21:38:02Z

following this kubernetes/kubernetes#84169 (comment) setting attach-detach-reconcile-sync-period to 300s improves the situation. Perhaps LIST_VOLUMES_PUBLISHED_NODES should be implemented?

pawanpraka1 · 2020-04-16T02:12:57Z

I don't think LIST_VOLUMES_PUBLISHED_NODES helps reduce API load. One possible scaling problem that was pointed out earlier is that we do a Get() on every VolumeAttachment every minute. We should change that to use an informer instead. This should greatly reduce API load.

Discussed here :- kubernetes/kubernetes#84169

w3aman · 2020-04-22T06:44:12Z

I reproduced this issue with 200s of volumes. i saw that when this no increases Volume attachment is getting very slow. I've some stats regarding this timings.

first 50 pods   ----> 43 seconds
next 50   ---------> 1 min 11 sec
101st --------->  68 seconds
102nd   ----------> 2 min 16 seconds
next 48 ------------>  6 min 50 sec  --------> in which 1st one came in 4 min 12 sec
next 50 ------------>  14 min 55 sec  --------> in which 1st one came same 4 min 14 sec ----> 2nd one in 6 min 11 sec

To overcome this issue we are avoiding the creation of volumeattachment object as it is not required for ZFSPV as of now. Refer the PR: #85

To validate this PR i reproduced the scenario with 200 volumes and zfs-driver:v0.4. Then we upgrade the driver to 0.6.0 with the changes done in the above mentioned PR where we are removing the csi-attacher container. And also as a cleanup part this upgrade deleting the volumeattachments also kubectl delete volumeattachments --all

After upgrade i tried 2 scenario...one where i cloned the 200 volumes for which i already took the snapshot before upgrading and second one where i provisioned new 200 volumes. I observed a very significant amount of decrease in time for pods coming into the Running state.

For new provisioning of volumes i have some stats regarding timings.

next 50 ------> 40 sec
next 50 -------> 38 sec
101st ------> 8 sec
102nd ------> 4 sec
next 50 -------> 45 sec
next 34 ------>  33 seconds

w3aman · 2020-04-26T14:21:05Z

So this PR #85 now been tested on three k8s versions (1.16 , 1.17 and 1.18) which resolves this issue #79.
As a update for validation of the PR, deletion of volumeattachment is not making any regression with zfs-volumes. And avoiding the creation of volume attachement object is making volume mouting fast. As a result even after large no of volumes pod creation is not facing slow mouting issue.

pawanpraka1 · 2020-04-29T17:04:48Z

fixed to issue by avoiding the volumeattachment object (#85). Now we can see volumes are getting attached very fast.

pawanpraka1 added the bug Something isn't working label Apr 17, 2020

pawanpraka1 added this to To do in ZFS Local PV via automation Apr 17, 2020

pawanpraka1 added this to the v0.6.1 milestone Apr 17, 2020

pawanpraka1 moved this from To do to In progress in ZFS Local PV Apr 17, 2020

pawanpraka1 added this to Pre-commits and Designs - Due: Apr 30 2020 in 1.10 Release Tracker - Due May 15th. Apr 17, 2020

AmitKumarDas added the scalability label Apr 22, 2020

pawanpraka1 mentioned this issue Apr 22, 2020

feat(attach): avoid creation of volumeattachment object #85

Merged

pawanpraka1 moved this from In progress to In Review in ZFS Local PV Apr 22, 2020

pawanpraka1 closed this as completed Apr 29, 2020

ZFS Local PV automation moved this from In Review to Done Apr 29, 2020

1.10 Release Tracker - Due May 15th. automation moved this from Pre-commits and Designs - Due: Apr 30 2020 to Done Apr 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod creation problems due to slow mounting/attaching #79

Pod creation problems due to slow mounting/attaching #79

spencergilbert commented Apr 11, 2020

pawanpraka1 commented Apr 13, 2020

spencergilbert commented Apr 15, 2020

pawanpraka1 commented Apr 16, 2020

w3aman commented Apr 22, 2020

w3aman commented Apr 26, 2020 •

edited

pawanpraka1 commented Apr 29, 2020

Pod creation problems due to slow mounting/attaching #79

Pod creation problems due to slow mounting/attaching #79

Comments

spencergilbert commented Apr 11, 2020

pawanpraka1 commented Apr 13, 2020

spencergilbert commented Apr 15, 2020

pawanpraka1 commented Apr 16, 2020

w3aman commented Apr 22, 2020

w3aman commented Apr 26, 2020 • edited

pawanpraka1 commented Apr 29, 2020

w3aman commented Apr 26, 2020 •

edited