New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky in-tree local volume plugin tests #74229
Comments
/sig storage |
The problem is when client is deleting first pod, it is deleted immediately.
This is because when pod status phase is
If we want to wait for the pod to be deleted by kubelet instead of apiserver, we need to make it a long-running pod. Alternatively, we can make the two pods use the same fsgroup as you said. cc @msau42 What do you think? |
cc @dashpole is this expected? I thought kubelet is supposed to delete the pod object only after the volumes are unmounted. This also impacts the StorageProtection feature, which assumes that if the Pod object is deleted, then the volume has been unmounted. For a storage backend like nfs, it may result in unmount hanging. cc @kubernetes/sig-storage-bugs @cofyc to fix the flakiness, the simplest thing may be to make the two pods use the same fsgroup. I don't see a strong use case for two pods using different fsgroups. |
That is only during graceful deletion. If a pod is Succeeded, do we begin unmounting volumes? Or do we wait until the deletion timestamp is set? Would it be good enough to start unmounting when the pod is failed/succeeded? If we want to be 100% sure this isn't a problem, we would need to always do graceful deletion, even when the pod is failed/succeeded. |
Yes, we do start unmounting when pod is succeeded or failed. The API is setting grace period to 0 in the case of pod success or failed. Maybe that needs to be removed if we want to use Pod deletion as a signal that volumes are unmounted. |
To always do graceful deletion even when the pod is failed/succeeded is simple, then we can assume that normally when pod objects are deleted (except forcefully), volumes are already unmounted by kubelet. But this feature is added long ago, and it helps to improve performance in deleting jobs in batch. For this flaky test only, what I need is to make sure volumes are unmounted before the second pod is going to mount them, but local volumes are not reported in Possible workarounds:
I'm going to use same fsGroup for both pods until we have a reliable way to detect volumes are unmounted or not. |
Which jobs are failing:
https://gubernator.k8s.io/build/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce/34994
Which test(s) are failing:
[sig-storage] In-tree Volumes [Driver: local][LocalVolumeType: block] [Testpattern: Pre-provisioned PV (default fs)] volumes should be mountable
Since when has it been failing:
Testgrid link:
https://testgrid.k8s.io/sig-storage-kubernetes#gce
Reason for failure:
Volume of previous pod is not unmounted before the second pod is trying to mount its volume.
Anything else we need to know:
The text was updated successfully, but these errors were encountered: