Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Race condition / timeout issues when mounting PV (AWS EBS via CSI driver) #76568
The mount of the volume, a 1.5 TB EBS disk with ext4 on it in this case, was reported to be failing though.
At first this error seemed solid, but I could not find a reason - The volume was attached, reported as NVME device and was also mounted at the OS level even while the failed mount was still reported and the pod was pending.
I attempted to re-schedule over and other again to other (new) nodes and even attempted a simple reboot of one instance.
This is the eventlog of the pod:
And also when looking at the Kubelet logs (after rebooting this node) it seems not to be happy:
Finally after man attempts to try and "fix" this by rescheduling we waited a few minuted and all of a sudden the volume was "accepted" as mounted and the pod starts up just fine.
The only line regarding any kernel activity during the time-frame of the other logs I could find was:
What you expected to happen:
I expected to either get a clear error message about a failed mount command or a successfully mounted volume to start a pod.
How to reproduce it (as minimally and precisely as possible):
This seems to be a timing issue or even race condition of some sort. It boils down to a 1.5TB EBS volume, ext4 formatted that is mounted to a directory of a pod.
Anything else we need to know?: