New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Volume mount/unmount errors saying "wait: no child processes" #103753
Comments
|
Another thing is when analyzing a distribution of such errors across different nodes, we observed a skew. The top 10% of the nodes were seeing more than 90% of these errors. And these are nodes that are running hotter (w.r.t the pod churn) compared to others: Given this, I'm wondering if those errors are higher on those nodes just because of statistical significance or other causes like more GC pauses happening on those nodes. |
|
/triage accepted |
|
^ I've opened this PR to fix the Mount/Unmount functions in the mount_linux.go file. In general we might want to make a similar fix to other invocations of exec.Run() and exec.CombinedOutput() within our codebase. But that'll be a bigger change. Happy to open another issue for it. |
Due to a race-condition in Golang, it's possible that a cmd.Start() may result in a "no child process" error even though the child process was created and in fact succeeded. See kubernetes/kubernetes#103753 for full context. This change causes the code to ignore "no child process" errors if the sub-command process succeeds. Signed-off-by: Cesar Talledo <ctalledo@nestybox.com>
Due to a race-condition in Golang, it's possible that a cmd.Start() may result in a "no child process" error even though the child process was created and in fact succeeded. See kubernetes/kubernetes#103753 for full context. This change causes the code to ignore "no child process" errors if the sub-command process succeeds. Signed-off-by: Cesar Talledo <ctalledo@nestybox.com>
Due to a race-condition in Golang, it's possible that a cmd.Start() may result in a "no child process" error even though the child process was created and in fact succeeded. See kubernetes/kubernetes#103753 for full context. This change causes the code to ignore "no child process" errors if the sub-command process succeeds. Signed-off-by: Cesar Talledo <ctalledo@nestybox.com>

While investigating some slowness in mounting/unmounting operations of PVs (in this case EFS volumes) for pods, we observed errors like these:
Jun 25 06:57:02 ip-172-18-228-96.ec2.internal kubelet[4955]: E0625 06:57:02.532284 4955 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/efs.csi.aws.com^fs-81d53375:data/d8d2edcc58a94cd1baaa7ae7494ef6ab podName:709ca80b-6a3c-4885-88a3-4acbd087595a nodeName:}" failed. No retries permitted until 2021-06-25 06:57:03.032234742 +0000 UTC m=+138521.856402465 (durationBeforeRetry 500ms). Error: "UnmountVolume.TearDown failed for volume \"shared-files\" (UniqueName: \"kubernetes.io/csi/efs.csi.aws.com^fs-81d53375:data/d8d2edcc58a94cd1baaa7ae7494ef6ab\") pod \"709ca80b-6a3c-4885-88a3-4acbd087595a\" (UID: \"709ca80b-6a3c-4885-88a3-4acbd087595a\") : kubernetes.io/csi: mounter.TearDownAt failed: rpc error: code = Internal desc = Could not unmount \"/var/lib/kubelet/pods/709ca80b-6a3c-4885-88a3-4acbd087595a/volumes/kubernetes.io~csi/7490a2ae-ae45-4389-b66d-beef79b2723e-files/mount\": unmount failed: wait: no child processes\nUnmounting arguments: /var/lib/kubelet/pods/709ca80b-6a3c-4885-88a3-4acbd087595a/volumes/kubernetes.io~csi/7490a2ae-ae45-4389-b66d-beef79b2723e-files/mount\nOutput: "fortunately it does recover quickly on Kubelet’s next retry:
Looking through the Kubelet code I found out that the issue is not related to the umount calls themselves, but rather a race condition inside this exec command where we call umount. Because of the way golang’s exec::Run function (called inside exec::CombinedOutput) is implemented, it waits for the child process to finish after it is started. Since there isn't any locking on the child process across those function calls, it's possible that the child process exits event before the Wait() call starts. In such case the wait call returns with a
no child processeserror (ECHILD).This error is more likely to happen especially if the gap between the Start() and Wait() calls is more (which could for e,g happen if the goroutine had CPU cycles snatched away from it for some time in between those calls). A similar issue also exists in the mount codepath.
While this is a self-recovering issue (because of Kubelet retries), it does seem to increase the pod startup/cleanup latency upto a 1s.
Open to suggestions for a fix. The simplest one occurring to me is to treat the
wait: no child processeserror as a special one on which we don't actually return an error for the mount/unmount calls./sig storage
/sig scalability
/assign
The text was updated successfully, but these errors were encountered: