Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volume mount/unmount errors saying "wait: no child processes" #103753

Closed
shyamjvs opened this issue Jul 17, 2021 · 4 comments · Fixed by #103780
Closed

Volume mount/unmount errors saying "wait: no child processes" #103753

shyamjvs opened this issue Jul 17, 2021 · 4 comments · Fixed by #103780
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/storage Categorizes an issue or PR as relevant to SIG Storage. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@shyamjvs
Copy link
Member

shyamjvs commented Jul 17, 2021

While investigating some slowness in mounting/unmounting operations of PVs (in this case EFS volumes) for pods, we observed errors like these:

Jun 25 06:57:02 ip-172-18-228-96.ec2.internal kubelet[4955]: E0625 06:57:02.532284 4955 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/efs.csi.aws.com^fs-81d53375:data/d8d2edcc58a94cd1baaa7ae7494ef6ab podName:709ca80b-6a3c-4885-88a3-4acbd087595a nodeName:}" failed. No retries permitted until 2021-06-25 06:57:03.032234742 +0000 UTC m=+138521.856402465 (durationBeforeRetry 500ms). Error: "UnmountVolume.TearDown failed for volume \"shared-files\" (UniqueName: \"kubernetes.io/csi/efs.csi.aws.com^fs-81d53375:data/d8d2edcc58a94cd1baaa7ae7494ef6ab\") pod \"709ca80b-6a3c-4885-88a3-4acbd087595a\" (UID: \"709ca80b-6a3c-4885-88a3-4acbd087595a\") : kubernetes.io/csi: mounter.TearDownAt failed: rpc error: code = Internal desc = Could not unmount \"/var/lib/kubelet/pods/709ca80b-6a3c-4885-88a3-4acbd087595a/volumes/kubernetes.io~csi/7490a2ae-ae45-4389-b66d-beef79b2723e-files/mount\": unmount failed: wait: no child processes\nUnmounting arguments: /var/lib/kubelet/pods/709ca80b-6a3c-4885-88a3-4acbd087595a/volumes/kubernetes.io~csi/7490a2ae-ae45-4389-b66d-beef79b2723e-files/mount\nOutput: "

fortunately it does recover quickly on Kubelet’s next retry:

Jun 25 06:57:03 ip-172-18-228-96.ec2.internal kubelet[4955]: I0625 06:57:03.109085 4955 reconciler.go:183] operationExecutor.UnmountVolume started for volume "shared-files" (UniqueName: "kubernetes.io/csi/efs.csi.aws.com^fs-81d53375:data/d8d2edcc58a94cd1baaa7ae7494ef6ab") pod "709ca80b-6a3c-4885-88a3-4acbd087595a" (UID: "709ca80b-6a3c-4885-88a3-4acbd087595a")
Jun 25 06:57:03 ip-172-18-228-96.ec2.internal kubelet[4955]: I0625 06:57:03.120211 4955 operation_generator.go:713] UnmountVolume.TearDown succeeded for volume "kubernetes.io/csi/efs.csi.aws.com^fs-81d53375:data/d8d2edcc58a94cd1baaa7ae7494ef6ab" (OuterVolumeSpecName: "shared-files") pod "709ca80b-6a3c-4885-88a3-4acbd087595a" (UID: "709ca80b-6a3c-4885-88a3-4acbd087595a"). InnerVolumeSpecName "7490a2ae-ae45-4389-b66d-beef79b2723e-files". PluginName "kubernetes.io/csi", VolumeGidValue ""

Looking through the Kubelet code I found out that the issue is not related to the umount calls themselves, but rather a race condition inside this exec command where we call umount. Because of the way golang’s exec::Run function (called inside exec::CombinedOutput) is implemented, it waits for the child process to finish after it is started. Since there isn't any locking on the child process across those function calls, it's possible that the child process exits event before the Wait() call starts. In such case the wait call returns with a no child processes error (ECHILD).

This error is more likely to happen especially if the gap between the Start() and Wait() calls is more (which could for e,g happen if the goroutine had CPU cycles snatched away from it for some time in between those calls). A similar issue also exists in the mount codepath.

While this is a self-recovering issue (because of Kubelet retries), it does seem to increase the pod startup/cleanup latency upto a 1s.

Open to suggestions for a fix. The simplest one occurring to me is to treat the wait: no child processes error as a special one on which we don't actually return an error for the mount/unmount calls.

/sig storage
/sig scalability
/assign

@shyamjvs shyamjvs added the kind/bug Categorizes issue or PR as related to a bug. label Jul 17, 2021
@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 17, 2021
@shyamjvs
Copy link
Member Author

Another thing is when analyzing a distribution of such errors across different nodes, we observed a skew. The top 10% of the nodes were seeing more than 90% of these errors. And these are nodes that are running hotter (w.r.t the pod churn) compared to others:

Screen Shot 2021-06-28 at 4 10 44 PM (2)

Given this, I'm wondering if those errors are higher on those nodes just because of statistical significance or other causes like more GC pauses happening on those nodes.

@shyamjvs
Copy link
Member Author

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 17, 2021
@shyamjvs
Copy link
Member Author

^ I've opened this PR to fix the Mount/Unmount functions in the mount_linux.go file.

In general we might want to make a similar fix to other invocations of exec.Run() and exec.CombinedOutput() within our codebase. But that'll be a bigger change. Happy to open another issue for it.

@shyamjvs
Copy link
Member Author

@liggitt Sorry for opening the PR against wrong repo, here's the right one - #103780

ctalledo added a commit to nestybox/fuse that referenced this issue Nov 23, 2021
Due to a race-condition in Golang, it's possible that a cmd.Start() may result
in a "no child process" error even though the child process was created and in
fact succeeded. See kubernetes/kubernetes#103753 for
full context.

This change causes the code to ignore "no child process" errors if the
sub-command process succeeds.

Signed-off-by: Cesar Talledo <ctalledo@nestybox.com>
ctalledo added a commit to nestybox/fuse that referenced this issue Nov 23, 2021
Due to a race-condition in Golang, it's possible that a cmd.Start() may result
in a "no child process" error even though the child process was created and in
fact succeeded. See kubernetes/kubernetes#103753 for
full context.

This change causes the code to ignore "no child process" errors if the
sub-command process succeeds.

Signed-off-by: Cesar Talledo <ctalledo@nestybox.com>
ctalledo added a commit to nestybox/fuse that referenced this issue Nov 23, 2021
Due to a race-condition in Golang, it's possible that a cmd.Start() may result
in a "no child process" error even though the child process was created and in
fact succeeded. See kubernetes/kubernetes#103753 for
full context.

This change causes the code to ignore "no child process" errors if the
sub-command process succeeds.

Signed-off-by: Cesar Talledo <ctalledo@nestybox.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/storage Categorizes an issue or PR as relevant to SIG Storage. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants