New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Error adding network: failed to Statfs" during pod startup #72044
Comments
Note: This is structurally very similar to #57465. However, that issue has been fixed, and was during pod teardown. The similarity (from my perspective) is that both that issue, and this one, contain the same core "network: failed to Statfs" error, simply at the opposite ends of the pod lifecycle. |
/sig network |
What CNI driver are you using? @dcbw because |
Calico CNI 3.1.3 |
We're seeing this frequently with CronJobs pods also. |
We're seeing this on OpenShift Origin/OKD with the ovs-multitenant plugin. We've only seen it with CronJob pods, as far as I can tell from our logs. |
Just had that issue with a |
@caseydavenport @dcbw If you aren't able to handle this issue, consider unassigning yourself and/or adding the 🤖 I am a bot run by vllry. 👩🔬 |
1 similar comment
@caseydavenport @dcbw If you aren't able to handle this issue, consider unassigning yourself and/or adding the 🤖 I am a bot run by vllry. 👩🔬 |
This subset of logs is interesting - it seems to be doing the following:
It feels to me that this is a race between k8s deciding to tear down a pod (thus deleting the netns), and the CNI ADD call initiated right before - the CNI plugin of course is expecting the netns to exist. /kind bug |
The fact that this seems to be happening exclusively with CronJob pods might be a clue - wonder what is special about them. For those who are seeing this - does this resolve itself? Or are the same pods hitting this scenario consistently? Probably worth pulling in @kubernetes/sig-node-bugs as well. /sig node |
I upgraded to CNI 1.4.1 and EKS/kubernetes 1.13 and haven't seen this error anymore. |
@DreadPirateShawn @blakebarnett @clcollins if you get a chance to try this on 1.13+, that would be useful to tease out if this may have already been fixed in a newer version of Kubernetes. |
Has anyone found a fix for this? We're experiencing this on our openshift 3.11 on-prem cluster with cron jobs |
Getting this on cronjobs w/ Kubernetes 1.12.10. It seems to happen randomly to some nodes, after they've been running for a few weeks, and we can fix it by either restarting the node or terminating it. I've attached kubelet output from a typical example cronjob that failed. |
We've encountered this issue with CronJob on 1.15.7-gke.23 |
I've encountered this error with Argo Cron Workflows using k8s v1.17.4 and calico v3.13.1 |
/unassign |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
We are finding issues with OpenShift as well. Any potential fix ? k8s : 1.11 |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened:
During pod creation, we sometimes see the following error-level kubelet logs:
What you expected to happen:
Clean pod startup, no error-level logs.
How to reproduce it (as minimally and precisely as possible):
We've got about 5 kube crons that run every 5 minutes, across 13 regions in production -- so 18720 related pods daily -- and we see over a thousand related errors during pod creation each day.
Anything else we need to know?:
Relevant kubelet log context of a cron pod that manifested this error:
Controller manager for the pod:
Cron yaml:
Environment:
kubectl version
):uname -a
): 4.4.0-116-generic PreStart and PostStop event hooks #140-Ubuntu/kind bug
The text was updated successfully, but these errors were encountered: