Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubelet CNI nsenter failure #42735

Closed
DreadPirateShawn opened this issue Mar 8, 2017 · 27 comments · Fixed by #46823
Closed

Kubelet CNI nsenter failure #42735

DreadPirateShawn opened this issue Mar 8, 2017 · 27 comments · Fixed by #46823
Assignees
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@DreadPirateShawn
Copy link

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:57:25Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:52:34Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: Bare metal
  • OS (e.g. from /etc/os-release): Ubuntu precise (12.04.4 LTS)
  • Kernel (e.g. uname -a): 3.13.0-55-generic
  • Install tools: N/A
  • Others: N/A

What happened:

During an otherwise-normal rollingupdate for a replication controller, we see this error in the logs:

E0307 21:43:51.235958    3650 docker_manager.go:373] NetworkPlugin cni failed on the status hook for pod 'foo-r3rg5' - Unexpected command output nsenter: cannot open /proc/5875/ns/net: No such file or directory\n with error: exit status 1

What you expected to happen:

Expected that successful rollingupdate wouldn't generate error-level logs without an error-level problem -- trying to determine what the error-level problem is.

How to reproduce it (as minimally and precisely as possible):

During our prod upgrade of various services, this occurred during 3 out of 70 rollingupdates.

Anything else we need to know:

Perhaps this is the same general issue as #25281? But I couldn't find any references to the "Unexpected command output nsenter" variation seen above, thus filing separately.

@calebamiles calebamiles added kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Mar 8, 2017
@calebamiles
Copy link
Contributor

Could someone from @kubernetes/sig-node-bugs and @kubernetes/sig-network-bugs please help with triage? Thanks!

@calebamiles calebamiles modified the milestone: v1.6 Mar 9, 2017
@dchen1107
Copy link
Member

dchen1107 commented Mar 10, 2017

This is a not a 1.6 blocker, and I saw those nsenter error above before which came from a small race window between pod status and cni execution. Those error logging is red herring.

Remove this from 1.6 milestone.

@dchen1107 dchen1107 removed this from the v1.6 milestone Mar 10, 2017
@dchen1107 dchen1107 added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. and removed kind/bug Categorizes issue or PR as related to a bug. labels Mar 10, 2017
@aespinosa
Copy link

This is also happening in v1.6.1 using the kubenet network plugin:

docker_sandbox.go:263] NetworkPlugin kubenet failed on the status hook for pod "kubernetes-dashboard-v1.4.0-3sf40_kube-system": Unexpected command output nsenter: cannot open : No such file or directory
 with error: exit status 1

@freehan
Copy link
Contributor

freehan commented Apr 19, 2017

Is there other symptoms besides the log? Like lost of pod IP or some sort?

IIUC, pod deletion and status check run in seperate go routines. If a pod sandbox just got stopped and status check happened right after that. Status check might failed with this log.

@aespinosa
Copy link

@freehan I see Pod's IPs still being available. I can run kubectl port-forward on them with no problem.

The problem i'm encountering is that I get 503s when accessing http services through the apiserver's proxy endpoints.

@freehan
Copy link
Contributor

freehan commented Apr 19, 2017

do you see this log line a lot? or during rolling update?
do you encounter 503s during rolling update? or regularly?

@aespinosa
Copy link

I see this line a lot regularly. I'm not even using rolling updates. I just have plain ReplicaSets / ReplicationControllers made. And since there's this kubenet failure, pods are being restarted by the RS/RC.

@freehan
Copy link
Contributor

freehan commented Apr 20, 2017

@aespinosa Sounds like your problem is different from the issue here. We need to know more about your setup. Need the output of following command:
ps aux|grep kubelet
ifconfig
what is your OS distro?
did kubelet restart?

Just to confirm, the pods are able to start. Later they got recreated? Pod IP changed right?

@aespinosa
Copy link

aespinosa commented Apr 20, 2017

kubelet:

/usr/bin/kubelet --api-servers=https://ip-xxxxxxxxx.ec2.internal --address=0.0.0.0 --port=10250 --healthz-bind-address=0.0.0.0 --allow-privileged=true --enable-debugging-handlers --pod-manifest-path=/etc/kubernetes/manifests --hostname-override=ip-172-xxx-xxx-xxxec2.internal --cloud-provider=aws --network-plugin=kubenet -v=2 --cluster-domain=cluster.local --cluster-dns=10.254.10.10

Distro:

NAME="Ubuntu"
VERSION="16.04.1 LTS (Xenial Xerus)"
kernel 4.4.0-59-generic

No the kubelet did not restart.

After a while, the pods are running now and able to start.

ifconfig:

cbr0      Link encap:Ethernet  HWaddr 0a:58:0a:c8:00:01
          inet addr:10.200.0.1  Bcast:0.0.0.0  Mask:255.255.255.0
          inet6 addr: fe80::d0b8:39ff:fe0a:a74b/64 Scope:Link
          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
          RX packets:2260610 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2064273 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:767108175 (767.1 MB)  TX bytes:1591836104 (1.5 GB)

docker0   Link encap:Ethernet  HWaddr 02:42:04:62:d6:52
          inet addr:172.17.0.1  Bcast:0.0.0.0  Mask:255.255.0.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

eth0      Link encap:Ethernet  HWaddr 0a:0e:57:f9:68:82
          inet addr:172.xxx.xxx.xxx  Bcast:172.xxx.xxx.255  Mask:255.255.255.0
          inet6 addr: fe80::80e:57ff:fef9:6882/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:2639600 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1891940 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2056653665 (2.0 GB)  TX bytes:208307886 (208.3 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:182 errors:0 dropped:0 overruns:0 frame:0
          TX packets:182 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:13472 (13.4 KB)  TX bytes:13472 (13.4 KB)

veth070781a7 Link encap:Ethernet  HWaddr 1a:9c:b5:b1:38:91
          inet6 addr: fe80::189c:b5ff:feb1:3891/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:49234 errors:0 dropped:0 overruns:0 frame:0
          TX packets:49559 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:4679403 (4.6 MB)  TX bytes:4571062 (4.5 MB)

veth5037dfc2 Link encap:Ethernet  HWaddr 7e:03:f3:4f:ce:21
          inet6 addr: fe80::7c03:f3ff:fe4f:ce21/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:360376 errors:0 dropped:0 overruns:0 frame:0
          TX packets:430115 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:35722738 (35.7 MB)  TX bytes:132507465 (132.5 MB)

veth6ef9d016 Link encap:Ethernet  HWaddr a2:5d:6a:8f:31:26
          inet6 addr: fe80::a05d:6aff:fe8f:3126/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:179035 errors:0 dropped:0 overruns:0 frame:0
          TX packets:186410 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:13641399 (13.6 MB)  TX bytes:646488737 (646.4 MB)

vethe31141aa Link encap:Ethernet  HWaddr ae:33:1f:4d:f1:c9
          inet6 addr: fe80::ac33:1fff:fe4d:f1c9/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:269398 errors:0 dropped:0 overruns:0 frame:0
          TX packets:280169 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:22880380 (22.8 MB)  TX bytes:92100483 (92.1 MB)

vethea64c677 Link encap:Ethernet  HWaddr b6:a4:3f:8a:5c:45
          inet6 addr: fe80::b4a4:3fff:fe8a:5c45/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:367614 errors:0 dropped:0 overruns:0 frame:0
          TX packets:424894 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:367594882 (367.5 MB)  TX bytes:361527394 (361.5 MB)

vethfc53db08 Link encap:Ethernet  HWaddr 42:22:6d:e0:ca:74
          inet6 addr: fe80::4022:6dff:fee0:ca74/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:52746 errors:0 dropped:0 overruns:0 frame:0
          TX packets:52859 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:10541869 (10.5 MB)  TX bytes:4704792 (4.7 MB)

@freehan
Copy link
Contributor

freehan commented Apr 20, 2017

After a while, the pods are running now and able to start.

Could you elaborate? What is the sequence of symptoms and operations?

@aespinosa
Copy link

I tried to delete and have the pods be resurrected by the replicaset. The pod has been created and was allocated a Pod IP successfully.

However, I still have problems of the proxy endpoints at the master not being able to be served. I'm using the aws cloud provider and I don't see my route tables being updated either.

@freehan
Copy link
Contributor

freehan commented Apr 21, 2017

Are you still able to reproduce the problem related to pod IP?

I am not familiar with aws cloud provider. cc @justinsb

@aespinosa
Copy link

Thanks for the help @freehan . I figured out that my problem from the controller-manager's logs. It was saying it can't update route tables because of duplicate matching route tables when querying AWS.

@DreadPirateShawn
Copy link
Author

@freehan So -- ignoring the sidebar issue @aespinosa had, what's the next step here? Is it to fix the small race window between pod status and cni execution?

(I'm not in a rush, just clarifying what to expect.)

@freehan
Copy link
Contributor

freehan commented May 2, 2017

@DreadPirateShawn
Besides the error log, is there any problem that stands out?

We need kubelet log to confirm it as a race between kubelet pod status sync and cni network plugin execution.

cc: @yujuhong

@yujuhong
Copy link
Contributor

yujuhong commented May 2, 2017

I am a bit lost in all these discussions. What exactly is the problem (besides the error message in the log) that we are dealing with?

@vishh
Copy link
Contributor

vishh commented May 12, 2017

I hit (possibly) the same issue. With a cluster from HEAD, kube-system not in host net namespace fails to come up

$ k get po -n kube-system | grep -v Running
NAME                                              READY     STATUS                                                                                                                                                                                                                                                                  RESTARTS   AGE
fluentd-gcp-v2.0-0flxx                            0/2       rpc error: code = 2 desc = failed to start container "15e07e324daa0f68793c454d81173c42b9c2e86d41b24236206756f2b55c7633": Error response from daemon: cannot join network of a non running container: 9f2593412ac541240a74ce445344e777b7fdb86c8ebc13520283a5e02f679f2e   528        12h
fluentd-gcp-v2.0-1h18l                            0/2       rpc error: code = 2 desc = failed to start container "a918404ab141350861cf653ed3b0b54924d0c8017316c45a45ab7f754065d521": Error response from daemon: cannot join network of a non running container: 3ac6abfa227e42665dcbe51aae62b97b924b5a9b0d9261529de806d6d627be7f   662        12h
fluentd-gcp-v2.0-5cdrh                            0/2       rpc error: code = 2 desc = failed to start container "68a7b104f66dbd2ad5874a8f89618b5d32af9f5b01e4114a8b750f513adced14": Error response from daemon: cannot join network of a non running container: 968a72050d2d9333837d1cd4dcaa645f5e21efd74e9745082b409803ce8da63e   384        12h
fluentd-gcp-v2.0-q73tm                            0/2       rpc error: code = 2 desc = failed to start container "3910156c8c6e712129bcc60bd8d9ff1c13bdb403cba18b0171a1d3da5ae8f434": Error response from daemon: cannot join network of a non running container: bf4d20fe360d4c1cf25987875e3b45642cd9317d39c51749b15b7d3b4fcbdf33   72         12h
heapster-v1.3.0-2598322454-1trbt                  0/4       rpc error: code = 2 desc = failed to start container "7d831a61aaf12752f790b86a6e004f3e40814670f8aa781eb24532c379cdd53c": Error response from daemon: cannot join network of a non running container: a5122aa07a1f027e1e03733c442d082162e6bd30534e326b1550ac7843e3ce64   424        12h
kube-dns-2473071502-08smf                         0/3       rpc error: code = 2 desc = failed to start container "b443d0ccfb753b32d304bf4ae281b85d8e146f6e9592eb253438e5a12b247f3f": Error response from daemon: cannot join network of a non running container: a4cf42c8f65e913e5cc9edea70ef7721f592adda5510341f508944d71435167d   84         12h
kube-dns-autoscaler-2042644734-6vz7d              0/1       rpc error: code = 2 desc = failed to start container "e0279bdab408ae39ba53abecd111001b06e051fa511d6b77df9a07a727776f31": Error response from daemon: cannot join network of a non running container: 6cb2f4de5dfde918d50e47c85fe15a3a59ea684a8bff307f8839997409738e41   115        12h
kubernetes-dashboard-638781211-v0q5l              0/1       rpc error: code = 2 desc = failed to start container "eea90aeccef3b30feab78d7dda50f93952ae10544032445eb7406555b2345f0f": Error response from daemon: cannot join network of a non running container: 09ab2bd4b25161151f0c26d83af1fb4d968b25e58870c640969bc8d8ee48e802   38         12h
l7-default-backend-1599978672-st6tj               0/1       rpc error: code = 2 desc = failed to start container "274ac2867f40a5275cee5f3959b6e05166d27fab9944f33abe829576f15c7b66": Error response from daemon: cannot join network of a non running container: 24d2dcf060bb7075efeeebf20188f43391e7fdefe2560156b6ea91047954fb28   79         12h
monitoring-influxdb-grafana-v4-mbtgv              0/2       rpc error: code = 2 desc = failed to start container "da19ad4133c194671b85874afd327879bf32ee869ab095160d29990d0377d518": Error response from daemon: cannot join network of a non running container: f613d4805b5d9a681461b203f7140a684baada3229f87668e2ed98494648ab13   74         12h

Kubelet logs has the following warning messages, which the infra container being restarted in a loop

May 12 16:26:05 e2e-test-vishnuk-minion-group-tjr0 kubelet[1272]: W0512 16:26:05.488409    1272 pod_container_deletor.go:77] Container "2b9990b748a79c5eb711bedb61e7eb0c3767d81692f1d70933216ced7465542a" not found in pod's containers
May 12 16:26:05 e2e-test-vishnuk-minion-group-tjr0 kubelet[1272]: W0512 16:26:05.494757    1272 docker_sandbox.go:285] NetworkPlugin kubenet failed on the status hook for pod "kube-dns-autoscaler-2042644734-6vz7d_kube-system": Cannot find the network namespace, skipping pod network status for container {"docker" "faf66266a7e06c4e1e9f8ff99fb3ac133cd91f488b22747fe1d49e92a27c7108"}
May 12 16:26:05 e2e-test-vishnuk-minion-group-tjr0 kubelet[1272]: W0512 16:26:05.496795    1272 docker_sandbox.go:285] NetworkPlugin kubenet failed on the status hook for pod "kube-dns-autoscaler-2042644734-6vz7d_kube-system": Cannot find the network namespace, skipping pod network status for container {"docker" "1ebeef7ec432f322c949363a937c9647674825fd9a120e5768ee5eacf0c23f75"}
May 12 16:26:05 e2e-test-vishnuk-minion-group-tjr0 kubelet[1272]: W0512 16:26:05.498753    1272 docker_sandbox.go:285] NetworkPlugin kubenet failed on the status hook for pod "kube-dns-autoscaler-2042644734-6vz7d_kube-system": Cannot find the network namespace, skipping pod network status for container {"docker" "a43ca215854a316e46b62c0737539923638262059112336f706da251b86b1149"}
May 12 16:26:05 e2e-test-vishnuk-minion-group-tjr0 kubelet[1272]: W0512 16:26:05.500791    1272 docker_sandbox.go:285] NetworkPlugin kubenet failed on the status hook for pod "kube-dns-autoscaler-2042644734-6vz7d_kube-system": Cannot find the network namespace, skipping pod network status for container {"docker" "d439fb1bf3d99c0438023e3548d6ab5ead3c37f6d4bb9436f0d5ddf3928bdc46"}
May 12 16:26:05 e2e-test-vishnuk-minion-group-tjr0 kubelet[1272]: W0512 16:26:05.502725    1272 docker_sandbox.go:285] NetworkPlugin kubenet failed on the status hook for pod "kube-dns-autoscaler-2042644734-6vz7d_kube-system": Cannot find the network namespace, skipping pod network status for container {"docker" "7ef2d537193d765e9b9dbfb57fab28328e9029f2450ff55aa44015958f705a6c"}
May 12 16:26:05 e2e-test-vishnuk-minion-group-tjr0 kubelet[1272]: W0512 16:26:05.504724    1272 docker_sandbox.go:285] NetworkPlugin kubenet failed on the status hook for pod "kube-dns-autoscaler-2042644734-6vz7d_kube-system": Cannot find the network namespace, skipping pod network status for container {"docker" "b2acb2fae37ebe3a876039fcaeff4236055ea055214a8b17af85e6b472b85e20"}
May 12 16:26:05 e2e-test-vishnuk-minion-group-tjr0 kubelet[1272]: W0512 16:26:05.506781    1272 docker_sandbox.go:285] NetworkPlugin kubenet failed on the status hook for pod "kube-dns-autoscaler-2042644734-6vz7d_kube-system": Cannot find the network namespace, skipping pod network status for container {"docker" "ee915d5a6e2207432f5c379f310be765f6b1f1b40b26a2035ee025c912da9f0c"}
May 12 16:26:05 e2e-test-vishnuk-minion-group-tjr0 kubelet[1272]: W0512 16:26:05.509020    1272 docker_sandbox.go:285] NetworkPlugin kubenet failed on the status hook for pod "kube-dns-autoscaler-2042644734-6vz7d_kube-system": Cannot find the network namespace, skipping pod network status for container {"docker" "33ef9686a51d190adf2883553593f9513dcc88f904ee7b51629018a800408b3e"}
May 12 16:26:05 e2e-test-vishnuk-minion-group-tjr0 kubelet[1272]: I0512 16:26:05.513086    1272 generic.go:343] PLEG: Write status for kube-dns-autoscaler-2042644734-6vz7d/kube-system: &container.PodStatus{ID:"117537e6-36de-11e7-9733-42010af00002", Name:"kube-dns-autoscaler-2042644734-6vz7d", Namespace:"kube-system", IP:"", ContainerStatuses:[]*container.ContainerStatus{(*container.ContainerStatus)(0xc4217e89a0)}, SandboxStatuses:[]*runtime.PodSandboxStatus{(*runtime.PodSandboxStatus)(0xc42254bb80), (*runtime.PodSandboxStatus)(0xc42273c050), (*runtime.PodSandboxStatus)(0xc42273c460), (*runtime.PodSandboxStatus)(0xc42273c910), (*runtime.PodSandboxStatus)(0xc42273cd20), (*runtime.PodSandboxStatus)(0xc42273d180), (*runtime.PodSandboxStatus)(0xc420b86f50), (*runtime.PodSandboxStatus)(0xc42166e320)}} (err: <nil>)
May 12 16:26:05 e2e-test-vishnuk-minion-group-tjr0 kubelet[1272]: W0512 16:26:05.516551    1272 pod_container_deletor.go:77] Container "faf66266a7e06c4e1e9f8ff99fb3ac133cd91f488b22747fe1d49e92a27c7108" not found in pod's containers

@a-chernykh
Copy link

Just bumped image versions to 2.1.5 in addons/networking.projectcalico.org/k8s-1.6.yaml manually and it started to work correctly. I see that Calico has been already upgraded to 2.1.5 in master, so we just need to wait for new release.

@thockin thockin added this to the v1.7 milestone May 27, 2017
@dcbw
Copy link
Member

dcbw commented Jun 1, 2017

#43879 may help this, though it has been reverted and needs some fixups.

@thockin
Copy link
Member

thockin commented Jun 9, 2017

Update?

@hfrog
Copy link

hfrog commented Jun 9, 2017

I've seen the same errors in my cluster after turning on RBAC. The reason is default calico.yaml (http://docs.projectcalico.org/v2.2/getting-started/kubernetes/installation/hosted/calico.yaml) lacks of cluster roles and bindings. After applying https://github.com/projectcalico/calico/blob/master/master/getting-started/kubernetes/installation/rbac.yaml all is ok.
Hope this helps

@saschagrunert
Copy link
Member

I am not sure if this issue is related to RBAC since it also occurs then not using calico but cni networking via flannel. I found out that the cni Network interface got some sort of disconnect (via dmesg), then reconnects and after that the nsenter failures occurred.

@bowei
Copy link
Member

bowei commented Jun 13, 2017

@dchen1107 move to next milestone

Reading through the issue, there seem to be a couple of different intersecting issues going on here, it seems like it needs some more investigation.

@matchstick
Copy link
Contributor

This feels like an umbrella issue at this point. It keeps redefining itself.
@DreadPirateShawn I am moving out of 1.7 if that is inappropriate please have us re-add milestone 1.7.

@matchstick matchstick modified the milestones: next-candidate, v1.7 Jun 13, 2017
@murhum1
Copy link

murhum1 commented Jun 13, 2017

I'm also getting this every now and then, and when it starts happening, the only workaround seems to be tearing down the cluster and bringing it back up. At least in my case it seems to be related to resource limits. I'm trying to run a pod both with and without a resource limit - every time I specify a memory limit, the pod is stuck at ContainerCreating, and every time I comment them out it starts Running without a hitch.

Here's some logs from the kubelet trying to run the resource limited pods. I'm running with a kubeadm install + flannel.

EDIT: kubeadm, kubectl version 1.6.4

Jun 13 14:43:52 my.server.com kubelet[25154]: with error: exit status 1
Jun 13 14:43:52 my.server.com kubelet[25154]: W0613 14:43:52.344639   25154 docker_sandbox.go:263] NetworkPlugin cni failed on the status hook for pod "scala-test-bqr59_grader": Unexpected command output nsenter: cannot open : No such file or directory
Jun 13 14:43:52 my.server.com kubelet[25154]: with error: exit status 1
Jun 13 14:43:52 my.server.com kubelet[25154]: W0613 14:43:52.365529   25154 docker_sandbox.go:263] NetworkPlugin cni failed on the status hook for pod "scala-test-bqr59_grader": Unexpected command output nsenter: cannot open : No such file or directory
Jun 13 14:43:52 my.server.com kubelet[25154]: with error: exit status 1
Jun 13 14:43:52 my.server.com kubelet[25154]: W0613 14:43:52.387687   25154 docker_sandbox.go:263] NetworkPlugin cni failed on the status hook for pod "scala-test-bqr59_grader": Unexpected command output nsenter: cannot open : No such file or directory
Jun 13 14:43:52 my.server.com kubelet[25154]: with error: exit status 1
Jun 13 14:43:52 my.server.com kubelet[25154]: W0613 14:43:52.914538   25154 container.go:352] Failed to create summary reader for "/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6d5940c2_502d_11e7_9b32_005056940720.slice/docker-875a65dd3e3c4e1e6919ce34e4d125699f5e64db8cdd804626a6eac38802b640.scope": none of the resources are being tracked.
Jun 13 14:43:52 my.server.com kubelet[25154]: E0613 14:43:52.915062   25154 remote_runtime.go:86] RunPodSandbox from runtime service failed: rpc error: code = 2 desc = failed to start sandbox container for pod "scala-test-bqr59": Error response from daemon: {"message":"invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:286: decoding sync type from init pipe caused \\\\\\\"read parent: connection reset by peer\\\\\\\"\\\"\\n\""}
Jun 13 14:43:52 my.server.com kubelet[25154]: E0613 14:43:52.915133   25154 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "scala-test-bqr59_grader(6d5940c2-502d-11e7-9b32-005056940720)" failed: rpc error: code = 2 desc = failed to start sandbox container for pod "scala-test-bqr59": Error response from daemon: {"message":"invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:286: decoding sync type from init pipe caused \\\\\\\"read parent: connection reset by peer\\\\\\\"\\\"\\n\""}

@yujuhong
Copy link
Contributor

FWIW, we investigated @vishh's cluster in #42735 (comment), and found that it was not relevant to this thread, but forgot to come back and update.

k8s-github-robot pushed a commit that referenced this issue Jun 13, 2017
Automatic merge from submit-queue (batch tested with PRs 46441, 43987, 46921, 46823, 47276)

kubelet/network: report but tolerate errors returned from GetNetNS() v2

Runtimes should never return "" and nil errors, since network plugin
drivers need to treat netns differently in different cases. So return
errors when we can't get the netns, and fix up the plugins to do the
right thing.

Namely, we don't need a NetNS on pod network teardown. We do need
a netns for pod Status checks and for network setup.

V2: don't return errors from getIP(), since they will block pod status :(  Just log them.  But even so, this still fixes the original problem by ensuring we don't log errors when the network isn't ready.

@freehan @yujuhong 

Fixes: #42735
Fixes: #44307
@DreadPirateShawn
Copy link
Author

@matchstick Yeah, the umbrella drift was unfortunate. In the original reply by @dchen1107 he said the original ticket was due to "a small race window between pod status and cni execution."

Is there a ticket / action item for that particular issue, which I originally reported?

(Or am I misunderstanding, is the original issue inseparable from the other issues people added to the ticket?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

Successfully merging a pull request may close this issue.