Kubelet CNI nsenter failure #42735

DreadPirateShawn · 2017-03-08T17:06:53Z

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:57:25Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:52:34Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

Cloud provider or hardware configuration: Bare metal
OS (e.g. from /etc/os-release): Ubuntu precise (12.04.4 LTS)
Kernel (e.g. uname -a): 3.13.0-55-generic
Install tools: N/A
Others: N/A

What happened:

During an otherwise-normal rollingupdate for a replication controller, we see this error in the logs:

E0307 21:43:51.235958    3650 docker_manager.go:373] NetworkPlugin cni failed on the status hook for pod 'foo-r3rg5' - Unexpected command output nsenter: cannot open /proc/5875/ns/net: No such file or directory\n with error: exit status 1

What you expected to happen:

Expected that successful rollingupdate wouldn't generate error-level logs without an error-level problem -- trying to determine what the error-level problem is.

How to reproduce it (as minimally and precisely as possible):

During our prod upgrade of various services, this occurred during 3 out of 70 rollingupdates.

Anything else we need to know:

Perhaps this is the same general issue as #25281? But I couldn't find any references to the "Unexpected command output nsenter" variation seen above, thus filing separately.

The text was updated successfully, but these errors were encountered:

calebamiles · 2017-03-08T18:18:30Z

Could someone from @kubernetes/sig-node-bugs and @kubernetes/sig-network-bugs please help with triage? Thanks!

dchen1107 · 2017-03-10T18:20:34Z

This is a not a 1.6 blocker, and I saw those nsenter error above before which came from a small race window between pod status and cni execution. Those error logging is red herring.

Remove this from 1.6 milestone.

aespinosa · 2017-04-19T02:17:57Z

This is also happening in v1.6.1 using the kubenet network plugin:

docker_sandbox.go:263] NetworkPlugin kubenet failed on the status hook for pod "kubernetes-dashboard-v1.4.0-3sf40_kube-system": Unexpected command output nsenter: cannot open : No such file or directory
 with error: exit status 1

freehan · 2017-04-19T17:33:45Z

Is there other symptoms besides the log? Like lost of pod IP or some sort?

IIUC, pod deletion and status check run in seperate go routines. If a pod sandbox just got stopped and status check happened right after that. Status check might failed with this log.

aespinosa · 2017-04-19T17:51:13Z

@freehan I see Pod's IPs still being available. I can run kubectl port-forward on them with no problem.

The problem i'm encountering is that I get 503s when accessing http services through the apiserver's proxy endpoints.

freehan · 2017-04-19T18:28:02Z

do you see this log line a lot? or during rolling update?
do you encounter 503s during rolling update? or regularly?

aespinosa · 2017-04-19T19:52:35Z

I see this line a lot regularly. I'm not even using rolling updates. I just have plain ReplicaSets / ReplicationControllers made. And since there's this kubenet failure, pods are being restarted by the RS/RC.

freehan · 2017-04-20T20:17:52Z

@aespinosa Sounds like your problem is different from the issue here. We need to know more about your setup. Need the output of following command:
ps aux|grep kubelet
ifconfig
what is your OS distro?
did kubelet restart?

Just to confirm, the pods are able to start. Later they got recreated? Pod IP changed right?

aespinosa · 2017-04-20T23:06:08Z

kubelet:

/usr/bin/kubelet --api-servers=https://ip-xxxxxxxxx.ec2.internal --address=0.0.0.0 --port=10250 --healthz-bind-address=0.0.0.0 --allow-privileged=true --enable-debugging-handlers --pod-manifest-path=/etc/kubernetes/manifests --hostname-override=ip-172-xxx-xxx-xxxec2.internal --cloud-provider=aws --network-plugin=kubenet -v=2 --cluster-domain=cluster.local --cluster-dns=10.254.10.10

Distro:

NAME="Ubuntu"
VERSION="16.04.1 LTS (Xenial Xerus)"
kernel 4.4.0-59-generic

No the kubelet did not restart.

After a while, the pods are running now and able to start.

ifconfig:

cbr0      Link encap:Ethernet  HWaddr 0a:58:0a:c8:00:01
          inet addr:10.200.0.1  Bcast:0.0.0.0  Mask:255.255.255.0
          inet6 addr: fe80::d0b8:39ff:fe0a:a74b/64 Scope:Link
          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
          RX packets:2260610 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2064273 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:767108175 (767.1 MB)  TX bytes:1591836104 (1.5 GB)

docker0   Link encap:Ethernet  HWaddr 02:42:04:62:d6:52
          inet addr:172.17.0.1  Bcast:0.0.0.0  Mask:255.255.0.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

eth0      Link encap:Ethernet  HWaddr 0a:0e:57:f9:68:82
          inet addr:172.xxx.xxx.xxx  Bcast:172.xxx.xxx.255  Mask:255.255.255.0
          inet6 addr: fe80::80e:57ff:fef9:6882/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:2639600 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1891940 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:2056653665 (2.0 GB)  TX bytes:208307886 (208.3 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:182 errors:0 dropped:0 overruns:0 frame:0
          TX packets:182 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:13472 (13.4 KB)  TX bytes:13472 (13.4 KB)

veth070781a7 Link encap:Ethernet  HWaddr 1a:9c:b5:b1:38:91
          inet6 addr: fe80::189c:b5ff:feb1:3891/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:49234 errors:0 dropped:0 overruns:0 frame:0
          TX packets:49559 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:4679403 (4.6 MB)  TX bytes:4571062 (4.5 MB)

veth5037dfc2 Link encap:Ethernet  HWaddr 7e:03:f3:4f:ce:21
          inet6 addr: fe80::7c03:f3ff:fe4f:ce21/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:360376 errors:0 dropped:0 overruns:0 frame:0
          TX packets:430115 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:35722738 (35.7 MB)  TX bytes:132507465 (132.5 MB)

veth6ef9d016 Link encap:Ethernet  HWaddr a2:5d:6a:8f:31:26
          inet6 addr: fe80::a05d:6aff:fe8f:3126/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:179035 errors:0 dropped:0 overruns:0 frame:0
          TX packets:186410 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:13641399 (13.6 MB)  TX bytes:646488737 (646.4 MB)

vethe31141aa Link encap:Ethernet  HWaddr ae:33:1f:4d:f1:c9
          inet6 addr: fe80::ac33:1fff:fe4d:f1c9/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:269398 errors:0 dropped:0 overruns:0 frame:0
          TX packets:280169 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:22880380 (22.8 MB)  TX bytes:92100483 (92.1 MB)

vethea64c677 Link encap:Ethernet  HWaddr b6:a4:3f:8a:5c:45
          inet6 addr: fe80::b4a4:3fff:fe8a:5c45/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:367614 errors:0 dropped:0 overruns:0 frame:0
          TX packets:424894 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:367594882 (367.5 MB)  TX bytes:361527394 (361.5 MB)

vethfc53db08 Link encap:Ethernet  HWaddr 42:22:6d:e0:ca:74
          inet6 addr: fe80::4022:6dff:fee0:ca74/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:52746 errors:0 dropped:0 overruns:0 frame:0
          TX packets:52859 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:10541869 (10.5 MB)  TX bytes:4704792 (4.7 MB)

freehan · 2017-04-20T23:27:34Z

After a while, the pods are running now and able to start.

Could you elaborate? What is the sequence of symptoms and operations?

aespinosa · 2017-04-20T23:35:34Z

I tried to delete and have the pods be resurrected by the replicaset. The pod has been created and was allocated a Pod IP successfully.

However, I still have problems of the proxy endpoints at the master not being able to be served. I'm using the aws cloud provider and I don't see my route tables being updated either.

freehan · 2017-04-21T17:57:13Z

Are you still able to reproduce the problem related to pod IP?

I am not familiar with aws cloud provider. cc @justinsb

aespinosa · 2017-04-22T21:43:58Z

Thanks for the help @freehan . I figured out that my problem from the controller-manager's logs. It was saying it can't update route tables because of duplicate matching route tables when querying AWS.

DreadPirateShawn · 2017-05-02T17:09:13Z

@freehan So -- ignoring the sidebar issue @aespinosa had, what's the next step here? Is it to fix the small race window between pod status and cni execution?

(I'm not in a rush, just clarifying what to expect.)

freehan · 2017-05-02T20:24:38Z

@DreadPirateShawn
Besides the error log, is there any problem that stands out?

We need kubelet log to confirm it as a race between kubelet pod status sync and cni network plugin execution.

cc: @yujuhong

yujuhong · 2017-05-02T21:48:57Z

I am a bit lost in all these discussions. What exactly is the problem (besides the error message in the log) that we are dealing with?

vishh · 2017-05-12T19:01:55Z

I hit (possibly) the same issue. With a cluster from HEAD, kube-system not in host net namespace fails to come up

$ k get po -n kube-system | grep -v Running
NAME                                              READY     STATUS                                                                                                                                                                                                                                                                  RESTARTS   AGE
fluentd-gcp-v2.0-0flxx                            0/2       rpc error: code = 2 desc = failed to start container "15e07e324daa0f68793c454d81173c42b9c2e86d41b24236206756f2b55c7633": Error response from daemon: cannot join network of a non running container: 9f2593412ac541240a74ce445344e777b7fdb86c8ebc13520283a5e02f679f2e   528        12h
fluentd-gcp-v2.0-1h18l                            0/2       rpc error: code = 2 desc = failed to start container "a918404ab141350861cf653ed3b0b54924d0c8017316c45a45ab7f754065d521": Error response from daemon: cannot join network of a non running container: 3ac6abfa227e42665dcbe51aae62b97b924b5a9b0d9261529de806d6d627be7f   662        12h
fluentd-gcp-v2.0-5cdrh                            0/2       rpc error: code = 2 desc = failed to start container "68a7b104f66dbd2ad5874a8f89618b5d32af9f5b01e4114a8b750f513adced14": Error response from daemon: cannot join network of a non running container: 968a72050d2d9333837d1cd4dcaa645f5e21efd74e9745082b409803ce8da63e   384        12h
fluentd-gcp-v2.0-q73tm                            0/2       rpc error: code = 2 desc = failed to start container "3910156c8c6e712129bcc60bd8d9ff1c13bdb403cba18b0171a1d3da5ae8f434": Error response from daemon: cannot join network of a non running container: bf4d20fe360d4c1cf25987875e3b45642cd9317d39c51749b15b7d3b4fcbdf33   72         12h
heapster-v1.3.0-2598322454-1trbt                  0/4       rpc error: code = 2 desc = failed to start container "7d831a61aaf12752f790b86a6e004f3e40814670f8aa781eb24532c379cdd53c": Error response from daemon: cannot join network of a non running container: a5122aa07a1f027e1e03733c442d082162e6bd30534e326b1550ac7843e3ce64   424        12h
kube-dns-2473071502-08smf                         0/3       rpc error: code = 2 desc = failed to start container "b443d0ccfb753b32d304bf4ae281b85d8e146f6e9592eb253438e5a12b247f3f": Error response from daemon: cannot join network of a non running container: a4cf42c8f65e913e5cc9edea70ef7721f592adda5510341f508944d71435167d   84         12h
kube-dns-autoscaler-2042644734-6vz7d              0/1       rpc error: code = 2 desc = failed to start container "e0279bdab408ae39ba53abecd111001b06e051fa511d6b77df9a07a727776f31": Error response from daemon: cannot join network of a non running container: 6cb2f4de5dfde918d50e47c85fe15a3a59ea684a8bff307f8839997409738e41   115        12h
kubernetes-dashboard-638781211-v0q5l              0/1       rpc error: code = 2 desc = failed to start container "eea90aeccef3b30feab78d7dda50f93952ae10544032445eb7406555b2345f0f": Error response from daemon: cannot join network of a non running container: 09ab2bd4b25161151f0c26d83af1fb4d968b25e58870c640969bc8d8ee48e802   38         12h
l7-default-backend-1599978672-st6tj               0/1       rpc error: code = 2 desc = failed to start container "274ac2867f40a5275cee5f3959b6e05166d27fab9944f33abe829576f15c7b66": Error response from daemon: cannot join network of a non running container: 24d2dcf060bb7075efeeebf20188f43391e7fdefe2560156b6ea91047954fb28   79         12h
monitoring-influxdb-grafana-v4-mbtgv              0/2       rpc error: code = 2 desc = failed to start container "da19ad4133c194671b85874afd327879bf32ee869ab095160d29990d0377d518": Error response from daemon: cannot join network of a non running container: f613d4805b5d9a681461b203f7140a684baada3229f87668e2ed98494648ab13   74         12h

Kubelet logs has the following warning messages, which the infra container being restarted in a loop

May 12 16:26:05 e2e-test-vishnuk-minion-group-tjr0 kubelet[1272]: W0512 16:26:05.488409    1272 pod_container_deletor.go:77] Container "2b9990b748a79c5eb711bedb61e7eb0c3767d81692f1d70933216ced7465542a" not found in pod's containers
May 12 16:26:05 e2e-test-vishnuk-minion-group-tjr0 kubelet[1272]: W0512 16:26:05.494757    1272 docker_sandbox.go:285] NetworkPlugin kubenet failed on the status hook for pod "kube-dns-autoscaler-2042644734-6vz7d_kube-system": Cannot find the network namespace, skipping pod network status for container {"docker" "faf66266a7e06c4e1e9f8ff99fb3ac133cd91f488b22747fe1d49e92a27c7108"}
May 12 16:26:05 e2e-test-vishnuk-minion-group-tjr0 kubelet[1272]: W0512 16:26:05.496795    1272 docker_sandbox.go:285] NetworkPlugin kubenet failed on the status hook for pod "kube-dns-autoscaler-2042644734-6vz7d_kube-system": Cannot find the network namespace, skipping pod network status for container {"docker" "1ebeef7ec432f322c949363a937c9647674825fd9a120e5768ee5eacf0c23f75"}
May 12 16:26:05 e2e-test-vishnuk-minion-group-tjr0 kubelet[1272]: W0512 16:26:05.498753    1272 docker_sandbox.go:285] NetworkPlugin kubenet failed on the status hook for pod "kube-dns-autoscaler-2042644734-6vz7d_kube-system": Cannot find the network namespace, skipping pod network status for container {"docker" "a43ca215854a316e46b62c0737539923638262059112336f706da251b86b1149"}
May 12 16:26:05 e2e-test-vishnuk-minion-group-tjr0 kubelet[1272]: W0512 16:26:05.500791    1272 docker_sandbox.go:285] NetworkPlugin kubenet failed on the status hook for pod "kube-dns-autoscaler-2042644734-6vz7d_kube-system": Cannot find the network namespace, skipping pod network status for container {"docker" "d439fb1bf3d99c0438023e3548d6ab5ead3c37f6d4bb9436f0d5ddf3928bdc46"}
May 12 16:26:05 e2e-test-vishnuk-minion-group-tjr0 kubelet[1272]: W0512 16:26:05.502725    1272 docker_sandbox.go:285] NetworkPlugin kubenet failed on the status hook for pod "kube-dns-autoscaler-2042644734-6vz7d_kube-system": Cannot find the network namespace, skipping pod network status for container {"docker" "7ef2d537193d765e9b9dbfb57fab28328e9029f2450ff55aa44015958f705a6c"}
May 12 16:26:05 e2e-test-vishnuk-minion-group-tjr0 kubelet[1272]: W0512 16:26:05.504724    1272 docker_sandbox.go:285] NetworkPlugin kubenet failed on the status hook for pod "kube-dns-autoscaler-2042644734-6vz7d_kube-system": Cannot find the network namespace, skipping pod network status for container {"docker" "b2acb2fae37ebe3a876039fcaeff4236055ea055214a8b17af85e6b472b85e20"}
May 12 16:26:05 e2e-test-vishnuk-minion-group-tjr0 kubelet[1272]: W0512 16:26:05.506781    1272 docker_sandbox.go:285] NetworkPlugin kubenet failed on the status hook for pod "kube-dns-autoscaler-2042644734-6vz7d_kube-system": Cannot find the network namespace, skipping pod network status for container {"docker" "ee915d5a6e2207432f5c379f310be765f6b1f1b40b26a2035ee025c912da9f0c"}
May 12 16:26:05 e2e-test-vishnuk-minion-group-tjr0 kubelet[1272]: W0512 16:26:05.509020    1272 docker_sandbox.go:285] NetworkPlugin kubenet failed on the status hook for pod "kube-dns-autoscaler-2042644734-6vz7d_kube-system": Cannot find the network namespace, skipping pod network status for container {"docker" "33ef9686a51d190adf2883553593f9513dcc88f904ee7b51629018a800408b3e"}
May 12 16:26:05 e2e-test-vishnuk-minion-group-tjr0 kubelet[1272]: I0512 16:26:05.513086    1272 generic.go:343] PLEG: Write status for kube-dns-autoscaler-2042644734-6vz7d/kube-system: &container.PodStatus{ID:"117537e6-36de-11e7-9733-42010af00002", Name:"kube-dns-autoscaler-2042644734-6vz7d", Namespace:"kube-system", IP:"", ContainerStatuses:[]*container.ContainerStatus{(*container.ContainerStatus)(0xc4217e89a0)}, SandboxStatuses:[]*runtime.PodSandboxStatus{(*runtime.PodSandboxStatus)(0xc42254bb80), (*runtime.PodSandboxStatus)(0xc42273c050), (*runtime.PodSandboxStatus)(0xc42273c460), (*runtime.PodSandboxStatus)(0xc42273c910), (*runtime.PodSandboxStatus)(0xc42273cd20), (*runtime.PodSandboxStatus)(0xc42273d180), (*runtime.PodSandboxStatus)(0xc420b86f50), (*runtime.PodSandboxStatus)(0xc42166e320)}} (err: <nil>)
May 12 16:26:05 e2e-test-vishnuk-minion-group-tjr0 kubelet[1272]: W0512 16:26:05.516551    1272 pod_container_deletor.go:77] Container "faf66266a7e06c4e1e9f8ff99fb3ac133cd91f488b22747fe1d49e92a27c7108" not found in pod's containers

a-chernykh · 2017-05-12T20:56:54Z

Just bumped image versions to 2.1.5 in addons/networking.projectcalico.org/k8s-1.6.yaml manually and it started to work correctly. I see that Calico has been already upgraded to 2.1.5 in master, so we just need to wait for new release.

dcbw · 2017-06-01T21:23:27Z

#43879 may help this, though it has been reverted and needs some fixups.

thockin · 2017-06-09T04:24:15Z

Update?

hfrog · 2017-06-09T21:41:53Z

I've seen the same errors in my cluster after turning on RBAC. The reason is default calico.yaml (http://docs.projectcalico.org/v2.2/getting-started/kubernetes/installation/hosted/calico.yaml) lacks of cluster roles and bindings. After applying https://github.com/projectcalico/calico/blob/master/master/getting-started/kubernetes/installation/rbac.yaml all is ok.
Hope this helps

saschagrunert · 2017-06-11T11:37:02Z

I am not sure if this issue is related to RBAC since it also occurs then not using calico but cni networking via flannel. I found out that the cni Network interface got some sort of disconnect (via dmesg), then reconnects and after that the nsenter failures occurred.

bowei · 2017-06-13T00:06:40Z

@dchen1107 move to next milestone

Reading through the issue, there seem to be a couple of different intersecting issues going on here, it seems like it needs some more investigation.

matchstick · 2017-06-13T00:21:09Z

This feels like an umbrella issue at this point. It keeps redefining itself.
@DreadPirateShawn I am moving out of 1.7 if that is inappropriate please have us re-add milestone 1.7.

murhum1 · 2017-06-13T11:59:23Z

I'm also getting this every now and then, and when it starts happening, the only workaround seems to be tearing down the cluster and bringing it back up. At least in my case it seems to be related to resource limits. I'm trying to run a pod both with and without a resource limit - every time I specify a memory limit, the pod is stuck at ContainerCreating, and every time I comment them out it starts Running without a hitch.

Here's some logs from the kubelet trying to run the resource limited pods. I'm running with a kubeadm install + flannel.

EDIT: kubeadm, kubectl version 1.6.4

Jun 13 14:43:52 my.server.com kubelet[25154]: with error: exit status 1
Jun 13 14:43:52 my.server.com kubelet[25154]: W0613 14:43:52.344639   25154 docker_sandbox.go:263] NetworkPlugin cni failed on the status hook for pod "scala-test-bqr59_grader": Unexpected command output nsenter: cannot open : No such file or directory
Jun 13 14:43:52 my.server.com kubelet[25154]: with error: exit status 1
Jun 13 14:43:52 my.server.com kubelet[25154]: W0613 14:43:52.365529   25154 docker_sandbox.go:263] NetworkPlugin cni failed on the status hook for pod "scala-test-bqr59_grader": Unexpected command output nsenter: cannot open : No such file or directory
Jun 13 14:43:52 my.server.com kubelet[25154]: with error: exit status 1
Jun 13 14:43:52 my.server.com kubelet[25154]: W0613 14:43:52.387687   25154 docker_sandbox.go:263] NetworkPlugin cni failed on the status hook for pod "scala-test-bqr59_grader": Unexpected command output nsenter: cannot open : No such file or directory
Jun 13 14:43:52 my.server.com kubelet[25154]: with error: exit status 1
Jun 13 14:43:52 my.server.com kubelet[25154]: W0613 14:43:52.914538   25154 container.go:352] Failed to create summary reader for "/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6d5940c2_502d_11e7_9b32_005056940720.slice/docker-875a65dd3e3c4e1e6919ce34e4d125699f5e64db8cdd804626a6eac38802b640.scope": none of the resources are being tracked.
Jun 13 14:43:52 my.server.com kubelet[25154]: E0613 14:43:52.915062   25154 remote_runtime.go:86] RunPodSandbox from runtime service failed: rpc error: code = 2 desc = failed to start sandbox container for pod "scala-test-bqr59": Error response from daemon: {"message":"invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:286: decoding sync type from init pipe caused \\\\\\\"read parent: connection reset by peer\\\\\\\"\\\"\\n\""}
Jun 13 14:43:52 my.server.com kubelet[25154]: E0613 14:43:52.915133   25154 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "scala-test-bqr59_grader(6d5940c2-502d-11e7-9b32-005056940720)" failed: rpc error: code = 2 desc = failed to start sandbox container for pod "scala-test-bqr59": Error response from daemon: {"message":"invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:286: decoding sync type from init pipe caused \\\\\\\"read parent: connection reset by peer\\\\\\\"\\\"\\n\""}

yujuhong · 2017-06-13T15:34:53Z

FWIW, we investigated @vishh's cluster in #42735 (comment), and found that it was not relevant to this thread, but forgot to come back and update.

@freehan

Automatic merge from submit-queue (batch tested with PRs 46441, 43987, 46921, 46823, 47276) kubelet/network: report but tolerate errors returned from GetNetNS() v2 Runtimes should never return "" and nil errors, since network plugin drivers need to treat netns differently in different cases. So return errors when we can't get the netns, and fix up the plugins to do the right thing. Namely, we don't need a NetNS on pod network teardown. We do need a netns for pod Status checks and for network setup. V2: don't return errors from getIP(), since they will block pod status :( Just log them. But even so, this still fixes the original problem by ensuring we don't log errors when the network isn't ready. @freehan @yujuhong Fixes: #42735 Fixes: #44307

DreadPirateShawn · 2017-06-17T15:59:28Z

@matchstick Yeah, the umbrella drift was unfortunate. In the original reply by @dchen1107 he said the original ticket was due to "a small race window between pod status and cni execution."

Is there a ticket / action item for that particular issue, which I originally reported?

(Or am I misunderstanding, is the original issue inseparable from the other issues people added to the ticket?)

calebamiles added kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Mar 8, 2017

calebamiles modified the milestone: v1.6 Mar 9, 2017

dchen1107 removed this from the v1.6 milestone Mar 10, 2017

dchen1107 added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. and removed kind/bug Categorizes issue or PR as related to a bug. labels Mar 10, 2017

dchen1107 assigned freehan Mar 10, 2017

cmluciano mentioned this issue Apr 27, 2017

Pods have no network access on new 1.6.2 cluster #45022

Closed

thockin added this to the v1.7 milestone May 27, 2017

thockin added the approved-for-milestone label May 27, 2017

dcbw self-assigned this Jun 1, 2017

dcbw mentioned this issue Jun 2, 2017

kubelet/network: report but tolerate errors returned from GetNetNS() v2 #46823

Merged

matchstick modified the milestones: next-candidate, v1.7 Jun 13, 2017

k8s-github-robot closed this as completed in #46823 Jun 13, 2017

colemickens mentioned this issue Jun 15, 2017

hyperkube lacks nsenter causing kubenet hooks to fail #47618

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kubelet CNI nsenter failure #42735

Kubelet CNI nsenter failure #42735

DreadPirateShawn commented Mar 8, 2017

calebamiles commented Mar 8, 2017

dchen1107 commented Mar 10, 2017 •

edited

aespinosa commented Apr 19, 2017

freehan commented Apr 19, 2017 •

edited

aespinosa commented Apr 19, 2017

freehan commented Apr 19, 2017 •

edited

aespinosa commented Apr 19, 2017

freehan commented Apr 20, 2017 •

edited

aespinosa commented Apr 20, 2017 •

edited

freehan commented Apr 20, 2017

aespinosa commented Apr 20, 2017

freehan commented Apr 21, 2017

aespinosa commented Apr 22, 2017

DreadPirateShawn commented May 2, 2017

freehan commented May 2, 2017

yujuhong commented May 2, 2017

vishh commented May 12, 2017

a-chernykh commented May 12, 2017

dcbw commented Jun 1, 2017

thockin commented Jun 9, 2017

hfrog commented Jun 9, 2017

saschagrunert commented Jun 11, 2017

bowei commented Jun 13, 2017

matchstick commented Jun 13, 2017

murhum1 commented Jun 13, 2017 •

edited

yujuhong commented Jun 13, 2017

DreadPirateShawn commented Jun 17, 2017

Kubelet CNI nsenter failure #42735

Kubelet CNI nsenter failure #42735

Comments

DreadPirateShawn commented Mar 8, 2017

calebamiles commented Mar 8, 2017

dchen1107 commented Mar 10, 2017 • edited

aespinosa commented Apr 19, 2017

freehan commented Apr 19, 2017 • edited

aespinosa commented Apr 19, 2017

freehan commented Apr 19, 2017 • edited

aespinosa commented Apr 19, 2017

freehan commented Apr 20, 2017 • edited

aespinosa commented Apr 20, 2017 • edited

freehan commented Apr 20, 2017

aespinosa commented Apr 20, 2017

freehan commented Apr 21, 2017

aespinosa commented Apr 22, 2017

DreadPirateShawn commented May 2, 2017

freehan commented May 2, 2017

yujuhong commented May 2, 2017

vishh commented May 12, 2017

a-chernykh commented May 12, 2017

dcbw commented Jun 1, 2017

thockin commented Jun 9, 2017

hfrog commented Jun 9, 2017

saschagrunert commented Jun 11, 2017

bowei commented Jun 13, 2017

matchstick commented Jun 13, 2017

murhum1 commented Jun 13, 2017 • edited

yujuhong commented Jun 13, 2017

DreadPirateShawn commented Jun 17, 2017

dchen1107 commented Mar 10, 2017 •

edited

freehan commented Apr 19, 2017 •

edited

freehan commented Apr 19, 2017 •

edited

freehan commented Apr 20, 2017 •

edited

aespinosa commented Apr 20, 2017 •

edited

murhum1 commented Jun 13, 2017 •

edited